Re: [PATCH v2 13/19] x86/shadow: Avoid raising faults behind the emulators back

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Tim Deegan <tim@xen.org>
Cc: Jan Beulich <JBeulich@suse.com>, Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2 13/19] x86/shadow: Avoid raising faults behind the emulators back
Date: Mon, 28 Nov 2016 16:04:19 +0000	[thread overview]
Message-ID: <63a04bf2-5a6b-06c8-040f-c6a3939308c1@citrix.com> (raw)
In-Reply-To: <20161128144930.GG73967@deinos.phlegethon.org>

On 28/11/16 14:49, Tim Deegan wrote:
> Hi,
>
> At 11:13 +0000 on 28 Nov (1480331610), Andrew Cooper wrote:
>> Use x86_emul_{hw_exception,pagefault}() rather than
>> {pv,hvm}_inject_page_fault() and hvm_inject_hw_exception() to cause raised
>> faults to be known to the emulator.  This requires altering the callers of
>> x86_emulate() to properly re-inject the event.
>>
>> While fixing this, fix the singlestep behaviour.  Previously, an otherwise
>> successful emulation would fail if singlestepping was active, as the emulator
>> couldn't raise #DB.  This is unreasonable from the point of view of the guest.
>>
>> We therefore tolerate #PF/#GP/SS and #DB being raised by the emulator, but
>> reject anything else as unexpected.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> +    if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
>> +    {
>> +        /*
>> +         * This emulation covers writes to shadow pagetables.  We tolerate #PF
>> +         * (from hitting adjacent pages), #GP/#SS (from segmentation errors),
>> +         * and #DB (from singlestepping).  Anything else is an emulation bug,
>> +         * or a guest playing with the instruction stream under Xen's feet.
>> +         */
>> +        if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
>> +             (emul_ctxt.ctxt.event.vector < 32) &&
>> +             ((1u << emul_ctxt.ctxt.event.vector) &
>> +              ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
>> +               (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
>> +        {
>> +            if ( is_hvm_vcpu(v) )
>> +                hvm_inject_event(&emul_ctxt.ctxt.event);
>> +            else
>> +                pv_inject_event(&emul_ctxt.ctxt.event);
>> +        }
>> +        else
>> +        {
>> +            if ( is_hvm_vcpu(v) )
>> +                hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> +            else
>> +                pv_inject_hw_exception(TRAP_gp_fault, 0);
>> +        }
> I don't think it's OK to lob #GP into a HVM guest here -- we can hit
> this path even if the guest isn't behaving strangely.

What circumstances are you thinking of?

Unless the guest is playing with the instruction stream,  event_pending
will only be set by the set in the paths altered by this patch, which is
the four whitelisted vectors.

> I think it would be better to run this check before the X86EMUL_UNHANDLEABLE one
> and convert injections that we choose not to handle into
> X86EMUL_UNHANDLEABLE.
>
> Which I guess brings us back full circle to the behaviour we had
> before, and perhaps the right change to the earlier patch is to
> start down this road, with an explanatory comment.
>
> e.g. start with something like this, immediately after the first call
> to x86_emulate():
>
>     /*
>      * Events raised within the emulator itself used to return
>      * X86EMUL_UNHANDLEABLE because we didn't supply injection
>      * callbacks.  Now the emulator supplies those to us via
>      * ctxt.event_pending instead.  Preserve the old behaviour
>      * for now.
>      */
>      if (emul_ctxt.ctxt.event_pending)
>          r = X86EMUL_UNHANDLEABLE;
>
> And in this patch, replace it with the hunk you have above, but
> setting r = X86EMUL_UNHANDLEABLE instead of injecting #GP.
>
> Does that make sense?

It does make sense, but that goes against the comment of not unshadowing
on exception.

A lot of this revolves around how likely we are to hit the #GP[0] case. 
I assert that we shouldn't be able to hit it unless the guest is playing
games, or we have a bug in emulation.

If we don't go throwing a #GP back, we should at least leave something
obvious in the log.

>
> Also, I'm a little confused after all this as to whether the emulator
> can still return with X86EMUL_OKAY and the event_pending set.

I have now determined this not to be the case.  Apologies for my
misinformation in v1.

All exception and software interrupt cases end up returning X86_EXCEPTION.

    case 0xcd: /* int imm8 */
        swint_type = x86_swint_int;
    swint:
        rc = inject_swint(swint_type, (uint8_t)src.val,
                          _regs.eip - ctxt->regs->eip,
                          ctxt, ops) ? : X86EMUL_EXCEPTION;
        goto done;

This is why I introduced the slightly relaxed check:

if ( emul_ctxt.ctxt.event_pending )
    ASSERT(r == X86EMUL_EXCEPTION);

in the earlier patch.

> If it
> can do that then we need to inject the event come what may, because
> any other side-effects will have been committted already. 

Because of the shadow pagetables use of x86_swint_emulate_none, all
software interrupts have fault semantics, so %eip isn't moved forward;
this is left to hardware on the re-inject path.  There are no other
pieces of state change for any software interrupts.

With some re-evaluation of hindsight, I plan to make all trap injection
have fault semantics out of x86_emulate(), leaving the possible
adjustment of %eip to the lower level vendor functions, as x86_event has
an insn_len field.  In reality, its only the svm path which needs to
adjust %eip, and only in certain circumstances.

The odd-case-out is singlestep, which completes writeback then raises an
exception.

>
>> @@ -3475,6 +3503,37 @@ static int sh_page_fault(struct vcpu *v,
>>              {
>>                  perfc_incr(shadow_em_ex_fail);
>>                  TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_EMULATION_LAST_FAILED);
>> +
>> +                if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
>> +                {
>> +                    /*
>> +                     * This emulation covers writes to shadow pagetables.  We
>> +                     * tolerate #PF (from hitting adjacent pages), #GP/#SS
>> +                     * (from segmentation errors), and #DB (from
>> +                     * singlestepping).  Anything else is an emulation bug, or
>> +                     * a guest playing with the instruction stream under Xen's
>> +                     * feet.
>> +                     */
>> +                    if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
>> +                         (emul_ctxt.ctxt.event.vector < 32) &&
>> +                         ((1u << emul_ctxt.ctxt.event.vector) &
>> +                          ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
>> +                           (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
>> +                    {
>> +                        if ( is_hvm_vcpu(v) )
>> +                            hvm_inject_event(&emul_ctxt.ctxt.event);
>> +                        else
>> +                            pv_inject_event(&emul_ctxt.ctxt.event);
>> +                    }
>> +                    else
>> +                    {
>> +                        if ( is_hvm_vcpu(v) )
>> +                            hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> +                        else
>> +                            pv_inject_hw_exception(TRAP_gp_fault, 0);
>> +                    }
>> +                }
>> +
> This looks like code duplication, but rather than trying to merge the
> two cases, I think we can drop this one entirely.  This emulation is
> optimistically trying to find the second half of a PAE PTE write -
> it's OK just to stop emulating if we hit anything this exciting.
> So we can lose the whole hunk.

At the very least we should retain the singlestep #DB injection, as it
still has trap semantics.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel