All of lore.kernel.org
 help / color / mirror / Atom feed
* Interrupt issues with hvm_emulate_one_vm_event()
@ 2017-05-25  9:40 Razvan Cojocaru
  2017-05-26 14:29 ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Razvan Cojocaru @ 2017-05-25  9:40 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tamas K Lengyel, Jan Beulich

Hello,

I've noticed that, with pages marked NX and vm_event emulation, we can
end up emulating an ud2, for which hvm_emulate_one() returns
X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().

This, in turn, causes a hvm_inject_event() call in the context of
hvm_do_resume(), which can, if there's already a pending event there,
cause a 101 BSOD (timer-related, if I understand correctly) or loss of
input (mouse frozen, keyboard unresponsive).

After much trial and error, I've been able to confirm this by leaving a
guest on for almost a full day with this change:

     case X86EMUL_EXCEPTION:
-        hvm_inject_event(&ctx.ctxt.event);
+        if ( !hvm_event_pending(current) )
+            hvm_inject_event(&ctx.ctxt.event);

and checking that there's been no BSOD or loss of input.

However, just losing the event here, while fine to prove that this is
indeed the problem, is not OK. But I'm not sure what an elegant / robust
way of fixing this is.

Suggestions are (as always) greatly appreciated.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-25  9:40 Interrupt issues with hvm_emulate_one_vm_event() Razvan Cojocaru
@ 2017-05-26 14:29 ` Jan Beulich
  2017-05-26 14:37   ` Razvan Cojocaru
  2017-05-26 15:11   ` Andrew Cooper
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Beulich @ 2017-05-26 14:29 UTC (permalink / raw)
  To: Razvan Cojocaru; +Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel

>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
> I've noticed that, with pages marked NX and vm_event emulation, we can
> end up emulating an ud2, for which hvm_emulate_one() returns
> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().

Could you explain what would lead to emulation of UD2?

> This, in turn, causes a hvm_inject_event() call in the context of
> hvm_do_resume(), which can, if there's already a pending event there,
> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
> input (mouse frozen, keyboard unresponsive).
> 
> After much trial and error, I've been able to confirm this by leaving a
> guest on for almost a full day with this change:
> 
>      case X86EMUL_EXCEPTION:
> -        hvm_inject_event(&ctx.ctxt.event);
> +        if ( !hvm_event_pending(current) )
> +            hvm_inject_event(&ctx.ctxt.event);
> 
> and checking that there's been no BSOD or loss of input.
> 
> However, just losing the event here, while fine to prove that this is
> indeed the problem, is not OK. But I'm not sure what an elegant / robust
> way of fixing this is.

Much depends on what the other event is: If it's an interrupt, I'd
assume there to be an ordering problem (interrupts shouldn't be
injected when there is a pending exception, their delivery instead
should be attempted on the first instruction of the exception
handler [if interrupts remain on] or whenever interrupts get
re-enabled).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-26 14:29 ` Jan Beulich
@ 2017-05-26 14:37   ` Razvan Cojocaru
  2017-05-26 15:38     ` Jan Beulich
  2017-05-26 15:11   ` Andrew Cooper
  1 sibling, 1 reply; 11+ messages in thread
From: Razvan Cojocaru @ 2017-05-26 14:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel

On 05/26/17 17:29, Jan Beulich wrote:
>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>> I've noticed that, with pages marked NX and vm_event emulation, we can
>> end up emulating an ud2, for which hvm_emulate_one() returns
>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
> 
> Could you explain what would lead to emulation of UD2?

If you mean in which cases does our engine mark pages NX, I'll have to
ask and get back to you. If you mean why generally would an UD2 end up
being the instruction where RIP causes an execute violation fault, I'll
have to check.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-26 14:29 ` Jan Beulich
  2017-05-26 14:37   ` Razvan Cojocaru
@ 2017-05-26 15:11   ` Andrew Cooper
  2017-05-29  9:20     ` Razvan Cojocaru
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2017-05-26 15:11 UTC (permalink / raw)
  To: Jan Beulich, Razvan Cojocaru; +Cc: Xen-devel, Tamas K Lengyel

On 26/05/17 15:29, Jan Beulich wrote:
>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>> I've noticed that, with pages marked NX and vm_event emulation, we can
>> end up emulating an ud2, for which hvm_emulate_one() returns
>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
> Could you explain what would lead to emulation of UD2?
>
>> This, in turn, causes a hvm_inject_event() call in the context of
>> hvm_do_resume(), which can, if there's already a pending event there,
>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>> input (mouse frozen, keyboard unresponsive).
>>
>> After much trial and error, I've been able to confirm this by leaving a
>> guest on for almost a full day with this change:
>>
>>      case X86EMUL_EXCEPTION:
>> -        hvm_inject_event(&ctx.ctxt.event);
>> +        if ( !hvm_event_pending(current) )
>> +            hvm_inject_event(&ctx.ctxt.event);
>>
>> and checking that there's been no BSOD or loss of input.
>>
>> However, just losing the event here, while fine to prove that this is
>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>> way of fixing this is.
> Much depends on what the other event is: If it's an interrupt, I'd
> assume there to be an ordering problem (interrupts shouldn't be
> injected when there is a pending exception, their delivery instead
> should be attempted on the first instruction of the exception
> handler [if interrupts remain on] or whenever interrupts get
> re-enabled).

I suspect it is an ordering issue, and something has processed and
interrupt before the emulation occurs as part of the vm_event reply happens.

The interrupt ordering spec indicates that external interrupts take
precedent over faults raised from executing an instruction, on the basis
that once the interrupt handler returns, the instruction will generate
the same fault again.  However, its not obvious how this is intended to
interact with interrupt windows and vmexits.  I expect we can get away
with ensuring that external interrupts are the final thing considered
for injection on the return-to-guest path.

It might be an idea to leave an assert in vmx_inject_event() that an
event is not already pending, but in the short term, this probably also
wants debugging by trying to identify what sequence of actions is
leading us to inject two events in this case (if indeed this is what is
happening).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-26 14:37   ` Razvan Cojocaru
@ 2017-05-26 15:38     ` Jan Beulich
  2017-05-29 13:24       ` Razvan Cojocaru
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2017-05-26 15:38 UTC (permalink / raw)
  To: Razvan Cojocaru; +Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel

>>> On 26.05.17 at 16:37, <rcojocaru@bitdefender.com> wrote:
> On 05/26/17 17:29, Jan Beulich wrote:
>>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>> 
>> Could you explain what would lead to emulation of UD2?
> 
> If you mean in which cases does our engine mark pages NX, I'll have to
> ask and get back to you. If you mean why generally would an UD2 end up
> being the instruction where RIP causes an execute violation fault, I'll
> have to check.

The question was more for the latter, as I don't understand what
good could come from executing UD2 intentionally, unless the
entity doing so knows there is an emulator around to do something
sensible with it.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-26 15:11   ` Andrew Cooper
@ 2017-05-29  9:20     ` Razvan Cojocaru
  2017-05-29 11:05       ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Razvan Cojocaru @ 2017-05-29  9:20 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: Xen-devel, Tamas K Lengyel

On 05/26/17 18:11, Andrew Cooper wrote:
> On 26/05/17 15:29, Jan Beulich wrote:
>>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>> Could you explain what would lead to emulation of UD2?
>>
>>> This, in turn, causes a hvm_inject_event() call in the context of
>>> hvm_do_resume(), which can, if there's already a pending event there,
>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>> input (mouse frozen, keyboard unresponsive).
>>>
>>> After much trial and error, I've been able to confirm this by leaving a
>>> guest on for almost a full day with this change:
>>>
>>>      case X86EMUL_EXCEPTION:
>>> -        hvm_inject_event(&ctx.ctxt.event);
>>> +        if ( !hvm_event_pending(current) )
>>> +            hvm_inject_event(&ctx.ctxt.event);
>>>
>>> and checking that there's been no BSOD or loss of input.
>>>
>>> However, just losing the event here, while fine to prove that this is
>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>> way of fixing this is.
>> Much depends on what the other event is: If it's an interrupt, I'd
>> assume there to be an ordering problem (interrupts shouldn't be
>> injected when there is a pending exception, their delivery instead
>> should be attempted on the first instruction of the exception
>> handler [if interrupts remain on] or whenever interrupts get
>> re-enabled).
> 
> I suspect it is an ordering issue, and something has processed and
> interrupt before the emulation occurs as part of the vm_event reply happens.
> 
> The interrupt ordering spec indicates that external interrupts take
> precedent over faults raised from executing an instruction, on the basis
> that once the interrupt handler returns, the instruction will generate
> the same fault again.  However, its not obvious how this is intended to
> interact with interrupt windows and vmexits.  I expect we can get away
> with ensuring that external interrupts are the final thing considered
> for injection on the return-to-guest path.
> 
> It might be an idea to leave an assert in vmx_inject_event() that an
> event is not already pending, but in the short term, this probably also
> wants debugging by trying to identify what sequence of actions is
> leading us to inject two events in this case (if indeed this is what is
> happening).

With some patience, I've been able to catch the problem: "(XEN)
vmx_inject_event(3, 14) but 0, 225 pending".

 63 /*
 64  * x86 event types. This enumeration is valid for:
 65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
 66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
 67  */
 68 enum x86_event_type {
 69     X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
 70     X86_EVENTTYPE_NMI = 2,          /* NMI */
 71     X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
 72     X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
 73     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
 74     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
 75 };

so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.

This is the patch that has produced the above output:

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c8ef18a..3b88eca 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1807,6 +1807,12 @@ void vmx_inject_nmi(void)
                            X86_EVENT_NO_EC);
 }

+static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
+{
+    info->cr2 = v->arch.hvm_vcpu.guest_cr[2];
+    return hvm_funcs.get_pending_event(v, info);
+}
+
 /*
  * Generate a virtual event in the guest.
  * NOTES:
@@ -1821,6 +1827,15 @@ static void vmx_inject_event(const struct
x86_event *event)
     struct vcpu *curr = current;
     struct x86_event _event = *event;

+    if ( hvm_event_pending(current) )
+    {
+       struct x86_event ev;
+       hvm_get_pending_event(current, &ev);
+
+       printk("vmx_inject_event(%d, %d) but %d, %d pending\n",
+              _event.type, _event.vector, ev.type, ev.vector);
+    }
+
     switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
     {
     case TRAP_debug:


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-29  9:20     ` Razvan Cojocaru
@ 2017-05-29 11:05       ` Jan Beulich
  2017-05-29 11:46         ` Razvan Cojocaru
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2017-05-29 11:05 UTC (permalink / raw)
  To: Razvan Cojocaru; +Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel

>>> On 29.05.17 at 11:20, <rcojocaru@bitdefender.com> wrote:
> On 05/26/17 18:11, Andrew Cooper wrote:
>> On 26/05/17 15:29, Jan Beulich wrote:
>>>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>> Could you explain what would lead to emulation of UD2?
>>>
>>>> This, in turn, causes a hvm_inject_event() call in the context of
>>>> hvm_do_resume(), which can, if there's already a pending event there,
>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>>> input (mouse frozen, keyboard unresponsive).
>>>>
>>>> After much trial and error, I've been able to confirm this by leaving a
>>>> guest on for almost a full day with this change:
>>>>
>>>>      case X86EMUL_EXCEPTION:
>>>> -        hvm_inject_event(&ctx.ctxt.event);
>>>> +        if ( !hvm_event_pending(current) )
>>>> +            hvm_inject_event(&ctx.ctxt.event);
>>>>
>>>> and checking that there's been no BSOD or loss of input.
>>>>
>>>> However, just losing the event here, while fine to prove that this is
>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>>> way of fixing this is.
>>> Much depends on what the other event is: If it's an interrupt, I'd
>>> assume there to be an ordering problem (interrupts shouldn't be
>>> injected when there is a pending exception, their delivery instead
>>> should be attempted on the first instruction of the exception
>>> handler [if interrupts remain on] or whenever interrupts get
>>> re-enabled).
>> 
>> I suspect it is an ordering issue, and something has processed and
>> interrupt before the emulation occurs as part of the vm_event reply happens.
>> 
>> The interrupt ordering spec indicates that external interrupts take
>> precedent over faults raised from executing an instruction, on the basis
>> that once the interrupt handler returns, the instruction will generate
>> the same fault again.  However, its not obvious how this is intended to
>> interact with interrupt windows and vmexits.  I expect we can get away
>> with ensuring that external interrupts are the final thing considered
>> for injection on the return-to-guest path.
>> 
>> It might be an idea to leave an assert in vmx_inject_event() that an
>> event is not already pending, but in the short term, this probably also
>> wants debugging by trying to identify what sequence of actions is
>> leading us to inject two events in this case (if indeed this is what is
>> happening).
> 
> With some patience, I've been able to catch the problem: "(XEN)
> vmx_inject_event(3, 14) but 0, 225 pending".
> 
>  63 /*
>  64  * x86 event types. This enumeration is valid for:
>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>  67  */
>  68 enum x86_event_type {
>  69     X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
>  70     X86_EVENTTYPE_NMI = 2,          /* NMI */
>  71     X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
>  72     X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
>  73     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>  74     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
>  75 };
> 
> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.

So this confirms our suspicion, but doesn't move us closer to a
solution. The question after all is why an external interrupt is
being delivered prior to or while emulating. As Andrew did
explain, proper behavior would be to check for external
interrupts and don't enter emulation if one is pending, or don't
check for external interrupts until the _next_ instruction
boundary. Correct architectural behavior will result either way;
the second variant merely must not continuously defer interrupts
(i.e. there need to be instruction boundaries at which hardware
of software do check for them).

I'm not that familiar with the sequence of steps when dealing
with emulation requests from an introspection agent, so I would
hope you could go through those code paths to see where
external interrupts are being checked for. Or wait - isn't your
problem that you invoke emulation out of hvm_do_resume() (via
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
which happens after all other "normal" processing of a VM exit?
Perhaps emulation should be skipped there if an event is already
pending injection, as emulation not having started means we still
are on an instruction boundary?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-29 11:05       ` Jan Beulich
@ 2017-05-29 11:46         ` Razvan Cojocaru
  2017-05-29 12:11           ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Razvan Cojocaru @ 2017-05-29 11:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel

On 05/29/17 14:05, Jan Beulich wrote:
>>>> On 29.05.17 at 11:20, <rcojocaru@bitdefender.com> wrote:
>> On 05/26/17 18:11, Andrew Cooper wrote:
>>> On 26/05/17 15:29, Jan Beulich wrote:
>>>>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>>>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>>> Could you explain what would lead to emulation of UD2?
>>>>
>>>>> This, in turn, causes a hvm_inject_event() call in the context of
>>>>> hvm_do_resume(), which can, if there's already a pending event there,
>>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>>>> input (mouse frozen, keyboard unresponsive).
>>>>>
>>>>> After much trial and error, I've been able to confirm this by leaving a
>>>>> guest on for almost a full day with this change:
>>>>>
>>>>>      case X86EMUL_EXCEPTION:
>>>>> -        hvm_inject_event(&ctx.ctxt.event);
>>>>> +        if ( !hvm_event_pending(current) )
>>>>> +            hvm_inject_event(&ctx.ctxt.event);
>>>>>
>>>>> and checking that there's been no BSOD or loss of input.
>>>>>
>>>>> However, just losing the event here, while fine to prove that this is
>>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>>>> way of fixing this is.
>>>> Much depends on what the other event is: If it's an interrupt, I'd
>>>> assume there to be an ordering problem (interrupts shouldn't be
>>>> injected when there is a pending exception, their delivery instead
>>>> should be attempted on the first instruction of the exception
>>>> handler [if interrupts remain on] or whenever interrupts get
>>>> re-enabled).
>>>
>>> I suspect it is an ordering issue, and something has processed and
>>> interrupt before the emulation occurs as part of the vm_event reply happens.
>>>
>>> The interrupt ordering spec indicates that external interrupts take
>>> precedent over faults raised from executing an instruction, on the basis
>>> that once the interrupt handler returns, the instruction will generate
>>> the same fault again.  However, its not obvious how this is intended to
>>> interact with interrupt windows and vmexits.  I expect we can get away
>>> with ensuring that external interrupts are the final thing considered
>>> for injection on the return-to-guest path.
>>>
>>> It might be an idea to leave an assert in vmx_inject_event() that an
>>> event is not already pending, but in the short term, this probably also
>>> wants debugging by trying to identify what sequence of actions is
>>> leading us to inject two events in this case (if indeed this is what is
>>> happening).
>>
>> With some patience, I've been able to catch the problem: "(XEN)
>> vmx_inject_event(3, 14) but 0, 225 pending".
>>
>>  63 /*
>>  64  * x86 event types. This enumeration is valid for:
>>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>>  67  */
>>  68 enum x86_event_type {
>>  69     X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
>>  70     X86_EVENTTYPE_NMI = 2,          /* NMI */
>>  71     X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
>>  72     X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
>>  73     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>>  74     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
>>  75 };
>>
>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
> 
> So this confirms our suspicion, but doesn't move us closer to a
> solution. The question after all is why an external interrupt is
> being delivered prior to or while emulating. As Andrew did
> explain, proper behavior would be to check for external
> interrupts and don't enter emulation if one is pending, or don't
> check for external interrupts until the _next_ instruction
> boundary. Correct architectural behavior will result either way;
> the second variant merely must not continuously defer interrupts
> (i.e. there need to be instruction boundaries at which hardware
> of software do check for them).
> 
> I'm not that familiar with the sequence of steps when dealing
> with emulation requests from an introspection agent, so I would
> hope you could go through those code paths to see where
> external interrupts are being checked for. Or wait - isn't your
> problem that you invoke emulation out of hvm_do_resume() (via
> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
> which happens after all other "normal" processing of a VM exit?
> Perhaps emulation should be skipped there if an event is already
> pending injection, as emulation not having started means we still
> are on an instruction boundary?

Indeed, we are emulating after all the vmexit processing has already
happened, in the hvm_do_resume() path you've mentioned.

Not emulating if an event is pending injection is certainly trivial to
test - however, since then we would have to lose the emulation attempt,
presumably the EPT fault event would show up again (a "duplicate" event)
in this case (since it is no longer safe to assume that we land on the
same RIP after the interrupt has been handled, and thus the next
hvm_do_resume() may not happen where we wanted to emulate).

I'll test this scenario out.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-29 11:46         ` Razvan Cojocaru
@ 2017-05-29 12:11           ` Andrew Cooper
  2017-05-29 12:22             ` Razvan Cojocaru
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2017-05-29 12:11 UTC (permalink / raw)
  To: Razvan Cojocaru, Jan Beulich; +Cc: Xen-devel, Tamas K Lengyel

On 29/05/2017 12:46, Razvan Cojocaru wrote:
> On 05/29/17 14:05, Jan Beulich wrote:
>>>>> On 29.05.17 at 11:20, <rcojocaru@bitdefender.com> wrote:
>>> On 05/26/17 18:11, Andrew Cooper wrote:
>>>> On 26/05/17 15:29, Jan Beulich wrote:
>>>>>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>>>>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>>>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>>>> Could you explain what would lead to emulation of UD2?
>>>>>
>>>>>> This, in turn, causes a hvm_inject_event() call in the context of
>>>>>> hvm_do_resume(), which can, if there's already a pending event there,
>>>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>>>>> input (mouse frozen, keyboard unresponsive).
>>>>>>
>>>>>> After much trial and error, I've been able to confirm this by leaving a
>>>>>> guest on for almost a full day with this change:
>>>>>>
>>>>>>      case X86EMUL_EXCEPTION:
>>>>>> -        hvm_inject_event(&ctx.ctxt.event);
>>>>>> +        if ( !hvm_event_pending(current) )
>>>>>> +            hvm_inject_event(&ctx.ctxt.event);
>>>>>>
>>>>>> and checking that there's been no BSOD or loss of input.
>>>>>>
>>>>>> However, just losing the event here, while fine to prove that this is
>>>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>>>>> way of fixing this is.
>>>>> Much depends on what the other event is: If it's an interrupt, I'd
>>>>> assume there to be an ordering problem (interrupts shouldn't be
>>>>> injected when there is a pending exception, their delivery instead
>>>>> should be attempted on the first instruction of the exception
>>>>> handler [if interrupts remain on] or whenever interrupts get
>>>>> re-enabled).
>>>> I suspect it is an ordering issue, and something has processed and
>>>> interrupt before the emulation occurs as part of the vm_event reply happens.
>>>>
>>>> The interrupt ordering spec indicates that external interrupts take
>>>> precedent over faults raised from executing an instruction, on the basis
>>>> that once the interrupt handler returns, the instruction will generate
>>>> the same fault again.  However, its not obvious how this is intended to
>>>> interact with interrupt windows and vmexits.  I expect we can get away
>>>> with ensuring that external interrupts are the final thing considered
>>>> for injection on the return-to-guest path.
>>>>
>>>> It might be an idea to leave an assert in vmx_inject_event() that an
>>>> event is not already pending, but in the short term, this probably also
>>>> wants debugging by trying to identify what sequence of actions is
>>>> leading us to inject two events in this case (if indeed this is what is
>>>> happening).
>>> With some patience, I've been able to catch the problem: "(XEN)
>>> vmx_inject_event(3, 14) but 0, 225 pending".
>>>
>>>  63 /*
>>>  64  * x86 event types. This enumeration is valid for:
>>>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>>>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>>>  67  */
>>>  68 enum x86_event_type {
>>>  69     X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
>>>  70     X86_EVENTTYPE_NMI = 2,          /* NMI */
>>>  71     X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
>>>  72     X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
>>>  73     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>>>  74     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
>>>  75 };
>>>
>>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
>>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
>> So this confirms our suspicion, but doesn't move us closer to a
>> solution. The question after all is why an external interrupt is
>> being delivered prior to or while emulating. As Andrew did
>> explain, proper behavior would be to check for external
>> interrupts and don't enter emulation if one is pending, or don't
>> check for external interrupts until the _next_ instruction
>> boundary. Correct architectural behavior will result either way;
>> the second variant merely must not continuously defer interrupts
>> (i.e. there need to be instruction boundaries at which hardware
>> of software do check for them).
>>
>> I'm not that familiar with the sequence of steps when dealing
>> with emulation requests from an introspection agent, so I would
>> hope you could go through those code paths to see where
>> external interrupts are being checked for. Or wait - isn't your
>> problem that you invoke emulation out of hvm_do_resume() (via
>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
>> which happens after all other "normal" processing of a VM exit?
>> Perhaps emulation should be skipped there if an event is already
>> pending injection, as emulation not having started means we still
>> are on an instruction boundary?
> Indeed, we are emulating after all the vmexit processing has already
> happened, in the hvm_do_resume() path you've mentioned.
>
> Not emulating if an event is pending injection is certainly trivial to
> test - however, since then we would have to lose the emulation attempt,
> presumably the EPT fault event would show up again (a "duplicate" event)
> in this case (since it is no longer safe to assume that we land on the
> same RIP after the interrupt has been handled, and thus the next
> hvm_do_resume() may not happen where we wanted to emulate).
>
> I'll test this scenario out.

The problem is in .Lvmx_do_vmentry.  We call vmx_intr_assist() before
vmx_process_softirqs()

Therefore, when the EPT violation occurs and we break for introspection,
we still process pending interrupts before finally getting desheduled to
wait for the vm_event reply.  On receiving the vm_event reply, we
emulate the instruction, and find that an interrupt has already been
queued from before, but not yet delivered.

Untangling this going to be complicated.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-29 12:11           ` Andrew Cooper
@ 2017-05-29 12:22             ` Razvan Cojocaru
  0 siblings, 0 replies; 11+ messages in thread
From: Razvan Cojocaru @ 2017-05-29 12:22 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: Xen-devel, Tamas K Lengyel

On 05/29/17 15:11, Andrew Cooper wrote:
> On 29/05/2017 12:46, Razvan Cojocaru wrote:
>> On 05/29/17 14:05, Jan Beulich wrote:
>>>>>> On 29.05.17 at 11:20, <rcojocaru@bitdefender.com> wrote:
>>>> On 05/26/17 18:11, Andrew Cooper wrote:
>>>>> On 26/05/17 15:29, Jan Beulich wrote:
>>>>>>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>>>>>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>>>>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>>>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>>>>> Could you explain what would lead to emulation of UD2?
>>>>>>
>>>>>>> This, in turn, causes a hvm_inject_event() call in the context of
>>>>>>> hvm_do_resume(), which can, if there's already a pending event there,
>>>>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>>>>>> input (mouse frozen, keyboard unresponsive).
>>>>>>>
>>>>>>> After much trial and error, I've been able to confirm this by leaving a
>>>>>>> guest on for almost a full day with this change:
>>>>>>>
>>>>>>>      case X86EMUL_EXCEPTION:
>>>>>>> -        hvm_inject_event(&ctx.ctxt.event);
>>>>>>> +        if ( !hvm_event_pending(current) )
>>>>>>> +            hvm_inject_event(&ctx.ctxt.event);
>>>>>>>
>>>>>>> and checking that there's been no BSOD or loss of input.
>>>>>>>
>>>>>>> However, just losing the event here, while fine to prove that this is
>>>>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>>>>>> way of fixing this is.
>>>>>> Much depends on what the other event is: If it's an interrupt, I'd
>>>>>> assume there to be an ordering problem (interrupts shouldn't be
>>>>>> injected when there is a pending exception, their delivery instead
>>>>>> should be attempted on the first instruction of the exception
>>>>>> handler [if interrupts remain on] or whenever interrupts get
>>>>>> re-enabled).
>>>>> I suspect it is an ordering issue, and something has processed and
>>>>> interrupt before the emulation occurs as part of the vm_event reply happens.
>>>>>
>>>>> The interrupt ordering spec indicates that external interrupts take
>>>>> precedent over faults raised from executing an instruction, on the basis
>>>>> that once the interrupt handler returns, the instruction will generate
>>>>> the same fault again.  However, its not obvious how this is intended to
>>>>> interact with interrupt windows and vmexits.  I expect we can get away
>>>>> with ensuring that external interrupts are the final thing considered
>>>>> for injection on the return-to-guest path.
>>>>>
>>>>> It might be an idea to leave an assert in vmx_inject_event() that an
>>>>> event is not already pending, but in the short term, this probably also
>>>>> wants debugging by trying to identify what sequence of actions is
>>>>> leading us to inject two events in this case (if indeed this is what is
>>>>> happening).
>>>> With some patience, I've been able to catch the problem: "(XEN)
>>>> vmx_inject_event(3, 14) but 0, 225 pending".
>>>>
>>>>  63 /*
>>>>  64  * x86 event types. This enumeration is valid for:
>>>>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>>>>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>>>>  67  */
>>>>  68 enum x86_event_type {
>>>>  69     X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
>>>>  70     X86_EVENTTYPE_NMI = 2,          /* NMI */
>>>>  71     X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
>>>>  72     X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
>>>>  73     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>>>>  74     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
>>>>  75 };
>>>>
>>>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
>>>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
>>> So this confirms our suspicion, but doesn't move us closer to a
>>> solution. The question after all is why an external interrupt is
>>> being delivered prior to or while emulating. As Andrew did
>>> explain, proper behavior would be to check for external
>>> interrupts and don't enter emulation if one is pending, or don't
>>> check for external interrupts until the _next_ instruction
>>> boundary. Correct architectural behavior will result either way;
>>> the second variant merely must not continuously defer interrupts
>>> (i.e. there need to be instruction boundaries at which hardware
>>> of software do check for them).
>>>
>>> I'm not that familiar with the sequence of steps when dealing
>>> with emulation requests from an introspection agent, so I would
>>> hope you could go through those code paths to see where
>>> external interrupts are being checked for. Or wait - isn't your
>>> problem that you invoke emulation out of hvm_do_resume() (via
>>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
>>> which happens after all other "normal" processing of a VM exit?
>>> Perhaps emulation should be skipped there if an event is already
>>> pending injection, as emulation not having started means we still
>>> are on an instruction boundary?
>> Indeed, we are emulating after all the vmexit processing has already
>> happened, in the hvm_do_resume() path you've mentioned.
>>
>> Not emulating if an event is pending injection is certainly trivial to
>> test - however, since then we would have to lose the emulation attempt,
>> presumably the EPT fault event would show up again (a "duplicate" event)
>> in this case (since it is no longer safe to assume that we land on the
>> same RIP after the interrupt has been handled, and thus the next
>> hvm_do_resume() may not happen where we wanted to emulate).
>>
>> I'll test this scenario out.
> 
> The problem is in .Lvmx_do_vmentry.  We call vmx_intr_assist() before
> vmx_process_softirqs()
> 
> Therefore, when the EPT violation occurs and we break for introspection,
> we still process pending interrupts before finally getting desheduled to
> wait for the vm_event reply.  On receiving the vm_event reply, we
> emulate the instruction, and find that an interrupt has already been
> queued from before, but not yet delivered.
> 
> Untangling this going to be complicated.

Would it not be acceptable to try to add a bool flag, e.g.
v->arch.hvm_vcpu.disable_interrupts, that would work similarly to how
v->arch.hvm_vcpu.single_step works in vmx_intr_assist()?

We'd set it to true before sending out an EPT violation event, and then
back to false in hvm_vm_event_do_resume().


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interrupt issues with hvm_emulate_one_vm_event()
  2017-05-26 15:38     ` Jan Beulich
@ 2017-05-29 13:24       ` Razvan Cojocaru
  0 siblings, 0 replies; 11+ messages in thread
From: Razvan Cojocaru @ 2017-05-29 13:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel

On 05/26/17 18:38, Jan Beulich wrote:
>>>> On 26.05.17 at 16:37, <rcojocaru@bitdefender.com> wrote:
>> On 05/26/17 17:29, Jan Beulich wrote:
>>>>>> On 25.05.17 at 11:40, <rcojocaru@bitdefender.com> wrote:
>>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>>
>>> Could you explain what would lead to emulation of UD2?
>>
>> If you mean in which cases does our engine mark pages NX, I'll have to
>> ask and get back to you. If you mean why generally would an UD2 end up
>> being the instruction where RIP causes an execute violation fault, I'll
>> have to check.
> 
> The question was more for the latter, as I don't understand what
> good could come from executing UD2 intentionally, unless the
> entity doing so knows there is an emulator around to do something
> sensible with it.

I owe you an answer here: I've spoken to my introspection engine
colleague Andrei, and they purposefully put an UD2 there to terminate a
malicious process (i.e. the exception is wanted).

I've found this problem while stress-testing Xen 4.9 verifying another
patch, using our in-house user-mode test applications, which simulate
this sort of malicious behaviour.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-29 13:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25  9:40 Interrupt issues with hvm_emulate_one_vm_event() Razvan Cojocaru
2017-05-26 14:29 ` Jan Beulich
2017-05-26 14:37   ` Razvan Cojocaru
2017-05-26 15:38     ` Jan Beulich
2017-05-29 13:24       ` Razvan Cojocaru
2017-05-26 15:11   ` Andrew Cooper
2017-05-29  9:20     ` Razvan Cojocaru
2017-05-29 11:05       ` Jan Beulich
2017-05-29 11:46         ` Razvan Cojocaru
2017-05-29 12:11           ` Andrew Cooper
2017-05-29 12:22             ` Razvan Cojocaru

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.