All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about VPID during MOV-TO-CR3
@ 2016-09-20 17:29 Tamas K Lengyel
  2016-09-21 10:23 ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-20 17:29 UTC (permalink / raw)
  To: xen-devel

Hi all,
I'm trying to figure out the design decision regarding the handling of
guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
(vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
TLB utilization point-of-view this seems to be rather wasteful.
Furthermore, it even breaks the guests' ability to take advantage of
PCID, as the TLB just guts flushed when a new process is scheduled.
Does anyone have an insight into what was the rationale behind this?

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-20 17:29 Question about VPID during MOV-TO-CR3 Tamas K Lengyel
@ 2016-09-21 10:23 ` Jan Beulich
  2016-09-21 14:18   ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-21 10:23 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
> I'm trying to figure out the design decision regarding the handling of
> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
> TLB utilization point-of-view this seems to be rather wasteful.
> Furthermore, it even breaks the guests' ability to take advantage of
> PCID, as the TLB just guts flushed when a new process is scheduled.
> Does anyone have an insight into what was the rationale behind this?

Since you don't quote the specific commit(s), I would guess that
this was mainly an attempt by the author(s) to keep things simple
for themselves, i.e. not having to properly think through under
which conditions less than a full TLB flush would suffice.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 10:23 ` Jan Beulich
@ 2016-09-21 14:18   ` Tamas K Lengyel
  2016-09-21 14:44     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-21 14:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
>> I'm trying to figure out the design decision regarding the handling of
>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
>> TLB utilization point-of-view this seems to be rather wasteful.
>> Furthermore, it even breaks the guests' ability to take advantage of
>> PCID, as the TLB just guts flushed when a new process is scheduled.
>> Does anyone have an insight into what was the rationale behind this?
>
> Since you don't quote the specific commit(s), I would guess that
> this was mainly an attempt by the author(s) to keep things simple
> for themselves, i.e. not having to properly think through under
> which conditions less than a full TLB flush would suffice.

The commit that added VPID and the TLB flush is
e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID
(Virtual Processor Identification). So this has been there as long as
Xen supported VPID. The only case where flushing the TLB on a guest
MOV-TO-CR3 that possibly would make sense to me is if we are running a
PV guest. But this is hvm/vmx, so why would we care about what the
guest does to its cr3 from a TLB standpoint? Wouldn't the guest OS
need be in charge of that? With the TLBs being tagged there is no
side-effect the guest can induce on any other domain whether it
flushes its TLB or not.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 14:18   ` Tamas K Lengyel
@ 2016-09-21 14:44     ` Jan Beulich
  2016-09-21 15:09       ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-21 14:44 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 21.09.16 at 16:18, <tamas.lengyel@zentific.com> wrote:
> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
>>> I'm trying to figure out the design decision regarding the handling of
>>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
>>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
>>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
>>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
>>> TLB utilization point-of-view this seems to be rather wasteful.
>>> Furthermore, it even breaks the guests' ability to take advantage of
>>> PCID, as the TLB just guts flushed when a new process is scheduled.
>>> Does anyone have an insight into what was the rationale behind this?
>>
>> Since you don't quote the specific commit(s), I would guess that
>> this was mainly an attempt by the author(s) to keep things simple
>> for themselves, i.e. not having to properly think through under
>> which conditions less than a full TLB flush would suffice.
> 
> The commit that added VPID and the TLB flush is
> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID
> (Virtual Processor Identification). So this has been there as long as
> Xen supported VPID. The only case where flushing the TLB on a guest
> MOV-TO-CR3 that possibly would make sense to me is if we are running a
> PV guest. But this is hvm/vmx, so why would we care about what the
> guest does to its cr3 from a TLB standpoint?

Are you forgetting that a move to CR3 needs to flush all non-global
TLB entries? Or else, why do you think no flushing needs to happen
at all?

Jan

> Wouldn't the guest OS
> need be in charge of that? With the TLBs being tagged there is no
> side-effect the guest can induce on any other domain whether it
> flushes its TLB or not.
> 
> Tamas




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 14:44     ` Jan Beulich
@ 2016-09-21 15:09       ` Tamas K Lengyel
  2016-09-21 15:16         ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-21 15:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 21.09.16 at 16:18, <tamas.lengyel@zentific.com> wrote:
>> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
>>>> I'm trying to figure out the design decision regarding the handling of
>>>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
>>>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
>>>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
>>>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
>>>> TLB utilization point-of-view this seems to be rather wasteful.
>>>> Furthermore, it even breaks the guests' ability to take advantage of
>>>> PCID, as the TLB just guts flushed when a new process is scheduled.
>>>> Does anyone have an insight into what was the rationale behind this?
>>>
>>> Since you don't quote the specific commit(s), I would guess that
>>> this was mainly an attempt by the author(s) to keep things simple
>>> for themselves, i.e. not having to properly think through under
>>> which conditions less than a full TLB flush would suffice.
>>
>> The commit that added VPID and the TLB flush is
>> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID
>> (Virtual Processor Identification). So this has been there as long as
>> Xen supported VPID. The only case where flushing the TLB on a guest
>> MOV-TO-CR3 that possibly would make sense to me is if we are running a
>> PV guest. But this is hvm/vmx, so why would we care about what the
>> guest does to its cr3 from a TLB standpoint?
>
> Are you forgetting that a move to CR3 needs to flush all non-global
> TLB entries? Or else, why do you think no flushing needs to happen
> at all?
>

The guest can mark entries as global or non-global but it will have no
affect on VPID, every translation is still going to be tagged by VPID
when the translation was triggered in guest-context. So why does Xen
need to jump in flush the TLB when the guest OS likely already done
so? It will render the guest OS's use of PCID optimization useless.
But even if the guest OS didn't flush - for whatever strange reason -
it would have no effect on anything else outside the guest context, so
Xen jumping in and doing this flush is unwarranted AFAICT.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 15:09       ` Tamas K Lengyel
@ 2016-09-21 15:16         ` Tamas K Lengyel
  2016-09-21 15:23           ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-21 15:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel
<tamas.lengyel@zentific.com> wrote:
> On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 21.09.16 at 16:18, <tamas.lengyel@zentific.com> wrote:
>>> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
>>>>> I'm trying to figure out the design decision regarding the handling of
>>>>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
>>>>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
>>>>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
>>>>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
>>>>> TLB utilization point-of-view this seems to be rather wasteful.
>>>>> Furthermore, it even breaks the guests' ability to take advantage of
>>>>> PCID, as the TLB just guts flushed when a new process is scheduled.
>>>>> Does anyone have an insight into what was the rationale behind this?
>>>>
>>>> Since you don't quote the specific commit(s), I would guess that
>>>> this was mainly an attempt by the author(s) to keep things simple
>>>> for themselves, i.e. not having to properly think through under
>>>> which conditions less than a full TLB flush would suffice.
>>>
>>> The commit that added VPID and the TLB flush is
>>> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID
>>> (Virtual Processor Identification). So this has been there as long as
>>> Xen supported VPID. The only case where flushing the TLB on a guest
>>> MOV-TO-CR3 that possibly would make sense to me is if we are running a
>>> PV guest. But this is hvm/vmx, so why would we care about what the
>>> guest does to its cr3 from a TLB standpoint?
>>
>> Are you forgetting that a move to CR3 needs to flush all non-global
>> TLB entries? Or else, why do you think no flushing needs to happen
>> at all?
>>
>
> The guest can mark entries as global or non-global but it will have no
> affect on VPID, every translation is still going to be tagged by VPID
> when the translation was triggered in guest-context. So why does Xen
> need to jump in flush the TLB when the guest OS likely already done
> so? It will render the guest OS's use of PCID optimization useless.
> But even if the guest OS didn't flush - for whatever strange reason -
> it would have no effect on anything else outside the guest context, so
> Xen jumping in and doing this flush is unwarranted AFAICT.
>

Also, Xen flushing on every MOV-TO-CR3 effectively disables the use of
global TLB entries in the guest as well. So both global TLB entries
and TLB entries tagged with PCID are disabled with this flush in
place. That seems to be a bad idea from a performance perspective..

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 15:16         ` Tamas K Lengyel
@ 2016-09-21 15:23           ` Jan Beulich
  2016-09-21 15:30             ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-21 15:23 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 21.09.16 at 17:16, <tamas.lengyel@zentific.com> wrote:
> On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel
> <tamas.lengyel@zentific.com> wrote:
>> On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 21.09.16 at 16:18, <tamas.lengyel@zentific.com> wrote:
>>>> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
>>>>>> I'm trying to figure out the design decision regarding the handling of
>>>>>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
>>>>>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
>>>>>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
>>>>>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
>>>>>> TLB utilization point-of-view this seems to be rather wasteful.
>>>>>> Furthermore, it even breaks the guests' ability to take advantage of
>>>>>> PCID, as the TLB just guts flushed when a new process is scheduled.
>>>>>> Does anyone have an insight into what was the rationale behind this?
>>>>>
>>>>> Since you don't quote the specific commit(s), I would guess that
>>>>> this was mainly an attempt by the author(s) to keep things simple
>>>>> for themselves, i.e. not having to properly think through under
>>>>> which conditions less than a full TLB flush would suffice.
>>>>
>>>> The commit that added VPID and the TLB flush is
>>>> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID
>>>> (Virtual Processor Identification). So this has been there as long as
>>>> Xen supported VPID. The only case where flushing the TLB on a guest
>>>> MOV-TO-CR3 that possibly would make sense to me is if we are running a
>>>> PV guest. But this is hvm/vmx, so why would we care about what the
>>>> guest does to its cr3 from a TLB standpoint?
>>>
>>> Are you forgetting that a move to CR3 needs to flush all non-global
>>> TLB entries? Or else, why do you think no flushing needs to happen
>>> at all?
>>>
>>
>> The guest can mark entries as global or non-global but it will have no
>> affect on VPID, every translation is still going to be tagged by VPID
>> when the translation was triggered in guest-context. So why does Xen
>> need to jump in flush the TLB when the guest OS likely already done
>> so?

Likely? We can't base anything on likelihood (the more that no matter
what flushing may have been done before the CR3 write, further
flushing may be necessary and mustn't be skipped). We need to
provide architecturally correct behavior, and that includes the flushing
of non-global entries. This doesn't mean we need to flush anything
ourselves, but we have to make previously created non-global TLB
entries unavailable.

>> It will render the guest OS's use of PCID optimization useless.
>> But even if the guest OS didn't flush - for whatever strange reason -
>> it would have no effect on anything else outside the guest context, so
>> Xen jumping in and doing this flush is unwarranted AFAICT.
> 
> Also, Xen flushing on every MOV-TO-CR3 effectively disables the use of
> global TLB entries in the guest as well. So both global TLB entries
> and TLB entries tagged with PCID are disabled with this flush in
> place. That seems to be a bad idea from a performance perspective..

I didn't say what gets done right now looks to be optimal.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 15:23           ` Jan Beulich
@ 2016-09-21 15:30             ` Tamas K Lengyel
  2016-09-21 18:26               ` Tamas K Lengyel
  2016-09-22  8:56               ` Jan Beulich
  0 siblings, 2 replies; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-21 15:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, Sep 21, 2016 at 9:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 21.09.16 at 17:16, <tamas.lengyel@zentific.com> wrote:
>> On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel
>> <tamas.lengyel@zentific.com> wrote:
>>> On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 21.09.16 at 16:18, <tamas.lengyel@zentific.com> wrote:
>>>>> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
>>>>>>> I'm trying to figure out the design decision regarding the handling of
>>>>>>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
>>>>>>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
>>>>>>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
>>>>>>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
>>>>>>> TLB utilization point-of-view this seems to be rather wasteful.
>>>>>>> Furthermore, it even breaks the guests' ability to take advantage of
>>>>>>> PCID, as the TLB just guts flushed when a new process is scheduled.
>>>>>>> Does anyone have an insight into what was the rationale behind this?
>>>>>>
>>>>>> Since you don't quote the specific commit(s), I would guess that
>>>>>> this was mainly an attempt by the author(s) to keep things simple
>>>>>> for themselves, i.e. not having to properly think through under
>>>>>> which conditions less than a full TLB flush would suffice.
>>>>>
>>>>> The commit that added VPID and the TLB flush is
>>>>> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID
>>>>> (Virtual Processor Identification). So this has been there as long as
>>>>> Xen supported VPID. The only case where flushing the TLB on a guest
>>>>> MOV-TO-CR3 that possibly would make sense to me is if we are running a
>>>>> PV guest. But this is hvm/vmx, so why would we care about what the
>>>>> guest does to its cr3 from a TLB standpoint?
>>>>
>>>> Are you forgetting that a move to CR3 needs to flush all non-global
>>>> TLB entries? Or else, why do you think no flushing needs to happen
>>>> at all?
>>>>
>>>
>>> The guest can mark entries as global or non-global but it will have no
>>> affect on VPID, every translation is still going to be tagged by VPID
>>> when the translation was triggered in guest-context. So why does Xen
>>> need to jump in flush the TLB when the guest OS likely already done
>>> so?
>
> Likely? We can't base anything on likelihood (the more that no matter
> what flushing may have been done before the CR3 write, further
> flushing may be necessary and mustn't be skipped). We need to
> provide architecturally correct behavior, and that includes the flushing
> of non-global entries. This doesn't mean we need to flush anything
> ourselves, but we have to make previously created non-global TLB
> entries unavailable.

What I'm saying is that the guest OS should be in charge of managing
its own TLB when VPID is in use. Whether it does flush the TLB or not
is not of our concern. If it's a sane OS it will likely flush when it
needs to, but we should not be jumping in and doing it as we do right
now. We are actually breaking the architectural behavior by forcing a
flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware.
Also, there are no non-global TLB entries we need to flush as long as
we are using VPID. Any translation used by Xen or by any other domain
will have a different VPID, so there is no chance of stale TLB entries
being an issue.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 15:30             ` Tamas K Lengyel
@ 2016-09-21 18:26               ` Tamas K Lengyel
  2016-09-22  9:00                 ` Jan Beulich
  2016-09-22  8:56               ` Jan Beulich
  1 sibling, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-21 18:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, Sep 21, 2016 at 9:30 AM, Tamas K Lengyel
<tamas.lengyel@zentific.com> wrote:
> On Wed, Sep 21, 2016 at 9:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 21.09.16 at 17:16, <tamas.lengyel@zentific.com> wrote:
>>> On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel
>>> <tamas.lengyel@zentific.com> wrote:
>>>> On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 21.09.16 at 16:18, <tamas.lengyel@zentific.com> wrote:
>>>>>> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>>> On 20.09.16 at 19:29, <tamas.lengyel@zentific.com> wrote:
>>>>>>>> I'm trying to figure out the design decision regarding the handling of
>>>>>>>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for
>>>>>>>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB
>>>>>>>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 ->
>>>>>>>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a
>>>>>>>> TLB utilization point-of-view this seems to be rather wasteful.
>>>>>>>> Furthermore, it even breaks the guests' ability to take advantage of
>>>>>>>> PCID, as the TLB just guts flushed when a new process is scheduled.
>>>>>>>> Does anyone have an insight into what was the rationale behind this?
>>>>>>>
>>>>>>> Since you don't quote the specific commit(s), I would guess that
>>>>>>> this was mainly an attempt by the author(s) to keep things simple
>>>>>>> for themselves, i.e. not having to properly think through under
>>>>>>> which conditions less than a full TLB flush would suffice.
>>>>>>
>>>>>> The commit that added VPID and the TLB flush is
>>>>>> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID
>>>>>> (Virtual Processor Identification). So this has been there as long as
>>>>>> Xen supported VPID. The only case where flushing the TLB on a guest
>>>>>> MOV-TO-CR3 that possibly would make sense to me is if we are running a
>>>>>> PV guest. But this is hvm/vmx, so why would we care about what the
>>>>>> guest does to its cr3 from a TLB standpoint?
>>>>>
>>>>> Are you forgetting that a move to CR3 needs to flush all non-global
>>>>> TLB entries? Or else, why do you think no flushing needs to happen
>>>>> at all?
>>>>>
>>>>
>>>> The guest can mark entries as global or non-global but it will have no
>>>> affect on VPID, every translation is still going to be tagged by VPID
>>>> when the translation was triggered in guest-context. So why does Xen
>>>> need to jump in flush the TLB when the guest OS likely already done
>>>> so?
>>
>> Likely? We can't base anything on likelihood (the more that no matter
>> what flushing may have been done before the CR3 write, further
>> flushing may be necessary and mustn't be skipped). We need to
>> provide architecturally correct behavior, and that includes the flushing
>> of non-global entries. This doesn't mean we need to flush anything
>> ourselves, but we have to make previously created non-global TLB
>> entries unavailable.
>
> What I'm saying is that the guest OS should be in charge of managing
> its own TLB when VPID is in use. Whether it does flush the TLB or not
> is not of our concern. If it's a sane OS it will likely flush when it
> needs to, but we should not be jumping in and doing it as we do right
> now. We are actually breaking the architectural behavior by forcing a
> flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware.
> Also, there are no non-global TLB entries we need to flush as long as
> we are using VPID. Any translation used by Xen or by any other domain
> will have a different VPID, so there is no chance of stale TLB entries
> being an issue.
>

So reading through the Intel SDM the following bits are relevant here:

28.3.3.1
Operations that Invalidate Cached Mappings
The following operations invalidate cached mappings as indicated:
• Operations that architecturally invalidate entries in the TLBs or
paging-structure caches independent of VMX
operation (e.g., the INVLPG and INVPCID instructions) invalidate
linear mappings and combined mappings. 1
They are required to do so only for the current VPID (but, for
combined mappings, all EP4TAs). Linear
mappings for the current VPID are invalidated even if EPT is in use. 2
Combined mappings for the current
VPID are invalidated even if EPT is not in use.

To me this reads that the CPU will automatically handle the TLB
flushing for all operations that would normally do so when running
without a hypervisor, but only within the context of the VPID. While
it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this
same category regarding non-global TLB entries that would be flushed
by it. Thus, there is no need for the VMM to step in do anything in
this regard if my interpretation is correct.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 15:30             ` Tamas K Lengyel
  2016-09-21 18:26               ` Tamas K Lengyel
@ 2016-09-22  8:56               ` Jan Beulich
  2016-09-22 10:35                 ` Tamas K Lengyel
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-22  8:56 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 21.09.16 at 17:30, <tamas.lengyel@zentific.com> wrote:
> What I'm saying is that the guest OS should be in charge of managing
> its own TLB when VPID is in use. Whether it does flush the TLB or not
> is not of our concern. If it's a sane OS it will likely flush when it
> needs to, but we should not be jumping in and doing it as we do right
> now. We are actually breaking the architectural behavior by forcing a
> flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware.

I continue to not understand where you take this from. Writes to
CR3 have always been doing TLB flushes - full ones prior to the
introduction of global pages, and flushes of only non-global entries
nowadays. In fact prior to the introduction of INVLPG and CR4
there was no other way to flush TLBs.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-21 18:26               ` Tamas K Lengyel
@ 2016-09-22  9:00                 ` Jan Beulich
  2016-09-22 10:39                   ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-22  9:00 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 21.09.16 at 20:26, <tamas.lengyel@zentific.com> wrote:
> So reading through the Intel SDM the following bits are relevant here:
> 
> 28.3.3.1
> Operations that Invalidate Cached Mappings
> The following operations invalidate cached mappings as indicated:
> ● Operations that architecturally invalidate entries in the TLBs or
> paging-structure caches independent of VMX
> operation (e.g., the INVLPG and INVPCID instructions) invalidate
> linear mappings and combined mappings. 1
> They are required to do so only for the current VPID (but, for
> combined mappings, all EP4TAs). Linear
> mappings for the current VPID are invalidated even if EPT is in use. 2
> Combined mappings for the current
> VPID are invalidated even if EPT is not in use.
> 
> To me this reads that the CPU will automatically handle the TLB
> flushing for all operations that would normally do so when running
> without a hypervisor, but only within the context of the VPID. While
> it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this
> same category regarding non-global TLB entries that would be flushed
> by it. Thus, there is no need for the VMM to step in do anything in
> this regard if my interpretation is correct.

Well, that would be true if a CR3 write intercept meant the CPU
first does its job, and only then invokes the hypervisor. Such
intercepts, however, get invoked before the CPU starts doing
anything the instruction would require to be done (except for
a few exception checks, like CPL). Hence the hypervisor has to
do everything the CPU would normally do on its own.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-22  8:56               ` Jan Beulich
@ 2016-09-22 10:35                 ` Tamas K Lengyel
  2016-09-22 11:27                   ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-22 10:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 993 bytes --]

On Sep 22, 2016 02:56, "Jan Beulich" <JBeulich@suse.com> wrote:
>
> >>> On 21.09.16 at 17:30, <tamas.lengyel@zentific.com> wrote:
> > What I'm saying is that the guest OS should be in charge of managing
> > its own TLB when VPID is in use. Whether it does flush the TLB or not
> > is not of our concern. If it's a sane OS it will likely flush when it
> > needs to, but we should not be jumping in and doing it as we do right
> > now. We are actually breaking the architectural behavior by forcing a
> > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware.
>
> I continue to not understand where you take this from. Writes to
> CR3 have always been doing TLB flushes - full ones prior to the
> introduction of global pages, and flushes of only non-global entries
> nowadays. In fact prior to the introduction of INVLPG and CR4
> there was no other way to flush TLBs.
>

Yes, I meant it doesn't completely flush the TLB as we do right now when
invalidating the whole VPID.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 1317 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-22  9:00                 ` Jan Beulich
@ 2016-09-22 10:39                   ` Tamas K Lengyel
  2016-09-22 11:35                     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-22 10:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2004 bytes --]

On Sep 22, 2016 03:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>
> >>> On 21.09.16 at 20:26, <tamas.lengyel@zentific.com> wrote:
> > So reading through the Intel SDM the following bits are relevant here:
> >
> > 28.3.3.1
> > Operations that Invalidate Cached Mappings
> > The following operations invalidate cached mappings as indicated:
> > ● Operations that architecturally invalidate entries in the TLBs or
> > paging-structure caches independent of VMX
> > operation (e.g., the INVLPG and INVPCID instructions) invalidate
> > linear mappings and combined mappings. 1
> > They are required to do so only for the current VPID (but, for
> > combined mappings, all EP4TAs). Linear
> > mappings for the current VPID are invalidated even if EPT is in use. 2
> > Combined mappings for the current
> > VPID are invalidated even if EPT is not in use.
> >
> > To me this reads that the CPU will automatically handle the TLB
> > flushing for all operations that would normally do so when running
> > without a hypervisor, but only within the context of the VPID. While
> > it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this
> > same category regarding non-global TLB entries that would be flushed
> > by it. Thus, there is no need for the VMM to step in do anything in
> > this regard if my interpretation is correct.
>
> Well, that would be true if a CR3 write intercept meant the CPU
> first does its job, and only then invokes the hypervisor. Such
> intercepts, however, get invoked before the CPU starts doing
> anything the instruction would require to be done (except for
> a few exception checks, like CPL). Hence the hypervisor has to
> do everything the CPU would normally do on its own.
>

Has that been verified though? The SDM doesn't mention that cpu-based load
exiting would alter the TLB operations the CPU would otherwise perform. So
while I could see this actually being the case I can't find anything
officially saying this.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 2491 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-22 10:35                 ` Tamas K Lengyel
@ 2016-09-22 11:27                   ` Jan Beulich
  2016-09-22 11:37                     ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-22 11:27 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 22.09.16 at 12:35, <tamas.lengyel@zentific.com> wrote:
> On Sep 22, 2016 02:56, "Jan Beulich" <JBeulich@suse.com> wrote:
>>
>> >>> On 21.09.16 at 17:30, <tamas.lengyel@zentific.com> wrote:
>> > What I'm saying is that the guest OS should be in charge of managing
>> > its own TLB when VPID is in use. Whether it does flush the TLB or not
>> > is not of our concern. If it's a sane OS it will likely flush when it
>> > needs to, but we should not be jumping in and doing it as we do right
>> > now. We are actually breaking the architectural behavior by forcing a
>> > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware.
>>
>> I continue to not understand where you take this from. Writes to
>> CR3 have always been doing TLB flushes - full ones prior to the
>> introduction of global pages, and flushes of only non-global entries
>> nowadays. In fact prior to the introduction of INVLPG and CR4
>> there was no other way to flush TLBs.
> 
> Yes, I meant it doesn't completely flush the TLB as we do right now when
> invalidating the whole VPID.

But then what architectural behavior do you see broken? Flushing
more than is required is always permitted. (And again - I'm all for
improvements here, we just need to be careful to not remove
flushing that is architecturally required.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-22 10:39                   ` Tamas K Lengyel
@ 2016-09-22 11:35                     ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2016-09-22 11:35 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 22.09.16 at 12:39, <tamas.lengyel@zentific.com> wrote:
> On Sep 22, 2016 03:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>>
>> >>> On 21.09.16 at 20:26, <tamas.lengyel@zentific.com> wrote:
>> > So reading through the Intel SDM the following bits are relevant here:
>> >
>> > 28.3.3.1
>> > Operations that Invalidate Cached Mappings
>> > The following operations invalidate cached mappings as indicated:
>> > ● Operations that architecturally invalidate entries in the TLBs or
>> > paging-structure caches independent of VMX
>> > operation (e.g., the INVLPG and INVPCID instructions) invalidate
>> > linear mappings and combined mappings. 1
>> > They are required to do so only for the current VPID (but, for
>> > combined mappings, all EP4TAs). Linear
>> > mappings for the current VPID are invalidated even if EPT is in use. 2
>> > Combined mappings for the current
>> > VPID are invalidated even if EPT is not in use.
>> >
>> > To me this reads that the CPU will automatically handle the TLB
>> > flushing for all operations that would normally do so when running
>> > without a hypervisor, but only within the context of the VPID. While
>> > it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this
>> > same category regarding non-global TLB entries that would be flushed
>> > by it. Thus, there is no need for the VMM to step in do anything in
>> > this regard if my interpretation is correct.
>>
>> Well, that would be true if a CR3 write intercept meant the CPU
>> first does its job, and only then invokes the hypervisor. Such
>> intercepts, however, get invoked before the CPU starts doing
>> anything the instruction would require to be done (except for
>> a few exception checks, like CPL). Hence the hypervisor has to
>> do everything the CPU would normally do on its own.
> 
> Has that been verified though? The SDM doesn't mention that cpu-based load
> exiting would alter the TLB operations the CPU would otherwise perform. So
> while I could see this actually being the case I can't find anything
> officially saying this.

Well, it is the purpose of all VM exits to let the VMM customize
behavior instead of letting the CPU do its default operations.
See AMD's PM Vol 2 "Instruction Intercepts" section and Intel's
SDM Vol 3 "Instructions that cause VM exits" section.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-22 11:27                   ` Jan Beulich
@ 2016-09-22 11:37                     ` Tamas K Lengyel
  2016-09-22 17:18                       ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-22 11:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1976 bytes --]

On Sep 22, 2016 05:27, "Jan Beulich" <JBeulich@suse.com> wrote:
>
> >>> On 22.09.16 at 12:35, <tamas.lengyel@zentific.com> wrote:
> > On Sep 22, 2016 02:56, "Jan Beulich" <JBeulich@suse.com> wrote:
> >>
> >> >>> On 21.09.16 at 17:30, <tamas.lengyel@zentific.com> wrote:
> >> > What I'm saying is that the guest OS should be in charge of managing
> >> > its own TLB when VPID is in use. Whether it does flush the TLB or not
> >> > is not of our concern. If it's a sane OS it will likely flush when it
> >> > needs to, but we should not be jumping in and doing it as we do right
> >> > now. We are actually breaking the architectural behavior by forcing a
> >> > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware.
> >>
> >> I continue to not understand where you take this from. Writes to
> >> CR3 have always been doing TLB flushes - full ones prior to the
> >> introduction of global pages, and flushes of only non-global entries
> >> nowadays. In fact prior to the introduction of INVLPG and CR4
> >> there was no other way to flush TLBs.
> >
> > Yes, I meant it doesn't completely flush the TLB as we do right now when
> > invalidating the whole VPID.
>
> But then what architectural behavior do you see broken? Flushing
> more than is required is always permitted. (And again - I'm all for
> improvements here, we just need to be careful to not remove
> flushing that is architecturally required.)
>

Global pages and PCID both are effectively disabled by this flush. And yes
flushing more then the minimum necessary is permitted, but this seems
rather excessive. It won't break (sane) applications but would slow things
down for ones that optimize TLB usage. I'll do an experiment to check your
hypothesis about no TLB flush being performed by the CPU if cpu-based load
exiting is enabled. Should be rather easy to break applications that use
the same virtual address if this is the case and we don't flush in Xen.
Will report back on the results.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 2637 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-22 11:37                     ` Tamas K Lengyel
@ 2016-09-22 17:18                       ` Tamas K Lengyel
  2016-09-23  8:24                         ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-22 17:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Thu, Sep 22, 2016 at 5:37 AM, Tamas K Lengyel
<tamas.lengyel@zentific.com> wrote:
> On Sep 22, 2016 05:27, "Jan Beulich" <JBeulich@suse.com> wrote:
>>
>> >>> On 22.09.16 at 12:35, <tamas.lengyel@zentific.com> wrote:
>> > On Sep 22, 2016 02:56, "Jan Beulich" <JBeulich@suse.com> wrote:
>> >>
>> >> >>> On 21.09.16 at 17:30, <tamas.lengyel@zentific.com> wrote:
>> >> > What I'm saying is that the guest OS should be in charge of managing
>> >> > its own TLB when VPID is in use. Whether it does flush the TLB or not
>> >> > is not of our concern. If it's a sane OS it will likely flush when it
>> >> > needs to, but we should not be jumping in and doing it as we do right
>> >> > now. We are actually breaking the architectural behavior by forcing a
>> >> > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware.
>> >>
>> >> I continue to not understand where you take this from. Writes to
>> >> CR3 have always been doing TLB flushes - full ones prior to the
>> >> introduction of global pages, and flushes of only non-global entries
>> >> nowadays. In fact prior to the introduction of INVLPG and CR4
>> >> there was no other way to flush TLBs.
>> >
>> > Yes, I meant it doesn't completely flush the TLB as we do right now when
>> > invalidating the whole VPID.
>>
>> But then what architectural behavior do you see broken? Flushing
>> more than is required is always permitted. (And again - I'm all for
>> improvements here, we just need to be careful to not remove
>> flushing that is architecturally required.)
>>
>
> Global pages and PCID both are effectively disabled by this flush. And yes
> flushing more then the minimum necessary is permitted, but this seems rather
> excessive. It won't break (sane) applications but would slow things down for
> ones that optimize TLB usage. I'll do an experiment to check your hypothesis
> about no TLB flush being performed by the CPU if cpu-based load exiting is
> enabled. Should be rather easy to break applications that use the same
> virtual address if this is the case and we don't flush in Xen. Will report
> back on the results.
>

So I verified that when CPU-based load exiting is enabled, the TLB
flush here is critical. Without it the guest kernel crashes at random
points during boot. OTOH why does Xen trap every guest CR3 update
unconditionally? While we have features such as the vm_event/monitor
that may choose to subscribe to that event, Xen traps it even when
that is not in use. Is that trapping necessary for something else?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-22 17:18                       ` Tamas K Lengyel
@ 2016-09-23  8:24                         ` Jan Beulich
  2016-09-23  8:35                           ` Razvan Cojocaru
  2016-09-23 15:26                           ` Tamas K Lengyel
  0 siblings, 2 replies; 34+ messages in thread
From: Jan Beulich @ 2016-09-23  8:24 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
> So I verified that when CPU-based load exiting is enabled, the TLB
> flush here is critical. Without it the guest kernel crashes at random
> points during boot. OTOH why does Xen trap every guest CR3 update
> unconditionally? While we have features such as the vm_event/monitor
> that may choose to subscribe to that event, Xen traps it even when
> that is not in use. Is that trapping necessary for something else?

Where do you see this being unconditional? construct_vmcs()
clearly avoids setting these intercepts when using EPT. Are you
perhaps suffering from

            /* Trap CR3 updates if CR3 memory events are enabled. */
            if ( v->domain->arch.monitor.write_ctrlreg_enabled &
                 monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
                v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;

in vmx_update_guest_cr()? That'll be rather something for you
or Razvan to explain. Outside of nested VMX I don't see any
other enabling of that intercept (didn't check AMD code on the
assumption that you're working on Intel hardware).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-23  8:24                         ` Jan Beulich
@ 2016-09-23  8:35                           ` Razvan Cojocaru
  2016-09-23 15:26                           ` Tamas K Lengyel
  1 sibling, 0 replies; 34+ messages in thread
From: Razvan Cojocaru @ 2016-09-23  8:35 UTC (permalink / raw)
  To: Jan Beulich, Tamas K Lengyel; +Cc: xen-devel

On 09/23/16 11:24, Jan Beulich wrote:
>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>> So I verified that when CPU-based load exiting is enabled, the TLB
>> flush here is critical. Without it the guest kernel crashes at random
>> points during boot. OTOH why does Xen trap every guest CR3 update
>> unconditionally? While we have features such as the vm_event/monitor
>> that may choose to subscribe to that event, Xen traps it even when
>> that is not in use. Is that trapping necessary for something else?
> 
> Where do you see this being unconditional? construct_vmcs()
> clearly avoids setting these intercepts when using EPT. Are you
> perhaps suffering from
> 
>             /* Trap CR3 updates if CR3 memory events are enabled. */
>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
> 
> in vmx_update_guest_cr()? That'll be rather something for you
> or Razvan to explain. Outside of nested VMX I don't see any
> other enabling of that intercept (didn't check AMD code on the
> assumption that you're working on Intel hardware).

I did touch that line, but that was mostly a mechanical change in commit
712bdd01:

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 74f563f..af257db 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -57,6 +57,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/event.h>
+#include <asm/monitor.h>
 #include <public/arch-x86/cpuid.h>

 static bool_t __initdata opt_force_ept;
@@ -1262,7 +1263,8 @@ static void vmx_update_guest_cr(struct vcpu *v,
unsigned int cr)
                 v->arch.hvm_vmx.exec_control |= cr3_ctls;

             /* Trap CR3 updates if CR3 memory events are enabled. */
-            if ( v->domain->arch.monitor.mov_to_cr3_enabled )
+            if ( v->domain->arch.monitor.write_ctrlreg_enabled &
+                 monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;

             vmx_update_cpu_exec_control(v);
@@ -2010,7 +2012,7 @@ static int vmx_cr_access(unsigned long
exit_qualification)
         unsigned long old = curr->arch.hvm_vcpu.guest_cr[0];
         curr->arch.hvm_vcpu.guest_cr[0] &= ~X86_CR0_TS;
         vmx_update_guest_cr(curr, 0);
-        hvm_event_cr0(curr->arch.hvm_vcpu.guest_cr[0], old);
+        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
         HVMTRACE_0D(CLTS);
         break;
     }

The basic logic has remained untouched. The logic has been added in
commit df402bb9, by Joe Epstein. It's of course open to debate.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-23  8:24                         ` Jan Beulich
  2016-09-23  8:35                           ` Razvan Cojocaru
@ 2016-09-23 15:26                           ` Tamas K Lengyel
  2016-09-23 15:39                             ` Jan Beulich
  1 sibling, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-23 15:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>> So I verified that when CPU-based load exiting is enabled, the TLB
>> flush here is critical. Without it the guest kernel crashes at random
>> points during boot. OTOH why does Xen trap every guest CR3 update
>> unconditionally? While we have features such as the vm_event/monitor
>> that may choose to subscribe to that event, Xen traps it even when
>> that is not in use. Is that trapping necessary for something else?
>
> Where do you see this being unconditional? construct_vmcs()
> clearly avoids setting these intercepts when using EPT. Are you
> perhaps suffering from
>
>             /* Trap CR3 updates if CR3 memory events are enabled. */
>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>
> in vmx_update_guest_cr()? That'll be rather something for you
> or Razvan to explain. Outside of nested VMX I don't see any
> other enabling of that intercept (didn't check AMD code on the
> assumption that you're working on Intel hardware).

So there seems to be two separate paths that lead to the TLB flushing.
One is indeed the above case you cited when we enable CR3 monitoring
through the monitor interface. However, during domain boot I also see
this path being called that is not related to the
CPU_BASED_CR3_LOAD_EXITING:

(XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
(XEN) hap.c:701:d1v0 HAP update cr3 called
(XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called
(XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000

This path seems to de-activate once the domain is fully booted. So at
this point I'm still not sure if the CPU-based load exiting needs the
flush or not, as I couldn't get the domain to boot when the flush was
simply removed, as this other path does seem to require it. I'll do an
experiment with the tlb flush only happening if the monitor interface
for this is not enabled and see what happens.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-23 15:26                           ` Tamas K Lengyel
@ 2016-09-23 15:39                             ` Jan Beulich
  2016-09-23 15:50                               ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-23 15:39 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 23.09.16 at 17:26, <tamas.lengyel@zentific.com> wrote:
> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>> flush here is critical. Without it the guest kernel crashes at random
>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>> unconditionally? While we have features such as the vm_event/monitor
>>> that may choose to subscribe to that event, Xen traps it even when
>>> that is not in use. Is that trapping necessary for something else?
>>
>> Where do you see this being unconditional? construct_vmcs()
>> clearly avoids setting these intercepts when using EPT. Are you
>> perhaps suffering from
>>
>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>
>> in vmx_update_guest_cr()? That'll be rather something for you
>> or Razvan to explain. Outside of nested VMX I don't see any
>> other enabling of that intercept (didn't check AMD code on the
>> assumption that you're working on Intel hardware).
> 
> So there seems to be two separate paths that lead to the TLB flushing.
> One is indeed the above case you cited when we enable CR3 monitoring
> through the monitor interface. However, during domain boot I also see
> this path being called that is not related to the
> CPU_BASED_CR3_LOAD_EXITING:
> 
> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
> (XEN) hap.c:701:d1v0 HAP update cr3 called
> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called
> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
> 
> This path seems to de-activate once the domain is fully booted.

This late? According to the CR0 handling in
vmx_update_guest_cr() I would understand it to be enabled only
while the guest is still in real mode (and even then only on old
hardware, i.e. without the Unrestricted Guest functionality).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-23 15:39                             ` Jan Beulich
@ 2016-09-23 15:50                               ` Tamas K Lengyel
  2016-09-23 20:45                                 ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-23 15:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 23.09.16 at 17:26, <tamas.lengyel@zentific.com> wrote:
>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>>> flush here is critical. Without it the guest kernel crashes at random
>>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>>> unconditionally? While we have features such as the vm_event/monitor
>>>> that may choose to subscribe to that event, Xen traps it even when
>>>> that is not in use. Is that trapping necessary for something else?
>>>
>>> Where do you see this being unconditional? construct_vmcs()
>>> clearly avoids setting these intercepts when using EPT. Are you
>>> perhaps suffering from
>>>
>>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>
>>> in vmx_update_guest_cr()? That'll be rather something for you
>>> or Razvan to explain. Outside of nested VMX I don't see any
>>> other enabling of that intercept (didn't check AMD code on the
>>> assumption that you're working on Intel hardware).
>>
>> So there seems to be two separate paths that lead to the TLB flushing.
>> One is indeed the above case you cited when we enable CR3 monitoring
>> through the monitor interface. However, during domain boot I also see
>> this path being called that is not related to the
>> CPU_BASED_CR3_LOAD_EXITING:
>>
>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
>> (XEN) hap.c:701:d1v0 HAP update cr3 called
>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called
>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
>>
>> This path seems to de-activate once the domain is fully booted.
>
> This late? According to the CR0 handling in
> vmx_update_guest_cr() I would understand it to be enabled only
> while the guest is still in real mode (and even then only on old
> hardware, i.e. without the Unrestricted Guest functionality).
>

Right, with unrestricted guest support I would assume none of this
would get called - but it does, and quite frequently during domain
boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-23 15:50                               ` Tamas K Lengyel
@ 2016-09-23 20:45                                 ` Tamas K Lengyel
  2016-09-26  6:24                                   ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-23 20:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel
<tamas.lengyel@zentific.com> wrote:
> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 23.09.16 at 17:26, <tamas.lengyel@zentific.com> wrote:
>>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>>>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>>>> flush here is critical. Without it the guest kernel crashes at random
>>>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>>>> unconditionally? While we have features such as the vm_event/monitor
>>>>> that may choose to subscribe to that event, Xen traps it even when
>>>>> that is not in use. Is that trapping necessary for something else?
>>>>
>>>> Where do you see this being unconditional? construct_vmcs()
>>>> clearly avoids setting these intercepts when using EPT. Are you
>>>> perhaps suffering from
>>>>
>>>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>>>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>>
>>>> in vmx_update_guest_cr()? That'll be rather something for you
>>>> or Razvan to explain. Outside of nested VMX I don't see any
>>>> other enabling of that intercept (didn't check AMD code on the
>>>> assumption that you're working on Intel hardware).
>>>
>>> So there seems to be two separate paths that lead to the TLB flushing.
>>> One is indeed the above case you cited when we enable CR3 monitoring
>>> through the monitor interface. However, during domain boot I also see
>>> this path being called that is not related to the
>>> CPU_BASED_CR3_LOAD_EXITING:
>>>
>>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
>>> (XEN) hap.c:701:d1v0 HAP update cr3 called
>>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called
>>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
>>>
>>> This path seems to de-activate once the domain is fully booted.
>>
>> This late? According to the CR0 handling in
>> vmx_update_guest_cr() I would understand it to be enabled only
>> while the guest is still in real mode (and even then only on old
>> hardware, i.e. without the Unrestricted Guest functionality).
>>
>
> Right, with unrestricted guest support I would assume none of this
> would get called - but it does, and quite frequently during domain
> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430.
>

So I experimented with selectively disabling the flushing such that
it's done only when coming from a path other then CPU-based CR3 load
exiting. I've added a bool to struct vcpu that gets set to 0 every
time vmx_vmexit_handler is called, and only gets set to 1 when
vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr
the flush only happens as such:

        if ( !v->movtocr3 )
            hvm_asid_flush_vcpu(v);

In the guest I run a test application that allocates a page at a fixed
VA, writes a magic value to it, and then keeps spinning on reading the
magic value back from the page, checking if it's the same as
originally supplied. I lunch this application twice with different
magic values, so that if the TLB invalidation is an issue one of the
test applications would read back the wrong magic value from the VA
using a stale TLB entry. I've verified that same VA in the two
applications point to different pages and that those PTEs are not
marked global and no PCID is used.

[  724] test (struct addr:ffff88003730f330). PGD: 0x3731f000
VADDR 0x5000000 -> PADDR 0x73e35000. Global page: 0
[  727] test (struct addr:ffff88003681ea20). PGD: 0x777a6000
VADDR 0x5000000 -> PADDR 0x75043000. Global page: 0

Both applications work as expected without the VPID flushing taking
place. So at least for CPU-based CR3 load exiting it seems that this
flush is not necessary. As for why this path gets called during domain
boot when the CPU supports Unrestricted Guest mode and it is properly
detecting when Xen boots, I'm not sure. However, as we use CPU-based
CR3 load exiting quite often when doing VMI, I would prefer to disable
this flushing at least for this case. Any thoughts?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-23 20:45                                 ` Tamas K Lengyel
@ 2016-09-26  6:24                                   ` Jan Beulich
  2016-09-26 16:12                                     ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-26  6:24 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 23.09.16 at 22:45, <tamas.lengyel@zentific.com> wrote:
> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel
> <tamas.lengyel@zentific.com> wrote:
>> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 23.09.16 at 17:26, <tamas.lengyel@zentific.com> wrote:
>>>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>>>>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>>>>> flush here is critical. Without it the guest kernel crashes at random
>>>>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>>>>> unconditionally? While we have features such as the vm_event/monitor
>>>>>> that may choose to subscribe to that event, Xen traps it even when
>>>>>> that is not in use. Is that trapping necessary for something else?
>>>>>
>>>>> Where do you see this being unconditional? construct_vmcs()
>>>>> clearly avoids setting these intercepts when using EPT. Are you
>>>>> perhaps suffering from
>>>>>
>>>>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>>>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>>>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>>>>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>>>
>>>>> in vmx_update_guest_cr()? That'll be rather something for you
>>>>> or Razvan to explain. Outside of nested VMX I don't see any
>>>>> other enabling of that intercept (didn't check AMD code on the
>>>>> assumption that you're working on Intel hardware).
>>>>
>>>> So there seems to be two separate paths that lead to the TLB flushing.
>>>> One is indeed the above case you cited when we enable CR3 monitoring
>>>> through the monitor interface. However, during domain boot I also see
>>>> this path being called that is not related to the
>>>> CPU_BASED_CR3_LOAD_EXITING:
>>>>
>>>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
>>>> (XEN) hap.c:701:d1v0 HAP update cr3 called
>>>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 
> called
>>>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
>>>>
>>>> This path seems to de-activate once the domain is fully booted.
>>>
>>> This late? According to the CR0 handling in
>>> vmx_update_guest_cr() I would understand it to be enabled only
>>> while the guest is still in real mode (and even then only on old
>>> hardware, i.e. without the Unrestricted Guest functionality).
>>>
>>
>> Right, with unrestricted guest support I would assume none of this
>> would get called - but it does, and quite frequently during domain
>> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430.
>>
> 
> So I experimented with selectively disabling the flushing such that
> it's done only when coming from a path other then CPU-based CR3 load
> exiting. I've added a bool to struct vcpu that gets set to 0 every
> time vmx_vmexit_handler is called, and only gets set to 1 when
> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr
> the flush only happens as such:
> 
>         if ( !v->movtocr3 )
>             hvm_asid_flush_vcpu(v);
> 
> In the guest I run a test application that allocates a page at a fixed
> VA, writes a magic value to it, and then keeps spinning on reading the
> magic value back from the page, checking if it's the same as
> originally supplied. I lunch this application twice with different
> magic values, so that if the TLB invalidation is an issue one of the
> test applications would read back the wrong magic value from the VA
> using a stale TLB entry. I've verified that same VA in the two
> applications point to different pages and that those PTEs are not
> marked global and no PCID is used.
> 
> [  724] test (struct addr:ffff88003730f330). PGD: 0x3731f000
> VADDR 0x5000000 -> PADDR 0x73e35000. Global page: 0
> [  727] test (struct addr:ffff88003681ea20). PGD: 0x777a6000
> VADDR 0x5000000 -> PADDR 0x75043000. Global page: 0

I'm surprised. As said before - a mov-to-CR3 cannot be emulated
without a minimal amount of flushing. No experiments whatsoever
are suitable to prove the contrary.

> Both applications work as expected without the VPID flushing taking
> place. So at least for CPU-based CR3 load exiting it seems that this
> flush is not necessary. As for why this path gets called during domain
> boot when the CPU supports Unrestricted Guest mode and it is properly
> detecting when Xen boots, I'm not sure. However, as we use CPU-based
> CR3 load exiting quite often when doing VMI, I would prefer to disable
> this flushing at least for this case. Any thoughts?

As said before - you'd better direct this question to the VMX
maintainers, and even better would be to first understand why
the intercept remains enabled in the first place. After all it's
quite obvious that most improvement can be expected from not
enabling it at all, whenever possible. Only if it needs to stay
enabled over extended periods of a guest's lifetime it would then
become interesting to see whether the emulation path can be
improved.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-26  6:24                                   ` Jan Beulich
@ 2016-09-26 16:12                                     ` Tamas K Lengyel
  2016-09-27 13:49                                       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-09-26 16:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tian, Kevin, Jun Nakajima

On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 23.09.16 at 22:45, <tamas.lengyel@zentific.com> wrote:
>> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel
>> <tamas.lengyel@zentific.com> wrote:
>>> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 23.09.16 at 17:26, <tamas.lengyel@zentific.com> wrote:
>>>>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>>>>>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>>>>>> flush here is critical. Without it the guest kernel crashes at random
>>>>>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>>>>>> unconditionally? While we have features such as the vm_event/monitor
>>>>>>> that may choose to subscribe to that event, Xen traps it even when
>>>>>>> that is not in use. Is that trapping necessary for something else?
>>>>>>
>>>>>> Where do you see this being unconditional? construct_vmcs()
>>>>>> clearly avoids setting these intercepts when using EPT. Are you
>>>>>> perhaps suffering from
>>>>>>
>>>>>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>>>>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>>>>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>>>>>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>>>>
>>>>>> in vmx_update_guest_cr()? That'll be rather something for you
>>>>>> or Razvan to explain. Outside of nested VMX I don't see any
>>>>>> other enabling of that intercept (didn't check AMD code on the
>>>>>> assumption that you're working on Intel hardware).
>>>>>
>>>>> So there seems to be two separate paths that lead to the TLB flushing.
>>>>> One is indeed the above case you cited when we enable CR3 monitoring
>>>>> through the monitor interface. However, during domain boot I also see
>>>>> this path being called that is not related to the
>>>>> CPU_BASED_CR3_LOAD_EXITING:
>>>>>
>>>>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
>>>>> (XEN) hap.c:701:d1v0 HAP update cr3 called
>>>>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3
>> called
>>>>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
>>>>>
>>>>> This path seems to de-activate once the domain is fully booted.
>>>>
>>>> This late? According to the CR0 handling in
>>>> vmx_update_guest_cr() I would understand it to be enabled only
>>>> while the guest is still in real mode (and even then only on old
>>>> hardware, i.e. without the Unrestricted Guest functionality).
>>>>
>>>
>>> Right, with unrestricted guest support I would assume none of this
>>> would get called - but it does, and quite frequently during domain
>>> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430.
>>>
>>
>> So I experimented with selectively disabling the flushing such that
>> it's done only when coming from a path other then CPU-based CR3 load
>> exiting. I've added a bool to struct vcpu that gets set to 0 every
>> time vmx_vmexit_handler is called, and only gets set to 1 when
>> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr
>> the flush only happens as such:
>>
>>         if ( !v->movtocr3 )
>>             hvm_asid_flush_vcpu(v);
>>
>> In the guest I run a test application that allocates a page at a fixed
>> VA, writes a magic value to it, and then keeps spinning on reading the
>> magic value back from the page, checking if it's the same as
>> originally supplied. I lunch this application twice with different
>> magic values, so that if the TLB invalidation is an issue one of the
>> test applications would read back the wrong magic value from the VA
>> using a stale TLB entry. I've verified that same VA in the two
>> applications point to different pages and that those PTEs are not
>> marked global and no PCID is used.
>>
>> [  724] test (struct addr:ffff88003730f330). PGD: 0x3731f000
>> VADDR 0x5000000 -> PADDR 0x73e35000. Global page: 0
>> [  727] test (struct addr:ffff88003681ea20). PGD: 0x777a6000
>> VADDR 0x5000000 -> PADDR 0x75043000. Global page: 0
>
> I'm surprised. As said before - a mov-to-CR3 cannot be emulated
> without a minimal amount of flushing. No experiments whatsoever
> are suitable to prove the contrary.

That's a pretty strong statement - can you tell me where in the SDM
does it say that exactly? I've went through it couple times already
and I can't find anything that explicitly says that the flushing has
to be performed by the VMM when mov-to-CR3 trapping is enabled. The
closest thing I found was indicating the contrary. Furthermore, if the
flushing is necessary, then how would you explain that there were no
TLB mixups in the above experiment?

>
>> Both applications work as expected without the VPID flushing taking
>> place. So at least for CPU-based CR3 load exiting it seems that this
>> flush is not necessary. As for why this path gets called during domain
>> boot when the CPU supports Unrestricted Guest mode and it is properly
>> detecting when Xen boots, I'm not sure. However, as we use CPU-based
>> CR3 load exiting quite often when doing VMI, I would prefer to disable
>> this flushing at least for this case. Any thoughts?
>
> As said before - you'd better direct this question to the VMX
> maintainers, and even better would be to first understand why
> the intercept remains enabled in the first place. After all it's
> quite obvious that most improvement can be expected from not
> enabling it at all, whenever possible. Only if it needs to stay
> enabled over extended periods of a guest's lifetime it would then
> become interesting to see whether the emulation path can be
> improved.
>

To clarify - mov-to-CR3 trapping is _not_ enabled by default on a
domain. I assumed it is the only path to vmx_update_guest_cr, but I
now further verified that vmx_cr_access does not get called for a
mov-to-CR3 when the domain boots, it only gets called when we enable
it through the monitor system. There is another path leads to a call
to vmx_update_guest_cr for updating CR3 when the domain boots which
seems to require this flushing to happen. That other path I don't care
about - although it's rather odd in itself as well. Now when the
mov-to-CR3 path gets activated the flushing does not seem to be
necessary as my experiment shows and it actually actively breaks
architectural features (global pages and PCID). When we do
introspection this trapping does get enabled and stays on for the
lifetime of the domain. So adding such a big and unnecessary
performance hit is very much undesirable.

I've CC-d the VMX maintainers to see what their perspective is on this.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-26 16:12                                     ` Tamas K Lengyel
@ 2016-09-27 13:49                                       ` Jan Beulich
  2016-10-01 19:05                                         ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-09-27 13:49 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel, Kevin Tian, Jun Nakajima

>>> On 26.09.16 at 18:12, <tamas.lengyel@zentific.com> wrote:
> On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 23.09.16 at 22:45, <tamas.lengyel@zentific.com> wrote:
>>> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel
>>> <tamas.lengyel@zentific.com> wrote:
>>>> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 23.09.16 at 17:26, <tamas.lengyel@zentific.com> wrote:
>>>>>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>>>>>>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>>>>>>> flush here is critical. Without it the guest kernel crashes at random
>>>>>>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>>>>>>> unconditionally? While we have features such as the vm_event/monitor
>>>>>>>> that may choose to subscribe to that event, Xen traps it even when
>>>>>>>> that is not in use. Is that trapping necessary for something else?
>>>>>>>
>>>>>>> Where do you see this being unconditional? construct_vmcs()
>>>>>>> clearly avoids setting these intercepts when using EPT. Are you
>>>>>>> perhaps suffering from
>>>>>>>
>>>>>>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>>>>>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>>>>>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>>>>>>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>>>>>
>>>>>>> in vmx_update_guest_cr()? That'll be rather something for you
>>>>>>> or Razvan to explain. Outside of nested VMX I don't see any
>>>>>>> other enabling of that intercept (didn't check AMD code on the
>>>>>>> assumption that you're working on Intel hardware).
>>>>>>
>>>>>> So there seems to be two separate paths that lead to the TLB flushing.
>>>>>> One is indeed the above case you cited when we enable CR3 monitoring
>>>>>> through the monitor interface. However, during domain boot I also see
>>>>>> this path being called that is not related to the
>>>>>> CPU_BASED_CR3_LOAD_EXITING:
>>>>>>
>>>>>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
>>>>>> (XEN) hap.c:701:d1v0 HAP update cr3 called
>>>>>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3
>>> called
>>>>>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
>>>>>>
>>>>>> This path seems to de-activate once the domain is fully booted.
>>>>>
>>>>> This late? According to the CR0 handling in
>>>>> vmx_update_guest_cr() I would understand it to be enabled only
>>>>> while the guest is still in real mode (and even then only on old
>>>>> hardware, i.e. without the Unrestricted Guest functionality).
>>>>>
>>>>
>>>> Right, with unrestricted guest support I would assume none of this
>>>> would get called - but it does, and quite frequently during domain
>>>> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430.
>>>>
>>>
>>> So I experimented with selectively disabling the flushing such that
>>> it's done only when coming from a path other then CPU-based CR3 load
>>> exiting. I've added a bool to struct vcpu that gets set to 0 every
>>> time vmx_vmexit_handler is called, and only gets set to 1 when
>>> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr
>>> the flush only happens as such:
>>>
>>>         if ( !v->movtocr3 )
>>>             hvm_asid_flush_vcpu(v);
>>>
>>> In the guest I run a test application that allocates a page at a fixed
>>> VA, writes a magic value to it, and then keeps spinning on reading the
>>> magic value back from the page, checking if it's the same as
>>> originally supplied. I lunch this application twice with different
>>> magic values, so that if the TLB invalidation is an issue one of the
>>> test applications would read back the wrong magic value from the VA
>>> using a stale TLB entry. I've verified that same VA in the two
>>> applications point to different pages and that those PTEs are not
>>> marked global and no PCID is used.
>>>
>>> [  724] test (struct addr:ffff88003730f330). PGD: 0x3731f000
>>> VADDR 0x5000000 -> PADDR 0x73e35000. Global page: 0
>>> [  727] test (struct addr:ffff88003681ea20). PGD: 0x777a6000
>>> VADDR 0x5000000 -> PADDR 0x75043000. Global page: 0
>>
>> I'm surprised. As said before - a mov-to-CR3 cannot be emulated
>> without a minimal amount of flushing. No experiments whatsoever
>> are suitable to prove the contrary.
> 
> That's a pretty strong statement - can you tell me where in the SDM
> does it say that exactly? I've went through it couple times already
> and I can't find anything that explicitly says that the flushing has
> to be performed by the VMM when mov-to-CR3 trapping is enabled.

I though I had pointed you there already: Section "Instructions
that cause VM exits". There's nothing said about flushes, but that's
also not necessary: "... the instruction causing the VM exit does not
execute and no processor state is updated by the instruction." Plus
everything the sub-section "Relative Priority of Faults and VM Exits"
says.

> The
> closest thing I found was indicating the contrary. Furthermore, if the
> flushing is necessary, then how would you explain that there were no
> TLB mixups in the above experiment?

No idea. Perhaps there is some further flushing going on due to
other reasons?

>>> Both applications work as expected without the VPID flushing taking
>>> place. So at least for CPU-based CR3 load exiting it seems that this
>>> flush is not necessary. As for why this path gets called during domain
>>> boot when the CPU supports Unrestricted Guest mode and it is properly
>>> detecting when Xen boots, I'm not sure. However, as we use CPU-based
>>> CR3 load exiting quite often when doing VMI, I would prefer to disable
>>> this flushing at least for this case. Any thoughts?
>>
>> As said before - you'd better direct this question to the VMX
>> maintainers, and even better would be to first understand why
>> the intercept remains enabled in the first place. After all it's
>> quite obvious that most improvement can be expected from not
>> enabling it at all, whenever possible. Only if it needs to stay
>> enabled over extended periods of a guest's lifetime it would then
>> become interesting to see whether the emulation path can be
>> improved.
>>
> 
> To clarify - mov-to-CR3 trapping is _not_ enabled by default on a
> domain. I assumed it is the only path to vmx_update_guest_cr, but I
> now further verified that vmx_cr_access does not get called for a
> mov-to-CR3 when the domain boots, it only gets called when we enable
> it through the monitor system. There is another path leads to a call
> to vmx_update_guest_cr for updating CR3 when the domain boots which
> seems to require this flushing to happen. That other path I don't care
> about - although it's rather odd in itself as well. Now when the
> mov-to-CR3 path gets activated the flushing does not seem to be
> necessary as my experiment shows and it actually actively breaks
> architectural features (global pages and PCID).

Once again - it does not break anything. Performance aspects are
not architectural features. All you can say is that it makes these
extended features useless.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-09-27 13:49                                       ` Jan Beulich
@ 2016-10-01 19:05                                         ` Tamas K Lengyel
  2016-10-04  7:41                                           ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-10-01 19:05 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Tim Deegan
  Cc: xen-devel, Kevin Tian, Jun Nakajima

On Tue, Sep 27, 2016 at 7:49 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 26.09.16 at 18:12, <tamas.lengyel@zentific.com> wrote:
>> On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 23.09.16 at 22:45, <tamas.lengyel@zentific.com> wrote:
>>>> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel
>>>> <tamas.lengyel@zentific.com> wrote:
>>>>> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>> On 23.09.16 at 17:26, <tamas.lengyel@zentific.com> wrote:
>>>>>>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@zentific.com> wrote:
>>>>>>>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>>>>>>>> flush here is critical. Without it the guest kernel crashes at random
>>>>>>>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>>>>>>>> unconditionally? While we have features such as the vm_event/monitor
>>>>>>>>> that may choose to subscribe to that event, Xen traps it even when
>>>>>>>>> that is not in use. Is that trapping necessary for something else?
>>>>>>>>
>>>>>>>> Where do you see this being unconditional? construct_vmcs()
>>>>>>>> clearly avoids setting these intercepts when using EPT. Are you
>>>>>>>> perhaps suffering from
>>>>>>>>
>>>>>>>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>>>>>>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>>>>>>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>>>>>>>                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>>>>>>
>>>>>>>> in vmx_update_guest_cr()? That'll be rather something for you
>>>>>>>> or Razvan to explain. Outside of nested VMX I don't see any
>>>>>>>> other enabling of that intercept (didn't check AMD code on the
>>>>>>>> assumption that you're working on Intel hardware).
>>>>>>>
>>>>>>> So there seems to be two separate paths that lead to the TLB flushing.
>>>>>>> One is indeed the above case you cited when we enable CR3 monitoring
>>>>>>> through the monitor interface. However, during domain boot I also see
>>>>>>> this path being called that is not related to the
>>>>>>> CPU_BASED_CR3_LOAD_EXITING:
>>>>>>>
>>>>>>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
>>>>>>> (XEN) hap.c:701:d1v0 HAP update cr3 called
>>>>>>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3
>>>> called
>>>>>>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
>>>>>>>
>>>>>>> This path seems to de-activate once the domain is fully booted.
>>>>>>
>>>>>> This late? According to the CR0 handling in
>>>>>> vmx_update_guest_cr() I would understand it to be enabled only
>>>>>> while the guest is still in real mode (and even then only on old
>>>>>> hardware, i.e. without the Unrestricted Guest functionality).
>>>>>>
>>>>>
>>>>> Right, with unrestricted guest support I would assume none of this
>>>>> would get called - but it does, and quite frequently during domain
>>>>> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430.
>>>>>
>>>>
>>>> So I experimented with selectively disabling the flushing such that
>>>> it's done only when coming from a path other then CPU-based CR3 load
>>>> exiting. I've added a bool to struct vcpu that gets set to 0 every
>>>> time vmx_vmexit_handler is called, and only gets set to 1 when
>>>> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr
>>>> the flush only happens as such:
>>>>
>>>>         if ( !v->movtocr3 )
>>>>             hvm_asid_flush_vcpu(v);
>>>>
>>>> In the guest I run a test application that allocates a page at a fixed
>>>> VA, writes a magic value to it, and then keeps spinning on reading the
>>>> magic value back from the page, checking if it's the same as
>>>> originally supplied. I lunch this application twice with different
>>>> magic values, so that if the TLB invalidation is an issue one of the
>>>> test applications would read back the wrong magic value from the VA
>>>> using a stale TLB entry. I've verified that same VA in the two
>>>> applications point to different pages and that those PTEs are not
>>>> marked global and no PCID is used.
>>>>
>>>> [  724] test (struct addr:ffff88003730f330). PGD: 0x3731f000
>>>> VADDR 0x5000000 -> PADDR 0x73e35000. Global page: 0
>>>> [  727] test (struct addr:ffff88003681ea20). PGD: 0x777a6000
>>>> VADDR 0x5000000 -> PADDR 0x75043000. Global page: 0
>>>
>>> I'm surprised. As said before - a mov-to-CR3 cannot be emulated
>>> without a minimal amount of flushing. No experiments whatsoever
>>> are suitable to prove the contrary.
>>
>> That's a pretty strong statement - can you tell me where in the SDM
>> does it say that exactly? I've went through it couple times already
>> and I can't find anything that explicitly says that the flushing has
>> to be performed by the VMM when mov-to-CR3 trapping is enabled.
>
> I though I had pointed you there already: Section "Instructions
> that cause VM exits". There's nothing said about flushes, but that's
> also not necessary: "... the instruction causing the VM exit does not
> execute and no processor state is updated by the instruction." Plus
> everything the sub-section "Relative Priority of Faults and VM Exits"
> says.
>
>> The
>> closest thing I found was indicating the contrary. Furthermore, if the
>> flushing is necessary, then how would you explain that there were no
>> TLB mixups in the above experiment?
>
> No idea. Perhaps there is some further flushing going on due to
> other reasons?

I've been digging more into this issue and indeed there are many
unrelated VPID flushes happening. One source of such VPID flush has
been SMP migration which is an obvious case that must use new tags.
Pinning the vCPU to a pCPU makes these flushes go away, as expected.
However, I've found two other sources that need more attention:

In x86/flushtlb.c the function flush_area_local invalidates all guest
TLBs as such:

 if ( flags & (FLUSH_TLB|FLUSH_TLB_GLOBAL) )
    {
        if ( order == 0 )
        {
...
        }
        else
        {
            u32 t = pre_flush();
            unsigned long cr4 = read_cr4();

            hvm_flush_guest_tlbs();

This flush here to me seems to be only warranted when FLUSH_TLB_GLOBAL
is requested. However, since it is being called for simple TLB flush
requests as well it results in guest tlb flushes very often. When I
change the behavior to only issue this flush when the FLUSH_TLB_GLOBAL
flag is set, the number of flushes issued to all guests is
significantly reduced. So the question is, is this just an oversight
that should be fixed?

The other flush comes from the function write_cr3 also in
x86/flushtlb.c, which was introduced in the patch "[HVM][SVM] flush
all entries from guest ASIDs when xen writes CR3." commit id
eed63189dabd90abe422b0e94ab8854783329bed. From the commit message
however it is not entirely clear to me what exactly warrants having to
flush HVM guest TLBs and how that relates to shadow code. Commenting
this flush out made no difference to the guest or dom0, everything
works as expected. Of course, without understanding the real reason
for why this flush is here it is hard to judge whether this change
(re-)introduces some cornercase issue. It is worth noting this was
added even before VPID was introduced, so we might want to check
whether it is still required. AFAICT flushing the VPID in this case is
fine.

The fourth source for VPID flushing originates from MOV-TO-CR4 when
the guest changes certain flags related to paging. For example, during
boot Linux flips the PGE bit to force a complete TLB flush. This is
the source for the update_paging_mode calls I observed earlier, that
also performs a VPID flush. After Linux boots,these flushes go away.
Windows continues to use this method after boot as well.

Now, with these modifications performed and having Linux booted
completely there are no more spurious guest TLB flushes with VPID -
both the asid generation and the core asid generation is stable. I
repeated the experiment above with the two processes using the same VA
and enabled mov-to-cr3 trapping. There are no VPID flushes happening
as expected. The processes continue to run normally and no stale TLB
entries are observed.

>>>> Both applications work as expected without the VPID flushing taking
>>>> place. So at least for CPU-based CR3 load exiting it seems that this
>>>> flush is not necessary. As for why this path gets called during domain
>>>> boot when the CPU supports Unrestricted Guest mode and it is properly
>>>> detecting when Xen boots, I'm not sure. However, as we use CPU-based
>>>> CR3 load exiting quite often when doing VMI, I would prefer to disable
>>>> this flushing at least for this case. Any thoughts?
>>>
>>> As said before - you'd better direct this question to the VMX
>>> maintainers, and even better would be to first understand why
>>> the intercept remains enabled in the first place. After all it's
>>> quite obvious that most improvement can be expected from not
>>> enabling it at all, whenever possible. Only if it needs to stay
>>> enabled over extended periods of a guest's lifetime it would then
>>> become interesting to see whether the emulation path can be
>>> improved.
>>>
>>
>> To clarify - mov-to-CR3 trapping is _not_ enabled by default on a
>> domain. I assumed it is the only path to vmx_update_guest_cr, but I
>> now further verified that vmx_cr_access does not get called for a
>> mov-to-CR3 when the domain boots, it only gets called when we enable
>> it through the monitor system. There is another path leads to a call
>> to vmx_update_guest_cr for updating CR3 when the domain boots which
>> seems to require this flushing to happen. That other path I don't care
>> about - although it's rather odd in itself as well. Now when the
>> mov-to-CR3 path gets activated the flushing does not seem to be
>> necessary as my experiment shows and it actually actively breaks
>> architectural features (global pages and PCID).
>
> Once again - it does not break anything. Performance aspects are
> not architectural features. All you can say is that it makes these
> extended features useless.
>

Sure that's fair to say. Still worth exploring it a bit more in detail
as the performance benefits gained from tagged TLB features seems to
be fairly limited right now.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-10-01 19:05                                         ` Tamas K Lengyel
@ 2016-10-04  7:41                                           ` Jan Beulich
  2016-10-04 14:12                                             ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-10-04  7:41 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Andrew Cooper, Kevin Tian, Tim Deegan, Jun Nakajima, xen-devel

>>> On 01.10.16 at 21:05, <tamas.lengyel@zentific.com> wrote:
> However, I've found two other sources that need more attention:
> 
> In x86/flushtlb.c the function flush_area_local invalidates all guest
> TLBs as such:
> 
>  if ( flags & (FLUSH_TLB|FLUSH_TLB_GLOBAL) )
>     {
>         if ( order == 0 )
>         {
> ...
>         }
>         else
>         {
>             u32 t = pre_flush();
>             unsigned long cr4 = read_cr4();
> 
>             hvm_flush_guest_tlbs();
> 
> This flush here to me seems to be only warranted when FLUSH_TLB_GLOBAL
> is requested.

Why? The problem is that hvm_asid_flush_core() can't flush just
non-global ones.

> The other flush comes from the function write_cr3 also in
> x86/flushtlb.c, which was introduced in the patch "[HVM][SVM] flush
> all entries from guest ASIDs when xen writes CR3." commit id
> eed63189dabd90abe422b0e94ab8854783329bed. From the commit message
> however it is not entirely clear to me what exactly warrants having to
> flush HVM guest TLBs and how that relates to shadow code. Commenting
> this flush out made no difference to the guest or dom0, everything
> works as expected. Of course, without understanding the real reason
> for why this flush is here it is hard to judge whether this change
> (re-)introduces some cornercase issue. It is worth noting this was
> added even before VPID was introduced, so we might want to check
> whether it is still required. AFAICT flushing the VPID in this case is
> fine.

Same problem here it seems - there's no way to leave global TLB
entries unaffected, but we can't avoid the flush completely since
non-global entries need to go away.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-10-04  7:41                                           ` Jan Beulich
@ 2016-10-04 14:12                                             ` Tamas K Lengyel
  2016-10-04 14:29                                               ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-10-04 14:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Kevin Tian, Tim Deegan, Jun Nakajima, xen-devel

On Tue, Oct 4, 2016 at 1:41 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 01.10.16 at 21:05, <tamas.lengyel@zentific.com> wrote:
>> However, I've found two other sources that need more attention:
>>
>> In x86/flushtlb.c the function flush_area_local invalidates all guest
>> TLBs as such:
>>
>>  if ( flags & (FLUSH_TLB|FLUSH_TLB_GLOBAL) )
>>     {
>>         if ( order == 0 )
>>         {
>> ...
>>         }
>>         else
>>         {
>>             u32 t = pre_flush();
>>             unsigned long cr4 = read_cr4();
>>
>>             hvm_flush_guest_tlbs();
>>
>> This flush here to me seems to be only warranted when FLUSH_TLB_GLOBAL
>> is requested.
>
> Why? The problem is that hvm_asid_flush_core() can't flush just
> non-global ones.
>
>> The other flush comes from the function write_cr3 also in
>> x86/flushtlb.c, which was introduced in the patch "[HVM][SVM] flush
>> all entries from guest ASIDs when xen writes CR3." commit id
>> eed63189dabd90abe422b0e94ab8854783329bed. From the commit message
>> however it is not entirely clear to me what exactly warrants having to
>> flush HVM guest TLBs and how that relates to shadow code. Commenting
>> this flush out made no difference to the guest or dom0, everything
>> works as expected. Of course, without understanding the real reason
>> for why this flush is here it is hard to judge whether this change
>> (re-)introduces some cornercase issue. It is worth noting this was
>> added even before VPID was introduced, so we might want to check
>> whether it is still required. AFAICT flushing the VPID in this case is
>> fine.
>
> Same problem here it seems - there's no way to leave global TLB
> entries unaffected, but we can't avoid the flush completely since
> non-global entries need to go away.
>

Hi Jan,
yes, I understand that is the case when you do need to flush a guest.
And yes, there seem to be paths that require to bump the tag of a
specific guest for certain events (mov-to-cr4 with paging mode changes
for example). What I'm poking at it here is that we invalidate the
guest TLBs for _all_ guests very frequently. I can't find an
explanation for why _that_ is required. AFAIK having the TLB tag
guarantees that no other guest or Xen will have a chance to bump into
stale entries given no guests or Xen share a TLB tag with each other.
So the only time I see that we would have to flush all guest TLBs is
when the tag overflows and we start from 1 again. What am I missing
here?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-10-04 14:12                                             ` Tamas K Lengyel
@ 2016-10-04 14:29                                               ` Jan Beulich
  2016-10-04 15:06                                                 ` Tim Deegan
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-10-04 14:29 UTC (permalink / raw)
  To: Tim Deegan, Tamas K Lengyel
  Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 04.10.16 at 16:12, <tamas.lengyel@zentific.com> wrote:
> yes, I understand that is the case when you do need to flush a guest.
> And yes, there seem to be paths that require to bump the tag of a
> specific guest for certain events (mov-to-cr4 with paging mode changes
> for example). What I'm poking at it here is that we invalidate the
> guest TLBs for _all_ guests very frequently. I can't find an
> explanation for why _that_ is required. AFAIK having the TLB tag
> guarantees that no other guest or Xen will have a chance to bump into
> stale entries given no guests or Xen share a TLB tag with each other.
> So the only time I see that we would have to flush all guest TLBs is
> when the tag overflows and we start from 1 again. What am I missing
> here?

Oh, I see - this indeed looks to be quite a bit more flushing than is
desirable. So the question, as you did put it already, is why it got
done that way in the first place. At the very least it would look like
more control would need to be given to the callers of both
write_cr3() and flush_area_local(). Tim?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-10-04 14:29                                               ` Jan Beulich
@ 2016-10-04 15:06                                                 ` Tim Deegan
  2016-10-07 15:32                                                   ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tim Deegan @ 2016-10-04 15:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Kevin Tian, xen-devel, Jun Nakajima, Andrew Cooper

At 08:29 -0600 on 04 Oct (1475569774), Jan Beulich wrote:
> >>> On 04.10.16 at 16:12, <tamas.lengyel@zentific.com> wrote:
> > yes, I understand that is the case when you do need to flush a guest.
> > And yes, there seem to be paths that require to bump the tag of a
> > specific guest for certain events (mov-to-cr4 with paging mode changes
> > for example). What I'm poking at it here is that we invalidate the
> > guest TLBs for _all_ guests very frequently. I can't find an
> > explanation for why _that_ is required. AFAIK having the TLB tag
> > guarantees that no other guest or Xen will have a chance to bump into
> > stale entries given no guests or Xen share a TLB tag with each other.
> > So the only time I see that we would have to flush all guest TLBs is
> > when the tag overflows and we start from 1 again. What am I missing
> > here?
> 
> Oh, I see - this indeed looks to be quite a bit more flushing than is
> desirable. So the question, as you did put it already, is why it got
> done that way in the first place. At the very least it would look like
> more control would need to be given to the callers of both
> write_cr3() and flush_area_local(). Tim?

IIRC:
 - Remote TLB flushes are used for safety, e.g. to be sure that no
   guest has a mapping of a page before its type or owner changes.
   The callers rely on _all_ mappings of the page being gone after
   the remote flush.  The simplest way to do that is to flush all tags.
 - We believed that on the then-current hardware, and with the
   scheduling timeslice we had, there wasn't an awful lot of
   benefit to keeping the tags of descheduled VMs around.
 - Although it might sometimes be safe to leave some tags unflushed,
   it wasn't clear exactly when that would be.  E.g. I don't think
   that whether the tag is 'current' is a very useful test -- either
   the tag might contain dangerous mappings or it might not.

Since there are cases where we already mask TLB flushes by domain
(usign the dirty-cpumask) I can see that we might pass that domain ID
to the remote CPU and drop only that domain's tags.

And for HAP guests it may be possible to distinguish between "guest"
flushes (e.g. emulating guest CR3 writes) and "hypervisor" flushes
(e.g. after grant/p2m ops), and target "guest" flushes at particular
VCPUs.

Both of those will want careful unpicking from existing safety
mechanisms that assume that a flush is a flush.  E.g. the
tlbflush_timestamp used on page allocation skips a shootdown if _any_
TLB flush has happened on the remote PCPU since the page was freed.
Partial flushes can't count towards that.  And there might be other
gotchas that I can't think of right now.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-10-04 15:06                                                 ` Tim Deegan
@ 2016-10-07 15:32                                                   ` Jan Beulich
  2016-10-07 15:41                                                     ` Tamas K Lengyel
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2016-10-07 15:32 UTC (permalink / raw)
  To: Tim Deegan, Tamas K Lengyel
  Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 04.10.16 at 17:06, <tim@xen.org> wrote:
> At 08:29 -0600 on 04 Oct (1475569774), Jan Beulich wrote:
>> >>> On 04.10.16 at 16:12, <tamas.lengyel@zentific.com> wrote:
>> > yes, I understand that is the case when you do need to flush a guest.
>> > And yes, there seem to be paths that require to bump the tag of a
>> > specific guest for certain events (mov-to-cr4 with paging mode changes
>> > for example). What I'm poking at it here is that we invalidate the
>> > guest TLBs for _all_ guests very frequently. I can't find an
>> > explanation for why _that_ is required. AFAIK having the TLB tag
>> > guarantees that no other guest or Xen will have a chance to bump into
>> > stale entries given no guests or Xen share a TLB tag with each other.
>> > So the only time I see that we would have to flush all guest TLBs is
>> > when the tag overflows and we start from 1 again. What am I missing
>> > here?
>> 
>> Oh, I see - this indeed looks to be quite a bit more flushing than is
>> desirable. So the question, as you did put it already, is why it got
>> done that way in the first place. At the very least it would look like
>> more control would need to be given to the callers of both
>> write_cr3() and flush_area_local(). Tim?
> 
> IIRC:
>  - Remote TLB flushes are used for safety, e.g. to be sure that no
>    guest has a mapping of a page before its type or owner changes.
>    The callers rely on _all_ mappings of the page being gone after
>    the remote flush.  The simplest way to do that is to flush all tags.

Ah, of course. And that means that no matter that Tamas observed
no breakage with some of the flushing removed, it can't be dropped
altogether.

>  - We believed that on the then-current hardware, and with the
>    scheduling timeslice we had, there wasn't an awful lot of
>    benefit to keeping the tags of descheduled VMs around.
>  - Although it might sometimes be safe to leave some tags unflushed,
>    it wasn't clear exactly when that would be.  E.g. I don't think
>    that whether the tag is 'current' is a very useful test -- either
>    the tag might contain dangerous mappings or it might not.
> 
> Since there are cases where we already mask TLB flushes by domain
> (usign the dirty-cpumask) I can see that we might pass that domain ID
> to the remote CPU and drop only that domain's tags.
> 
> And for HAP guests it may be possible to distinguish between "guest"
> flushes (e.g. emulating guest CR3 writes) and "hypervisor" flushes
> (e.g. after grant/p2m ops), and target "guest" flushes at particular
> VCPUs.

Right. Question is whether there are any such operations
occurring frequently enough that optimizing this would make
sense. I don't see HVM code paths leading to write_cr3(), and
I don't think there are a whole lot leading to flush_area_local().
Did you gain any insight in this regard, Tamas?

The thing that would really help us would be some INVLPG
equivalent allowing a size/mask to be provided along with the
address (as that other path in flush_area_local() doesn't have
all these problems). Otoh, Tim - if INVLPG was sufficient for order
zero, how come ASID based full invalidation is required on the
other path? Wouldn't this need to be accompanied by a suitable
INVVPID/INVLPGA?

Jan

> Both of those will want careful unpicking from existing safety
> mechanisms that assume that a flush is a flush.  E.g. the
> tlbflush_timestamp used on page allocation skips a shootdown if _any_
> TLB flush has happened on the remote PCPU since the page was freed.
> Partial flushes can't count towards that.  And there might be other
> gotchas that I can't think of right now.
> 
> Cheers,
> 
> Tim.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-10-07 15:32                                                   ` Jan Beulich
@ 2016-10-07 15:41                                                     ` Tamas K Lengyel
  2016-10-07 16:00                                                       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Tamas K Lengyel @ 2016-10-07 15:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Kevin Tian, Tim Deegan, Jun Nakajima, xen-devel

On Fri, Oct 7, 2016 at 9:32 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 04.10.16 at 17:06, <tim@xen.org> wrote:
>> At 08:29 -0600 on 04 Oct (1475569774), Jan Beulich wrote:
>>> >>> On 04.10.16 at 16:12, <tamas.lengyel@zentific.com> wrote:
>>> > yes, I understand that is the case when you do need to flush a guest.
>>> > And yes, there seem to be paths that require to bump the tag of a
>>> > specific guest for certain events (mov-to-cr4 with paging mode changes
>>> > for example). What I'm poking at it here is that we invalidate the
>>> > guest TLBs for _all_ guests very frequently. I can't find an
>>> > explanation for why _that_ is required. AFAIK having the TLB tag
>>> > guarantees that no other guest or Xen will have a chance to bump into
>>> > stale entries given no guests or Xen share a TLB tag with each other.
>>> > So the only time I see that we would have to flush all guest TLBs is
>>> > when the tag overflows and we start from 1 again. What am I missing
>>> > here?
>>>
>>> Oh, I see - this indeed looks to be quite a bit more flushing than is
>>> desirable. So the question, as you did put it already, is why it got
>>> done that way in the first place. At the very least it would look like
>>> more control would need to be given to the callers of both
>>> write_cr3() and flush_area_local(). Tim?
>>
>> IIRC:
>>  - Remote TLB flushes are used for safety, e.g. to be sure that no
>>    guest has a mapping of a page before its type or owner changes.
>>    The callers rely on _all_ mappings of the page being gone after
>>    the remote flush.  The simplest way to do that is to flush all tags.
>
> Ah, of course. And that means that no matter that Tamas observed
> no breakage with some of the flushing removed, it can't be dropped
> altogether.
>
>>  - We believed that on the then-current hardware, and with the
>>    scheduling timeslice we had, there wasn't an awful lot of
>>    benefit to keeping the tags of descheduled VMs around.
>>  - Although it might sometimes be safe to leave some tags unflushed,
>>    it wasn't clear exactly when that would be.  E.g. I don't think
>>    that whether the tag is 'current' is a very useful test -- either
>>    the tag might contain dangerous mappings or it might not.
>>
>> Since there are cases where we already mask TLB flushes by domain
>> (usign the dirty-cpumask) I can see that we might pass that domain ID
>> to the remote CPU and drop only that domain's tags.
>>
>> And for HAP guests it may be possible to distinguish between "guest"
>> flushes (e.g. emulating guest CR3 writes) and "hypervisor" flushes
>> (e.g. after grant/p2m ops), and target "guest" flushes at particular
>> VCPUs.
>
> Right. Question is whether there are any such operations
> occurring frequently enough that optimizing this would make
> sense. I don't see HVM code paths leading to write_cr3(), and
> I don't think there are a whole lot leading to flush_area_local().
> Did you gain any insight in this regard, Tamas?

There are a ton of calls to flush_area_local, and a good chunk of them
with the idle vCPU being the active one when it is called. As for
write_cr3, there are also a lot of calls there. When I added some
debug output to observe just how many dom0 would take almost an hour
to boot and the serial line would just be spammed with that printk. So
even if there no HVM paths leading there, others paths definitely do
that affect HVM guests by making all of them take on a new tag next
time they are scheduled.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Question about VPID during MOV-TO-CR3
  2016-10-07 15:41                                                     ` Tamas K Lengyel
@ 2016-10-07 16:00                                                       ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2016-10-07 16:00 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Andrew Cooper, Kevin Tian, Tim Deegan, Jun Nakajima, xen-devel

>>> On 07.10.16 at 17:41, <tamas.lengyel@zentific.com> wrote:
> There are a ton of calls to flush_area_local, and a good chunk of them
> with the idle vCPU being the active one when it is called. As for
> write_cr3, there are also a lot of calls there. When I added some
> debug output to observe just how many dom0 would take almost an hour
> to boot and the serial line would just be spammed with that printk. So
> even if there no HVM paths leading there, others paths definitely do
> that affect HVM guests by making all of them take on a new tag next
> time they are scheduled.

Well, that's all fine, but - considering what Tim explained in great
detail - not really relevant. We just can't blindly eliminate those
safety flushes. What we can eliminate are just flushes where we
know they're not safety ones, i.e. such initiated by guest CR
updates (or alike), and I'm afraid there aren't that many.

For the safety flushes the best we may be able to do would appear
to be to limit their scope: If we knew which domains can possibly
have active mappings, we could avoid flushing unrelated ASIDs. But
even then we'd have to flush full address spaces, as we don't know
at which _virtual_ address(es) such mappings may have lived (and
there are no mechanisms to flush based on guest or host physical
address).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-10-07 16:01 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 17:29 Question about VPID during MOV-TO-CR3 Tamas K Lengyel
2016-09-21 10:23 ` Jan Beulich
2016-09-21 14:18   ` Tamas K Lengyel
2016-09-21 14:44     ` Jan Beulich
2016-09-21 15:09       ` Tamas K Lengyel
2016-09-21 15:16         ` Tamas K Lengyel
2016-09-21 15:23           ` Jan Beulich
2016-09-21 15:30             ` Tamas K Lengyel
2016-09-21 18:26               ` Tamas K Lengyel
2016-09-22  9:00                 ` Jan Beulich
2016-09-22 10:39                   ` Tamas K Lengyel
2016-09-22 11:35                     ` Jan Beulich
2016-09-22  8:56               ` Jan Beulich
2016-09-22 10:35                 ` Tamas K Lengyel
2016-09-22 11:27                   ` Jan Beulich
2016-09-22 11:37                     ` Tamas K Lengyel
2016-09-22 17:18                       ` Tamas K Lengyel
2016-09-23  8:24                         ` Jan Beulich
2016-09-23  8:35                           ` Razvan Cojocaru
2016-09-23 15:26                           ` Tamas K Lengyel
2016-09-23 15:39                             ` Jan Beulich
2016-09-23 15:50                               ` Tamas K Lengyel
2016-09-23 20:45                                 ` Tamas K Lengyel
2016-09-26  6:24                                   ` Jan Beulich
2016-09-26 16:12                                     ` Tamas K Lengyel
2016-09-27 13:49                                       ` Jan Beulich
2016-10-01 19:05                                         ` Tamas K Lengyel
2016-10-04  7:41                                           ` Jan Beulich
2016-10-04 14:12                                             ` Tamas K Lengyel
2016-10-04 14:29                                               ` Jan Beulich
2016-10-04 15:06                                                 ` Tim Deegan
2016-10-07 15:32                                                   ` Jan Beulich
2016-10-07 15:41                                                     ` Tamas K Lengyel
2016-10-07 16:00                                                       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.