All of lore.kernel.org
 help / color / mirror / Atom feed
* Weird altp2m behaviour when switching early to a new view
@ 2018-04-08 20:38 Razvan Cojocaru
  2018-04-09 14:12 ` George Dunlap
  2018-04-09 15:40 ` Alexey G
  0 siblings, 2 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-08 20:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima

Hello,

I've noticed altp2m behaviour I can't explain yet - I'm not all that
familiar with all the ways the new views corellate with the previous
EPT-based "view 0".

In short, if we create a new view and simply switch to it early in the
boot process of the guest, something goes wrong and the guest either
freezes, becomes unresponsive to input, or has something wrong with the
display (most often the latter, with a black band on top of the image):

https://ibb.co/eUPJ6c
https://ibb.co/etCXXH

That guest is a 64-bit Windows 7 system with nothing special about it.
It's easy to reproduce this:

1. Start the guest paused (I've used "xl create -p myguest.conf").
2. Patch xen-access like this: https://pastebin.com/67PpQ9fu (just
remove the part of the code that modifies the new view before switching
to it).
3. Hook xen-access to the guest ("./xen-access <domid> altp2m_write").
4. Unpause the guest ("xl unpause <domid>").

I think that's a valid scenario and supposed to work.

I've also noticed that if I wait to do this until the OS is
up-and-running (e.g. after logging into Windows), there seems to be no
problem. I don't know if this is just coincidence (as is bound to happen
with race-condition situations), or means something, but I can get the
problems every time when switching views early, and never when switching
the views late.

Suggestions on what the problem could be are, as always, greatly
appreciated.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-08 20:38 Weird altp2m behaviour when switching early to a new view Razvan Cojocaru
@ 2018-04-09 14:12 ` George Dunlap
  2018-04-11  6:39   ` Razvan Cojocaru
  2018-04-09 15:40 ` Alexey G
  1 sibling, 1 reply; 31+ messages in thread
From: George Dunlap @ 2018-04-09 14:12 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel
  Cc: Andrew Cooper, Tian, Kevin, Tamas K Lengyel, Tim Deegan, Jun Nakajima

On 04/08/2018 09:38 PM, Razvan Cojocaru wrote:
> Hello,
> 
> I've noticed altp2m behaviour I can't explain yet - I'm not all that
> familiar with all the ways the new views corellate with the previous
> EPT-based "view 0".
> 
> In short, if we create a new view and simply switch to it early in the
> boot process of the guest, something goes wrong and the guest either
> freezes, becomes unresponsive to input, or has something wrong with the
> display (most often the latter, with a black band on top of the image):
> 
> https://ibb.co/eUPJ6c
> https://ibb.co/etCXXH
> 
> That guest is a 64-bit Windows 7 system with nothing special about it.
> It's easy to reproduce this:
> 
> 1. Start the guest paused (I've used "xl create -p myguest.conf").
> 2. Patch xen-access like this: https://pastebin.com/67PpQ9fu (just
> remove the part of the code that modifies the new view before switching
> to it).
> 3. Hook xen-access to the guest ("./xen-access <domid> altp2m_write").
> 4. Unpause the guest ("xl unpause <domid>").
> 
> I think that's a valid scenario and supposed to work.
> 
> I've also noticed that if I wait to do this until the OS is
> up-and-running (e.g. after logging into Windows), there seems to be no
> problem. I don't know if this is just coincidence (as is bound to happen
> with race-condition situations), or means something, but I can get the
> problems every time when switching views early, and never when switching
> the views late.
> 
> Suggestions on what the problem could be are, as always, greatly
> appreciated.

The obvious place to look is the logdirtyvram functionality, which is
used to make it easier for QEMU to figure out which bits of the display
buffer have been modified.  One of the big reasons the altp2m
functionality is still considered "experimental" is that these sorts of
interactions were never carefully thought out.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-08 20:38 Weird altp2m behaviour when switching early to a new view Razvan Cojocaru
  2018-04-09 14:12 ` George Dunlap
@ 2018-04-09 15:40 ` Alexey G
  1 sibling, 0 replies; 31+ messages in thread
From: Alexey G @ 2018-04-09 15:40 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel

On Sun, 8 Apr 2018 23:38:31 +0300
Razvan Cojocaru <rcojocaru@bitdefender.com> wrote:
>I've noticed altp2m behaviour I can't explain yet - I'm not all that
>familiar with all the ways the new views corellate with the previous
>EPT-based "view 0".
>
>In short, if we create a new view and simply switch to it early in the
>boot process of the guest, something goes wrong and the guest either
>freezes, becomes unresponsive to input, or has something wrong with the
>display (most often the latter, with a black band on top of the image):
>
>https://ibb.co/eUPJ6c
>https://ibb.co/etCXXH
>
>That guest is a 64-bit Windows 7 system with nothing special about it.
>It's easy to reproduce this:
>
>1. Start the guest paused (I've used "xl create -p myguest.conf").
>2. Patch xen-access like this: https://pastebin.com/67PpQ9fu (just
>remove the part of the code that modifies the new view before switching
>to it).
>3. Hook xen-access to the guest ("./xen-access <domid> altp2m_write").
>4. Unpause the guest ("xl unpause <domid>").
>
>I think that's a valid scenario and supposed to work.
>
>I've also noticed that if I wait to do this until the OS is
>up-and-running (e.g. after logging into Windows), there seems to be no
>problem. I don't know if this is just coincidence (as is bound to
>happen with race-condition situations), or means something, but I can
>get the problems every time when switching views early, and never when
>switching the views late.
>
>Suggestions on what the problem could be are, as always, greatly
>appreciated.

The difference between starting the domain paused and switching altp2m
views later (in this particular case) is likely the fact whether
the emulated LFB backing storage (aka VRAM) was "deployed" to the guest
domain or not.

QEMU uses special ranges to maintain the state of some emulated
devices with memories. These are VRAM and Option ROMs, but latter are
emulated as MMIO currently for some reason (either a bug or
intentionally). VRAM area is used to hold the LFB content for the
emulated videocard (stdvga/cirrus/etc) and its usage depends on whether
the emulated LFB is currently mapped or not.

When you start the domain paused, VRAM is not mapped to the guest
address space yet -- it is allocated to the location outside the guest
memory map, known to QEMU. In order to map VRAM to the guest's address
space, the emulated QEMU videocard needs to be initialized, which
happens when hvmloader or guest OS allocates MMIO BARs for the emulated
videocard.
Upon writing the corresponding BAR register, the VRAM area is relocated
to the guest's visible address space (via add_to_physmap hypercall).
Disabling MMIO decoding may cause to relocate the VRAM range back
to its hidden backing storage (again via add_to_physmap).

So, it's basically the same appearance of the problem of different
Xen/QEMU views on guest's memory layout.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-09 14:12 ` George Dunlap
@ 2018-04-11  6:39   ` Razvan Cojocaru
  2018-04-11  8:04     ` Razvan Cojocaru
  2018-04-11 20:17     ` Tamas K Lengyel
  0 siblings, 2 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-11  6:39 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Andrew Cooper, Tian, Kevin, Tamas K Lengyel, Tim Deegan, Jun Nakajima

On 04/09/2018 05:12 PM, George Dunlap wrote:
> The obvious place to look is the logdirtyvram functionality, which is
> used to make it easier for QEMU to figure out which bits of the display
> buffer have been modified.  One of the big reasons the altp2m
> functionality is still considered "experimental" is that these sorts of
> interactions were never carefully thought out.

George (and Alexey), thanks for the help!

I've managed to find out why this is happening, though I don't yet have
a solution for it. After much debugging, it turns out that the
"p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch
to the new altp2m view fast enough, and that in turn disables the
logdirty processing gated on it:

/* Spurious fault? PoD and log-dirty also take this path. */
if ( p2m_is_ram(p2mt) )
{
    rc = 1;
    /*
     * Page log dirty is always done with order 0. If this mfn resides
     * in a large page, we do not change other pages type within that
     * large page.
     */
    if ( npfec.write_access )
    {
        paging_mark_pfn_dirty(currd, _pfn(gfn));
        /*
         * If p2m is really an altp2m, unlock here to avoid lock
         * ordering violation when the change below is propagated
         * from host p2m.
         */
        if ( ap2m_active )
            __put_gfn(p2m, gfn);
        p2m_change_type_one(currd, gfn, p2m_ram_logdirty, p2m_ram_rw);
        __put_gfn(ap2m_active ? hostp2m : p2m, gfn);

        goto out;
    }
    goto out_put_gfn;
}

I think this is likely because switching to the new EPT view loses
information stored when previously working with the default one, in this
case while that work has not finished (if that's the case, creating a
new altp2m view should also somehow inherit the history of view 0 - when
doing set_access() on view 0 the changes propagate to all existent
altp2m views, but I don't think that happens when a new view is created:
that starts with a clean slate).

Also related to this part of the altp2m design, I've also noticed that
creating a new EPT view with libxc involves this function:

int xc_altp2m_create_view(xc_interface *handle, uint32_t domid,
                          xenmem_access_t default_access,
                          uint16_t *view_id);

However, default_access is completely ignored. If this was meant to
reset all pages to that access it's trivial to wire it in into the
hypervisor (we could just iterate through all guest pages from 0 to
max_gpfn and set the access) - but in light of this issue that's not a
good thing to do. Or, rather, it was meant to specify the default_access
for new pages where it is not explicitly specified?

Either way, the argument is not used:

case HVMOP_altp2m_create_p2m:
    if ( !(rc = p2m_init_next_altp2m(d, &a.u.view.view)) )
        rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
    break;

That's the progress so far.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-11  6:39   ` Razvan Cojocaru
@ 2018-04-11  8:04     ` Razvan Cojocaru
  2018-04-12 16:15       ` Razvan Cojocaru
  2018-04-13 14:44       ` Razvan Cojocaru
  2018-04-11 20:17     ` Tamas K Lengyel
  1 sibling, 2 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-11  8:04 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Andrew Cooper, Tian, Kevin, Tamas K Lengyel, Tim Deegan, Jun Nakajima

> After much debugging, it turns out that the
> "p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch
> to the new altp2m view fast enough, and that in turn disables the
> logdirty processing gated on it

Actually as it turns out the exit doesn't happen at all anymore so
hvm_hap_nested_page_fault() doesn't get called (I've added a printk() in
hvm_hap_nested_page_fault() just before "/* Check access permissions
first, then handle faults */" and it doesn't appear).

Debugging continues.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-11  6:39   ` Razvan Cojocaru
  2018-04-11  8:04     ` Razvan Cojocaru
@ 2018-04-11 20:17     ` Tamas K Lengyel
  2018-04-12  7:19       ` Razvan Cojocaru
  1 sibling, 1 reply; 31+ messages in thread
From: Tamas K Lengyel @ 2018-04-11 20:17 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Tian, Kevin, Andrew Cooper, Tim Deegan, George Dunlap,
	Jun Nakajima, Xen-devel

On Wed, Apr 11, 2018 at 12:39 AM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 04/09/2018 05:12 PM, George Dunlap wrote:
>> The obvious place to look is the logdirtyvram functionality, which is
>> used to make it easier for QEMU to figure out which bits of the display
>> buffer have been modified.  One of the big reasons the altp2m
>> functionality is still considered "experimental" is that these sorts of
>> interactions were never carefully thought out.
>
> George (and Alexey), thanks for the help!
>
> I've managed to find out why this is happening, though I don't yet have
> a solution for it. After much debugging, it turns out that the
> "p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch
> to the new altp2m view fast enough, and that in turn disables the
> logdirty processing gated on it:
>
> /* Spurious fault? PoD and log-dirty also take this path. */
> if ( p2m_is_ram(p2mt) )
> {
>     rc = 1;
>     /*
>      * Page log dirty is always done with order 0. If this mfn resides
>      * in a large page, we do not change other pages type within that
>      * large page.
>      */
>     if ( npfec.write_access )
>     {
>         paging_mark_pfn_dirty(currd, _pfn(gfn));
>         /*
>          * If p2m is really an altp2m, unlock here to avoid lock
>          * ordering violation when the change below is propagated
>          * from host p2m.
>          */
>         if ( ap2m_active )
>             __put_gfn(p2m, gfn);
>         p2m_change_type_one(currd, gfn, p2m_ram_logdirty, p2m_ram_rw);
>         __put_gfn(ap2m_active ? hostp2m : p2m, gfn);
>
>         goto out;
>     }
>     goto out_put_gfn;
> }
>
> I think this is likely because switching to the new EPT view loses
> information stored when previously working with the default one, in this
> case while that work has not finished (if that's the case, creating a
> new altp2m view should also somehow inherit the history of view 0 - when
> doing set_access() on view 0 the changes propagate to all existent
> altp2m views, but I don't think that happens when a new view is created:
> that starts with a clean slate).
>
> Also related to this part of the altp2m design, I've also noticed that
> creating a new EPT view with libxc involves this function:
>
> int xc_altp2m_create_view(xc_interface *handle, uint32_t domid,
>                           xenmem_access_t default_access,
>                           uint16_t *view_id);
>
> However, default_access is completely ignored. If this was meant to
> reset all pages to that access it's trivial to wire it in into the
> hypervisor (we could just iterate through all guest pages from 0 to
> max_gpfn and set the access) - but in light of this issue that's not a
> good thing to do. Or, rather, it was meant to specify the default_access
> for new pages where it is not explicitly specified?

I don't think the function of default_access for an altp2m was
settled, so both of these would sound to me like a valid setup. I
personally would prefer that if the default_access is specified and
it's not rwx it would result in the entire altp2m view getting
populated right there instead of it being done lazily. The whole
lazy-population of the altp2m views has been nothing but a headache.
Add to that the fact that views get completely reset under certain
situations without any indication being given to the application using
it.. might be time we rethink these parts of altp2m to make it more
robust.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-11 20:17     ` Tamas K Lengyel
@ 2018-04-12  7:19       ` Razvan Cojocaru
  0 siblings, 0 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-12  7:19 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tian, Kevin, Andrew Cooper, Tim Deegan, George Dunlap,
	Jun Nakajima, Xen-devel

On 04/11/2018 11:17 PM, Tamas K Lengyel wrote:
> On Wed, Apr 11, 2018 at 12:39 AM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>> Also related to this part of the altp2m design, I've also noticed that
>> creating a new EPT view with libxc involves this function:
>>
>> int xc_altp2m_create_view(xc_interface *handle, uint32_t domid,
>>                           xenmem_access_t default_access,
>>                           uint16_t *view_id);
>>
>> However, default_access is completely ignored. If this was meant to
>> reset all pages to that access it's trivial to wire it in into the
>> hypervisor (we could just iterate through all guest pages from 0 to
>> max_gpfn and set the access) - but in light of this issue that's not a
>> good thing to do. Or, rather, it was meant to specify the default_access
>> for new pages where it is not explicitly specified?
> 
> I don't think the function of default_access for an altp2m was
> settled, so both of these would sound to me like a valid setup. I
> personally would prefer that if the default_access is specified and
> it's not rwx it would result in the entire altp2m view getting
> populated right there instead of it being done lazily. The whole
> lazy-population of the altp2m views has been nothing but a headache.
> Add to that the fact that views get completely reset under certain
> situations without any indication being given to the application using
> it.. might be time we rethink these parts of altp2m to make it more
> robust.

Do you mean the p2m_altp2m_lazy_copy() logic in
hvm_hap_nested_page_fault()? That took a while to wrap my head around.

Ideally a new view would immediately upon creation copy the default
view. Later on, it would be updated (kept in sync) by
p2m_altp2m_propagate_change(). That's the theory anyway.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-11  8:04     ` Razvan Cojocaru
@ 2018-04-12 16:15       ` Razvan Cojocaru
  2018-04-13 14:44       ` Razvan Cojocaru
  1 sibling, 0 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-12 16:15 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Andrew Cooper, Tian, Kevin, Tamas K Lengyel, Tim Deegan, Jun Nakajima

On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>> After much debugging, it turns out that the
>> "p2m_is_ram(p2mt)" test in hvm_hap_nested_page_fault() fails if I switch
>> to the new altp2m view fast enough, and that in turn disables the
>> logdirty processing gated on it
> 
> Actually as it turns out the exit doesn't happen at all anymore so
> hvm_hap_nested_page_fault() doesn't get called (I've added a printk() in
> hvm_hap_nested_page_fault() just before "/* Check access permissions
> first, then handle faults */" and it doesn't appear).

This is what seems to be happening, the following call traces end up
calling p2m_altp2m_propagate_change() after switching to the new altp2m
view early:

(XEN) Xen call trace:
(XEN)    [<ffff82d08032ee1a>] p2m_altp2m_propagate_change+0x4e/0x508
(XEN)    [<ffff82d0803341b7>] p2m-ept.c#ept_set_entry+0x7e4/0x8c4
(XEN)    [<ffff82d08032801b>] p2m_set_entry+0xe2/0x124
(XEN)    [<ffff82d080328250>] p2m.c#p2m_remove_page+0x1f3/0x209
(XEN)    [<ffff82d0803292df>] guest_physmap_remove_page+0x18c/0x214
(XEN)    [<ffff82d080221bc5>] guest_remove_page+0x27b/0x2d3
(XEN)    [<ffff82d08022232a>] do_memory_op+0x502/0x22f8
(XEN)    [<ffff82d08036d876>] pv_hypercall+0x1f4/0x440
(XEN)    [<ffff82d080374495>] lstar_enter+0x115/0x120

and

(XEN) Xen call trace:
(XEN)    [<ffff82d08032ee1a>] p2m_altp2m_propagate_change+0x4e/0x508
(XEN)    [<ffff82d0803341b7>] p2m-ept.c#ept_set_entry+0x7e4/0x8c4
(XEN)    [<ffff82d08032801b>] p2m_set_entry+0xe2/0x124
(XEN)    [<ffff82d080329b2a>] guest_physmap_add_entry+0x7c3/0xacd
(XEN)    [<ffff82d080222720>] do_memory_op+0x8f8/0x22f8
(XEN)    [<ffff82d08036d876>] pv_hypercall+0x1f4/0x440
(XEN)    [<ffff82d080374495>] lstar_enter+0x115/0x120

So clearly it's the external pages described earlier by George and
Alexey landing.

p2m_altp2m_propagate_change() then proceeds to do nothing (because gfn >
p2m->max_remapped_gfn, _and_ m == INVALID_MFN).

Next, in hvm_hap_nested_page_fault() p2m_altp2m_lazy_copy() returns 1,
which leads to the function just exiting with no further logdirty checks
(there's a goto out; that jumps them), and then that's it, I don't see
any more page faults for those gfns.

I've tried an assorted array of strategies: change
p2m_altp2m_propagate_change() to always call set_entry() on the altp2m
view, doing set_entry() for every gfn in the hostp2m into the new altp2m
view in p2m_init_altp2m_ept() (a combination of these), trying to run
the logdirty code in hvm_hap_nested_page_fault() before the goto out
corresponding to p2m_altp2m_lazy_copy(). None of these things has worked.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-11  8:04     ` Razvan Cojocaru
  2018-04-12 16:15       ` Razvan Cojocaru
@ 2018-04-13 14:44       ` Razvan Cojocaru
  2018-04-13 16:38         ` Tamas K Lengyel
  2018-04-16 17:47         ` George Dunlap
  1 sibling, 2 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-13 14:44 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Andrew Cooper, Tian, Kevin, Tamas K Lengyel, Tim Deegan, Jun Nakajima

[-- Attachment #1: Type: text/plain, Size: 1316 bytes --]

On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
> Debugging continues.

Finally, the attached patch seems to get the display unstuck in my
scenario, although for one guest I get:

(XEN) d2v0 Unexpected vmexit: reason 49
(XEN) domain_crash called from vmx.c:4120
(XEN) Domain 2 (vcpu#0) crashed on cpu#1:
(XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    0010:[<fffff96000842354>]
(XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
(XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
(XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
(XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
(XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
(XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
(XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
(XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
(XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010

i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
somebody more familiar with the code can point to a more elegant
solution if one exists.


Thanks,
Razvan

[-- Attachment #2: altp2m_vga.patch --]
[-- Type: text/x-patch, Size: 2873 bytes --]

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 14b5939..3be02ca 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1375,8 +1375,15 @@ void setup_ept_dump(void)
 void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
 {
     struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
+    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
     struct ept_data *ept;
 
+    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
+    p2m->default_access = hostp2m->default_access;
+    p2m->domain = hostp2m->domain;
+    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
+    p2m->global_logdirty = hostp2m->global_logdirty;
+
     p2m->min_remapped_gfn = gfn_x(INVALID_GFN);
     p2m->max_remapped_gfn = 0;
     ept = &p2m->ept;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c53cab4..00f85e1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -28,6 +28,7 @@
 #include <xen/vm_event.h>
 #include <xen/event.h>
 #include <public/vm_event.h>
+#include <asm/altp2m.h>
 #include <asm/domain.h>
 #include <asm/page.h>
 #include <asm/paging.h>
@@ -248,7 +249,6 @@ int p2m_init(struct domain *d)
 int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start,
                           unsigned long end)
 {
-    ASSERT(p2m_is_hostp2m(p2m));
     if ( p2m->global_logdirty ||
          rangeset_contains_range(p2m->logdirty_ranges, start, end) )
         return 1;
@@ -964,12 +964,12 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l,
 }
 
 /* Modify the p2m type of a range of gfns from ot to nt. */
-void p2m_change_type_range(struct domain *d, 
-                           unsigned long start, unsigned long end,
-                           p2m_type_t ot, p2m_type_t nt)
+static void _p2m_change_type_range(struct p2m_domain *p2m,
+                                   unsigned long start, unsigned long end,
+                                   p2m_type_t ot, p2m_type_t nt)
 {
+    struct domain *d = p2m->domain;
     unsigned long gfn = start;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int rc = 0;
 
     ASSERT(ot != nt);
@@ -1022,6 +1022,23 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+void p2m_change_type_range(struct domain *d,
+                           unsigned long start, unsigned long end,
+                           p2m_type_t ot, p2m_type_t nt)
+{
+    unsigned int i;
+
+    if ( !altp2m_active(d) )
+    {
+        _p2m_change_type_range(p2m_get_hostp2m(d), start, end, ot, nt);
+        return;
+    }
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+        if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
+            _p2m_change_type_range(d->arch.altp2m_p2m[i], start, end, ot, nt);
+}
+
 /*
  * Finish p2m type change for gfns which are marked as need_recalc in a range.
  * Returns: 0/1 for success, negative for failure

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-13 14:44       ` Razvan Cojocaru
@ 2018-04-13 16:38         ` Tamas K Lengyel
  2018-04-13 17:04           ` Razvan Cojocaru
  2018-04-16 17:47         ` George Dunlap
  1 sibling, 1 reply; 31+ messages in thread
From: Tamas K Lengyel @ 2018-04-13 16:38 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Tian, Kevin, Andrew Cooper, Tim Deegan, George Dunlap,
	Jun Nakajima, Xen-devel

On Fri, Apr 13, 2018 at 8:44 AM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>> Debugging continues.
>
> Finally, the attached patch seems to get the display unstuck in my
> scenario, although for one guest I get:
>
> (XEN) d2v0 Unexpected vmexit: reason 49
> (XEN) domain_crash called from vmx.c:4120
> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    0010:[<fffff96000842354>]
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>
> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
> somebody more familiar with the code can point to a more elegant
> solution if one exists.

I dare say at this point you might just be that person :) I looked at
the patch, one issue with p2m_change_type_range in its current form is
that it's unclear how it would behave when the range contains remapped
gfns in an altp2m (either a page that remaps to this range, or a page
that remaps from the range). According to the current altp2m approach
I would say the altp2m views should probably just get completely reset
if there is such a remapping.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-13 16:38         ` Tamas K Lengyel
@ 2018-04-13 17:04           ` Razvan Cojocaru
  0 siblings, 0 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-13 17:04 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tian, Kevin, Andrew Cooper, Tim Deegan, George Dunlap,
	Jun Nakajima, Xen-devel

On 04/13/2018 07:38 PM, Tamas K Lengyel wrote:
> On Fri, Apr 13, 2018 at 8:44 AM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>> Debugging continues.
>>
>> Finally, the attached patch seems to get the display unstuck in my
>> scenario, although for one guest I get:
>>
>> (XEN) d2v0 Unexpected vmexit: reason 49
>> (XEN) domain_crash called from vmx.c:4120
>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>> (XEN) CPU:    1
>> (XEN) RIP:    0010:[<fffff96000842354>]
>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>
>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>> somebody more familiar with the code can point to a more elegant
>> solution if one exists.
> 
> I dare say at this point you might just be that person :) I looked at
> the patch, one issue with p2m_change_type_range in its current form is
> that it's unclear how it would behave when the range contains remapped
> gfns in an altp2m (either a page that remaps to this range, or a page
> that remaps from the range). According to the current altp2m approach
> I would say the altp2m views should probably just get completely reset
> if there is such a remapping.

Yes, it's all quite complex.

Actually, for one I think I should set p2m->max_mapped_pfn =
hostp2m->max_mapped_pfn; at switch-time rather than at
p2m_init_altp2m_ept() time, for obvious reasons. I wonder if the sane
behaviour here isn't to always copy everything from the active view to
the view we're switching to (whatever "copy everything" means, I'm fuzzy
on this myself).

Also, looking at the code I think that the default_access parameter to
the libxc function we were discussing earlier should simply reach
p2m_init_altp2m_ept() and be assigned to p2m->default_access - that
probably clears the mystery.

You're saying that p2m_change_type_range() should act more like
p2m_altp2m_propagate_change() for p2ms that are not the hostp2m, right?


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-13 14:44       ` Razvan Cojocaru
  2018-04-13 16:38         ` Tamas K Lengyel
@ 2018-04-16 17:47         ` George Dunlap
  2018-04-16 18:46           ` Razvan Cojocaru
  1 sibling, 1 reply; 31+ messages in thread
From: George Dunlap @ 2018-04-16 17:47 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel
  Cc: Andrew Cooper, Tian, Kevin, Tamas K Lengyel, Tim Deegan, Jun Nakajima

On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>> Debugging continues.
> 
> Finally, the attached patch seems to get the display unstuck in my
> scenario, although for one guest I get:
> 
> (XEN) d2v0 Unexpected vmexit: reason 49
> (XEN) domain_crash called from vmx.c:4120
> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    0010:[<fffff96000842354>]
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
> 
> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
> somebody more familiar with the code can point to a more elegant
> solution if one exists.

I think I have an idea what's going on, but it's complicated. :-)

Basically, the logdirty functionality isn't simple, and needs careful
thought on how to integrate it.  I'll write some more tomorrow, and see
if I can come up with a solution.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-16 17:47         ` George Dunlap
@ 2018-04-16 18:46           ` Razvan Cojocaru
  2018-04-16 20:21             ` George Dunlap
  0 siblings, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-16 18:46 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Andrew Cooper, Tian, Kevin, Tamas K Lengyel, Tim Deegan, Jun Nakajima

On 04/16/2018 08:47 PM, George Dunlap wrote:
> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>> Debugging continues.
>>
>> Finally, the attached patch seems to get the display unstuck in my
>> scenario, although for one guest I get:
>>
>> (XEN) d2v0 Unexpected vmexit: reason 49
>> (XEN) domain_crash called from vmx.c:4120
>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>> (XEN) CPU:    1
>> (XEN) RIP:    0010:[<fffff96000842354>]
>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>
>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>> somebody more familiar with the code can point to a more elegant
>> solution if one exists.
> 
> I think I have an idea what's going on, but it's complicated. :-)
> 
> Basically, the logdirty functionality isn't simple, and needs careful
> thought on how to integrate it.  I'll write some more tomorrow, and see
> if I can come up with a solution.

I think I know why this happens for the one guest - the other guests
start at a certain resolution display-wise and stay that way until shutdown.

This particular guest starts with a larger screen, then goes to roughly
2/3rds of it, then tries to go back to the initial larger one - at which
point the above happens. I assume this corresponds to some pages being
removed and/or added. I'll test this theory more tomorrow - if it's
correct I should be able to reproduce the crash (with the patch) by
simply resetting the screen resolution (increasing it).


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-16 18:46           ` Razvan Cojocaru
@ 2018-04-16 20:21             ` George Dunlap
  2018-04-17  7:24               ` Razvan Cojocaru
  2018-04-17  8:24               ` Razvan Cojocaru
  0 siblings, 2 replies; 31+ messages in thread
From: George Dunlap @ 2018-04-16 20:21 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel

On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 04/16/2018 08:47 PM, George Dunlap wrote:
>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>>> Debugging continues.
>>>
>>> Finally, the attached patch seems to get the display unstuck in my
>>> scenario, although for one guest I get:
>>>
>>> (XEN) d2v0 Unexpected vmexit: reason 49
>>> (XEN) domain_crash called from vmx.c:4120
>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>>> (XEN) CPU:    1
>>> (XEN) RIP:    0010:[<fffff96000842354>]
>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>>
>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>>> somebody more familiar with the code can point to a more elegant
>>> solution if one exists.
>>
>> I think I have an idea what's going on, but it's complicated. :-)
>>
>> Basically, the logdirty functionality isn't simple, and needs careful
>> thought on how to integrate it.  I'll write some more tomorrow, and see
>> if I can come up with a solution.
>
> I think I know why this happens for the one guest - the other guests
> start at a certain resolution display-wise and stay that way until shutdown.
>
> This particular guest starts with a larger screen, then goes to roughly
> 2/3rds of it, then tries to go back to the initial larger one - at which
> point the above happens. I assume this corresponds to some pages being
> removed and/or added. I'll test this theory more tomorrow - if it's
> correct I should be able to reproduce the crash (with the patch) by
> simply resetting the screen resolution (increasing it).

The trick is that p2m_change_type doesn't actually iterate over the
entire p2m range, individually changing entries as it goes.  Instead
it misconfigures the entries at the top-level, which causes the kinds
of faults shown above.  As it gets faults for each entry, it checks
the current type, the logdirty ranges, and the global logdirty bit to
determine what the new types should be.

Your patch makes it so that all the altp2ms now get the
misconfiguration when the logdirty range is changed; but clearly
handling the misconfiguration isn't integrated properly with the
altp2m system yet.  Doing it right may take some thought.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-16 20:21             ` George Dunlap
@ 2018-04-17  7:24               ` Razvan Cojocaru
  2018-04-17  8:24               ` Razvan Cojocaru
  1 sibling, 0 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-17  7:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel



On 04/16/2018 11:21 PM, George Dunlap wrote:
> On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>> On 04/16/2018 08:47 PM, George Dunlap wrote:
>>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>>>> Debugging continues.
>>>>
>>>> Finally, the attached patch seems to get the display unstuck in my
>>>> scenario, although for one guest I get:
>>>>
>>>> (XEN) d2v0 Unexpected vmexit: reason 49
>>>> (XEN) domain_crash called from vmx.c:4120
>>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>>>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>>>> (XEN) CPU:    1
>>>> (XEN) RIP:    0010:[<fffff96000842354>]
>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>>>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>>>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>>>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>>>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>>>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>>>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>>>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>>>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>>>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>>>
>>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>>>> somebody more familiar with the code can point to a more elegant
>>>> solution if one exists.
>>>
>>> I think I have an idea what's going on, but it's complicated. :-)
>>>
>>> Basically, the logdirty functionality isn't simple, and needs careful
>>> thought on how to integrate it.  I'll write some more tomorrow, and see
>>> if I can come up with a solution.
>>
>> I think I know why this happens for the one guest - the other guests
>> start at a certain resolution display-wise and stay that way until shutdown.
>>
>> This particular guest starts with a larger screen, then goes to roughly
>> 2/3rds of it, then tries to go back to the initial larger one - at which
>> point the above happens. I assume this corresponds to some pages being
>> removed and/or added. I'll test this theory more tomorrow - if it's
>> correct I should be able to reproduce the crash (with the patch) by
>> simply resetting the screen resolution (increasing it).
> 
> The trick is that p2m_change_type doesn't actually iterate over the
> entire p2m range, individually changing entries as it goes.  Instead
> it misconfigures the entries at the top-level, which causes the kinds
> of faults shown above.  As it gets faults for each entry, it checks
> the current type, the logdirty ranges, and the global logdirty bit to
> determine what the new types should be.
> 
> Your patch makes it so that all the altp2ms now get the
> misconfiguration when the logdirty range is changed; but clearly
> handling the misconfiguration isn't integrated properly with the
> altp2m system yet.  Doing it right may take some thought.

Thanks for investigating! Sure enough, my suspicion was correct - all it
takes to get that domain crash is to change the VM screen resolution.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-16 20:21             ` George Dunlap
  2018-04-17  7:24               ` Razvan Cojocaru
@ 2018-04-17  8:24               ` Razvan Cojocaru
  2018-04-17 10:49                 ` Razvan Cojocaru
  1 sibling, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-17  8:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel

[-- Attachment #1: Type: text/plain, Size: 3578 bytes --]

On 04/16/2018 11:21 PM, George Dunlap wrote:
> On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>> On 04/16/2018 08:47 PM, George Dunlap wrote:
>>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>>>> Debugging continues.
>>>>
>>>> Finally, the attached patch seems to get the display unstuck in my
>>>> scenario, although for one guest I get:
>>>>
>>>> (XEN) d2v0 Unexpected vmexit: reason 49
>>>> (XEN) domain_crash called from vmx.c:4120
>>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>>>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>>>> (XEN) CPU:    1
>>>> (XEN) RIP:    0010:[<fffff96000842354>]
>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>>>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>>>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>>>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>>>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>>>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>>>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>>>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>>>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>>>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>>>
>>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>>>> somebody more familiar with the code can point to a more elegant
>>>> solution if one exists.
>>>
>>> I think I have an idea what's going on, but it's complicated. :-)
>>>
>>> Basically, the logdirty functionality isn't simple, and needs careful
>>> thought on how to integrate it.  I'll write some more tomorrow, and see
>>> if I can come up with a solution.
>>
>> I think I know why this happens for the one guest - the other guests
>> start at a certain resolution display-wise and stay that way until shutdown.
>>
>> This particular guest starts with a larger screen, then goes to roughly
>> 2/3rds of it, then tries to go back to the initial larger one - at which
>> point the above happens. I assume this corresponds to some pages being
>> removed and/or added. I'll test this theory more tomorrow - if it's
>> correct I should be able to reproduce the crash (with the patch) by
>> simply resetting the screen resolution (increasing it).
> 
> The trick is that p2m_change_type doesn't actually iterate over the
> entire p2m range, individually changing entries as it goes.  Instead
> it misconfigures the entries at the top-level, which causes the kinds
> of faults shown above.  As it gets faults for each entry, it checks
> the current type, the logdirty ranges, and the global logdirty bit to
> determine what the new types should be.
> 
> Your patch makes it so that all the altp2ms now get the
> misconfiguration when the logdirty range is changed; but clearly
> handling the misconfiguration isn't integrated properly with the
> altp2m system yet.  Doing it right may take some thought.

FWIW, the attached patch has solved the misconfig-related domain crash
for me (though I'm very likely missing some subtleties). It all seems to
work as expected when enabling altp2m and switching early to a new view.
However, now I have domUs with a frozen display when I disconnect the
introspection application (that is, after I switch back to the default
view and disable altp2m on the domain).


Thanks,
Razvan

[-- Attachment #2: altp2m_logdirty2.patch --]
[-- Type: text/x-patch, Size: 4312 bytes --]

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 14b5939..4530689 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -17,6 +17,7 @@
 
 #include <xen/domain_page.h>
 #include <xen/sched.h>
+#include <asm/altp2m.h>
 #include <asm/current.h>
 #include <asm/paging.h>
 #include <asm/types.h>
@@ -652,17 +653,38 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
 bool_t ept_handle_misconfig(uint64_t gpa)
 {
     struct vcpu *curr = current;
-    struct p2m_domain *p2m = p2m_get_hostp2m(curr->domain);
+    struct domain *d = curr->domain;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
     bool_t spurious;
-    int rc;
-
-    p2m_lock(p2m);
+    int rc = 0;
+    unsigned int i;
 
     spurious = curr->arch.hvm_vmx.ept_spurious_misconfig;
-    rc = resolve_misconfig(p2m, PFN_DOWN(gpa));
-    curr->arch.hvm_vmx.ept_spurious_misconfig = 0;
 
-    p2m_unlock(p2m);
+    if ( altp2m_active(d) )
+    {
+       for ( i = 0; i < MAX_ALTP2M; i++ )
+            if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
+            {
+                p2m = d->arch.altp2m_p2m[i];
+
+                p2m_lock(p2m);
+
+                rc = resolve_misconfig(p2m, PFN_DOWN(gpa));
+                curr->arch.hvm_vmx.ept_spurious_misconfig = 0;
+
+                p2m_unlock(p2m);
+            }
+    }
+    else
+    {
+        p2m_lock(p2m);
+
+        rc = resolve_misconfig(p2m, PFN_DOWN(gpa));
+        curr->arch.hvm_vmx.ept_spurious_misconfig = 0;
+
+        p2m_unlock(p2m);
+    }
 
     return spurious ? (rc >= 0) : (rc > 0);
 }
@@ -1375,8 +1397,15 @@ void setup_ept_dump(void)
 void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
 {
     struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
+    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
     struct ept_data *ept;
 
+    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
+    p2m->default_access = hostp2m->default_access;
+    p2m->domain = hostp2m->domain;
+    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
+    p2m->global_logdirty = hostp2m->global_logdirty;
+
     p2m->min_remapped_gfn = gfn_x(INVALID_GFN);
     p2m->max_remapped_gfn = 0;
     ept = &p2m->ept;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c53cab4..00f85e1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -28,6 +28,7 @@
 #include <xen/vm_event.h>
 #include <xen/event.h>
 #include <public/vm_event.h>
+#include <asm/altp2m.h>
 #include <asm/domain.h>
 #include <asm/page.h>
 #include <asm/paging.h>
@@ -248,7 +249,6 @@ int p2m_init(struct domain *d)
 int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start,
                           unsigned long end)
 {
-    ASSERT(p2m_is_hostp2m(p2m));
     if ( p2m->global_logdirty ||
          rangeset_contains_range(p2m->logdirty_ranges, start, end) )
         return 1;
@@ -964,12 +964,12 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l,
 }
 
 /* Modify the p2m type of a range of gfns from ot to nt. */
-void p2m_change_type_range(struct domain *d, 
-                           unsigned long start, unsigned long end,
-                           p2m_type_t ot, p2m_type_t nt)
+static void _p2m_change_type_range(struct p2m_domain *p2m,
+                                   unsigned long start, unsigned long end,
+                                   p2m_type_t ot, p2m_type_t nt)
 {
+    struct domain *d = p2m->domain;
     unsigned long gfn = start;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int rc = 0;
 
     ASSERT(ot != nt);
@@ -1022,6 +1022,23 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+void p2m_change_type_range(struct domain *d,
+                           unsigned long start, unsigned long end,
+                           p2m_type_t ot, p2m_type_t nt)
+{
+    unsigned int i;
+
+    if ( !altp2m_active(d) )
+    {
+        _p2m_change_type_range(p2m_get_hostp2m(d), start, end, ot, nt);
+        return;
+    }
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+        if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
+            _p2m_change_type_range(d->arch.altp2m_p2m[i], start, end, ot, nt);
+}
+
 /*
  * Finish p2m type change for gfns which are marked as need_recalc in a range.
  * Returns: 0/1 for success, negative for failure

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17  8:24               ` Razvan Cojocaru
@ 2018-04-17 10:49                 ` Razvan Cojocaru
  2018-04-17 10:50                   ` Razvan Cojocaru
  0 siblings, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-17 10:49 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel

On 04/17/2018 11:24 AM, Razvan Cojocaru wrote:
> On 04/16/2018 11:21 PM, George Dunlap wrote:
>> On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru
>> <rcojocaru@bitdefender.com> wrote:
>>> On 04/16/2018 08:47 PM, George Dunlap wrote:
>>>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>>>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>>>>> Debugging continues.
>>>>>
>>>>> Finally, the attached patch seems to get the display unstuck in my
>>>>> scenario, although for one guest I get:
>>>>>
>>>>> (XEN) d2v0 Unexpected vmexit: reason 49
>>>>> (XEN) domain_crash called from vmx.c:4120
>>>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>>>>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>>>>> (XEN) CPU:    1
>>>>> (XEN) RIP:    0010:[<fffff96000842354>]
>>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>>>>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>>>>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>>>>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>>>>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>>>>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>>>>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>>>>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>>>>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>>>>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>>>>
>>>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>>>>> somebody more familiar with the code can point to a more elegant
>>>>> solution if one exists.
>>>>
>>>> I think I have an idea what's going on, but it's complicated. :-)
>>>>
>>>> Basically, the logdirty functionality isn't simple, and needs careful
>>>> thought on how to integrate it.  I'll write some more tomorrow, and see
>>>> if I can come up with a solution.
>>>
>>> I think I know why this happens for the one guest - the other guests
>>> start at a certain resolution display-wise and stay that way until shutdown.
>>>
>>> This particular guest starts with a larger screen, then goes to roughly
>>> 2/3rds of it, then tries to go back to the initial larger one - at which
>>> point the above happens. I assume this corresponds to some pages being
>>> removed and/or added. I'll test this theory more tomorrow - if it's
>>> correct I should be able to reproduce the crash (with the patch) by
>>> simply resetting the screen resolution (increasing it).
>>
>> The trick is that p2m_change_type doesn't actually iterate over the
>> entire p2m range, individually changing entries as it goes.  Instead
>> it misconfigures the entries at the top-level, which causes the kinds
>> of faults shown above.  As it gets faults for each entry, it checks
>> the current type, the logdirty ranges, and the global logdirty bit to
>> determine what the new types should be.
>>
>> Your patch makes it so that all the altp2ms now get the
>> misconfiguration when the logdirty range is changed; but clearly
>> handling the misconfiguration isn't integrated properly with the
>> altp2m system yet.  Doing it right may take some thought.
> 
> FWIW, the attached patch has solved the misconfig-related domain crash
> for me (though I'm very likely missing some subtleties). It all seems to
> work as expected when enabling altp2m and switching early to a new view.
> However, now I have domUs with a frozen display when I disconnect the
> introspection application (that is, after I switch back to the default
> view and disable altp2m on the domain).

The for() loop in the previous patch is unnecessary, so here's a new
(cleaner) patch. I can't get the guest to freeze the display when
detaching anymore - unrelated to the for() - (so it might have been
something else in my setup), but I'll watch for it in the following days.

Hopefully this is either a reasonable fix or a basis for one.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17 10:49                 ` Razvan Cojocaru
@ 2018-04-17 10:50                   ` Razvan Cojocaru
  2018-04-17 13:53                     ` George Dunlap
  0 siblings, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-17 10:50 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel

[-- Attachment #1: Type: text/plain, Size: 4210 bytes --]

On 04/17/2018 01:49 PM, Razvan Cojocaru wrote:
> On 04/17/2018 11:24 AM, Razvan Cojocaru wrote:
>> On 04/16/2018 11:21 PM, George Dunlap wrote:
>>> On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru
>>> <rcojocaru@bitdefender.com> wrote:
>>>> On 04/16/2018 08:47 PM, George Dunlap wrote:
>>>>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>>>>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>>>>>> Debugging continues.
>>>>>>
>>>>>> Finally, the attached patch seems to get the display unstuck in my
>>>>>> scenario, although for one guest I get:
>>>>>>
>>>>>> (XEN) d2v0 Unexpected vmexit: reason 49
>>>>>> (XEN) domain_crash called from vmx.c:4120
>>>>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>>>>>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>>>>>> (XEN) CPU:    1
>>>>>> (XEN) RIP:    0010:[<fffff96000842354>]
>>>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>>>>>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>>>>>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>>>>>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>>>>>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>>>>>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>>>>>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>>>>>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>>>>>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>>>>>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>>>>>
>>>>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>>>>>> somebody more familiar with the code can point to a more elegant
>>>>>> solution if one exists.
>>>>>
>>>>> I think I have an idea what's going on, but it's complicated. :-)
>>>>>
>>>>> Basically, the logdirty functionality isn't simple, and needs careful
>>>>> thought on how to integrate it.  I'll write some more tomorrow, and see
>>>>> if I can come up with a solution.
>>>>
>>>> I think I know why this happens for the one guest - the other guests
>>>> start at a certain resolution display-wise and stay that way until shutdown.
>>>>
>>>> This particular guest starts with a larger screen, then goes to roughly
>>>> 2/3rds of it, then tries to go back to the initial larger one - at which
>>>> point the above happens. I assume this corresponds to some pages being
>>>> removed and/or added. I'll test this theory more tomorrow - if it's
>>>> correct I should be able to reproduce the crash (with the patch) by
>>>> simply resetting the screen resolution (increasing it).
>>>
>>> The trick is that p2m_change_type doesn't actually iterate over the
>>> entire p2m range, individually changing entries as it goes.  Instead
>>> it misconfigures the entries at the top-level, which causes the kinds
>>> of faults shown above.  As it gets faults for each entry, it checks
>>> the current type, the logdirty ranges, and the global logdirty bit to
>>> determine what the new types should be.
>>>
>>> Your patch makes it so that all the altp2ms now get the
>>> misconfiguration when the logdirty range is changed; but clearly
>>> handling the misconfiguration isn't integrated properly with the
>>> altp2m system yet.  Doing it right may take some thought.
>>
>> FWIW, the attached patch has solved the misconfig-related domain crash
>> for me (though I'm very likely missing some subtleties). It all seems to
>> work as expected when enabling altp2m and switching early to a new view.
>> However, now I have domUs with a frozen display when I disconnect the
>> introspection application (that is, after I switch back to the default
>> view and disable altp2m on the domain).
> 
> The for() loop in the previous patch is unnecessary, so here's a new
> (cleaner) patch. I can't get the guest to freeze the display when
> detaching anymore - unrelated to the for() - (so it might have been
> something else in my setup), but I'll watch for it in the following days.
> 
> Hopefully this is either a reasonable fix or a basis for one.

Apologies, forgot to actually attach the patch. Here it is.

[-- Attachment #2: altp2m_logdirty3.patch --]
[-- Type: text/x-patch, Size: 3306 bytes --]

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 14b5939..8495039 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -17,6 +17,7 @@
 
 #include <xen/domain_page.h>
 #include <xen/sched.h>
+#include <asm/altp2m.h>
 #include <asm/current.h>
 #include <asm/paging.h>
 #include <asm/types.h>
@@ -656,6 +657,9 @@ bool_t ept_handle_misconfig(uint64_t gpa)
     bool_t spurious;
     int rc;
 
+    if ( altp2m_active(curr->domain) )
+        p2m = p2m_get_altp2m(curr);
+
     p2m_lock(p2m);
 
     spurious = curr->arch.hvm_vmx.ept_spurious_misconfig;
@@ -1375,8 +1379,15 @@ void setup_ept_dump(void)
 void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
 {
     struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
+    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
     struct ept_data *ept;
 
+    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
+    p2m->default_access = hostp2m->default_access;
+    p2m->domain = hostp2m->domain;
+    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
+    p2m->global_logdirty = hostp2m->global_logdirty;
+
     p2m->min_remapped_gfn = gfn_x(INVALID_GFN);
     p2m->max_remapped_gfn = 0;
     ept = &p2m->ept;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c53cab4..00f85e1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -28,6 +28,7 @@
 #include <xen/vm_event.h>
 #include <xen/event.h>
 #include <public/vm_event.h>
+#include <asm/altp2m.h>
 #include <asm/domain.h>
 #include <asm/page.h>
 #include <asm/paging.h>
@@ -248,7 +249,6 @@ int p2m_init(struct domain *d)
 int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start,
                           unsigned long end)
 {
-    ASSERT(p2m_is_hostp2m(p2m));
     if ( p2m->global_logdirty ||
          rangeset_contains_range(p2m->logdirty_ranges, start, end) )
         return 1;
@@ -964,12 +964,12 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l,
 }
 
 /* Modify the p2m type of a range of gfns from ot to nt. */
-void p2m_change_type_range(struct domain *d, 
-                           unsigned long start, unsigned long end,
-                           p2m_type_t ot, p2m_type_t nt)
+static void _p2m_change_type_range(struct p2m_domain *p2m,
+                                   unsigned long start, unsigned long end,
+                                   p2m_type_t ot, p2m_type_t nt)
 {
+    struct domain *d = p2m->domain;
     unsigned long gfn = start;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int rc = 0;
 
     ASSERT(ot != nt);
@@ -1022,6 +1022,23 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+void p2m_change_type_range(struct domain *d,
+                           unsigned long start, unsigned long end,
+                           p2m_type_t ot, p2m_type_t nt)
+{
+    unsigned int i;
+
+    if ( !altp2m_active(d) )
+    {
+        _p2m_change_type_range(p2m_get_hostp2m(d), start, end, ot, nt);
+        return;
+    }
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+        if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
+            _p2m_change_type_range(d->arch.altp2m_p2m[i], start, end, ot, nt);
+}
+
 /*
  * Finish p2m type change for gfns which are marked as need_recalc in a range.
  * Returns: 0/1 for success, negative for failure

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17 10:50                   ` Razvan Cojocaru
@ 2018-04-17 13:53                     ` George Dunlap
  2018-04-17 14:21                       ` Razvan Cojocaru
  2018-04-18  8:20                       ` Razvan Cojocaru
  0 siblings, 2 replies; 31+ messages in thread
From: George Dunlap @ 2018-04-17 13:53 UTC (permalink / raw)
  To: Razvan Cojocaru, George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>  {
>      struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
> +    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>      struct ept_data *ept;
>  
> +    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
> +    p2m->default_access = hostp2m->default_access;
> +    p2m->domain = hostp2m->domain;
> +    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
> +    p2m->global_logdirty = hostp2m->global_logdirty;

This would certainly be one approach.  But then we'd need to keep a lot
more of these things in sync -- for instance, we'd have to have similar
code to enable and disable global_logdirty on all active altp2m entries.

I also don't think the max_mapped_pfn should be copied here; the fact
that updates got filtered out before was a red herring I think.

Another approach would be to maintain the logdirty_ranges and
global_logdirty only for the host p2m, but to misconfigure entries for
all the p2ms; and then on a misconfiguration, handle the
misconfiguration for the hostp2m and then do a lazy propagate for the
altp2m.  On the whole that's probably more error-prone than just doing a
for() loop, though, and not that much faster. :-)

The other thing that seems to be missing from synchronization is that in
p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
part of the eptp).  The code seems to indicate that this is required for
PML (hardware-assisted logdirty), but I don't see anywhere this is set
or copied from the host p2m.

It might be nice to have a more structured way of keeping all these
changes in sync, rather than relying on this open-coding everywhere.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17 13:53                     ` George Dunlap
@ 2018-04-17 14:21                       ` Razvan Cojocaru
  2018-04-17 14:58                         ` George Dunlap
  2018-04-18  8:20                       ` Razvan Cojocaru
  1 sibling, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-17 14:21 UTC (permalink / raw)
  To: George Dunlap, George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 04/17/2018 04:53 PM, George Dunlap wrote:
> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>>  {
>>      struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
>> +    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>      struct ept_data *ept;
>>  
>> +    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>> +    p2m->default_access = hostp2m->default_access;
>> +    p2m->domain = hostp2m->domain;
>> +    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
>> +    p2m->global_logdirty = hostp2m->global_logdirty;
> 
> This would certainly be one approach.  But then we'd need to keep a lot
> more of these things in sync -- for instance, we'd have to have similar
> code to enable and disable global_logdirty on all active altp2m entries.
> 
> I also don't think the max_mapped_pfn should be copied here; the fact
> that updates got filtered out before was a red herring I think.

I initially thought so too, and now I've commented out just that one
line to remember why I couldn't remove, and the reason is this:

(XEN) Assertion 's <= e' failed at rangeset.c:121
(XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080228826>] rangeset_add_range+0x5/0x1e6
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v0)
(XEN) rax: 0000000000000000   rbx: ffff830b577f92e0   rcx: 00000000000f0000
(XEN) rdx: 0000000000000000   rsi: 00000000000f0000   rdi: ffff830ad6a1ce50
(XEN) rbp: ffff83007ce87c78   rsp: ffff83007ce87c20   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 000000000000006f   r11: 0000000000000028
(XEN) r12: 0000000000000002   r13: 0000000000000000   r14: 0000000000000001
(XEN) r15: ffff830ad6ddd000   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 0000000ad714f000   cr2: 0000000000c12000
(XEN) fsb: 00007f794c7b2700   gsb: ffff880276c00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d080228826> (rangeset_add_range+0x5/0x1e6):
(XEN)  00 5d c3 48 39 d6 76 02 <0f> 0b 55 48 89 e5 41 56 41 55 41 54 53
49 89 d5
(XEN) Xen stack trace from rsp=ffff83007ce87c20:
(XEN)    ffff82d08032c644 0000000000000206 0000000000000010 0000000000000028
(XEN)    00000000000f0000 ffff82c000217000 0000000000000000 ffff830ad6ddd000
(XEN)    00000000000f0240 0000000000000000 0000000000000002 ffff83007ce87cc8
(XEN)    ffff82d08032c7ac 0000000000000240 00000000000f0000 ffff82c000217000
(XEN)    ffff830ad6ddd000 00000000000f0240 0000000000000048 ffff82c000217000
(XEN)    00000000000f0000 ffff83007ce87d38 ffff82d080362ade ffff830c5bb20000
(XEN)    ffff830ad6ddd650 0000000000000000 0000000000000000 00007f794c7bd004
(XEN)    0000000000000048 ffff830c5bb20000 0000000000000000 ffff83007ce87e00
(XEN)    ffffffffffffffea ffff82d0802e9c64 deadbeefdeadf00d ffff83007ce87de8
(XEN)    ffff82d0802e92c1 ffff830c5bb20000 ffff83007d616000 0000000000000000
(XEN)    ffff83007ce87dd8 0000000000000000 ffff83007ce87df8 ffff83007ce87df4
(XEN)    ffff82e016a5de40 ffff83007d616000 0000000000000007 0000000000000240
(XEN)    00000000000f0000 ffff83007ce87dc8 ffff830ad6ddd000 ffff83007ce87dc8
(XEN)    0000000000000002 0000000000000001 00007f794c7bf004 ffff82d0802e9c64
(XEN)    deadbeefdeadf00d ffff83007ce87e48 ffff82d0802e9ce1 ffff82d080374434
(XEN)    0000000280370001 00007f794c7be004 0000000000000020 00007f794c7bd004
(XEN)    0000000000000048 ffff82d080374434 ffff83007ce87ef8 ffff83007d616000
(XEN)    0000000000000029 ffff83007ce87ee8 ffff82d08036d8a6 03ff82d080374434
(XEN)    0000000000000001 0000000000000002 00007f794c7bf004 deadbeefdeadf00d
(XEN)    deadbeefdeadf00d ffff82d080374434 ffff82d080374428 ffff82d080374434
(XEN) Xen call trace:
(XEN)    [<ffff82d080228826>] rangeset_add_range+0x5/0x1e6
(XEN)    [<ffff82d08032c7ac>] p2m_change_type_range+0x66/0x7f
(XEN)    [<ffff82d080362ade>] hap_track_dirty_vram+0x240/0x491
(XEN)    [<ffff82d0802e92c1>] dm.c#dm_op+0x45c/0xd06
(XEN)    [<ffff82d0802e9ce1>] do_dm_op+0x7d/0xb3
(XEN)    [<ffff82d08036d8a6>] pv_hypercall+0x1f4/0x440
(XEN)    [<ffff82d080374495>] lstar_enter+0x115/0x120
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 's <= e' failed at rangeset.c:121
(XEN) ****************************************

> Another approach would be to maintain the logdirty_ranges and
> global_logdirty only for the host p2m, but to misconfigure entries for
> all the p2ms; and then on a misconfiguration, handle the
> misconfiguration for the hostp2m and then do a lazy propagate for the
> altp2m.  On the whole that's probably more error-prone than just doing a
> for() loop, though, and not that much faster. :-)
We can try that too.

> The other thing that seems to be missing from synchronization is that in
> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
> part of the eptp).  The code seems to indicate that this is required for
> PML (hardware-assisted logdirty), but I don't see anywhere this is set
> or copied from the host p2m.
> 
> It might be nice to have a more structured way of keeping all these
> changes in sync, rather than relying on this open-coding everywhere.

Very true. It has also occured to me that some of these issues would be
at least partially mitigated if altp2m was always on, but of course I
can also see why that would be frowned upon at this time.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17 14:21                       ` Razvan Cojocaru
@ 2018-04-17 14:58                         ` George Dunlap
  2018-04-17 15:13                           ` Razvan Cojocaru
  0 siblings, 1 reply; 31+ messages in thread
From: George Dunlap @ 2018-04-17 14:58 UTC (permalink / raw)
  To: Razvan Cojocaru, George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 04/17/2018 03:21 PM, Razvan Cojocaru wrote:
> On 04/17/2018 04:53 PM, George Dunlap wrote:
>> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>>>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>>>  {
>>>      struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
>>> +    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>>      struct ept_data *ept;
>>>  
>>> +    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>>> +    p2m->default_access = hostp2m->default_access;
>>> +    p2m->domain = hostp2m->domain;
>>> +    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
>>> +    p2m->global_logdirty = hostp2m->global_logdirty;
>>
>> This would certainly be one approach.  But then we'd need to keep a lot
>> more of these things in sync -- for instance, we'd have to have similar
>> code to enable and disable global_logdirty on all active altp2m entries.
>>
>> I also don't think the max_mapped_pfn should be copied here; the fact
>> that updates got filtered out before was a red herring I think.
> 
> I initially thought so too, and now I've commented out just that one
> line to remember why I couldn't remove, and the reason is this:
> 
> (XEN) Assertion 's <= e' failed at rangeset.c:121
> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d080228826>] rangeset_add_range+0x5/0x1e6
> (XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v0)
> (XEN) rax: 0000000000000000   rbx: ffff830b577f92e0   rcx: 00000000000f0000
> (XEN) rdx: 0000000000000000   rsi: 00000000000f0000   rdi: ffff830ad6a1ce50
> (XEN) rbp: ffff83007ce87c78   rsp: ffff83007ce87c20   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 000000000000006f   r11: 0000000000000028
> (XEN) r12: 0000000000000002   r13: 0000000000000000   r14: 0000000000000001
> (XEN) r15: ffff830ad6ddd000   cr0: 0000000080050033   cr4: 00000000003526e0
> (XEN) cr3: 0000000ad714f000   cr2: 0000000000c12000
> (XEN) fsb: 00007f794c7b2700   gsb: ffff880276c00000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen code around <ffff82d080228826> (rangeset_add_range+0x5/0x1e6):
> (XEN)  00 5d c3 48 39 d6 76 02 <0f> 0b 55 48 89 e5 41 56 41 55 41 54 53
> 49 89 d5
> (XEN) Xen stack trace from rsp=ffff83007ce87c20:
> (XEN)    ffff82d08032c644 0000000000000206 0000000000000010 0000000000000028
> (XEN)    00000000000f0000 ffff82c000217000 0000000000000000 ffff830ad6ddd000
> (XEN)    00000000000f0240 0000000000000000 0000000000000002 ffff83007ce87cc8
> (XEN)    ffff82d08032c7ac 0000000000000240 00000000000f0000 ffff82c000217000
> (XEN)    ffff830ad6ddd000 00000000000f0240 0000000000000048 ffff82c000217000
> (XEN)    00000000000f0000 ffff83007ce87d38 ffff82d080362ade ffff830c5bb20000
> (XEN)    ffff830ad6ddd650 0000000000000000 0000000000000000 00007f794c7bd004
> (XEN)    0000000000000048 ffff830c5bb20000 0000000000000000 ffff83007ce87e00
> (XEN)    ffffffffffffffea ffff82d0802e9c64 deadbeefdeadf00d ffff83007ce87de8
> (XEN)    ffff82d0802e92c1 ffff830c5bb20000 ffff83007d616000 0000000000000000
> (XEN)    ffff83007ce87dd8 0000000000000000 ffff83007ce87df8 ffff83007ce87df4
> (XEN)    ffff82e016a5de40 ffff83007d616000 0000000000000007 0000000000000240
> (XEN)    00000000000f0000 ffff83007ce87dc8 ffff830ad6ddd000 ffff83007ce87dc8
> (XEN)    0000000000000002 0000000000000001 00007f794c7bf004 ffff82d0802e9c64
> (XEN)    deadbeefdeadf00d ffff83007ce87e48 ffff82d0802e9ce1 ffff82d080374434
> (XEN)    0000000280370001 00007f794c7be004 0000000000000020 00007f794c7bd004
> (XEN)    0000000000000048 ffff82d080374434 ffff83007ce87ef8 ffff83007d616000
> (XEN)    0000000000000029 ffff83007ce87ee8 ffff82d08036d8a6 03ff82d080374434
> (XEN)    0000000000000001 0000000000000002 00007f794c7bf004 deadbeefdeadf00d
> (XEN)    deadbeefdeadf00d ffff82d080374434 ffff82d080374428 ffff82d080374434
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080228826>] rangeset_add_range+0x5/0x1e6
> (XEN)    [<ffff82d08032c7ac>] p2m_change_type_range+0x66/0x7f
> (XEN)    [<ffff82d080362ade>] hap_track_dirty_vram+0x240/0x491
> (XEN)    [<ffff82d0802e92c1>] dm.c#dm_op+0x45c/0xd06
> (XEN)    [<ffff82d0802e9ce1>] do_dm_op+0x7d/0xb3
> (XEN)    [<ffff82d08036d8a6>] pv_hypercall+0x1f4/0x440
> (XEN)    [<ffff82d080374495>] lstar_enter+0x115/0x120
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 's <= e' failed at rangeset.c:121
> (XEN) ****************************************

Right, but this is basically a bug in p2m_change_type_range(), where it
handles end > max_mapped_pfn, but not start > max_mapped_pfn.  It should
check the latter just after grabbing the lock and bail if true.  (This
should probably be in a separate patch, as it's really a generic bug in
p2m_change_type_range().)

>> Another approach would be to maintain the logdirty_ranges and
>> global_logdirty only for the host p2m, but to misconfigure entries for
>> all the p2ms; and then on a misconfiguration, handle the
>> misconfiguration for the hostp2m and then do a lazy propagate for the
>> altp2m.  On the whole that's probably more error-prone than just doing a
>> for() loop, though, and not that much faster. :-)
> We can try that too.

If you want -- but as I said, arguably your approach is more robust;
just incomplete.

>> The other thing that seems to be missing from synchronization is that in
>> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
>> part of the eptp).  The code seems to indicate that this is required for
>> PML (hardware-assisted logdirty), but I don't see anywhere this is set
>> or copied from the host p2m.
>>
>> It might be nice to have a more structured way of keeping all these
>> changes in sync, rather than relying on this open-coding everywhere.
> 
> Very true. It has also occured to me that some of these issues would be
> at least partially mitigated if altp2m was always on, but of course I
> can also see why that would be frowned upon at this time.

How would having it enabled all the time have helped in this situation?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17 14:58                         ` George Dunlap
@ 2018-04-17 15:13                           ` Razvan Cojocaru
  2018-04-17 17:07                             ` Tamas K Lengyel
  0 siblings, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-17 15:13 UTC (permalink / raw)
  To: George Dunlap, George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 04/17/2018 05:58 PM, George Dunlap wrote:
>>> It might be nice to have a more structured way of keeping all these
>>> changes in sync, rather than relying on this open-coding everywhere.
>>
>> Very true. It has also occured to me that some of these issues would be
>> at least partially mitigated if altp2m was always on, but of course I
>> can also see why that would be frowned upon at this time.
> 
> How would having it enabled all the time have helped in this situation?

Well, the default p2m would just be
d->arch.altp2m_p2m[0], all altp2m_active(d) checks would be gone, the
active p2m for a given VCPU would always be p2m_get_altp2m(v), and so
on. If I understand the design correctly, that is. :)


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17 15:13                           ` Razvan Cojocaru
@ 2018-04-17 17:07                             ` Tamas K Lengyel
  0 siblings, 0 replies; 31+ messages in thread
From: Tamas K Lengyel @ 2018-04-17 17:07 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Tian, Kevin, Andrew Cooper, George Dunlap, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel

On Tue, Apr 17, 2018 at 9:13 AM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 04/17/2018 05:58 PM, George Dunlap wrote:
>>>> It might be nice to have a more structured way of keeping all these
>>>> changes in sync, rather than relying on this open-coding everywhere.
>>>
>>> Very true. It has also occured to me that some of these issues would be
>>> at least partially mitigated if altp2m was always on, but of course I
>>> can also see why that would be frowned upon at this time.
>>
>> How would having it enabled all the time have helped in this situation?
>
> Well, the default p2m would just be
> d->arch.altp2m_p2m[0], all altp2m_active(d) checks would be gone, the
> active p2m for a given VCPU would always be p2m_get_altp2m(v), and so
> on. If I understand the design correctly, that is. :)

I would also like this setup, it would simplify things a lot.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-17 13:53                     ` George Dunlap
  2018-04-17 14:21                       ` Razvan Cojocaru
@ 2018-04-18  8:20                       ` Razvan Cojocaru
  2018-04-18 10:45                         ` George Dunlap
  1 sibling, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-18  8:20 UTC (permalink / raw)
  To: George Dunlap, George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 04/17/2018 04:53 PM, George Dunlap wrote:
> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>>  {
>>      struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
>> +    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>      struct ept_data *ept;
>>  
>> +    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>> +    p2m->default_access = hostp2m->default_access;
>> +    p2m->domain = hostp2m->domain;
>> +    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
>> +    p2m->global_logdirty = hostp2m->global_logdirty;
> 
> This would certainly be one approach.  But then we'd need to keep a lot
> more of these things in sync -- for instance, we'd have to have similar
> code to enable and disable global_logdirty on all active altp2m entries.
>
> [...]
> 
> The other thing that seems to be missing from synchronization is that in
> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
> part of the eptp).  The code seems to indicate that this is required for
> PML (hardware-assisted logdirty), but I don't see anywhere this is set
> or copied from the host p2m.
> 
> It might be nice to have a more structured way of keeping all these
> changes in sync, rather than relying on this open-coding everywhere.

For logdirty_ranges and global_logdirty, I propose that we put them both
in a dynamically allocated struct, and have all p2ms share a pointer to
them. That way, all that's required is for the pointer to be set up in
p2m_init_altp2m_ept() and the actual data will be automatically shared
between p2ms. If I've read the code correctly, the hostp2m is the last
to be destroyed and the first to be initialized, so it should work well
as long as all p2ms synchronize access to logdirty_ranges and
global_logdirty (which I assume they already do).

ept.ad probably needs to be propagated on each change and copied from
the hostp2m on altp2m init.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-18  8:20                       ` Razvan Cojocaru
@ 2018-04-18 10:45                         ` George Dunlap
  2018-04-18 10:49                           ` Razvan Cojocaru
  0 siblings, 1 reply; 31+ messages in thread
From: George Dunlap @ 2018-04-18 10:45 UTC (permalink / raw)
  To: Razvan Cojocaru, George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 04/18/2018 09:20 AM, Razvan Cojocaru wrote:
> On 04/17/2018 04:53 PM, George Dunlap wrote:
>> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>>>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>>>  {
>>>      struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
>>> +    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>>      struct ept_data *ept;
>>>  
>>> +    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>>> +    p2m->default_access = hostp2m->default_access;
>>> +    p2m->domain = hostp2m->domain;
>>> +    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
>>> +    p2m->global_logdirty = hostp2m->global_logdirty;
>>
>> This would certainly be one approach.  But then we'd need to keep a lot
>> more of these things in sync -- for instance, we'd have to have similar
>> code to enable and disable global_logdirty on all active altp2m entries.
>>
>> [...]
>>
>> The other thing that seems to be missing from synchronization is that in
>> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
>> part of the eptp).  The code seems to indicate that this is required for
>> PML (hardware-assisted logdirty), but I don't see anywhere this is set
>> or copied from the host p2m.
>>
>> It might be nice to have a more structured way of keeping all these
>> changes in sync, rather than relying on this open-coding everywhere.
> 
> For logdirty_ranges and global_logdirty, I propose that we put them both
> in a dynamically allocated struct, and have all p2ms share a pointer to
> them. That way, all that's required is for the pointer to be set up in
> p2m_init_altp2m_ept() and the actual data will be automatically shared
> between p2ms. If I've read the code correctly, the hostp2m is the last
> to be destroyed and the first to be initialized, so it should work well
> as long as all p2ms synchronize access to logdirty_ranges and
> global_logdirty (which I assume they already do).

That's an interesting idea; one potential disadvantage is that it would
make locking even more complex than it already is.  Enabling / disabling
logdirty isn't a hot path, so looping through the altp2ms shouldn't have
much of a performance impact; *reading* is much more common, so having
to grab an extra set of locks is more likely to have a performance
impact.  And it's not clear to me that the complexity of keeping several
copies in sync is that much higher than the complexity of adding Yet
Another MM Lock.

Those are just initial reactions though -- feel free to explore the
solution space and/or argue otherwise. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-04-18 10:45                         ` George Dunlap
@ 2018-04-18 10:49                           ` Razvan Cojocaru
  0 siblings, 0 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-04-18 10:49 UTC (permalink / raw)
  To: George Dunlap, George Dunlap
  Cc: Tian, Kevin, Tamas K Lengyel, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 04/18/2018 01:45 PM, George Dunlap wrote:
> On 04/18/2018 09:20 AM, Razvan Cojocaru wrote:
>> On 04/17/2018 04:53 PM, George Dunlap wrote:
>>> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>>>>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>>>>  {
>>>>      struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
>>>> +    struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>>>      struct ept_data *ept;
>>>>  
>>>> +    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>>>> +    p2m->default_access = hostp2m->default_access;
>>>> +    p2m->domain = hostp2m->domain;
>>>> +    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
>>>> +    p2m->global_logdirty = hostp2m->global_logdirty;
>>>
>>> This would certainly be one approach.  But then we'd need to keep a lot
>>> more of these things in sync -- for instance, we'd have to have similar
>>> code to enable and disable global_logdirty on all active altp2m entries.
>>>
>>> [...]
>>>
>>> The other thing that seems to be missing from synchronization is that in
>>> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
>>> part of the eptp).  The code seems to indicate that this is required for
>>> PML (hardware-assisted logdirty), but I don't see anywhere this is set
>>> or copied from the host p2m.
>>>
>>> It might be nice to have a more structured way of keeping all these
>>> changes in sync, rather than relying on this open-coding everywhere.
>>
>> For logdirty_ranges and global_logdirty, I propose that we put them both
>> in a dynamically allocated struct, and have all p2ms share a pointer to
>> them. That way, all that's required is for the pointer to be set up in
>> p2m_init_altp2m_ept() and the actual data will be automatically shared
>> between p2ms. If I've read the code correctly, the hostp2m is the last
>> to be destroyed and the first to be initialized, so it should work well
>> as long as all p2ms synchronize access to logdirty_ranges and
>> global_logdirty (which I assume they already do).
> 
> That's an interesting idea; one potential disadvantage is that it would
> make locking even more complex than it already is.  Enabling / disabling
> logdirty isn't a hot path, so looping through the altp2ms shouldn't have
> much of a performance impact; *reading* is much more common, so having
> to grab an extra set of locks is more likely to have a performance
> impact.  And it's not clear to me that the complexity of keeping several
> copies in sync is that much higher than the complexity of adding Yet
> Another MM Lock.
> 
> Those are just initial reactions though -- feel free to explore the
> solution space and/or argue otherwise. :-)

No, you're right, there's no point in complicating things, I'll just
loop over the p2ms.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-10-03 10:56   ` Сергей
  2018-10-03 11:02     ` Razvan Cojocaru
@ 2018-10-04  9:17     ` Razvan Cojocaru
  1 sibling, 0 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-10-04  9:17 UTC (permalink / raw)
  To: Сергей; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On 10/3/18 1:56 PM, Сергей wrote:
>> No yet, we're working on it.
> Could You point me to the branch with Your patches please? I Could not find it in https://xenbits.xen.org/gitweb/?p=xen.git

I've attached the current version of the patch, which needs George's
comments addressed (so we already know it needs more work) - but it does
fix the frozen display issue 99% of the time.

You need to apply "[PATCH V3] x86/altp2m: propagate ept.ad changes to
all active altp2ms" first.


Razvan

[-- Attachment #2: initial_hack.patch --]
[-- Type: text/x-patch, Size: 5038 bytes --]

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 3d228c2..7317e50 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -657,6 +657,9 @@ bool_t ept_handle_misconfig(uint64_t gpa)
     bool_t spurious;
     int rc;
 
+    if ( altp2m_active(curr->domain) )
+        p2m = p2m_get_altp2m(curr);
+
     p2m_lock(p2m);
 
     spurious = curr->arch.hvm.vmx.ept_spurious_misconfig;
@@ -1436,6 +1439,11 @@ void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
     struct ept_data *ept;
 
     p2m->ept.ad = hostp2m->ept.ad;
+    p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
+    p2m->default_access = hostp2m->default_access;
+    p2m->domain = hostp2m->domain;
+    p2m->logdirty_ranges = hostp2m->logdirty_ranges;
+    p2m->global_logdirty = hostp2m->global_logdirty;
     p2m->min_remapped_gfn = gfn_x(INVALID_GFN);
     p2m->max_remapped_gfn = 0;
     ept = &p2m->ept;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d90c624..2d59295 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -28,6 +28,7 @@
 #include <xen/vm_event.h>
 #include <xen/event.h>
 #include <public/vm_event.h>
+#include <asm/altp2m.h>
 #include <asm/domain.h>
 #include <asm/page.h>
 #include <asm/paging.h>
@@ -255,7 +256,6 @@ int p2m_init(struct domain *d)
 int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start,
                           unsigned long end)
 {
-    ASSERT(p2m_is_hostp2m(p2m));
     if ( p2m->global_logdirty ||
          rangeset_contains_range(p2m->logdirty_ranges, start, end) )
         return 1;
@@ -264,11 +264,9 @@ int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start,
     return 0;
 }
 
-void p2m_change_entry_type_global(struct domain *d,
-                                  p2m_type_t ot, p2m_type_t nt)
+static void _p2m_change_entry_type_global(struct p2m_domain *p2m,
+                                          p2m_type_t ot, p2m_type_t nt)
 {
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
     ASSERT(ot != nt);
     ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
 
@@ -278,10 +276,25 @@ void p2m_change_entry_type_global(struct domain *d,
     p2m_unlock(p2m);
 }
 
-void p2m_memory_type_changed(struct domain *d)
+void p2m_change_entry_type_global(struct domain *d,
+                                  p2m_type_t ot, p2m_type_t nt)
 {
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+#ifdef CONFIG_HVM
+    if ( unlikely(altp2m_active(d)) )
+    {
+        unsigned int i;
 
+        for ( i = 0; i < MAX_ALTP2M; i++ )
+            if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
+                _p2m_change_entry_type_global(d->arch.altp2m_p2m[i], ot, nt);
+    }
+#endif
+
+    _p2m_change_entry_type_global(p2m_get_hostp2m(d), ot, nt);
+}
+
+void _p2m_memory_type_changed(struct p2m_domain *p2m)
+{
     if ( p2m->memory_type_changed )
     {
         p2m_lock(p2m);
@@ -290,6 +303,22 @@ void p2m_memory_type_changed(struct domain *d)
     }
 }
 
+void p2m_memory_type_changed(struct domain *d)
+{
+#ifdef CONFIG_HVM
+    if ( unlikely(altp2m_active(d)) )
+    {
+        unsigned int i;
+
+        for ( i = 0; i < MAX_ALTP2M; i++ )
+            if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
+                _p2m_memory_type_changed(d->arch.altp2m_p2m[i]);
+    }
+#endif
+
+    _p2m_memory_type_changed(p2m_get_hostp2m(d));
+}
+
 int p2m_set_ioreq_server(struct domain *d,
                          unsigned int flags,
                          struct hvm_ioreq_server *s)
@@ -967,12 +996,12 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l,
 }
 
 /* Modify the p2m type of a range of gfns from ot to nt. */
-void p2m_change_type_range(struct domain *d, 
-                           unsigned long start, unsigned long end,
-                           p2m_type_t ot, p2m_type_t nt)
+static void _p2m_change_type_range(struct p2m_domain *p2m,
+                                   unsigned long start, unsigned long end,
+                                   p2m_type_t ot, p2m_type_t nt)
 {
+    struct domain *d = p2m->domain;
     unsigned long gfn = start;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int rc = 0;
 
     ASSERT(ot != nt);
@@ -1025,6 +1054,25 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+void p2m_change_type_range(struct domain *d,
+                           unsigned long start, unsigned long end,
+                           p2m_type_t ot, p2m_type_t nt)
+{
+#ifdef CONFIG_HVM
+    if ( unlikely(altp2m_active(d)) )
+    {
+        unsigned int i;
+
+        for ( i = 0; i < MAX_ALTP2M; i++ )
+            if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
+                _p2m_change_type_range(d->arch.altp2m_p2m[i], start, end, ot,
+                                       nt);
+    }
+#endif
+
+    _p2m_change_type_range(p2m_get_hostp2m(d), start, end, ot, nt);
+}
+
 /*
  * Finish p2m type change for gfns which are marked as need_recalc in a range.
  * Returns: 0/1 for success, negative for failure

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-10-03 10:56   ` Сергей
@ 2018-10-03 11:02     ` Razvan Cojocaru
  2018-10-04  9:17     ` Razvan Cojocaru
  1 sibling, 0 replies; 31+ messages in thread
From: Razvan Cojocaru @ 2018-10-03 11:02 UTC (permalink / raw)
  To: Сергей; +Cc: xen-devel

On 10/3/18 1:56 PM, Сергей wrote:
>> No yet, we're working on it.
> Could You point me to the branch with Your patches please? I Could not find it in https://xenbits.xen.org/gitweb/?p=xen.git

There's no public branch with my patches, I'm working locally. The
original patch has now split into two patches, the first one of which is
currently "[PATCH V3] x86/altp2m: propagate ept.ad changes to all active
altp2ms" (pending review on xen-devel), and the second one depends on
the final form of the first one and is not yet completely written.


Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
  2018-10-02 17:59 ` Razvan Cojocaru
@ 2018-10-03 10:56   ` Сергей
  2018-10-03 11:02     ` Razvan Cojocaru
  2018-10-04  9:17     ` Razvan Cojocaru
  0 siblings, 2 replies; 31+ messages in thread
From: Сергей @ 2018-10-03 10:56 UTC (permalink / raw)
  To: Razvan Cojocaru; +Cc: xen-devel

> No yet, we're working on it.
Could You point me to the branch with Your patches please? I Could not find it in https://xenbits.xen.org/gitweb/?p=xen.git

With best regards
Sergey Kovalev.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Weird altp2m behaviour when switching early to a new view
  2018-10-02 16:29 Сергей
@ 2018-10-02 17:59 ` Razvan Cojocaru
  2018-10-03 10:56   ` Сергей
  0 siblings, 1 reply; 31+ messages in thread
From: Razvan Cojocaru @ 2018-10-02 17:59 UTC (permalink / raw)
  To: Сергей; +Cc: xen-devel

On 10/2/18 7:29 PM, Сергей wrote:
> Hello Razvan.
> 
> Have Your patch been accepted in Xen hypervisor?
> 
> Searching through git I have found commit "61bdddb82151fbf51c58f6ebc1b4a687942c45a8" on "Thu Jun 28 10:54:01 2018 +0300". Is that commit deals with the error?

No yet, we're working on it. The commit you mention deals with something
else.


Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Weird altp2m behaviour when switching early to a new view
@ 2018-10-02 16:29 Сергей
  2018-10-02 17:59 ` Razvan Cojocaru
  0 siblings, 1 reply; 31+ messages in thread
From: Сергей @ 2018-10-02 16:29 UTC (permalink / raw)
  To: rcojocaru; +Cc: xen-devel

Hello Razvan.

Have Your patch been accepted in Xen hypervisor?

Searching through git I have found commit "61bdddb82151fbf51c58f6ebc1b4a687942c45a8" on "Thu Jun 28 10:54:01 2018 +0300". Is that commit deals with the error?

With best regards
Sergey Kovalev.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-10-04  9:17 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-08 20:38 Weird altp2m behaviour when switching early to a new view Razvan Cojocaru
2018-04-09 14:12 ` George Dunlap
2018-04-11  6:39   ` Razvan Cojocaru
2018-04-11  8:04     ` Razvan Cojocaru
2018-04-12 16:15       ` Razvan Cojocaru
2018-04-13 14:44       ` Razvan Cojocaru
2018-04-13 16:38         ` Tamas K Lengyel
2018-04-13 17:04           ` Razvan Cojocaru
2018-04-16 17:47         ` George Dunlap
2018-04-16 18:46           ` Razvan Cojocaru
2018-04-16 20:21             ` George Dunlap
2018-04-17  7:24               ` Razvan Cojocaru
2018-04-17  8:24               ` Razvan Cojocaru
2018-04-17 10:49                 ` Razvan Cojocaru
2018-04-17 10:50                   ` Razvan Cojocaru
2018-04-17 13:53                     ` George Dunlap
2018-04-17 14:21                       ` Razvan Cojocaru
2018-04-17 14:58                         ` George Dunlap
2018-04-17 15:13                           ` Razvan Cojocaru
2018-04-17 17:07                             ` Tamas K Lengyel
2018-04-18  8:20                       ` Razvan Cojocaru
2018-04-18 10:45                         ` George Dunlap
2018-04-18 10:49                           ` Razvan Cojocaru
2018-04-11 20:17     ` Tamas K Lengyel
2018-04-12  7:19       ` Razvan Cojocaru
2018-04-09 15:40 ` Alexey G
2018-10-02 16:29 Сергей
2018-10-02 17:59 ` Razvan Cojocaru
2018-10-03 10:56   ` Сергей
2018-10-03 11:02     ` Razvan Cojocaru
2018-10-04  9:17     ` Razvan Cojocaru

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.