Thanks for the clarification, Andrew. On Tue, Feb 20, 2018 at 5:20 PM, Andrew Cooper wrote: > On 21/02/2018 00:42, Andres Lagar Cavilla wrote: > > Hello everyone, > > > > I was thinking of the traditional Xen PV mode in which page table > > pages are write protected from guest meddling and PTE > > modifications are audited by the hypervisor (ptwr_emulated_update() > > these days, still?). > > Something like that, yes. Alternatively, via explicit hypercall which > is faster than the trap&emulate path. > > > Without software shadows or paging to e.g. an EPT, native PV loads the > > actual CR3 pointing to a write protected page table tree. > > Unfortunately, I've lost you here. There is no such thing as a > write-protected pagetable tree in the traditional PV sense. > > > When the cr3 is loaded, the hardware walker will want to set A and D > > bits in PTEs -- is this action immune to the write protection in the > > page table pages themselves? Or do we take emulation faults on these > > updates as well? > > The protection that Xen enforces on PV guests is that an L1 PTE mapping > a pagetable frame must never be writeable. This protection happens at > the linear address level. When the CPU pagewalker tries to set A/D > bits, it issues an atomic update to the physical address of the > pagetable entry which needs updating. > > As with everything, there are complicating factors. With EPT/NPT for > HVM guests these days, the hypervisor can also apply permissions to > guest physical addresses, as part of their translation to host physical > addresses. The hardware pagewalker, when attempting to set an A/D bit > of the HVM guests regular pagetables, issues an EPT/NPT write (well - > RMW strictly) to set the bits. > > Therefore, if the hypervisor marks an HVM guest's pagetable as > read-only, then the hardware pagewalker trying to set A/D bits will > vmexit with an EPT/NPT permissions violation. This is one major > performance limiting factor of introspection technology at the moment. > Indeed, this is what I was coming at. In my experience guests will be very adversely affected if we just latch the D bits to 1 unilaterally (it's legal to do so by the "hardware"), as they will be led to believe file cache pages are in constant need of writeback. (and A bits latched to 1 turn e.g. Linux's vmscan.c into a crapshoot) So this is currently not too hopeful Thanks again Andres > > ~Andrew >