All of lore.kernel.org
 help / color / mirror / Atom feed
* PML (Page Modification Logging) design for Xen
@ 2015-02-11  8:28 Kai Huang
  2015-02-11 11:52 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-11  8:28 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, keir, xen-devel

Hi all,

PML (Page Modification Logging) is a new feature on Intel's Boardwell server platfrom targeted to reduce overhead of dirty logging mechanism. Below is the design for Xen. Would you help to review and give comments?

Background
==========

Currently, dirty logging is done via write protection, which basically sets guest memory we want to log to be read-only, then when guest performs write to that memory, write fault (EPT violation in case of EPT is used) happens, in which we are able to log the dirty GFN. This mechanism works but at cost of one write fault for each write from the guest.

PML Introduction
================

PML is a hardware-assisted efficient way, based on EPT mechanism, for dirty logging. Briefly, PML logs dirty GPA automatically to a 4K PML buffer when CPU changes EPT table's D-bit from 0 to 1. To accomplish this, A new PML buffer base address (machine address), a PML index, and a new PML buffer full VMEXIT were added to VMCS. Initially PML index can be set to 511 (8 bytes for each GPA) to indicate the buffer is empty, and CPU decreases PML index by 1 after logging GPA. Before performing GPA logging, PML checks PML index to see if PML buffer has been fully logged, in which case a PML buffer full VMEXIT happens, and VMM should flush logged GPAs (to data structure keeps dirty GPAs) and reset PML index so that further GPAs can be logged again.

The specification of PML can be found at:
http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html

With PML, we don't have to use write protection but just clear D-bit of EPT entry of guest memory to do dirty logging, with an additional PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in dirty logging mode, and therefore more CPU cycles can be allocated to guest, so it's expected benchmarks in guest will have better performance comparing to non-PML.

Design
======

- PML feature is used globally

A new Xen boot parameter, say 'opt_enable_pml', will be introduced to control PML feature detection, and PML feature will only be detected if opt_enable_pml = 1. Once PML feature is detected, it will be used for dirty logging for all domains globally. Currently we don't support to use PML on basis of per-domain as it will require additional control from XL tool.

- PML enable/disable for particular Domain

PML needs to be enabled (allocate PML buffer, initialize PML index, PML base address, turn PML on VMCS, etc) for all vcpus of the domain, as PML buffer and PML index are per-vcpu, but EPT table may be shared by vcpus. Enabling PML on partial vcpus of the domain won't work. Also PML will only be enabled for the domain when it is switched to dirty logging mode, and it will be disabled when domain is switched back to normal mode. As looks vcpu number won't be changed dynamically during guest is running (correct me if I am wrong here), so we don't have to consider enabling PML for new created vcpu when guest is in dirty logging mode.

After PML is enabled for the domain, we only need to clear EPT entry's D-bit for guest memory in dirty logging mode. We achieve this by checking if PML is enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and updating EPT entry accordingly. However, for super pages, we still write protect them in case of PML as we still need to split super page to 4K page in dirty logging mode.

- PML buffer flush

There are two places we need to flush PML buffer. The first place is PML buffer full VMEXIT handler (apparently), and the second place is in paging_log_dirty_op (either peek or clean), as vcpus are running asynchronously along with paging_log_dirty_op is called from userspace via hypercall, and it's possible there are dirty GPAs logged in vcpus' PML buffers but not full. Therefore we'd better to flush all vcpus' PML buffers before reporting dirty GPAs to userspace.

We handle above two cases by flushing PML buffer at the beginning of all VMEXITs. This solves the first case above, and it also solves the second case, as prior to paging_log_dirty_op, domain_pause is called, which kicks vcpus (that are in guest mode) out of guest mode via sending IPI, which cause VMEXIT, to them.

This also makes log-dirty radix tree more updated as PML buffer is flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.

- Video RAM tracking (and partial dirty logging for guest memory range)

Video RAM is in dirty logging mode unconditionally during guest's run-time, and it is partial memory range of the guest. However, PML operates on the whole guest memory (the whole valid EPT table, more precisely), so we need to choose whether to use PML if only partial guest memory ranges are in dirty logging mode.

Currently, PML will be used as long as there's guest memory in dirty logging mode, no matter globally or partially. And in case of partial dirty logging, we need to check if the logged GPA in PML buffer is in dirty logging range.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11  8:28 PML (Page Modification Logging) design for Xen Kai Huang
@ 2015-02-11 11:52 ` Andrew Cooper
  2015-02-11 13:13   ` Jan Beulich
  2015-02-12  2:39   ` Kai Huang
  2015-02-11 13:06 ` Jan Beulich
  2015-02-12 12:34 ` Tim Deegan
  2 siblings, 2 replies; 54+ messages in thread
From: Andrew Cooper @ 2015-02-11 11:52 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, keir, xen-devel

On 11/02/15 08:28, Kai Huang wrote:
> Hi all,
>
> PML (Page Modification Logging) is a new feature on Intel's Boardwell
> server platfrom targeted to reduce overhead of dirty logging
> mechanism. Below is the design for Xen. Would you help to review and
> give comments?

Thankyou for this design.  It is a very good starting point!

>
> Background
> ==========
>
> Currently, dirty logging is done via write protection, which basically
> sets guest memory we want to log to be read-only, then when guest
> performs write to that memory, write fault (EPT violation in case of
> EPT is used) happens, in which we are able to log the dirty GFN. This
> mechanism works but at cost of one write fault for each write from the
> guest.

Strictly speaking, repeated writes to the same gfn after the first fault
are amortised until the logdirty is next queried, which makes typical
access patterns far less costly than a fault for every single write.

>
> PML Introduction
> ================
>
> PML is a hardware-assisted efficient way, based on EPT mechanism, for
> dirty logging. Briefly, PML logs dirty GPA automatically to a 4K PML
> buffer when CPU changes EPT table's D-bit from 0 to 1. To accomplish
> this, A new PML buffer base address (machine address), a PML index,
> and a new PML buffer full VMEXIT were added to VMCS. Initially PML
> index can be set to 511 (8 bytes for each GPA) to indicate the buffer
> is empty, and CPU decreases PML index by 1 after logging GPA. Before
> performing GPA logging, PML checks PML index to see if PML buffer has
> been fully logged, in which case a PML buffer full VMEXIT happens, and
> VMM should flush logged GPAs (to data structure keeps dirty GPAs) and
> reset PML index so that further GPAs can be logged again.
>
> The specification of PML can be found at:
> http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html
>
>
> With PML, we don't have to use write protection but just clear D-bit
> of EPT entry of guest memory to do dirty logging, with an additional
> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
> reduce hypervisor overhead when guest is in dirty logging mode, and
> therefore more CPU cycles can be allocated to guest, so it's expected
> benchmarks in guest will have better performance comparing to non-PML.

One issue with basic EPT A/D tracking was the scan of the EPT tables. 
Here, hardware will give us a list of affected gfns, but how is Xen
supposed to efficiently clear the dirty bits again?  Using EPT
misconfiguration is no better than the existing fault path.

>
> Design
> ======
>
> - PML feature is used globally
>
> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
> control PML feature detection, and PML feature will only be detected
> if opt_enable_pml = 1. Once PML feature is detected, it will be used
> for dirty logging for all domains globally. Currently we don't support
> to use PML on basis of per-domain as it will require additional
> control from XL tool.

Rather than adding in a new top level command line option for an ept
subfeature, it would be preferable to add an "ept=" option which has
"pml" as a sub boolean.

>
> - PML enable/disable for particular Domain

I do not believe that this is an interesting use case at the moment. 
Currently, PML would be an implementation detail of how Xen manages to
provide the logdirty bitmap to the toolstack or device model, and need
not be exposed at all.

If in the future, a toolstack component wishes to use the pml for other
purposes, there is more infrastructure which needs adjusting than just
per-domain PML.

>
> PML needs to be enabled (allocate PML buffer, initialize PML index,
> PML base address, turn PML on VMCS, etc) for all vcpus of the domain,
> as PML buffer and PML index are per-vcpu, but EPT table may be shared
> by vcpus. Enabling PML on partial vcpus of the domain won't work. Also
> PML will only be enabled for the domain when it is switched to dirty
> logging mode, and it will be disabled when domain is switched back to
> normal mode. As looks vcpu number won't be changed dynamically during
> guest is running (correct me if I am wrong here), so we don't have to
> consider enabling PML for new created vcpu when guest is in dirty
> logging mode.

There are exactly d->max_vcpus worth of struct vcpus (and therefore
VMCSes) for a domain after creation, and will exist for the lifetime of
the domain.  There is no dynamic adjustment of numbers of vcpus during
runtime.

>
> After PML is enabled for the domain, we only need to clear EPT entry's
> D-bit for guest memory in dirty logging mode. We achieve this by
> checking if PML is enabled for the domain when p2m_ram_rx changed to
> p2m_ram_logdirty, and updating EPT entry accordingly. However, for
> super pages, we still write protect them in case of PML as we still
> need to split super page to 4K page in dirty logging mode.

How is a superpage write reflected in the PML?

According to the whitepaper, transitioning the D bit from 0 to 1 results
in an entry being written into the log.  I presume that in the case of a
superpage, 512 entries are not written to the log, which presumably
means that the PML buffer flush needs to be aware of which gfns are
mapped by superpages to be able to correctly set a block of bits in the
logdirty bitmap.

>
> - PML buffer flush
>
> There are two places we need to flush PML buffer. The first place is
> PML buffer full VMEXIT handler (apparently), and the second place is
> in paging_log_dirty_op (either peek or clean), as vcpus are running
> asynchronously along with paging_log_dirty_op is called from userspace
> via hypercall, and it's possible there are dirty GPAs logged in vcpus'
> PML buffers but not full. Therefore we'd better to flush all vcpus'
> PML buffers before reporting dirty GPAs to userspace.

Why apparently?  It would be quite easy for a guest to dirty 512 frames
without otherwise taking a vmexit.

>
> We handle above two cases by flushing PML buffer at the beginning of
> all VMEXITs. This solves the first case above, and it also solves the
> second case, as prior to paging_log_dirty_op, domain_pause is called,
> which kicks vcpus (that are in guest mode) out of guest mode via
> sending IPI, which cause VMEXIT, to them.
>
> This also makes log-dirty radix tree more updated as PML buffer is
> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.

My gut feeling is that this is substantial overhead on a common path,
but this largely depends on how the dirty bits can be cleared efficiently.

>
> - Video RAM tracking (and partial dirty logging for guest memory range)
>
> Video RAM is in dirty logging mode unconditionally during guest's
> run-time, and it is partial memory range of the guest. However, PML
> operates on the whole guest memory (the whole valid EPT table, more
> precisely), so we need to choose whether to use PML if only partial
> guest memory ranges are in dirty logging mode.
>
> Currently, PML will be used as long as there's guest memory in dirty
> logging mode, no matter globally or partially. And in case of partial
> dirty logging, we need to check if the logged GPA in PML buffer is in
> dirty logging range.

I am not sure this is a problem.  HAP vram tracking already leaks
non-vram frames into the dirty bitmap, caused by calls to
paging_mark_dirty() from paths which are not caused by a p2m_logdirty fault.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11  8:28 PML (Page Modification Logging) design for Xen Kai Huang
  2015-02-11 11:52 ` Andrew Cooper
@ 2015-02-11 13:06 ` Jan Beulich
  2015-02-12  2:49   ` Kai Huang
  2015-02-12 12:34 ` Tim Deegan
  2 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2015-02-11 13:06 UTC (permalink / raw)
  To: Kai Huang; +Cc: andrew.cooper3, kevin.tian, tim, keir, xen-devel

>>> On 11.02.15 at 09:28, <kai.huang@linux.intel.com> wrote:
> - PML enable/disable for particular Domain
> 
> PML needs to be enabled (allocate PML buffer, initialize PML index, PML base 
> address, turn PML on VMCS, etc) for all vcpus of the domain, as PML buffer 
> and PML index are per-vcpu, but EPT table may be shared by vcpus. Enabling 
> PML on partial vcpus of the domain won't work. Also PML will only be enabled 
> for the domain when it is switched to dirty logging mode, and it will be 
> disabled when domain is switched back to normal mode. As looks vcpu number 
> won't be changed dynamically during guest is running (correct me if I am 
> wrong here), so we don't have to consider enabling PML for new created vcpu 
> when guest is in dirty logging mode.
> 
> After PML is enabled for the domain, we only need to clear EPT entry's D-bit 
> for guest memory in dirty logging mode. We achieve this by checking if PML is 
> enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and 
> updating EPT entry accordingly. However, for super pages, we still write 
> protect them in case of PML as we still need to split super page to 4K page 
> in dirty logging mode.

While it doesn't matter much for our immediate needs, the
documentation isn't really clear about the behavior when a 2M or
1G page gets its D bit set: Wouldn't it be rather useful to the
consumer to know of that fact (e.g. by setting some of the lower
bits of the PML entry to indicate so)?

> - PML buffer flush
> 
> There are two places we need to flush PML buffer. The first place is PML 
> buffer full VMEXIT handler (apparently), and the second place is in 
> paging_log_dirty_op (either peek or clean), as vcpus are running 
> asynchronously along with paging_log_dirty_op is called from userspace via 
> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML 
> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers 
> before reporting dirty GPAs to userspace.
> 
> We handle above two cases by flushing PML buffer at the beginning of all 
> VMEXITs. This solves the first case above, and it also solves the second 
> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks 
> vcpus (that are in guest mode) out of guest mode via sending IPI, which cause 
> VMEXIT, to them.
> 
> This also makes log-dirty radix tree more updated as PML buffer is flushed 
> on basis of all VMEXITs but not only PML buffer full VMEXIT.

Is that really efficient? Flushing the buffer only as needed doesn't
seem to be a major problem (apart from the usual preemption issue
when dealing with guests with very many vCPU-s, but you certainly
recall that at this point HVM is still limited to 128).

Apart from these two remarks, the design looks okay to me.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11 11:52 ` Andrew Cooper
@ 2015-02-11 13:13   ` Jan Beulich
  2015-02-11 16:33     ` Andrew Cooper
  2015-02-12  2:35     ` Kai Huang
  2015-02-12  2:39   ` Kai Huang
  1 sibling, 2 replies; 54+ messages in thread
From: Jan Beulich @ 2015-02-11 13:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kai Huang, tim, kevin.tian, keir, xen-devel

>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
> On 11/02/15 08:28, Kai Huang wrote:
>> With PML, we don't have to use write protection but just clear D-bit
>> of EPT entry of guest memory to do dirty logging, with an additional
>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>> reduce hypervisor overhead when guest is in dirty logging mode, and
>> therefore more CPU cycles can be allocated to guest, so it's expected
>> benchmarks in guest will have better performance comparing to non-PML.
> 
> One issue with basic EPT A/D tracking was the scan of the EPT tables. 
> Here, hardware will give us a list of affected gfns, but how is Xen
> supposed to efficiently clear the dirty bits again?  Using EPT
> misconfiguration is no better than the existing fault path.

Why not? The misconfiguration exit ought to clear the D bit for all
511 entries in the L1 table (and set it for the one entry that is
currently serving the access). All further D bit handling will then
be PML based.

>> - PML buffer flush
>>
>> There are two places we need to flush PML buffer. The first place is
>> PML buffer full VMEXIT handler (apparently), and the second place is
>> in paging_log_dirty_op (either peek or clean), as vcpus are running
>> asynchronously along with paging_log_dirty_op is called from userspace
>> via hypercall, and it's possible there are dirty GPAs logged in vcpus'
>> PML buffers but not full. Therefore we'd better to flush all vcpus'
>> PML buffers before reporting dirty GPAs to userspace.
> 
> Why apparently?  It would be quite easy for a guest to dirty 512 frames
> without otherwise taking a vmexit.

I silently replaced apparently with obviously while reading...

>> We handle above two cases by flushing PML buffer at the beginning of
>> all VMEXITs. This solves the first case above, and it also solves the
>> second case, as prior to paging_log_dirty_op, domain_pause is called,
>> which kicks vcpus (that are in guest mode) out of guest mode via
>> sending IPI, which cause VMEXIT, to them.
>>
>> This also makes log-dirty radix tree more updated as PML buffer is
>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
> 
> My gut feeling is that this is substantial overhead on a common path,
> but this largely depends on how the dirty bits can be cleared efficiently.

I agree on the overhead part, but I don't see what relation this has
to the dirty bit clearing - a PML buffer flush doesn't involve any
alterations of D bits.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11 13:13   ` Jan Beulich
@ 2015-02-11 16:33     ` Andrew Cooper
  2015-02-11 16:55       ` Jan Beulich
  2015-02-12  2:35     ` Kai Huang
  1 sibling, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2015-02-11 16:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kai Huang, tim, kevin.tian, keir, xen-devel

On 11/02/15 13:13, Jan Beulich wrote:
>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
>> On 11/02/15 08:28, Kai Huang wrote:
>>> With PML, we don't have to use write protection but just clear D-bit
>>> of EPT entry of guest memory to do dirty logging, with an additional
>>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>>> reduce hypervisor overhead when guest is in dirty logging mode, and
>>> therefore more CPU cycles can be allocated to guest, so it's expected
>>> benchmarks in guest will have better performance comparing to non-PML.
>> One issue with basic EPT A/D tracking was the scan of the EPT tables. 
>> Here, hardware will give us a list of affected gfns, but how is Xen
>> supposed to efficiently clear the dirty bits again?  Using EPT
>> misconfiguration is no better than the existing fault path.
> Why not? The misconfiguration exit ought to clear the D bit for all
> 511 entries in the L1 table (and set it for the one entry that is
> currently serving the access). All further D bit handling will then
> be PML based.
>
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is
>>> PML buffer full VMEXIT handler (apparently), and the second place is
>>> in paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from userspace
>>> via hypercall, and it's possible there are dirty GPAs logged in vcpus'
>>> PML buffers but not full. Therefore we'd better to flush all vcpus'
>>> PML buffers before reporting dirty GPAs to userspace.
>> Why apparently?  It would be quite easy for a guest to dirty 512 frames
>> without otherwise taking a vmexit.
> I silently replaced apparently with obviously while reading...
>
>>> We handle above two cases by flushing PML buffer at the beginning of
>>> all VMEXITs. This solves the first case above, and it also solves the
>>> second case, as prior to paging_log_dirty_op, domain_pause is called,
>>> which kicks vcpus (that are in guest mode) out of guest mode via
>>> sending IPI, which cause VMEXIT, to them.
>>>
>>> This also makes log-dirty radix tree more updated as PML buffer is
>>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>> My gut feeling is that this is substantial overhead on a common path,
>> but this largely depends on how the dirty bits can be cleared efficiently.
> I agree on the overhead part, but I don't see what relation this has
> to the dirty bit clearing - a PML buffer flush doesn't involve any
> alterations of D bits.

I admit that I was off-by-one level when considering the
misconfiguration overhead.  It would be inefficient (but not unsafe as
far as I can tell) to clear all D bits at once; the PML could end up
with repeated gfns in it, or different vcpus could end up with the same
gfn, depending on the exact access pattern, which will add to the flush
overhead.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11 16:33     ` Andrew Cooper
@ 2015-02-11 16:55       ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2015-02-11 16:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kai Huang, tim, kevin.tian, keir, xen-devel

>>> On 11.02.15 at 17:33, <andrew.cooper3@citrix.com> wrote:
> On 11/02/15 13:13, Jan Beulich wrote:
>>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
>>> On 11/02/15 08:28, Kai Huang wrote:
>>>> We handle above two cases by flushing PML buffer at the beginning of
>>>> all VMEXITs. This solves the first case above, and it also solves the
>>>> second case, as prior to paging_log_dirty_op, domain_pause is called,
>>>> which kicks vcpus (that are in guest mode) out of guest mode via
>>>> sending IPI, which cause VMEXIT, to them.
>>>>
>>>> This also makes log-dirty radix tree more updated as PML buffer is
>>>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>>> My gut feeling is that this is substantial overhead on a common path,
>>> but this largely depends on how the dirty bits can be cleared efficiently.
>> I agree on the overhead part, but I don't see what relation this has
>> to the dirty bit clearing - a PML buffer flush doesn't involve any
>> alterations of D bits.
> 
> I admit that I was off-by-one level when considering the
> misconfiguration overhead.  It would be inefficient (but not unsafe as
> far as I can tell) to clear all D bits at once; the PML could end up
> with repeated gfns in it, or different vcpus could end up with the same
> gfn, depending on the exact access pattern, which will add to the flush
> overhead.

Why would that be? A misconfiguration exit means no access to
a given range was possible at all before, i.e. all subordinate pages
would have the D bit clear if they were reachable. What you
describe would - afaict - be a problem only if we didn't go over the
whole guest address space at once.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11 13:13   ` Jan Beulich
  2015-02-11 16:33     ` Andrew Cooper
@ 2015-02-12  2:35     ` Kai Huang
  2015-02-12  6:25       ` Tian, Kevin
  1 sibling, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-12  2:35 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: keir, kevin.tian, tim, xen-devel


On 02/11/2015 09:13 PM, Jan Beulich wrote:
>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
>> On 11/02/15 08:28, Kai Huang wrote:
>>> With PML, we don't have to use write protection but just clear D-bit
>>> of EPT entry of guest memory to do dirty logging, with an additional
>>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>>> reduce hypervisor overhead when guest is in dirty logging mode, and
>>> therefore more CPU cycles can be allocated to guest, so it's expected
>>> benchmarks in guest will have better performance comparing to non-PML.
>> One issue with basic EPT A/D tracking was the scan of the EPT tables.
>> Here, hardware will give us a list of affected gfns, but how is Xen
>> supposed to efficiently clear the dirty bits again?  Using EPT
>> misconfiguration is no better than the existing fault path.
> Why not? The misconfiguration exit ought to clear the D bit for all
> 511 entries in the L1 table (and set it for the one entry that is
> currently serving the access). All further D bit handling will then
> be PML based.
Indeed, we clear D-bit in EPT misconfiguration. In my understanding, the 
sequences are as follows:

1) PML enabled for the domain.
2) ept_invalidate_emt (or ept_invalidate_emt_range) is called.
3) Guest accesses specific GPA (which has been invalidated by step 2), 
and EPT misconfig is triggered.
4) Then resolve_misconfig is called, which fixes up GFN (above GPA >> 
12) to p2m_ram_logdirty, and calls ept_p2m_type_to_flags, in which we 
clear D-bit of EPT entry (instead of clear W-bit) if p2m type is 
p2m_ram_logdirty. Then dirty logging of this GFN will be handled by PML.

The above 2) ~ 4) will be repeated when log-dirty radix tree is cleared.

>
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is
>>> PML buffer full VMEXIT handler (apparently), and the second place is
>>> in paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from userspace
>>> via hypercall, and it's possible there are dirty GPAs logged in vcpus'
>>> PML buffers but not full. Therefore we'd better to flush all vcpus'
>>> PML buffers before reporting dirty GPAs to userspace.
>> Why apparently?  It would be quite easy for a guest to dirty 512 frames
>> without otherwise taking a vmexit.
> I silently replaced apparently with obviously while reading...
>
>>> We handle above two cases by flushing PML buffer at the beginning of
>>> all VMEXITs. This solves the first case above, and it also solves the
>>> second case, as prior to paging_log_dirty_op, domain_pause is called,
>>> which kicks vcpus (that are in guest mode) out of guest mode via
>>> sending IPI, which cause VMEXIT, to them.
>>>
>>> This also makes log-dirty radix tree more updated as PML buffer is
>>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>> My gut feeling is that this is substantial overhead on a common path,
>> but this largely depends on how the dirty bits can be cleared efficiently.
> I agree on the overhead part, but I don't see what relation this has
> to the dirty bit clearing - a PML buffer flush doesn't involve any
> alterations of D bits.
No the flush is not related to the dirty bit clearing. The PML buffer 
flush just does following (which I should have clarified in my design, 
sorry):
1) read out PML index
2) Loop all GPAs logged in the PML buffer according to PML index, and 
update them to log-dirty radix tree.

I agree there's overhead on VMEXIT common path, but the overhead should 
not be substantial, comparing to the overhead of VMEXIT itself.

Thanks,
-Kai
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11 11:52 ` Andrew Cooper
  2015-02-11 13:13   ` Jan Beulich
@ 2015-02-12  2:39   ` Kai Huang
  2015-02-12  6:54     ` Tian, Kevin
  2015-02-17 10:19     ` Jan Beulich
  1 sibling, 2 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-12  2:39 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, keir, xen-devel


On 02/11/2015 07:52 PM, Andrew Cooper wrote:
> On 11/02/15 08:28, Kai Huang wrote:
>> Hi all,
>>
>> PML (Page Modification Logging) is a new feature on Intel's Boardwell
>> server platfrom targeted to reduce overhead of dirty logging
>> mechanism. Below is the design for Xen. Would you help to review and
>> give comments?
> Thankyou for this design.  It is a very good starting point!
Thanks!

>
>> Background
>> ==========
>>
>> Currently, dirty logging is done via write protection, which basically
>> sets guest memory we want to log to be read-only, then when guest
>> performs write to that memory, write fault (EPT violation in case of
>> EPT is used) happens, in which we are able to log the dirty GFN. This
>> mechanism works but at cost of one write fault for each write from the
>> guest.
> Strictly speaking, repeated writes to the same gfn after the first fault
> are amortised until the logdirty is next queried, which makes typical
> access patterns far less costly than a fault for every single write.
Indeed. I do mean first fault here.

>
>> PML Introduction
>> ================
>>
>> PML is a hardware-assisted efficient way, based on EPT mechanism, for
>> dirty logging. Briefly, PML logs dirty GPA automatically to a 4K PML
>> buffer when CPU changes EPT table's D-bit from 0 to 1. To accomplish
>> this, A new PML buffer base address (machine address), a PML index,
>> and a new PML buffer full VMEXIT were added to VMCS. Initially PML
>> index can be set to 511 (8 bytes for each GPA) to indicate the buffer
>> is empty, and CPU decreases PML index by 1 after logging GPA. Before
>> performing GPA logging, PML checks PML index to see if PML buffer has
>> been fully logged, in which case a PML buffer full VMEXIT happens, and
>> VMM should flush logged GPAs (to data structure keeps dirty GPAs) and
>> reset PML index so that further GPAs can be logged again.
>>
>> The specification of PML can be found at:
>> http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html
>>
>>
>> With PML, we don't have to use write protection but just clear D-bit
>> of EPT entry of guest memory to do dirty logging, with an additional
>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>> reduce hypervisor overhead when guest is in dirty logging mode, and
>> therefore more CPU cycles can be allocated to guest, so it's expected
>> benchmarks in guest will have better performance comparing to non-PML.
> One issue with basic EPT A/D tracking was the scan of the EPT tables.
> Here, hardware will give us a list of affected gfns, but how is Xen
> supposed to efficiently clear the dirty bits again?  Using EPT
> misconfiguration is no better than the existing fault path.
See my reply to Jan's email.

>> Design
>> ======
>>
>> - PML feature is used globally
>>
>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>> control PML feature detection, and PML feature will only be detected
>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>> for dirty logging for all domains globally. Currently we don't support
>> to use PML on basis of per-domain as it will require additional
>> control from XL tool.
> Rather than adding in a new top level command line option for an ept
> subfeature, it would be preferable to add an "ept=" option which has
> "pml" as a sub boolean.
Which is good to me, if Jan agrees.

Jan, which do you prefer here?

>> - PML enable/disable for particular Domain
> I do not believe that this is an interesting use case at the moment.
> Currently, PML would be an implementation detail of how Xen manages to
> provide the logdirty bitmap to the toolstack or device model, and need
> not be exposed at all.
>
> If in the future, a toolstack component wishes to use the pml for other
> purposes, there is more infrastructure which needs adjusting than just
> per-domain PML.
I did't mean to expose PML to toolstack here. In fact, this is I want to 
avoid now, PML should be hidden in Xen hypervisor completely, as you 
said, just another mechanism to provide logdirty bitmap to userspace.
Here I mean we need to enable PML for the domain (which means allocate 
PML buffer, initialize PML index, and turn PML on in VMCS) manually, as 
it's not turned on automatically after the PML feature detection.
Sorry for the confusion.

>
>> PML needs to be enabled (allocate PML buffer, initialize PML index,
>> PML base address, turn PML on VMCS, etc) for all vcpus of the domain,
>> as PML buffer and PML index are per-vcpu, but EPT table may be shared
>> by vcpus. Enabling PML on partial vcpus of the domain won't work. Also
>> PML will only be enabled for the domain when it is switched to dirty
>> logging mode, and it will be disabled when domain is switched back to
>> normal mode. As looks vcpu number won't be changed dynamically during
>> guest is running (correct me if I am wrong here), so we don't have to
>> consider enabling PML for new created vcpu when guest is in dirty
>> logging mode.
> There are exactly d->max_vcpus worth of struct vcpus (and therefore
> VMCSes) for a domain after creation, and will exist for the lifetime of
> the domain.  There is no dynamic adjustment of numbers of vcpus during
> runtime.
Good to know.

>> After PML is enabled for the domain, we only need to clear EPT entry's
>> D-bit for guest memory in dirty logging mode. We achieve this by
>> checking if PML is enabled for the domain when p2m_ram_rx changed to
>> p2m_ram_logdirty, and updating EPT entry accordingly. However, for
>> super pages, we still write protect them in case of PML as we still
>> need to split super page to 4K page in dirty logging mode.
> How is a superpage write reflected in the PML?
>
> According to the whitepaper, transitioning the D bit from 0 to 1 results
> in an entry being written into the log.  I presume that in the case of a
> superpage, 512 entries are not written to the log,
No, only the GPA being written will be logged, with the last 12 bits 
cleared. Whether hardware just clears the last 12 bits, or does a 2M 
alignment is not certain, as the specification doesn't tell. Probably 
I'd better to confirm with hardware guys. But it doesn't impact the 
design anyway, explained below.

> which presumably
> means that the PML buffer flush needs to be aware of which gfns are
> mapped by superpages to be able to correctly set a block of bits in the
> logdirty bitmap.
>
Unfortunately PML itself can't tell us if the logged GPA comes from 
superpage or not, but even in PML we still need to split superpages to 
4K page, just like traditional write protection approach does. I think 
this is because live migration should be based on 4K page granularity. 
Marking all 512 bits of a 2M page to be dirty by a single write doesn't 
make sense in both write protection and PML cases.

>> - PML buffer flush
>>
>> There are two places we need to flush PML buffer. The first place is
>> PML buffer full VMEXIT handler (apparently), and the second place is
>> in paging_log_dirty_op (either peek or clean), as vcpus are running
>> asynchronously along with paging_log_dirty_op is called from userspace
>> via hypercall, and it's possible there are dirty GPAs logged in vcpus'
>> PML buffers but not full. Therefore we'd better to flush all vcpus'
>> PML buffers before reporting dirty GPAs to userspace.
> Why apparently?  It would be quite easy for a guest to dirty 512 frames
> without otherwise taking a vmexit.
The PML buffer full VMEXIT indicates the buffer is fully logged, so 
clearly we need to flush it and make it empty to be able to log GPA 
again. See my reply to Jan's email.

>> We handle above two cases by flushing PML buffer at the beginning of
>> all VMEXITs. This solves the first case above, and it also solves the
>> second case, as prior to paging_log_dirty_op, domain_pause is called,
>> which kicks vcpus (that are in guest mode) out of guest mode via
>> sending IPI, which cause VMEXIT, to them.
>>
>> This also makes log-dirty radix tree more updated as PML buffer is
>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
> My gut feeling is that this is substantial overhead on a common path,
> but this largely depends on how the dirty bits can be cleared efficiently.
Yes but I don't think the overhead will be substantial. See my reply to 
Jan's email.

>
>> - Video RAM tracking (and partial dirty logging for guest memory range)
>>
>> Video RAM is in dirty logging mode unconditionally during guest's
>> run-time, and it is partial memory range of the guest. However, PML
>> operates on the whole guest memory (the whole valid EPT table, more
>> precisely), so we need to choose whether to use PML if only partial
>> guest memory ranges are in dirty logging mode.
>>
>> Currently, PML will be used as long as there's guest memory in dirty
>> logging mode, no matter globally or partially. And in case of partial
>> dirty logging, we need to check if the logged GPA in PML buffer is in
>> dirty logging range.
> I am not sure this is a problem.  HAP vram tracking already leaks
> non-vram frames into the dirty bitmap, caused by calls to
> paging_mark_dirty() from paths which are not caused by a p2m_logdirty fault.
Hmm. Seems right. Probably this also depends on how userspace uses the 
dirty bitmap.

If this is not a problem, we can avoid the checking of whether logged 
GPAs are in logdirty ranges but unconditionally update them to log-dirty 
radix tree.

Jan, what's your comments here?

Thanks,
-Kai
>
> ~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11 13:06 ` Jan Beulich
@ 2015-02-12  2:49   ` Kai Huang
  2015-02-12  5:16     ` Kai Huang
                       ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-12  2:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, kevin.tian, keir, tim, xen-devel


On 02/11/2015 09:06 PM, Jan Beulich wrote:
>>>> On 11.02.15 at 09:28, <kai.huang@linux.intel.com> wrote:
>> - PML enable/disable for particular Domain
>>
>> PML needs to be enabled (allocate PML buffer, initialize PML index, PML base
>> address, turn PML on VMCS, etc) for all vcpus of the domain, as PML buffer
>> and PML index are per-vcpu, but EPT table may be shared by vcpus. Enabling
>> PML on partial vcpus of the domain won't work. Also PML will only be enabled
>> for the domain when it is switched to dirty logging mode, and it will be
>> disabled when domain is switched back to normal mode. As looks vcpu number
>> won't be changed dynamically during guest is running (correct me if I am
>> wrong here), so we don't have to consider enabling PML for new created vcpu
>> when guest is in dirty logging mode.
>>
>> After PML is enabled for the domain, we only need to clear EPT entry's D-bit
>> for guest memory in dirty logging mode. We achieve this by checking if PML is
>> enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and
>> updating EPT entry accordingly. However, for super pages, we still write
>> protect them in case of PML as we still need to split super page to 4K page
>> in dirty logging mode.
> While it doesn't matter much for our immediate needs, the
> documentation isn't really clear about the behavior when a 2M or
> 1G page gets its D bit set: Wouldn't it be rather useful to the
> consumer to know of that fact (e.g. by setting some of the lower
> bits of the PML entry to indicate so)?
This is good point. The documentation only tells us the GPA will be 
logged with last 12 bits cleared. Whether hardware just clears last 12 
bits or performs 2M alignment (in case of 2M page) is not certain. I 
will confirm this with hardware guys. But as you said, it's not related 
to our immediate needs.
>
>> - PML buffer flush
>>
>> There are two places we need to flush PML buffer. The first place is PML
>> buffer full VMEXIT handler (apparently), and the second place is in
>> paging_log_dirty_op (either peek or clean), as vcpus are running
>> asynchronously along with paging_log_dirty_op is called from userspace via
>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers
>> before reporting dirty GPAs to userspace.
>>
>> We handle above two cases by flushing PML buffer at the beginning of all
>> VMEXITs. This solves the first case above, and it also solves the second
>> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks
>> vcpus (that are in guest mode) out of guest mode via sending IPI, which cause
>> VMEXIT, to them.
>>
>> This also makes log-dirty radix tree more updated as PML buffer is flushed
>> on basis of all VMEXITs but not only PML buffer full VMEXIT.
> Is that really efficient? Flushing the buffer only as needed doesn't
> seem to be a major problem (apart from the usual preemption issue
> when dealing with guests with very many vCPU-s, but you certainly
> recall that at this point HVM is still limited to 128).
>
> Apart from these two remarks, the design looks okay to me.
While keeping log-dirty radix tree more updated is probably irrelevant, 
I do think we'd better to flush PML buffers in paging_log_dirty_op (both 
peek and clear) before reporting dirty pages to userspace, in which case 
I think flushing PML buffer at beginning of VMEXIT is a good idea, as 
domain_pause does the job automatically. I am not sure how much cycles 
will flushing PML buffer contribute but I think it should be relatively 
small comparing to VMEXIT itself, therefore it can be ignored.

An optimized way probably is we only flush PML buffer for external 
interrupt VMEXIT, which domain_pause really triggers, but not at 
beginning of all VMEXITs. But as log as the overhead of flush PML buffer 
is negligible, this optimization is also unnecessary.

Thanks,
-Kai
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  2:49   ` Kai Huang
@ 2015-02-12  5:16     ` Kai Huang
  2015-02-12  7:02     ` Tian, Kevin
  2015-02-17 10:23     ` Jan Beulich
  2 siblings, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-12  5:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, kevin.tian, xen-devel, keir, tim


On 02/12/2015 10:49 AM, Kai Huang wrote:
>
> On 02/11/2015 09:06 PM, Jan Beulich wrote:
>>>>> On 11.02.15 at 09:28, <kai.huang@linux.intel.com> wrote:
>>> - PML enable/disable for particular Domain
>>>
>>> PML needs to be enabled (allocate PML buffer, initialize PML index, 
>>> PML base
>>> address, turn PML on VMCS, etc) for all vcpus of the domain, as PML 
>>> buffer
>>> and PML index are per-vcpu, but EPT table may be shared by vcpus. 
>>> Enabling
>>> PML on partial vcpus of the domain won't work. Also PML will only be 
>>> enabled
>>> for the domain when it is switched to dirty logging mode, and it 
>>> will be
>>> disabled when domain is switched back to normal mode. As looks vcpu 
>>> number
>>> won't be changed dynamically during guest is running (correct me if 
>>> I am
>>> wrong here), so we don't have to consider enabling PML for new 
>>> created vcpu
>>> when guest is in dirty logging mode.
>>>
>>> After PML is enabled for the domain, we only need to clear EPT 
>>> entry's D-bit
>>> for guest memory in dirty logging mode. We achieve this by checking 
>>> if PML is
>>> enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and
>>> updating EPT entry accordingly. However, for super pages, we still 
>>> write
>>> protect them in case of PML as we still need to split super page to 
>>> 4K page
>>> in dirty logging mode.
>> While it doesn't matter much for our immediate needs, the
>> documentation isn't really clear about the behavior when a 2M or
>> 1G page gets its D bit set: Wouldn't it be rather useful to the
>> consumer to know of that fact (e.g. by setting some of the lower
>> bits of the PML entry to indicate so)?
> This is good point. The documentation only tells us the GPA will be 
> logged with last 12 bits cleared. Whether hardware just clears last 12 
> bits or performs 2M alignment (in case of 2M page) is not certain. I 
> will confirm this with hardware guys. But as you said, it's not 
> related to our immediate needs.
Forgot to say, to me currently it is certain that the lower 12 bits are 
cleared as specification says GPA is written to log with 4K aligned. But 
it should be possible to push hardware guys to modify if necessary, 
though I am not 100% sure.

Thanks,
-Kai
>>
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is 
>>> PML
>>> buffer full VMEXIT handler (apparently), and the second place is in
>>> paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from 
>>> userspace via
>>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>>> buffers but not full. Therefore we'd better to flush all vcpus' PML 
>>> buffers
>>> before reporting dirty GPAs to userspace.
>>>
>>> We handle above two cases by flushing PML buffer at the beginning of 
>>> all
>>> VMEXITs. This solves the first case above, and it also solves the 
>>> second
>>> case, as prior to paging_log_dirty_op, domain_pause is called, which 
>>> kicks
>>> vcpus (that are in guest mode) out of guest mode via sending IPI, 
>>> which cause
>>> VMEXIT, to them.
>>>
>>> This also makes log-dirty radix tree more updated as PML buffer is 
>>> flushed
>>> on basis of all VMEXITs but not only PML buffer full VMEXIT.
>> Is that really efficient? Flushing the buffer only as needed doesn't
>> seem to be a major problem (apart from the usual preemption issue
>> when dealing with guests with very many vCPU-s, but you certainly
>> recall that at this point HVM is still limited to 128).
>>
>> Apart from these two remarks, the design looks okay to me.
> While keeping log-dirty radix tree more updated is probably 
> irrelevant, I do think we'd better to flush PML buffers in 
> paging_log_dirty_op (both peek and clear) before reporting dirty pages 
> to userspace, in which case I think flushing PML buffer at beginning 
> of VMEXIT is a good idea, as domain_pause does the job automatically. 
> I am not sure how much cycles will flushing PML buffer contribute but 
> I think it should be relatively small comparing to VMEXIT itself, 
> therefore it can be ignored.
>
> An optimized way probably is we only flush PML buffer for external 
> interrupt VMEXIT, which domain_pause really triggers, but not at 
> beginning of all VMEXITs. But as log as the overhead of flush PML 
> buffer is negligible, this optimization is also unnecessary.
>
> Thanks,
> -Kai
>>
>> Jan
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  2:35     ` Kai Huang
@ 2015-02-12  6:25       ` Tian, Kevin
  2015-02-12  6:45         ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Tian, Kevin @ 2015-02-12  6:25 UTC (permalink / raw)
  To: Kai Huang, Jan Beulich, Andrew Cooper; +Cc: keir, tim, xen-devel

> From: Kai Huang [mailto:kai.huang@linux.intel.com]
> Sent: Thursday, February 12, 2015 10:35 AM
> 
> On 02/11/2015 09:13 PM, Jan Beulich wrote:
> >>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
> >> On 11/02/15 08:28, Kai Huang wrote:
> >>> With PML, we don't have to use write protection but just clear D-bit
> >>> of EPT entry of guest memory to do dirty logging, with an additional
> >>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
> >>> reduce hypervisor overhead when guest is in dirty logging mode, and
> >>> therefore more CPU cycles can be allocated to guest, so it's expected
> >>> benchmarks in guest will have better performance comparing to
> non-PML.
> >> One issue with basic EPT A/D tracking was the scan of the EPT tables.
> >> Here, hardware will give us a list of affected gfns, but how is Xen
> >> supposed to efficiently clear the dirty bits again?  Using EPT
> >> misconfiguration is no better than the existing fault path.
> > Why not? The misconfiguration exit ought to clear the D bit for all
> > 511 entries in the L1 table (and set it for the one entry that is
> > currently serving the access). All further D bit handling will then
> > be PML based.
> Indeed, we clear D-bit in EPT misconfiguration. In my understanding, the
> sequences are as follows:
> 
> 1) PML enabled for the domain.
> 2) ept_invalidate_emt (or ept_invalidate_emt_range) is called.
> 3) Guest accesses specific GPA (which has been invalidated by step 2),
> and EPT misconfig is triggered.
> 4) Then resolve_misconfig is called, which fixes up GFN (above GPA >>
> 12) to p2m_ram_logdirty, and calls ept_p2m_type_to_flags, in which we
> clear D-bit of EPT entry (instead of clear W-bit) if p2m type is
> p2m_ram_logdirty. Then dirty logging of this GFN will be handled by PML.
> 
> The above 2) ~ 4) will be repeated when log-dirty radix tree is cleared.

is ept_invalidate_emt required by existing logdirty mode or by PML enable?
can we clear D bits directly when log-dirty radix tree is cleared to reduce 
EPT misconfig exits for repeatedly dirtied pages?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  6:25       ` Tian, Kevin
@ 2015-02-12  6:45         ` Kai Huang
  2015-02-12  7:08           ` Tian, Kevin
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-12  6:45 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Andrew Cooper; +Cc: keir, tim, xen-devel


On 02/12/2015 02:25 PM, Tian, Kevin wrote:
>> From: Kai Huang [mailto:kai.huang@linux.intel.com]
>> Sent: Thursday, February 12, 2015 10:35 AM
>>
>> On 02/11/2015 09:13 PM, Jan Beulich wrote:
>>>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
>>>> On 11/02/15 08:28, Kai Huang wrote:
>>>>> With PML, we don't have to use write protection but just clear D-bit
>>>>> of EPT entry of guest memory to do dirty logging, with an additional
>>>>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>>>>> reduce hypervisor overhead when guest is in dirty logging mode, and
>>>>> therefore more CPU cycles can be allocated to guest, so it's expected
>>>>> benchmarks in guest will have better performance comparing to
>> non-PML.
>>>> One issue with basic EPT A/D tracking was the scan of the EPT tables.
>>>> Here, hardware will give us a list of affected gfns, but how is Xen
>>>> supposed to efficiently clear the dirty bits again?  Using EPT
>>>> misconfiguration is no better than the existing fault path.
>>> Why not? The misconfiguration exit ought to clear the D bit for all
>>> 511 entries in the L1 table (and set it for the one entry that is
>>> currently serving the access). All further D bit handling will then
>>> be PML based.
>> Indeed, we clear D-bit in EPT misconfiguration. In my understanding, the
>> sequences are as follows:
>>
>> 1) PML enabled for the domain.
>> 2) ept_invalidate_emt (or ept_invalidate_emt_range) is called.
>> 3) Guest accesses specific GPA (which has been invalidated by step 2),
>> and EPT misconfig is triggered.
>> 4) Then resolve_misconfig is called, which fixes up GFN (above GPA >>
>> 12) to p2m_ram_logdirty, and calls ept_p2m_type_to_flags, in which we
>> clear D-bit of EPT entry (instead of clear W-bit) if p2m type is
>> p2m_ram_logdirty. Then dirty logging of this GFN will be handled by PML.
>>
>> The above 2) ~ 4) will be repeated when log-dirty radix tree is cleared.
> is ept_invalidate_emt required by existing logdirty mode or by PML enable?
It's in existing logdirty code.
> can we clear D bits directly when log-dirty radix tree is cleared to reduce
> EPT misconfig exits for repeatedly dirtied pages?
Theoretically we can, and looks logdirty for video ram is done in this 
way (logdirty for the page is re-enabled while it is reported to 
dirty_bitmap). One thing is looks video ram logdirty only exists for HAP 
mode.
But in current log dirty implementation for global logdirty, at common 
paging layer, the log-dirty radix tree is cleaned in single step after 
reporting all dirty pages to userspace. And it just calls 
ept_invalidate_emt essentially. Therefore we need to modify logdirty 
common code at paging layer to achieve this, which is more like logdirty 
enhancement but not related to PML enabling directly. And any change of 
interface in paging layer requires modification in shadow mode 
accordingly, so currently I just choose not to do it.

Thanks,
-Kai
>
> Thanks
> Kevin
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  2:39   ` Kai Huang
@ 2015-02-12  6:54     ` Tian, Kevin
  2015-02-12  6:56       ` Kai Huang
  2015-02-12 14:10       ` Andrew Cooper
  2015-02-17 10:19     ` Jan Beulich
  1 sibling, 2 replies; 54+ messages in thread
From: Tian, Kevin @ 2015-02-12  6:54 UTC (permalink / raw)
  To: Kai Huang, Andrew Cooper, jbeulich, tim, keir, xen-devel

> From: Kai Huang [mailto:kai.huang@linux.intel.com]
> Sent: Thursday, February 12, 2015 10:39 AM
> >
> >> PML needs to be enabled (allocate PML buffer, initialize PML index,
> >> PML base address, turn PML on VMCS, etc) for all vcpus of the domain,
> >> as PML buffer and PML index are per-vcpu, but EPT table may be shared
> >> by vcpus. Enabling PML on partial vcpus of the domain won't work. Also
> >> PML will only be enabled for the domain when it is switched to dirty
> >> logging mode, and it will be disabled when domain is switched back to
> >> normal mode. As looks vcpu number won't be changed dynamically during
> >> guest is running (correct me if I am wrong here), so we don't have to
> >> consider enabling PML for new created vcpu when guest is in dirty
> >> logging mode.
> > There are exactly d->max_vcpus worth of struct vcpus (and therefore
> > VMCSes) for a domain after creation, and will exist for the lifetime of
> > the domain.  There is no dynamic adjustment of numbers of vcpus during
> > runtime.
> Good to know.

could we at least detect and warn vcpu changes when PML is enabled?
dirty logging happens out of guest's knowledge and there could be the
case where user right online/offline a vcpu within that window.

> > which presumably
> > means that the PML buffer flush needs to be aware of which gfns are
> > mapped by superpages to be able to correctly set a block of bits in the
> > logdirty bitmap.
> >
> Unfortunately PML itself can't tell us if the logged GPA comes from
> superpage or not, but even in PML we still need to split superpages to
> 4K page, just like traditional write protection approach does. I think
> this is because live migration should be based on 4K page granularity.
> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
> make sense in both write protection and PML cases.
> 

agree. extending one write to superpage enlarges dirty set unnecessary.
since spec doesn't say superpage logging is not supported, I'd think a
4k-aligned entry being logged if within superpage.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  6:54     ` Tian, Kevin
@ 2015-02-12  6:56       ` Kai Huang
  2015-02-12  7:09         ` Tian, Kevin
  2015-02-12 14:10       ` Andrew Cooper
  1 sibling, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-12  6:56 UTC (permalink / raw)
  To: Tian, Kevin, Andrew Cooper, jbeulich, tim, keir, xen-devel


On 02/12/2015 02:54 PM, Tian, Kevin wrote:
>> From: Kai Huang [mailto:kai.huang@linux.intel.com]
>> Sent: Thursday, February 12, 2015 10:39 AM
>>>> PML needs to be enabled (allocate PML buffer, initialize PML index,
>>>> PML base address, turn PML on VMCS, etc) for all vcpus of the domain,
>>>> as PML buffer and PML index are per-vcpu, but EPT table may be shared
>>>> by vcpus. Enabling PML on partial vcpus of the domain won't work. Also
>>>> PML will only be enabled for the domain when it is switched to dirty
>>>> logging mode, and it will be disabled when domain is switched back to
>>>> normal mode. As looks vcpu number won't be changed dynamically during
>>>> guest is running (correct me if I am wrong here), so we don't have to
>>>> consider enabling PML for new created vcpu when guest is in dirty
>>>> logging mode.
>>> There are exactly d->max_vcpus worth of struct vcpus (and therefore
>>> VMCSes) for a domain after creation, and will exist for the lifetime of
>>> the domain.  There is no dynamic adjustment of numbers of vcpus during
>>> runtime.
>> Good to know.
> could we at least detect and warn vcpu changes when PML is enabled?
> dirty logging happens out of guest's knowledge and there could be the
> case where user right online/offline a vcpu within that window.
Why is the warning necessary? There's no harm leaving PML enabled when 
vcpu becomes offline.

Also we will not disable PML for that vcpu when it becomes offline, in 
which case we don't need to re-enable PML, which can fail, when vcpu 
becomes online again. It simplifies the logic.

Thanks,
-Kai
>
>>> which presumably
>>> means that the PML buffer flush needs to be aware of which gfns are
>>> mapped by superpages to be able to correctly set a block of bits in the
>>> logdirty bitmap.
>>>
>> Unfortunately PML itself can't tell us if the logged GPA comes from
>> superpage or not, but even in PML we still need to split superpages to
>> 4K page, just like traditional write protection approach does. I think
>> this is because live migration should be based on 4K page granularity.
>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>> make sense in both write protection and PML cases.
>>
> agree. extending one write to superpage enlarges dirty set unnecessary.
> since spec doesn't say superpage logging is not supported, I'd think a
> 4k-aligned entry being logged if within superpage.
>
> Thanks
> Kevin
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  2:49   ` Kai Huang
  2015-02-12  5:16     ` Kai Huang
@ 2015-02-12  7:02     ` Tian, Kevin
  2015-02-12  7:04       ` Kai Huang
  2015-02-17 10:23     ` Jan Beulich
  2 siblings, 1 reply; 54+ messages in thread
From: Tian, Kevin @ 2015-02-12  7:02 UTC (permalink / raw)
  To: Kai Huang, Jan Beulich; +Cc: andrew.cooper3, keir, tim, xen-devel

> From: Kai Huang [mailto:kai.huang@linux.intel.com]
> Sent: Thursday, February 12, 2015 10:50 AM
> 
> >> - PML buffer flush
> >>
> >> There are two places we need to flush PML buffer. The first place is PML
> >> buffer full VMEXIT handler (apparently), and the second place is in
> >> paging_log_dirty_op (either peek or clean), as vcpus are running
> >> asynchronously along with paging_log_dirty_op is called from userspace
> via
> >> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
> >> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers
> >> before reporting dirty GPAs to userspace.
> >>
> >> We handle above two cases by flushing PML buffer at the beginning of all
> >> VMEXITs. This solves the first case above, and it also solves the second
> >> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks
> >> vcpus (that are in guest mode) out of guest mode via sending IPI, which
> cause
> >> VMEXIT, to them.
> >>
> >> This also makes log-dirty radix tree more updated as PML buffer is flushed
> >> on basis of all VMEXITs but not only PML buffer full VMEXIT.
> > Is that really efficient? Flushing the buffer only as needed doesn't
> > seem to be a major problem (apart from the usual preemption issue
> > when dealing with guests with very many vCPU-s, but you certainly
> > recall that at this point HVM is still limited to 128).
> >
> > Apart from these two remarks, the design looks okay to me.
> While keeping log-dirty radix tree more updated is probably irrelevant,
> I do think we'd better to flush PML buffers in paging_log_dirty_op (both
> peek and clear) before reporting dirty pages to userspace, in which case
> I think flushing PML buffer at beginning of VMEXIT is a good idea, as
> domain_pause does the job automatically. I am not sure how much cycles
> will flushing PML buffer contribute but I think it should be relatively
> small comparing to VMEXIT itself, therefore it can be ignored.

it's not intuitive to add overhead (one extra vmread) to every vmexit
just for utilizing the side-effect of one specific exit due to domain_pause.

> 
> An optimized way probably is we only flush PML buffer for external
> interrupt VMEXIT, which domain_pause really triggers, but not at
> beginning of all VMEXITs. But as log as the overhead of flush PML buffer
> is negligible, this optimization is also unnecessary.
> 

this optimization is not real optimization as you still stick to implementation
detail of other operations. If you really want to take use of domain_pause,
piggyback PML flush explicitly in that path make things clearer.

Thanks
Keivn

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  7:02     ` Tian, Kevin
@ 2015-02-12  7:04       ` Kai Huang
  0 siblings, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-12  7:04 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich; +Cc: andrew.cooper3, xen-devel, keir, tim


On 02/12/2015 03:02 PM, Tian, Kevin wrote:
>> From: Kai Huang [mailto:kai.huang@linux.intel.com]
>> Sent: Thursday, February 12, 2015 10:50 AM
>>
>>>> - PML buffer flush
>>>>
>>>> There are two places we need to flush PML buffer. The first place is PML
>>>> buffer full VMEXIT handler (apparently), and the second place is in
>>>> paging_log_dirty_op (either peek or clean), as vcpus are running
>>>> asynchronously along with paging_log_dirty_op is called from userspace
>> via
>>>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>>>> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers
>>>> before reporting dirty GPAs to userspace.
>>>>
>>>> We handle above two cases by flushing PML buffer at the beginning of all
>>>> VMEXITs. This solves the first case above, and it also solves the second
>>>> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks
>>>> vcpus (that are in guest mode) out of guest mode via sending IPI, which
>> cause
>>>> VMEXIT, to them.
>>>>
>>>> This also makes log-dirty radix tree more updated as PML buffer is flushed
>>>> on basis of all VMEXITs but not only PML buffer full VMEXIT.
>>> Is that really efficient? Flushing the buffer only as needed doesn't
>>> seem to be a major problem (apart from the usual preemption issue
>>> when dealing with guests with very many vCPU-s, but you certainly
>>> recall that at this point HVM is still limited to 128).
>>>
>>> Apart from these two remarks, the design looks okay to me.
>> While keeping log-dirty radix tree more updated is probably irrelevant,
>> I do think we'd better to flush PML buffers in paging_log_dirty_op (both
>> peek and clear) before reporting dirty pages to userspace, in which case
>> I think flushing PML buffer at beginning of VMEXIT is a good idea, as
>> domain_pause does the job automatically. I am not sure how much cycles
>> will flushing PML buffer contribute but I think it should be relatively
>> small comparing to VMEXIT itself, therefore it can be ignored.
> it's not intuitive to add overhead (one extra vmread) to every vmexit
> just for utilizing the side-effect of one specific exit due to domain_pause.
What's the cost of one vmread? It's reasonable to avoid it if it's heavy.

>
>> An optimized way probably is we only flush PML buffer for external
>> interrupt VMEXIT, which domain_pause really triggers, but not at
>> beginning of all VMEXITs. But as log as the overhead of flush PML buffer
>> is negligible, this optimization is also unnecessary.
>>
> this optimization is not real optimization as you still stick to implementation
> detail of other operations.
Would you give me some possible hints? To me above is the most optimized 
way I can figure :)
> If you really want to take use of domain_pause,
> piggyback PML flush explicitly in that path make things clearer.
domain_pause is called in many code path, looks it's not as optimized as 
my above one.

Thanks,
-Kai
>
> Thanks
> Keivn
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  6:45         ` Kai Huang
@ 2015-02-12  7:08           ` Tian, Kevin
  2015-02-12  7:34             ` Kai Huang
  2015-02-12 12:42             ` Tim Deegan
  0 siblings, 2 replies; 54+ messages in thread
From: Tian, Kevin @ 2015-02-12  7:08 UTC (permalink / raw)
  To: Kai Huang, Jan Beulich, Andrew Cooper; +Cc: keir, tim, xen-devel

> From: Kai Huang [mailto:kai.huang@linux.intel.com]
> Sent: Thursday, February 12, 2015 2:46 PM
> 
> On 02/12/2015 02:25 PM, Tian, Kevin wrote:
> >> From: Kai Huang [mailto:kai.huang@linux.intel.com]
> >> Sent: Thursday, February 12, 2015 10:35 AM
> >>
> >> On 02/11/2015 09:13 PM, Jan Beulich wrote:
> >>>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
> >>>> On 11/02/15 08:28, Kai Huang wrote:
> >>>>> With PML, we don't have to use write protection but just clear D-bit
> >>>>> of EPT entry of guest memory to do dirty logging, with an additional
> >>>>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
> >>>>> reduce hypervisor overhead when guest is in dirty logging mode, and
> >>>>> therefore more CPU cycles can be allocated to guest, so it's expected
> >>>>> benchmarks in guest will have better performance comparing to
> >> non-PML.
> >>>> One issue with basic EPT A/D tracking was the scan of the EPT tables.
> >>>> Here, hardware will give us a list of affected gfns, but how is Xen
> >>>> supposed to efficiently clear the dirty bits again?  Using EPT
> >>>> misconfiguration is no better than the existing fault path.
> >>> Why not? The misconfiguration exit ought to clear the D bit for all
> >>> 511 entries in the L1 table (and set it for the one entry that is
> >>> currently serving the access). All further D bit handling will then
> >>> be PML based.
> >> Indeed, we clear D-bit in EPT misconfiguration. In my understanding, the
> >> sequences are as follows:
> >>
> >> 1) PML enabled for the domain.
> >> 2) ept_invalidate_emt (or ept_invalidate_emt_range) is called.
> >> 3) Guest accesses specific GPA (which has been invalidated by step 2),
> >> and EPT misconfig is triggered.
> >> 4) Then resolve_misconfig is called, which fixes up GFN (above GPA >>
> >> 12) to p2m_ram_logdirty, and calls ept_p2m_type_to_flags, in which we
> >> clear D-bit of EPT entry (instead of clear W-bit) if p2m type is
> >> p2m_ram_logdirty. Then dirty logging of this GFN will be handled by PML.
> >>
> >> The above 2) ~ 4) will be repeated when log-dirty radix tree is cleared.
> > is ept_invalidate_emt required by existing logdirty mode or by PML enable?
> It's in existing logdirty code.
> > can we clear D bits directly when log-dirty radix tree is cleared to reduce
> > EPT misconfig exits for repeatedly dirtied pages?
> Theoretically we can, and looks logdirty for video ram is done in this
> way (logdirty for the page is re-enabled while it is reported to
> dirty_bitmap). One thing is looks video ram logdirty only exists for HAP
> mode.
> But in current log dirty implementation for global logdirty, at common
> paging layer, the log-dirty radix tree is cleaned in single step after
> reporting all dirty pages to userspace. And it just calls
> ept_invalidate_emt essentially. Therefore we need to modify logdirty
> common code at paging layer to achieve this, which is more like logdirty
> enhancement but not related to PML enabling directly. And any change of
> interface in paging layer requires modification in shadow mode
> accordingly, so currently I just choose not to do it.
> 

for general log dirty, ept_invalidate_emt is required because there is 
access permission change (dirtied page becomes rw after 1st fault,
so need to change them back to ro again for the new dirty tracking
round). But for PML, there's no permission change at all (always rw),
so such behavior should be noted by general logdirty layer for better
optimization. I'm OK not doing so for initial enabling patch, but it's
something you can think about later. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  6:56       ` Kai Huang
@ 2015-02-12  7:09         ` Tian, Kevin
  2015-02-12  7:15           ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Tian, Kevin @ 2015-02-12  7:09 UTC (permalink / raw)
  To: Kai Huang, Andrew Cooper, jbeulich, tim, keir, xen-devel

> From: Kai Huang [mailto:kai.huang@linux.intel.com]
> Sent: Thursday, February 12, 2015 2:57 PM
> 
> On 02/12/2015 02:54 PM, Tian, Kevin wrote:
> >> From: Kai Huang [mailto:kai.huang@linux.intel.com]
> >> Sent: Thursday, February 12, 2015 10:39 AM
> >>>> PML needs to be enabled (allocate PML buffer, initialize PML index,
> >>>> PML base address, turn PML on VMCS, etc) for all vcpus of the domain,
> >>>> as PML buffer and PML index are per-vcpu, but EPT table may be shared
> >>>> by vcpus. Enabling PML on partial vcpus of the domain won't work. Also
> >>>> PML will only be enabled for the domain when it is switched to dirty
> >>>> logging mode, and it will be disabled when domain is switched back to
> >>>> normal mode. As looks vcpu number won't be changed dynamically
> during
> >>>> guest is running (correct me if I am wrong here), so we don't have to
> >>>> consider enabling PML for new created vcpu when guest is in dirty
> >>>> logging mode.
> >>> There are exactly d->max_vcpus worth of struct vcpus (and therefore
> >>> VMCSes) for a domain after creation, and will exist for the lifetime of
> >>> the domain.  There is no dynamic adjustment of numbers of vcpus during
> >>> runtime.
> >> Good to know.
> > could we at least detect and warn vcpu changes when PML is enabled?
> > dirty logging happens out of guest's knowledge and there could be the
> > case where user right online/offline a vcpu within that window.
> Why is the warning necessary? There's no harm leaving PML enabled when
> vcpu becomes offline.

what about online? you need enable PML for newly-online vcpu since
meaningful work may be scheduled to it within logdirty window.

> 
> Also we will not disable PML for that vcpu when it becomes offline, in
> which case we don't need to re-enable PML, which can fail, when vcpu
> becomes online again. It simplifies the logic.

offline is not a problem

Thanks
Kevin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  7:09         ` Tian, Kevin
@ 2015-02-12  7:15           ` Kai Huang
  0 siblings, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-12  7:15 UTC (permalink / raw)
  To: Tian, Kevin, Andrew Cooper, jbeulich, tim, keir, xen-devel


On 02/12/2015 03:09 PM, Tian, Kevin wrote:
>> From: Kai Huang [mailto:kai.huang@linux.intel.com]
>> Sent: Thursday, February 12, 2015 2:57 PM
>>
>> On 02/12/2015 02:54 PM, Tian, Kevin wrote:
>>>> From: Kai Huang [mailto:kai.huang@linux.intel.com]
>>>> Sent: Thursday, February 12, 2015 10:39 AM
>>>>>> PML needs to be enabled (allocate PML buffer, initialize PML index,
>>>>>> PML base address, turn PML on VMCS, etc) for all vcpus of the domain,
>>>>>> as PML buffer and PML index are per-vcpu, but EPT table may be shared
>>>>>> by vcpus. Enabling PML on partial vcpus of the domain won't work. Also
>>>>>> PML will only be enabled for the domain when it is switched to dirty
>>>>>> logging mode, and it will be disabled when domain is switched back to
>>>>>> normal mode. As looks vcpu number won't be changed dynamically
>> during
>>>>>> guest is running (correct me if I am wrong here), so we don't have to
>>>>>> consider enabling PML for new created vcpu when guest is in dirty
>>>>>> logging mode.
>>>>> There are exactly d->max_vcpus worth of struct vcpus (and therefore
>>>>> VMCSes) for a domain after creation, and will exist for the lifetime of
>>>>> the domain.  There is no dynamic adjustment of numbers of vcpus during
>>>>> runtime.
>>>> Good to know.
>>> could we at least detect and warn vcpu changes when PML is enabled?
>>> dirty logging happens out of guest's knowledge and there could be the
>>> case where user right online/offline a vcpu within that window.
>> Why is the warning necessary? There's no harm leaving PML enabled when
>> vcpu becomes offline.
> what about online? you need enable PML for newly-online vcpu since
> meaningful work may be scheduled to it within logdirty window.
Do you mean vcpu number (those offline + those online) can be changed 
during guest's runtime, ex, a new vcpu is created and becomes online 
after PML is enabled for the domain? Otherwise, I don't see a problem.

As long as the total number of vcpus remains constant, it's not a 
problem, as we only enable PML after all vcpus are created (and it 
remains constant), and the vcpu status is irrelevant.

Thanks,
-Kai
>
>> Also we will not disable PML for that vcpu when it becomes offline, in
>> which case we don't need to re-enable PML, which can fail, when vcpu
>> becomes online again. It simplifies the logic.
> offline is not a problem
>
> Thanks
> Kevin
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  7:08           ` Tian, Kevin
@ 2015-02-12  7:34             ` Kai Huang
  2015-02-12 12:42             ` Tim Deegan
  1 sibling, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-12  7:34 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Andrew Cooper; +Cc: tim, keir, xen-devel


On 02/12/2015 03:08 PM, Tian, Kevin wrote:
>> From: Kai Huang [mailto:kai.huang@linux.intel.com]
>> Sent: Thursday, February 12, 2015 2:46 PM
>>
>> On 02/12/2015 02:25 PM, Tian, Kevin wrote:
>>>> From: Kai Huang [mailto:kai.huang@linux.intel.com]
>>>> Sent: Thursday, February 12, 2015 10:35 AM
>>>>
>>>> On 02/11/2015 09:13 PM, Jan Beulich wrote:
>>>>>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 11/02/15 08:28, Kai Huang wrote:
>>>>>>> With PML, we don't have to use write protection but just clear D-bit
>>>>>>> of EPT entry of guest memory to do dirty logging, with an additional
>>>>>>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>>>>>>> reduce hypervisor overhead when guest is in dirty logging mode, and
>>>>>>> therefore more CPU cycles can be allocated to guest, so it's expected
>>>>>>> benchmarks in guest will have better performance comparing to
>>>> non-PML.
>>>>>> One issue with basic EPT A/D tracking was the scan of the EPT tables.
>>>>>> Here, hardware will give us a list of affected gfns, but how is Xen
>>>>>> supposed to efficiently clear the dirty bits again?  Using EPT
>>>>>> misconfiguration is no better than the existing fault path.
>>>>> Why not? The misconfiguration exit ought to clear the D bit for all
>>>>> 511 entries in the L1 table (and set it for the one entry that is
>>>>> currently serving the access). All further D bit handling will then
>>>>> be PML based.
>>>> Indeed, we clear D-bit in EPT misconfiguration. In my understanding, the
>>>> sequences are as follows:
>>>>
>>>> 1) PML enabled for the domain.
>>>> 2) ept_invalidate_emt (or ept_invalidate_emt_range) is called.
>>>> 3) Guest accesses specific GPA (which has been invalidated by step 2),
>>>> and EPT misconfig is triggered.
>>>> 4) Then resolve_misconfig is called, which fixes up GFN (above GPA >>
>>>> 12) to p2m_ram_logdirty, and calls ept_p2m_type_to_flags, in which we
>>>> clear D-bit of EPT entry (instead of clear W-bit) if p2m type is
>>>> p2m_ram_logdirty. Then dirty logging of this GFN will be handled by PML.
>>>>
>>>> The above 2) ~ 4) will be repeated when log-dirty radix tree is cleared.
>>> is ept_invalidate_emt required by existing logdirty mode or by PML enable?
>> It's in existing logdirty code.
>>> can we clear D bits directly when log-dirty radix tree is cleared to reduce
>>> EPT misconfig exits for repeatedly dirtied pages?
>> Theoretically we can, and looks logdirty for video ram is done in this
>> way (logdirty for the page is re-enabled while it is reported to
>> dirty_bitmap). One thing is looks video ram logdirty only exists for HAP
>> mode.
>> But in current log dirty implementation for global logdirty, at common
>> paging layer, the log-dirty radix tree is cleaned in single step after
>> reporting all dirty pages to userspace. And it just calls
>> ept_invalidate_emt essentially. Therefore we need to modify logdirty
>> common code at paging layer to achieve this, which is more like logdirty
>> enhancement but not related to PML enabling directly. And any change of
>> interface in paging layer requires modification in shadow mode
>> accordingly, so currently I just choose not to do it.
>>
> for general log dirty, ept_invalidate_emt is required because there is
> access permission change (dirtied page becomes rw after 1st fault,
> so need to change them back to ro again for the new dirty tracking
> round). But for PML, there's no permission change at all (always rw),
> so such behavior should be noted by general logdirty layer for better
> optimization. I'm OK not doing so for initial enabling patch, but it's
> something you can think about later. :-)
Yes thanks for the point :)

Thanks,
-Kai
>
> Thanks
> Kevin
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-11  8:28 PML (Page Modification Logging) design for Xen Kai Huang
  2015-02-11 11:52 ` Andrew Cooper
  2015-02-11 13:06 ` Jan Beulich
@ 2015-02-12 12:34 ` Tim Deegan
  2015-02-13  2:50   ` Kai Huang
  2 siblings, 1 reply; 54+ messages in thread
From: Tim Deegan @ 2015-02-12 12:34 UTC (permalink / raw)
  To: Kai Huang; +Cc: andrew.cooper3, kevin.tian, keir, jbeulich, xen-devel

Hi,

Thanks for posting this design!

At 16:28 +0800 on 11 Feb (1423668493), Kai Huang wrote:
> Design
> ======
> 
> - PML feature is used globally
> 
> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to control PML feature detection, and PML feature will only be detected if opt_enable_pml = 1. Once PML feature is detected, it will be used for dirty logging for all domains globally. Currently we don't support to use PML on basis of per-domain as it will require additional control from XL tool.

Sounds good.  I agree that there's no point in making this a per-VM
feature. 

> - PML enable/disable for particular Domain
> 
> PML needs to be enabled (allocate PML buffer, initialize PML index, PML base address, turn PML on VMCS, etc) for all vcpus of the domain, as PML buffer and PML index are per-vcpu, but EPT table may be shared by vcpus. Enabling PML on partial vcpus of the domain won't work. Also PML will only be enabled for the domain when it is switched to dirty logging mode, and it will be disabled when domain is switched back to normal mode. As looks vcpu number won't be changed dynamically during guest is running (correct me if I am wrong here), so we don't have to consider enabling PML for new created vcpu when guest is in dirty logging mode.
>

No - you really ought to handle enabling this for new VCPUs.  There
have been cases in the past where VMs are put into log-dirty mode
before their VCPUs are assigned, and there might be again.  

It ought to be easy to handle, though - just one more check and
function call on the vcpu setup path.

> After PML is enabled for the domain, we only need to clear EPT entry's D-bit for guest memory in dirty logging mode. We achieve this by checking if PML is enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and updating EPT entry accordingly. However, for super pages, we still write protect them in case of PML as we still need to split super page to 4K page in dirty logging mode.
>

IIUC, you are suggesting leaving superpages handled as they are now,
with read-only EPTEs, and only using PML for single-page mappings.
That seems good. :)

> - PML buffer flush
> 
> There are two places we need to flush PML buffer. The first place is PML buffer full VMEXIT handler (apparently), and the second place is in paging_log_dirty_op (either peek or clean), as vcpus are running asynchronously along with paging_log_dirty_op is called from userspace via hypercall, and it's possible there are dirty GPAs logged in vcpus' PML buffers but not full. Therefore we'd better to flush all vcpus' PML buffers before reporting dirty GPAs to userspace.
> 
> We handle above two cases by flushing PML buffer at the beginning of all VMEXITs. This solves the first case above, and it also solves the second case, as prior to paging_log_dirty_op, domain_pause is called, which kicks vcpus (that are in guest mode) out of guest mode via sending IPI, which cause VMEXIT, to them.
>

I would prefer to flush only on buffer-full VMEXITs and handle the
peek/clear path by explicitly reading all VCPUs' buffers.  That avoids
putting more code on the fast paths for other VMEXIT types.

> This also makes log-dirty radix tree more updated as PML buffer is flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
> 
> - Video RAM tracking (and partial dirty logging for guest memory range)
> 
> Video RAM is in dirty logging mode unconditionally during guest's run-time, and it is partial memory range of the guest. However, PML operates on the whole guest memory (the whole valid EPT table, more precisely), so we need to choose whether to use PML if only partial guest memory ranges are in dirty logging mode.
> 
> Currently, PML will be used as long as there's guest memory in dirty logging mode, no matter globally or partially. And in case of partial dirty logging, we need to check if the logged GPA in PML buffer is in dirty logging range.
> 

I think, as other people have said, that you can just use PML for this
case without any other restrictions.  After all, mappings for non-VRAM
areas ought not to have their D-bits clear anyway.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  7:08           ` Tian, Kevin
  2015-02-12  7:34             ` Kai Huang
@ 2015-02-12 12:42             ` Tim Deegan
  2015-02-13  2:15               ` Kai Huang
  2015-02-13  2:28               ` Tian, Kevin
  1 sibling, 2 replies; 54+ messages in thread
From: Tim Deegan @ 2015-02-12 12:42 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Kai Huang, Andrew Cooper, keir, Jan Beulich, xen-devel

At 07:08 +0000 on 12 Feb (1423721283), Tian, Kevin wrote:
> for general log dirty, ept_invalidate_emt is required because there is 
> access permission change (dirtied page becomes rw after 1st fault,
> so need to change them back to ro again for the new dirty tracking
> round). But for PML, there's no permission change at all (always rw),
> so such behavior should be noted by general logdirty layer for better
> optimization.

AIUI the reason for calling ept_invalidate_emt() is to avoid having to
update a large number of EPTEs at once.  If you still need to update a
large number of EPTEs (to clear the Dirty bits), that has to me
preemptable, or else use ept_invalidate_emt().

Or have I misunderstood?

Tim.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  6:54     ` Tian, Kevin
  2015-02-12  6:56       ` Kai Huang
@ 2015-02-12 14:10       ` Andrew Cooper
  2015-02-13  0:58         ` Bing
  2015-02-13  2:11         ` Kai Huang
  1 sibling, 2 replies; 54+ messages in thread
From: Andrew Cooper @ 2015-02-12 14:10 UTC (permalink / raw)
  To: Tian, Kevin, Kai Huang, jbeulich, tim, keir, xen-devel

On 12/02/15 06:54, Tian, Kevin wrote:
>
>>> which presumably
>>> means that the PML buffer flush needs to be aware of which gfns are
>>> mapped by superpages to be able to correctly set a block of bits in the
>>> logdirty bitmap.
>>>
>> Unfortunately PML itself can't tell us if the logged GPA comes from
>> superpage or not, but even in PML we still need to split superpages to
>> 4K page, just like traditional write protection approach does. I think
>> this is because live migration should be based on 4K page granularity.
>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>> make sense in both write protection and PML cases.
>>
> agree. extending one write to superpage enlarges dirty set unnecessary.
> since spec doesn't say superpage logging is not supported, I'd think a
> 4k-aligned entry being logged if within superpage.

The spec states that an gfn is appended to the log strictly on the
transition of the D bit from 0 to 1.

In the case of a 2M superpage, there is a single D bit for the entire 2M
range.


The plausible (working) scenarios I can see are:

1) superpages are not supported (not indicated by the whitepaper).
2) a single entry will be written which must be taken to cover the
entire 2M range.
3) an individual entry is written for every access.

Have I missed anything?

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* PML (Page Modification Logging) design for Xen
  2015-02-12 14:10       ` Andrew Cooper
@ 2015-02-13  0:58         ` Bing
  2015-02-13  2:11         ` Kai Huang
  1 sibling, 0 replies; 54+ messages in thread
From: Bing @ 2015-02-13  0:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tian, Kevin, keir, tim, xen-devel, Kai Huang, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 1939 bytes --]

​>On 12/02/15 06:54, Tian, Kevin wrote:
>>
>>>> which presumably
>>>> means that the PML buffer flush needs to be aware of which gfns are
>>>> mapped by superpages to be able to correctly set a block of bits in the
>>>> logdirty bitmap.
>>>>
>>> Unfortunately PML itself can't tell us if the logged GPA comes from
>>> superpage or not, but even in PML we still need to split superpages to
>>> 4K page, just like traditional write protection approach does. I think
>>> this is because live migration should be based on 4K page granularity.
>>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>>> make sense in both write protection and PML cases.
>>>
>> agree. extending one write to superpage enlarges dirty set unnecessary.
>> since spec doesn't say superpage logging is not supported, I'd think a
>> 4k-aligned entry being logged if within superpage.
>
>The spec states that an gfn is appended to the log strictly on the
>transition of the D bit from 0 to 1.
>
>In the case of a 2M superpage, there is a single D bit for the entire 2M
>range.
>
>
>The plausible (working) scenarios I can see are:
>
>1) superpages are not supported (not indicated by the whitepaper).

It seems The whitepaper doesn't say it is not supported either.  from my
understanding, it should be supported. whenever a dirty flag in a leaf EPT
table entry that points to the final machine frame (4KB or 2MB,..) is set,
the corresponding written address will be logged.

>2) a single entry will be written which must be taken to cover the
>entire 2M range.

I think yes, and any subsequent writes to the same 2M range address won't
be updated to the log entry.

>3) an individual entry is written for every access.

If I understand correctly for this, only the first write access for the
same 2M page (or 4K page if 4KB size is used in EPT).

>Have I missed anything?
>
>~Andrew
​

Bing

[-- Attachment #1.2: Type: text/html, Size: 6127 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12 14:10       ` Andrew Cooper
  2015-02-13  0:58         ` Bing
@ 2015-02-13  2:11         ` Kai Huang
  2015-02-13 10:57           ` Andrew Cooper
  1 sibling, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-13  2:11 UTC (permalink / raw)
  To: Andrew Cooper, Tian, Kevin, jbeulich, tim, keir, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2840 bytes --]


On 02/12/2015 10:10 PM, Andrew Cooper wrote:
> On 12/02/15 06:54, Tian, Kevin wrote:
>>>> which presumably
>>>> means that the PML buffer flush needs to be aware of which gfns are
>>>> mapped by superpages to be able to correctly set a block of bits in the
>>>> logdirty bitmap.
>>>>
>>> Unfortunately PML itself can't tell us if the logged GPA comes from
>>> superpage or not, but even in PML we still need to split superpages to
>>> 4K page, just like traditional write protection approach does. I think
>>> this is because live migration should be based on 4K page granularity.
>>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>>> make sense in both write protection and PML cases.
>>>
>> agree. extending one write to superpage enlarges dirty set unnecessary.
>> since spec doesn't say superpage logging is not supported, I'd think a
>> 4k-aligned entry being logged if within superpage.
> The spec states that an gfn is appended to the log strictly on the
> transition of the D bit from 0 to 1.
>
> In the case of a 2M superpage, there is a single D bit for the entire 2M
> range.
>
>
> The plausible (working) scenarios I can see are:
>
> 1) superpages are not supported (not indicated by the whitepaper).
A better description would be -- PML doesn't check if it's superpage, it 
just operates with D-bit, no matter what page size.
> 2) a single entry will be written which must be taken to cover the
> entire 2M range.
> 3) an individual entry is written for every access.
Below is the reply from our hardware guy related to PML on superpage. It 
should have answered accurately.

"As noted in Section 1.3, logging occurs whenever the CPU would set an 
EPT D bit.

It does not matter whether the D bit is in an EPT PTE (4KB page), EPT 
PDE (2MB page), or EPT PDPTE (1GB page).

In all cases, the GPA written to the PML log will be the address of the 
write that causes the D bit in question to be updated, with bits 11:0 
cleared.

This means that, in the case in which the D bit is in an EPT PDE or an 
EPT PDPTE, the log entry will communicate which 4KB region within the 
larger page was being written.

Once the D bit is set in one of these entries, a subsequent write to the 
larger page will not generate a log entry, even if that write is to a 
different 4KB region within the larger page.  This is because log 
entries are created only when a D bit is being set and a write will not 
cause a D bit to be set if the page's D bit is already set.

The log entries do not communicate the level of the EPT paging-structure 
entry in which the D bit was set (i.e., it does not communicate the page 
size). "

Thanks,
-Kai


>
> Have I missed anything?
>
> ~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 4438 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12 12:42             ` Tim Deegan
@ 2015-02-13  2:15               ` Kai Huang
  2015-02-13  2:28               ` Tian, Kevin
  1 sibling, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-13  2:15 UTC (permalink / raw)
  To: Tim Deegan, Tian, Kevin; +Cc: Andrew Cooper, keir, Jan Beulich, xen-devel


On 02/12/2015 08:42 PM, Tim Deegan wrote:
> At 07:08 +0000 on 12 Feb (1423721283), Tian, Kevin wrote:
>> for general log dirty, ept_invalidate_emt is required because there is
>> access permission change (dirtied page becomes rw after 1st fault,
>> so need to change them back to ro again for the new dirty tracking
>> round). But for PML, there's no permission change at all (always rw),
>> so such behavior should be noted by general logdirty layer for better
>> optimization.
> AIUI the reason for calling ept_invalidate_emt() is to avoid having to
> update a large number of EPTEs at once.  If you still need to update a
> large number of EPTEs (to clear the Dirty bits), that has to me
> preemptable, or else use ept_invalidate_emt().
>
> Or have I misunderstood?
I think you are correct. We still need to use ept_invalidate_emt for 
clearing D-bit, unless we invent a new paging layer interface, say 
paging_enable_log_dirty_gfn, which explicitly enables log-dirty for 
single GFN, either by write protection, or clearing D-bit, in case of PML.

Thanks,
-Kai
>
> Tim.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12 12:42             ` Tim Deegan
  2015-02-13  2:15               ` Kai Huang
@ 2015-02-13  2:28               ` Tian, Kevin
  2015-02-17 10:40                 ` Jan Beulich
  1 sibling, 1 reply; 54+ messages in thread
From: Tian, Kevin @ 2015-02-13  2:28 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Kai Huang, Andrew Cooper, keir, Jan Beulich, xen-devel

> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, February 12, 2015 8:42 PM
> 
> At 07:08 +0000 on 12 Feb (1423721283), Tian, Kevin wrote:
> > for general log dirty, ept_invalidate_emt is required because there is
> > access permission change (dirtied page becomes rw after 1st fault,
> > so need to change them back to ro again for the new dirty tracking
> > round). But for PML, there's no permission change at all (always rw),
> > so such behavior should be noted by general logdirty layer for better
> > optimization.
> 
> AIUI the reason for calling ept_invalidate_emt() is to avoid having to
> update a large number of EPTEs at once.  If you still need to update a
> large number of EPTEs (to clear the Dirty bits), that has to me
> preemptable, or else use ept_invalidate_emt().
> 
> Or have I misunderstood?
> 

preemptable is fine and we can judge whether dirty set is large or not. 
My feeling is that replace simple D-bit cleanup with ept misconfig exit
is not optimal. Jan explained not strictly one misconfig exit for every D 
bit since whole L1 will be handled in a batch, but we need have some 
understanding of actual impact based on various workload patterns.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12 12:34 ` Tim Deegan
@ 2015-02-13  2:50   ` Kai Huang
  2015-02-16 14:01     ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-13  2:50 UTC (permalink / raw)
  To: Tim Deegan; +Cc: andrew.cooper3, kevin.tian, keir, jbeulich, xen-devel


On 02/12/2015 08:34 PM, Tim Deegan wrote:
> Hi,
>
> Thanks for posting this design!
>
> At 16:28 +0800 on 11 Feb (1423668493), Kai Huang wrote:
>> Design
>> ======
>>
>> - PML feature is used globally
>>
>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to control PML feature detection, and PML feature will only be detected if opt_enable_pml = 1. Once PML feature is detected, it will be used for dirty logging for all domains globally. Currently we don't support to use PML on basis of per-domain as it will require additional control from XL tool.
> Sounds good.  I agree that there's no point in making this a per-VM
> feature.
>
>> - PML enable/disable for particular Domain
>>
>> PML needs to be enabled (allocate PML buffer, initialize PML index, PML base address, turn PML on VMCS, etc) for all vcpus of the domain, as PML buffer and PML index are per-vcpu, but EPT table may be shared by vcpus. Enabling PML on partial vcpus of the domain won't work. Also PML will only be enabled for the domain when it is switched to dirty logging mode, and it will be disabled when domain is switched back to normal mode. As looks vcpu number won't be changed dynamically during guest is running (correct me if I am wrong here), so we don't have to consider enabling PML for new created vcpu when guest is in dirty logging mode.
>>
> No - you really ought to handle enabling this for new VCPUs.  There
> have been cases in the past where VMs are put into log-dirty mode
> before their VCPUs are assigned, and there might be again.
"Assigned" here means created?

>
> It ought to be easy to handle, though - just one more check and
> function call on the vcpu setup path.
I think "check and function call" means check function call to enable 
PML on this vcpu? Then what if enabling PML for vcpu fails (possible as 
it needs to allocate 4K PML buffer)? It's better to choose to roll back 
to use write protection instead of indicating failure of creating the 
vcpu. But in this case there will be problem if the domain has already 
been in log dirty mode as we might already have EPT table setup with 
D-bit clear for logdirty range, which means we need to re-check the 
logdirty ranges and re-set EPT table to be read-only.  Does this sound 
reasonable?

>> After PML is enabled for the domain, we only need to clear EPT entry's D-bit for guest memory in dirty logging mode. We achieve this by checking if PML is enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and updating EPT entry accordingly. However, for super pages, we still write protect them in case of PML as we still need to split super page to 4K page in dirty logging mode.
>>
> IIUC, you are suggesting leaving superpages handled as they are now,
> with read-only EPTEs, and only using PML for single-page mappings.
> That seems good. :)
>
>> - PML buffer flush
>>
>> There are two places we need to flush PML buffer. The first place is PML buffer full VMEXIT handler (apparently), and the second place is in paging_log_dirty_op (either peek or clean), as vcpus are running asynchronously along with paging_log_dirty_op is called from userspace via hypercall, and it's possible there are dirty GPAs logged in vcpus' PML buffers but not full. Therefore we'd better to flush all vcpus' PML buffers before reporting dirty GPAs to userspace.
>>
>> We handle above two cases by flushing PML buffer at the beginning of all VMEXITs. This solves the first case above, and it also solves the second case, as prior to paging_log_dirty_op, domain_pause is called, which kicks vcpus (that are in guest mode) out of guest mode via sending IPI, which cause VMEXIT, to them.
>>
> I would prefer to flush only on buffer-full VMEXITs and handle the
> peek/clear path by explicitly reading all VCPUs' buffers.  That avoids
> putting more code on the fast paths for other VMEXIT types.
OK. But looks this requires a new interface like paging_flush_log_dirty, 
called at beginning of paging_log_dirty_op? This is actually what I 
wanted to avoid originally.

>
>> This also makes log-dirty radix tree more updated as PML buffer is flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>>
>> - Video RAM tracking (and partial dirty logging for guest memory range)
>>
>> Video RAM is in dirty logging mode unconditionally during guest's run-time, and it is partial memory range of the guest. However, PML operates on the whole guest memory (the whole valid EPT table, more precisely), so we need to choose whether to use PML if only partial guest memory ranges are in dirty logging mode.
>>
>> Currently, PML will be used as long as there's guest memory in dirty logging mode, no matter globally or partially. And in case of partial dirty logging, we need to check if the logged GPA in PML buffer is in dirty logging range.
>>
> I think, as other people have said, that you can just use PML for this
> case without any other restrictions.  After all, mappings for non-VRAM
> areas ought not to have their D-bits clear anyway.
Agreed.

Thanks,
-Kai
>
> Cheers,
>
> Tim.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-13  2:11         ` Kai Huang
@ 2015-02-13 10:57           ` Andrew Cooper
  2015-02-13 14:32             ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2015-02-13 10:57 UTC (permalink / raw)
  To: Kai Huang, Tian, Kevin, jbeulich, tim, keir, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3486 bytes --]

On 13/02/15 02:11, Kai Huang wrote:
>
> On 02/12/2015 10:10 PM, Andrew Cooper wrote:
>> On 12/02/15 06:54, Tian, Kevin wrote:
>>>>> which presumably
>>>>> means that the PML buffer flush needs to be aware of which gfns are
>>>>> mapped by superpages to be able to correctly set a block of bits in the
>>>>> logdirty bitmap.
>>>>>
>>>> Unfortunately PML itself can't tell us if the logged GPA comes from
>>>> superpage or not, but even in PML we still need to split superpages to
>>>> 4K page, just like traditional write protection approach does. I think
>>>> this is because live migration should be based on 4K page granularity.
>>>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>>>> make sense in both write protection and PML cases.
>>>>
>>> agree. extending one write to superpage enlarges dirty set unnecessary.
>>> since spec doesn't say superpage logging is not supported, I'd think a
>>> 4k-aligned entry being logged if within superpage.
>> The spec states that an gfn is appended to the log strictly on the
>> transition of the D bit from 0 to 1.
>>
>> In the case of a 2M superpage, there is a single D bit for the entire 2M
>> range.
>>
>>
>> The plausible (working) scenarios I can see are:
>>
>> 1) superpages are not supported (not indicated by the whitepaper).
> A better description would be -- PML doesn't check if it's superpage,
> it just operates with D-bit, no matter what page size.
>> 2) a single entry will be written which must be taken to cover the
>> entire 2M range.
>> 3) an individual entry is written for every access.
> Below is the reply from our hardware guy related to PML on superpage.
> It should have answered accurately.
>
> "As noted in Section 1.3, logging occurs whenever the CPU would set an
> EPT D bit.
>
> It does not matter whether the D bit is in an EPT PTE (4KB page), EPT
> PDE (2MB page), or EPT PDPTE (1GB page).
>
> In all cases, the GPA written to the PML log will be the address of
> the write that causes the D bit in question to be updated, with bits
> 11:0 cleared.
>
> This means that, in the case in which the D bit is in an EPT PDE or an
> EPT PDPTE, the log entry will communicate which 4KB region within the
> larger page was being written.
>
> Once the D bit is set in one of these entries, a subsequent write to
> the larger page will not generate a log entry, even if that write is
> to a different 4KB region within the larger page.  This is because log
> entries are created only when a D bit is being set and a write will
> not cause a D bit to be set if the page's D bit is already set.
>
> The log entries do not communicate the level of the EPT
> paging-structure entry in which the D bit was set (i.e., it does not
> communicate the page size). "

Thanks for the clarification.

The result of this behaviour is that the PML flush logic is going to
have to look up each gfn and check whether it is mapped by a superpage,
which will add a sizeable overhead.

It is also not conducive to minimising the data transmitted in the
migration stream.


One future option might be to shatter all the EPT superpages when
logdirty is enabled.  This would be ok for a domain which is being
migrated away, but would be suboptiomal for snapshot operations; Xen
currently has no ability to coalesce pages back into superpages.  It
also interacts poorly with HAP vram tracking which enables logdirty mode
itself.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 4704 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-13 10:57           ` Andrew Cooper
@ 2015-02-13 14:32             ` Kai Huang
  2015-02-13 15:28               ` Andrew Cooper
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-13 14:32 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tian, Kevin, keir, tim, xen-devel, Kai Huang, jbeulich

On Fri, Feb 13, 2015 at 6:57 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 13/02/15 02:11, Kai Huang wrote:
>
>
> On 02/12/2015 10:10 PM, Andrew Cooper wrote:
>
> On 12/02/15 06:54, Tian, Kevin wrote:
>
> which presumably
> means that the PML buffer flush needs to be aware of which gfns are
> mapped by superpages to be able to correctly set a block of bits in the
> logdirty bitmap.
>
> Unfortunately PML itself can't tell us if the logged GPA comes from
> superpage or not, but even in PML we still need to split superpages to
> 4K page, just like traditional write protection approach does. I think
> this is because live migration should be based on 4K page granularity.
> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
> make sense in both write protection and PML cases.
>
> agree. extending one write to superpage enlarges dirty set unnecessary.
> since spec doesn't say superpage logging is not supported, I'd think a
> 4k-aligned entry being logged if within superpage.
>
> The spec states that an gfn is appended to the log strictly on the
> transition of the D bit from 0 to 1.
>
> In the case of a 2M superpage, there is a single D bit for the entire 2M
> range.
>
>
> The plausible (working) scenarios I can see are:
>
> 1) superpages are not supported (not indicated by the whitepaper).
>
> A better description would be -- PML doesn't check if it's superpage, it
> just operates with D-bit, no matter what page size.
>
> 2) a single entry will be written which must be taken to cover the
> entire 2M range.
> 3) an individual entry is written for every access.
>
> Below is the reply from our hardware guy related to PML on superpage. It
> should have answered accurately.
>
> "As noted in Section 1.3, logging occurs whenever the CPU would set an EPT D
> bit.
>
> It does not matter whether the D bit is in an EPT PTE (4KB page), EPT PDE
> (2MB page), or EPT PDPTE (1GB page).
>
> In all cases, the GPA written to the PML log will be the address of the
> write that causes the D bit in question to be updated, with bits 11:0
> cleared.
>
> This means that, in the case in which the D bit is in an EPT PDE or an EPT
> PDPTE, the log entry will communicate which 4KB region within the larger
> page was being written.
>
> Once the D bit is set in one of these entries, a subsequent write to the
> larger page will not generate a log entry, even if that write is to a
> different 4KB region within the larger page.  This is because log entries
> are created only when a D bit is being set and a write will not cause a D
> bit to be set if the page's D bit is already set.
>
> The log entries do not communicate the level of the EPT paging-structure
> entry in which the D bit was set (i.e., it does not communicate the page
> size). "
>
>
> Thanks for the clarification.
>
> The result of this behaviour is that the PML flush logic is going to have to
> look up each gfn and check whether it is mapped by a superpage, which will
> add a sizeable overhead.

Sorry that I am replying using my personal email account, as I can't
access my company account.

I don't think we  need to check if the gfn is mapped by a superpage.
The PML flush does very simple thing:

1) read out PML index
2) loop all valid GPA logged in PML buffer according to PML index, and
call paging_mark_dirty for them.
3) reset PML index to 511, which essentially reset the PML buffer to
be empty again.

Above process doesn't need to know if the GFN is mapped by superpage
or not. Actually, for the superpage, as  you can see in my design, it
will still set to be  read-only in case of PML, as we still need to
split superpage to 4K pages even in PML case. Therefore superpage in
logdirty mode will be first split to 4K pages in EPT violation, and
then those 4K pages will follow PML path.

>
> It is also not conducive to minimising the data transmitted in the migration
> stream.

Yes PML itself is unlikely to minimize data transmitted in the
migration stream, as how much dirty pages will be  transmitted is
totally up to guest. But it reduces EPT violation of 4K page write
protection, so theoretically PML can reduce CPU cycles in hypervisor
context and more cycles can be used in guest mode, therefore it's
reasonable to expect guest will have better performance.

>
>
> One future option might be to shatter all the EPT superpages when logdirty
> is enabled.

This is  what I designed originally.

This would be ok for a domain which is being migrated away, but
> would be suboptiomal for snapshot operations; Xen currently has no ability
> to coalesce pages back into superpages.

Doesn't this issue exist in current log-dirty implementation anyway?
Therefore although PML doesn't solve this issue but it doesn't bring
any regression either. To me coalescing pages back to superpage is a
separate optimization but not related to PML directly.

It also interacts poorly with HAP
> vram tracking which enables logdirty mode itself.

Why would PML interact with HAP vram tracking poorly?

Thanks,
-Kai

>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-13 14:32             ` Kai Huang
@ 2015-02-13 15:28               ` Andrew Cooper
  2015-02-13 15:52                 ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2015-02-13 15:28 UTC (permalink / raw)
  To: Kai Huang; +Cc: Tian, Kevin, keir, tim, xen-devel, Kai Huang, jbeulich

On 13/02/15 14:32, Kai Huang wrote:
> On Fri, Feb 13, 2015 at 6:57 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 13/02/15 02:11, Kai Huang wrote:
>>
>>
>> On 02/12/2015 10:10 PM, Andrew Cooper wrote:
>>
>> On 12/02/15 06:54, Tian, Kevin wrote:
>>
>> which presumably
>> means that the PML buffer flush needs to be aware of which gfns are
>> mapped by superpages to be able to correctly set a block of bits in the
>> logdirty bitmap.
>>
>> Unfortunately PML itself can't tell us if the logged GPA comes from
>> superpage or not, but even in PML we still need to split superpages to
>> 4K page, just like traditional write protection approach does. I think
>> this is because live migration should be based on 4K page granularity.
>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>> make sense in both write protection and PML cases.
>>
>> agree. extending one write to superpage enlarges dirty set unnecessary.
>> since spec doesn't say superpage logging is not supported, I'd think a
>> 4k-aligned entry being logged if within superpage.
>>
>> The spec states that an gfn is appended to the log strictly on the
>> transition of the D bit from 0 to 1.
>>
>> In the case of a 2M superpage, there is a single D bit for the entire 2M
>> range.
>>
>>
>> The plausible (working) scenarios I can see are:
>>
>> 1) superpages are not supported (not indicated by the whitepaper).
>>
>> A better description would be -- PML doesn't check if it's superpage, it
>> just operates with D-bit, no matter what page size.
>>
>> 2) a single entry will be written which must be taken to cover the
>> entire 2M range.
>> 3) an individual entry is written for every access.
>>
>> Below is the reply from our hardware guy related to PML on superpage. It
>> should have answered accurately.
>>
>> "As noted in Section 1.3, logging occurs whenever the CPU would set an EPT D
>> bit.
>>
>> It does not matter whether the D bit is in an EPT PTE (4KB page), EPT PDE
>> (2MB page), or EPT PDPTE (1GB page).
>>
>> In all cases, the GPA written to the PML log will be the address of the
>> write that causes the D bit in question to be updated, with bits 11:0
>> cleared.
>>
>> This means that, in the case in which the D bit is in an EPT PDE or an EPT
>> PDPTE, the log entry will communicate which 4KB region within the larger
>> page was being written.
>>
>> Once the D bit is set in one of these entries, a subsequent write to the
>> larger page will not generate a log entry, even if that write is to a
>> different 4KB region within the larger page.  This is because log entries
>> are created only when a D bit is being set and a write will not cause a D
>> bit to be set if the page's D bit is already set.
>>
>> The log entries do not communicate the level of the EPT paging-structure
>> entry in which the D bit was set (i.e., it does not communicate the page
>> size). "
>>
>>
>> Thanks for the clarification.
>>
>> The result of this behaviour is that the PML flush logic is going to have to
>> look up each gfn and check whether it is mapped by a superpage, which will
>> add a sizeable overhead.
> Sorry that I am replying using my personal email account, as I can't
> access my company account.
>
> I don't think we  need to check if the gfn is mapped by a superpage.
> The PML flush does very simple thing:
>
> 1) read out PML index
> 2) loop all valid GPA logged in PML buffer according to PML index, and
> call paging_mark_dirty for them.
> 3) reset PML index to 511, which essentially reset the PML buffer to
> be empty again.
>
> Above process doesn't need to know if the GFN is mapped by superpage
> or not. Actually, for the superpage, as  you can see in my design, it
> will still set to be  read-only in case of PML, as we still need to
> split superpage to 4K pages even in PML case. Therefore superpage in
> logdirty mode will be first split to 4K pages in EPT violation, and
> then those 4K pages will follow PML path.

This will only function correctly if superpage shattering is used.

As soon as a superpage D bit transitions from 0 to 1, the gfn is logged
and the guest can make further updated in the same frame without further
log entries being recorded. The PML flush code *must* assume that every
other gfn mapped by the superpage is dirty, or memory corruption could
occur when resuming on the far side of the migration.

>
>> It is also not conducive to minimising the data transmitted in the migration
>> stream.
> Yes PML itself is unlikely to minimize data transmitted in the
> migration stream, as how much dirty pages will be  transmitted is
> totally up to guest. But it reduces EPT violation of 4K page write
> protection, so theoretically PML can reduce CPU cycles in hypervisor
> context and more cycles can be used in guest mode, therefore it's
> reasonable to expect guest will have better performance.

"performance" is a huge amorphous blob of niceness that wants to be
achieved.  You must be more specific than that when describing
"performance" as "better".

Without superpage shattering, the use of PML can trade off a reduction
in guest VMexits vs more data needing to be sent in the migration
stream.  This might be nice from the point of view of the guest
administrator, but is quite possibly disastrous for the host
administrator, if their cloud is network-bound.

With superpage shattering, the use of PML can trade off a reduction in
guest VMexits vs greater host ram usage and slower system runtime
performance because of increased TLB pressure.


Stating a change in performance must always consider the tradeoffs.  In
this PML example, it is not a simple case that a new hardware feature
strictly makes everything better, if used.

>
>>
>> One future option might be to shatter all the EPT superpages when logdirty
>> is enabled.
> This is  what I designed originally.

This is acceptable as a design constraint, especially given the limits
of the hardware, but it is important to know as a restriction.

Now that I reread your original email I do spot that in there.  I admit
that it was not immediately clear to me the first time around.

This does highlight the usefulness of design review to get everyones
understanding (i.e. mine) up to scratch before starting to argue over
the finer details of an implementation.

>
> This would be ok for a domain which is being migrated away, but
>> would be suboptiomal for snapshot operations; Xen currently has no ability
>> to coalesce pages back into superpages.
> Doesn't this issue exist in current log-dirty implementation anyway?

I believe it is an issue.

> Therefore although PML doesn't solve this issue but it doesn't bring
> any regression either. To me coalescing pages back to superpage is a
> separate optimization but not related to PML directly.

Agreed.

>
> It also interacts poorly with HAP
>> vram tracking which enables logdirty mode itself.
> Why would PML interact with HAP vram tracking poorly?

I was referring to the shattering aspect, rather than PML itself. 
Shattering all superpages would be overkill to just track vram, which
only needs to cover a small region.

I have to admit that the current vram tracking infrastructure is a bit
of a mess.  It has different semantics depending on whether HAP or
shadow is in use (HAP VRAM tracking enabled logdirty mode, shadow VRAM
tracking doesn't), and causes problems for the qemu/libxc interaction at
the beginning of live migration.  These problems are compounded by
XenServers habit of constantly tweaking the shadow allocation, and have
been further compounded by XSA-97 introducing -EBUSY into the mix.

I have tried once to sort the interface out, but didn't get very far.  I
really need to see about trying again.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-13 15:28               ` Andrew Cooper
@ 2015-02-13 15:52                 ` Kai Huang
  2015-02-14  3:01                   ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-13 15:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tian, Kevin, keir, tim, xen-devel, Kai Huang, jbeulich

On Fri, Feb 13, 2015 at 11:28 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 13/02/15 14:32, Kai Huang wrote:
>> On Fri, Feb 13, 2015 at 6:57 PM, Andrew Cooper
>> <andrew.cooper3@citrix.com> wrote:
>>> On 13/02/15 02:11, Kai Huang wrote:
>>>
>>>
>>> On 02/12/2015 10:10 PM, Andrew Cooper wrote:
>>>
>>> On 12/02/15 06:54, Tian, Kevin wrote:
>>>
>>> which presumably
>>> means that the PML buffer flush needs to be aware of which gfns are
>>> mapped by superpages to be able to correctly set a block of bits in the
>>> logdirty bitmap.
>>>
>>> Unfortunately PML itself can't tell us if the logged GPA comes from
>>> superpage or not, but even in PML we still need to split superpages to
>>> 4K page, just like traditional write protection approach does. I think
>>> this is because live migration should be based on 4K page granularity.
>>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>>> make sense in both write protection and PML cases.
>>>
>>> agree. extending one write to superpage enlarges dirty set unnecessary.
>>> since spec doesn't say superpage logging is not supported, I'd think a
>>> 4k-aligned entry being logged if within superpage.
>>>
>>> The spec states that an gfn is appended to the log strictly on the
>>> transition of the D bit from 0 to 1.
>>>
>>> In the case of a 2M superpage, there is a single D bit for the entire 2M
>>> range.
>>>
>>>
>>> The plausible (working) scenarios I can see are:
>>>
>>> 1) superpages are not supported (not indicated by the whitepaper).
>>>
>>> A better description would be -- PML doesn't check if it's superpage, it
>>> just operates with D-bit, no matter what page size.
>>>
>>> 2) a single entry will be written which must be taken to cover the
>>> entire 2M range.
>>> 3) an individual entry is written for every access.
>>>
>>> Below is the reply from our hardware guy related to PML on superpage. It
>>> should have answered accurately.
>>>
>>> "As noted in Section 1.3, logging occurs whenever the CPU would set an EPT D
>>> bit.
>>>
>>> It does not matter whether the D bit is in an EPT PTE (4KB page), EPT PDE
>>> (2MB page), or EPT PDPTE (1GB page).
>>>
>>> In all cases, the GPA written to the PML log will be the address of the
>>> write that causes the D bit in question to be updated, with bits 11:0
>>> cleared.
>>>
>>> This means that, in the case in which the D bit is in an EPT PDE or an EPT
>>> PDPTE, the log entry will communicate which 4KB region within the larger
>>> page was being written.
>>>
>>> Once the D bit is set in one of these entries, a subsequent write to the
>>> larger page will not generate a log entry, even if that write is to a
>>> different 4KB region within the larger page.  This is because log entries
>>> are created only when a D bit is being set and a write will not cause a D
>>> bit to be set if the page's D bit is already set.
>>>
>>> The log entries do not communicate the level of the EPT paging-structure
>>> entry in which the D bit was set (i.e., it does not communicate the page
>>> size). "
>>>
>>>
>>> Thanks for the clarification.
>>>
>>> The result of this behaviour is that the PML flush logic is going to have to
>>> look up each gfn and check whether it is mapped by a superpage, which will
>>> add a sizeable overhead.
>> Sorry that I am replying using my personal email account, as I can't
>> access my company account.
>>
>> I don't think we  need to check if the gfn is mapped by a superpage.
>> The PML flush does very simple thing:
>>
>> 1) read out PML index
>> 2) loop all valid GPA logged in PML buffer according to PML index, and
>> call paging_mark_dirty for them.
>> 3) reset PML index to 511, which essentially reset the PML buffer to
>> be empty again.
>>
>> Above process doesn't need to know if the GFN is mapped by superpage
>> or not. Actually, for the superpage, as  you can see in my design, it
>> will still set to be  read-only in case of PML, as we still need to
>> split superpage to 4K pages even in PML case. Therefore superpage in
>> logdirty mode will be first split to 4K pages in EPT violation, and
>> then those 4K pages will follow PML path.
>
> This will only function correctly if superpage shattering is used.
>
> As soon as a superpage D bit transitions from 0 to 1, the gfn is logged
> and the guest can make further updated in the same frame without further
> log entries being recorded. The PML flush code *must* assume that every
> other gfn mapped by the superpage is dirty, or memory corruption could
> occur when resuming on the far side of the migration.

To me the superpage has been split before its D bit changes from 0 to
1, as in my understanding EPT violation happens before setting D-bit,
and it's not possible to log gfn before superpage is split. Therefore
PML doesn't need to assume every other gfn in superpage range is
dirty, as they are already 4K pages now with D-bit clear and can be
logged by PML.  Does this sound reasonable?

>
>>
>>> It is also not conducive to minimising the data transmitted in the migration
>>> stream.
>> Yes PML itself is unlikely to minimize data transmitted in the
>> migration stream, as how much dirty pages will be  transmitted is
>> totally up to guest. But it reduces EPT violation of 4K page write
>> protection, so theoretically PML can reduce CPU cycles in hypervisor
>> context and more cycles can be used in guest mode, therefore it's
>> reasonable to expect guest will have better performance.
>
> "performance" is a huge amorphous blob of niceness that wants to be
> achieved.  You must be more specific than that when describing
> "performance" as "better".

Yes I will gather some benchmark results prior to sending out the
patch to review. Actually it will be helpful if you or other guys can
provide some suggestion relating to how to measure the performance,
such as which benchmarks should be run.

I have to read the rest of your reply tomorrow morning as it's
midnight at my time zone :)

Thanks,
-Kai

>
> Without superpage shattering, the use of PML can trade off a reduction
> in guest VMexits vs more data needing to be sent in the migration
> stream.  This might be nice from the point of view of the guest
> administrator, but is quite possibly disastrous for the host
> administrator, if their cloud is network-bound.
>
> With superpage shattering, the use of PML can trade off a reduction in
> guest VMexits vs greater host ram usage and slower system runtime
> performance because of increased TLB pressure.
>
>
> Stating a change in performance must always consider the tradeoffs.  In
> this PML example, it is not a simple case that a new hardware feature
> strictly makes everything better, if used.
>
>>
>>>
>>> One future option might be to shatter all the EPT superpages when logdirty
>>> is enabled.
>> This is  what I designed originally.
>
> This is acceptable as a design constraint, especially given the limits
> of the hardware, but it is important to know as a restriction.
>
> Now that I reread your original email I do spot that in there.  I admit
> that it was not immediately clear to me the first time around.
>
> This does highlight the usefulness of design review to get everyones
> understanding (i.e. mine) up to scratch before starting to argue over
> the finer details of an implementation.
>
>>
>> This would be ok for a domain which is being migrated away, but
>>> would be suboptiomal for snapshot operations; Xen currently has no ability
>>> to coalesce pages back into superpages.
>> Doesn't this issue exist in current log-dirty implementation anyway?
>
> I believe it is an issue.
>
>> Therefore although PML doesn't solve this issue but it doesn't bring
>> any regression either. To me coalescing pages back to superpage is a
>> separate optimization but not related to PML directly.
>
> Agreed.
>
>>
>> It also interacts poorly with HAP
>>> vram tracking which enables logdirty mode itself.
>> Why would PML interact with HAP vram tracking poorly?
>
> I was referring to the shattering aspect, rather than PML itself.
> Shattering all superpages would be overkill to just track vram, which
> only needs to cover a small region.
>
> I have to admit that the current vram tracking infrastructure is a bit
> of a mess.  It has different semantics depending on whether HAP or
> shadow is in use (HAP VRAM tracking enabled logdirty mode, shadow VRAM
> tracking doesn't), and causes problems for the qemu/libxc interaction at
> the beginning of live migration.  These problems are compounded by
> XenServers habit of constantly tweaking the shadow allocation, and have
> been further compounded by XSA-97 introducing -EBUSY into the mix.
>
> I have tried once to sort the interface out, but didn't get very far.  I
> really need to see about trying again.
>
> ~Andrew
>



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-13 15:52                 ` Kai Huang
@ 2015-02-14  3:01                   ` Kai Huang
  2015-02-16 11:44                     ` Andrew Cooper
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-14  3:01 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tian, Kevin, keir, tim, xen-devel, Kai Huang, jbeulich

On Fri, Feb 13, 2015 at 11:52 PM, Kai Huang <kaih.linux@gmail.com> wrote:
> On Fri, Feb 13, 2015 at 11:28 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 13/02/15 14:32, Kai Huang wrote:
>>> On Fri, Feb 13, 2015 at 6:57 PM, Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 13/02/15 02:11, Kai Huang wrote:
>>>>
>>>>
>>>> On 02/12/2015 10:10 PM, Andrew Cooper wrote:
>>>>
>>>> On 12/02/15 06:54, Tian, Kevin wrote:
>>>>
>>>> which presumably
>>>> means that the PML buffer flush needs to be aware of which gfns are
>>>> mapped by superpages to be able to correctly set a block of bits in the
>>>> logdirty bitmap.
>>>>
>>>> Unfortunately PML itself can't tell us if the logged GPA comes from
>>>> superpage or not, but even in PML we still need to split superpages to
>>>> 4K page, just like traditional write protection approach does. I think
>>>> this is because live migration should be based on 4K page granularity.
>>>> Marking all 512 bits of a 2M page to be dirty by a single write doesn't
>>>> make sense in both write protection and PML cases.
>>>>
>>>> agree. extending one write to superpage enlarges dirty set unnecessary.
>>>> since spec doesn't say superpage logging is not supported, I'd think a
>>>> 4k-aligned entry being logged if within superpage.
>>>>
>>>> The spec states that an gfn is appended to the log strictly on the
>>>> transition of the D bit from 0 to 1.
>>>>
>>>> In the case of a 2M superpage, there is a single D bit for the entire 2M
>>>> range.
>>>>
>>>>
>>>> The plausible (working) scenarios I can see are:
>>>>
>>>> 1) superpages are not supported (not indicated by the whitepaper).
>>>>
>>>> A better description would be -- PML doesn't check if it's superpage, it
>>>> just operates with D-bit, no matter what page size.
>>>>
>>>> 2) a single entry will be written which must be taken to cover the
>>>> entire 2M range.
>>>> 3) an individual entry is written for every access.
>>>>
>>>> Below is the reply from our hardware guy related to PML on superpage. It
>>>> should have answered accurately.
>>>>
>>>> "As noted in Section 1.3, logging occurs whenever the CPU would set an EPT D
>>>> bit.
>>>>
>>>> It does not matter whether the D bit is in an EPT PTE (4KB page), EPT PDE
>>>> (2MB page), or EPT PDPTE (1GB page).
>>>>
>>>> In all cases, the GPA written to the PML log will be the address of the
>>>> write that causes the D bit in question to be updated, with bits 11:0
>>>> cleared.
>>>>
>>>> This means that, in the case in which the D bit is in an EPT PDE or an EPT
>>>> PDPTE, the log entry will communicate which 4KB region within the larger
>>>> page was being written.
>>>>
>>>> Once the D bit is set in one of these entries, a subsequent write to the
>>>> larger page will not generate a log entry, even if that write is to a
>>>> different 4KB region within the larger page.  This is because log entries
>>>> are created only when a D bit is being set and a write will not cause a D
>>>> bit to be set if the page's D bit is already set.
>>>>
>>>> The log entries do not communicate the level of the EPT paging-structure
>>>> entry in which the D bit was set (i.e., it does not communicate the page
>>>> size). "
>>>>
>>>>
>>>> Thanks for the clarification.
>>>>
>>>> The result of this behaviour is that the PML flush logic is going to have to
>>>> look up each gfn and check whether it is mapped by a superpage, which will
>>>> add a sizeable overhead.
>>> Sorry that I am replying using my personal email account, as I can't
>>> access my company account.
>>>
>>> I don't think we  need to check if the gfn is mapped by a superpage.
>>> The PML flush does very simple thing:
>>>
>>> 1) read out PML index
>>> 2) loop all valid GPA logged in PML buffer according to PML index, and
>>> call paging_mark_dirty for them.
>>> 3) reset PML index to 511, which essentially reset the PML buffer to
>>> be empty again.
>>>
>>> Above process doesn't need to know if the GFN is mapped by superpage
>>> or not. Actually, for the superpage, as  you can see in my design, it
>>> will still set to be  read-only in case of PML, as we still need to
>>> split superpage to 4K pages even in PML case. Therefore superpage in
>>> logdirty mode will be first split to 4K pages in EPT violation, and
>>> then those 4K pages will follow PML path.
>>
>> This will only function correctly if superpage shattering is used.
>>
>> As soon as a superpage D bit transitions from 0 to 1, the gfn is logged
>> and the guest can make further updated in the same frame without further
>> log entries being recorded. The PML flush code *must* assume that every
>> other gfn mapped by the superpage is dirty, or memory corruption could
>> occur when resuming on the far side of the migration.
>
> To me the superpage has been split before its D bit changes from 0 to
> 1, as in my understanding EPT violation happens before setting D-bit,
> and it's not possible to log gfn before superpage is split. Therefore
> PML doesn't need to assume every other gfn in superpage range is
> dirty, as they are already 4K pages now with D-bit clear and can be
> logged by PML.  Does this sound reasonable?
>
>>
>>>
>>>> It is also not conducive to minimising the data transmitted in the migration
>>>> stream.
>>> Yes PML itself is unlikely to minimize data transmitted in the
>>> migration stream, as how much dirty pages will be  transmitted is
>>> totally up to guest. But it reduces EPT violation of 4K page write
>>> protection, so theoretically PML can reduce CPU cycles in hypervisor
>>> context and more cycles can be used in guest mode, therefore it's
>>> reasonable to expect guest will have better performance.
>>
>> "performance" is a huge amorphous blob of niceness that wants to be
>> achieved.  You must be more specific than that when describing
>> "performance" as "better".
>
> Yes I will gather some benchmark results prior to sending out the
> patch to review. Actually it will be helpful if you or other guys can
> provide some suggestion relating to how to measure the performance,
> such as which benchmarks should be run.
>
> I have to read the rest of your reply tomorrow morning as it's
> midnight at my time zone :)
>
> Thanks,
> -Kai
>
>>
>> Without superpage shattering, the use of PML can trade off a reduction
>> in guest VMexits vs more data needing to be sent in the migration
>> stream.  This might be nice from the point of view of the guest
>> administrator, but is quite possibly disastrous for the host
>> administrator, if their cloud is network-bound.
>>
>> With superpage shattering, the use of PML can trade off a reduction in
>> guest VMexits vs greater host ram usage and slower system runtime
>> performance because of increased TLB pressure.
>>
>>
>> Stating a change in performance must always consider the tradeoffs.  In
>> this PML example, it is not a simple case that a new hardware feature
>> strictly makes everything better, if used.

Continue to reply your comments.

Looks our assumptions of the purpose of PML are not aligned. To me
whether shattering superpage or not is not directly related to PML but
something needs to be considered in live migration layer. Currently
with write protection superpage is shattered, but actually we can
certainly choose either shattering superpage or not, even in write
protection, can't we?

My principle of designing PML is it changes as less common logic as
possible (at least for current stage), and the best case is PML logic
can be completely hidden to VMX/EPT layer, so that we don't have to
consider the impact of changing log-dirty common logic, ex, your above
concerns. It's better to separate common log-dirty mechanism
optimization to PML support so they don't impact each other.
Shattering superpage has already been  done in current log-dirty
implementation and PML just wants to follow this design. To me
actually even if PML hardware supports GPA logging with superpage
information, I'll still choose to split superpage, as shattering
superpage is some logic in live migration layer. Therefore based on
this design, the benefit of PML is straightforward --- it reduces EPT
violation for 4K pages, while leaving other things quite unchanged, in
which case we  have minimized the impact of PML changes to  log-dirty
mechanism so benchmarks to measure performance change of PML can be
minimized too.


>>
>>>
>>>>
>>>> One future option might be to shatter all the EPT superpages when logdirty
>>>> is enabled.
>>> This is  what I designed originally.
>>
>> This is acceptable as a design constraint, especially given the limits
>> of the hardware, but it is important to know as a restriction.
>>
>> Now that I reread your original email I do spot that in there.  I admit
>> that it was not immediately clear to me the first time around.
>>
>> This does highlight the usefulness of design review to get everyones
>> understanding (i.e. mine) up to scratch before starting to argue over
>> the finer details of an implementation.
>>
>>>
>>> This would be ok for a domain which is being migrated away, but
>>>> would be suboptiomal for snapshot operations; Xen currently has no ability
>>>> to coalesce pages back into superpages.
>>> Doesn't this issue exist in current log-dirty implementation anyway?
>>
>> I believe it is an issue.
>>
>>> Therefore although PML doesn't solve this issue but it doesn't bring
>>> any regression either. To me coalescing pages back to superpage is a
>>> separate optimization but not related to PML directly.
>>
>> Agreed.
>>
>>>
>>> It also interacts poorly with HAP
>>>> vram tracking which enables logdirty mode itself.
>>> Why would PML interact with HAP vram tracking poorly?
>>
>> I was referring to the shattering aspect, rather than PML itself.
>> Shattering all superpages would be overkill to just track vram, which
>> only needs to cover a small region.

To me looks currently tracking vram (HAP) shatters all superpages,
instead only superpages in vram range would be. Am I misunderstanding
here?

>>
>> I have to admit that the current vram tracking infrastructure is a bit
>> of a mess.  It has different semantics depending on whether HAP or
>> shadow is in use (HAP VRAM tracking enabled logdirty mode, shadow VRAM
>> tracking doesn't), and causes problems for the qemu/libxc interaction at
>> the beginning of live migration.  These problems are compounded by
>> XenServers habit of constantly tweaking the shadow allocation, and have
>> been further compounded by XSA-97 introducing -EBUSY into the mix.
>>

This is what I am unaware.  Thanks for the info. But to me PML is not
supposed to solve this either.

Thanks,
-Kai

>> I have tried once to sort the interface out, but didn't get very far.  I
>> really need to see about trying again.
>>
>> ~Andrew
>>
>
>
>
> --
> Thanks,
> -Kai



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-14  3:01                   ` Kai Huang
@ 2015-02-16 11:44                     ` Andrew Cooper
  2015-02-16 14:02                       ` Kai Huang
  2015-02-17 10:37                       ` Jan Beulich
  0 siblings, 2 replies; 54+ messages in thread
From: Andrew Cooper @ 2015-02-16 11:44 UTC (permalink / raw)
  To: Kai Huang; +Cc: Tian, Kevin, keir, tim, xen-devel, Kai Huang, jbeulich

On 14/02/15 03:01, Kai Huang wrote:
>>> This will only function correctly if superpage shattering is used.
>>>
>>> As soon as a superpage D bit transitions from 0 to 1, the gfn is logged
>>> and the guest can make further updated in the same frame without further
>>> log entries being recorded. The PML flush code *must* assume that every
>>> other gfn mapped by the superpage is dirty, or memory corruption could
>>> occur when resuming on the far side of the migration.
>> To me the superpage has been split before its D bit changes from 0 to
>> 1, as in my understanding EPT violation happens before setting D-bit,
>> and it's not possible to log gfn before superpage is split. Therefore
>> PML doesn't need to assume every other gfn in superpage range is
>> dirty, as they are already 4K pages now with D-bit clear and can be
>> logged by PML.  Does this sound reasonable?

Agreed - I was describing the non-shattering case.

>>
>>>>> It is also not conducive to minimising the data transmitted in the migration
>>>>> stream.
>>>> Yes PML itself is unlikely to minimize data transmitted in the
>>>> migration stream, as how much dirty pages will be  transmitted is
>>>> totally up to guest. But it reduces EPT violation of 4K page write
>>>> protection, so theoretically PML can reduce CPU cycles in hypervisor
>>>> context and more cycles can be used in guest mode, therefore it's
>>>> reasonable to expect guest will have better performance.
>>> "performance" is a huge amorphous blob of niceness that wants to be
>>> achieved.  You must be more specific than that when describing
>>> "performance" as "better".
>> Yes I will gather some benchmark results prior to sending out the
>> patch to review. Actually it will be helpful if you or other guys can
>> provide some suggestion relating to how to measure the performance,
>> such as which benchmarks should be run.

At a start, a simple count of vmexits using xentrace would be
interesting to see.

Can I highly recommend testing live migration using a memtest vm?  It
was highly useful to me when developing migration v2 and complains very
loudly if some if its memory gets left behind.

>>>> Why would PML interact with HAP vram tracking poorly?
>>> I was referring to the shattering aspect, rather than PML itself.
>>> Shattering all superpages would be overkill to just track vram, which
>>> only needs to cover a small region.
> To me looks currently tracking vram (HAP) shatters all superpages,
> instead only superpages in vram range would be. Am I misunderstanding
> here?

You are completely correct.

Having just re-reviewed the HAP code, superpages are fully shattered as
soon as logdirty mode is touched, which realistically means
unconditionally, given that Qemu will always track guest VRAM.  (So much
for the toolstack trying to optimise the guest by building memory using
superpages; Qemu goes and causes Xen extra work by shattering them all.)

This means that PML needing superpage shattering is no different to the
existing code, which means that there are no extra overheads incurred as
a direct result of PML.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-13  2:50   ` Kai Huang
@ 2015-02-16 14:01     ` Kai Huang
  2015-02-16 18:19       ` Tim Deegan
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-02-16 14:01 UTC (permalink / raw)
  To: Kai Huang
  Cc: Tian, Kevin, keir, Andrew Cooper, Tim Deegan, xen-devel, Jan Beulich

On Fri, Feb 13, 2015 at 10:50 AM, Kai Huang <kai.huang@linux.intel.com> wrote:
>
> On 02/12/2015 08:34 PM, Tim Deegan wrote:
>>
>> Hi,
>>
>> Thanks for posting this design!
>>
>> At 16:28 +0800 on 11 Feb (1423668493), Kai Huang wrote:
>>>
>>> Design
>>> ======
>>>
>>> - PML feature is used globally
>>>
>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>> control PML feature detection, and PML feature will only be detected if
>>> opt_enable_pml = 1. Once PML feature is detected, it will be used for dirty
>>> logging for all domains globally. Currently we don't support to use PML on
>>> basis of per-domain as it will require additional control from XL tool.
>>
>> Sounds good.  I agree that there's no point in making this a per-VM
>> feature.
>>
>>> - PML enable/disable for particular Domain
>>>
>>> PML needs to be enabled (allocate PML buffer, initialize PML index, PML
>>> base address, turn PML on VMCS, etc) for all vcpus of the domain, as PML
>>> buffer and PML index are per-vcpu, but EPT table may be shared by vcpus.
>>> Enabling PML on partial vcpus of the domain won't work. Also PML will only
>>> be enabled for the domain when it is switched to dirty logging mode, and it
>>> will be disabled when domain is switched back to normal mode. As looks vcpu
>>> number won't be changed dynamically during guest is running (correct me if I
>>> am wrong here), so we don't have to consider enabling PML for new created
>>> vcpu when guest is in dirty logging mode.
>>>
>> No - you really ought to handle enabling this for new VCPUs.  There
>> have been cases in the past where VMs are put into log-dirty mode
>> before their VCPUs are assigned, and there might be again.
>
> "Assigned" here means created?
>
>>
>> It ought to be easy to handle, though - just one more check and
>> function call on the vcpu setup path.
>
> I think "check and function call" means check function call to enable PML on
> this vcpu? Then what if enabling PML for vcpu fails (possible as it needs to
> allocate 4K PML buffer)? It's better to choose to roll back to use write
> protection instead of indicating failure of creating the vcpu. But in this
> case there will be problem if the domain has already been in log dirty mode
> as we might already have EPT table setup with D-bit clear for logdirty
> range, which means we need to re-check the logdirty ranges and re-set EPT
> table to be read-only.  Does this sound reasonable?

Hi Tim, all,

Do you have comments on this?

If my above understanding is true, to me it's a little bit complicated
to enable PML for domain on demand when it switches to log-dirty mode.
Another approach is we  enable PML for vcpu unconditionally (if PML
feature is detected of course) when vcpu is created, and if enabling
PML failed, vcpu will just  not be created. This approach simplifies
the logic to handle failure of enabling PML for vcpu, as there is no
need to roll back to write protection for other vcpus when enabling
PML fails. The disadvantage is PML will be enabled for guest during
guest's entire run-time, and there will be an additional 4K buffer
allocated for each vcpu even  guest is not in log-dirty mode. And we
also need to manually set D-bit to 1 for guest memory not in log-dirty
mode to avoid unnecessary GPA logging (ex, when guest memory is just
populated).  Btw, this approach is the approach we already did for
KVM.

Do you have any suggestion here?

Thanks,
-Kai

>
>>> After PML is enabled for the domain, we only need to clear EPT entry's
>>> D-bit for guest memory in dirty logging mode. We achieve this by checking if
>>> PML is enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty,
>>> and updating EPT entry accordingly. However, for super pages, we still write
>>> protect them in case of PML as we still need to split super page to 4K page
>>> in dirty logging mode.
>>>
>> IIUC, you are suggesting leaving superpages handled as they are now,
>> with read-only EPTEs, and only using PML for single-page mappings.
>> That seems good. :)
>>
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is PML
>>> buffer full VMEXIT handler (apparently), and the second place is in
>>> paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from userspace via
>>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>>> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers
>>> before reporting dirty GPAs to userspace.
>>>
>>> We handle above two cases by flushing PML buffer at the beginning of all
>>> VMEXITs. This solves the first case above, and it also solves the second
>>> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks
>>> vcpus (that are in guest mode) out of guest mode via sending IPI, which
>>> cause VMEXIT, to them.
>>>
>> I would prefer to flush only on buffer-full VMEXITs and handle the
>> peek/clear path by explicitly reading all VCPUs' buffers.  That avoids
>> putting more code on the fast paths for other VMEXIT types.
>
> OK. But looks this requires a new interface like paging_flush_log_dirty,
> called at beginning of paging_log_dirty_op? This is actually what I wanted
> to avoid originally.
>
>>
>>> This also makes log-dirty radix tree more updated as PML buffer is
>>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>>>
>>> - Video RAM tracking (and partial dirty logging for guest memory range)
>>>
>>> Video RAM is in dirty logging mode unconditionally during guest's
>>> run-time, and it is partial memory range of the guest. However, PML operates
>>> on the whole guest memory (the whole valid EPT table, more precisely), so we
>>> need to choose whether to use PML if only partial guest memory ranges are in
>>> dirty logging mode.
>>>
>>> Currently, PML will be used as long as there's guest memory in dirty
>>> logging mode, no matter globally or partially. And in case of partial dirty
>>> logging, we need to check if the logged GPA in PML buffer is in dirty
>>> logging range.
>>>
>> I think, as other people have said, that you can just use PML for this
>> case without any other restrictions.  After all, mappings for non-VRAM
>> areas ought not to have their D-bits clear anyway.
>
> Agreed.
>
> Thanks,
> -Kai
>
>>
>> Cheers,
>>
>> Tim.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-16 11:44                     ` Andrew Cooper
@ 2015-02-16 14:02                       ` Kai Huang
  2015-02-17 10:37                       ` Jan Beulich
  1 sibling, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-02-16 14:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tian, Kevin, keir, tim, xen-devel, Kai Huang, jbeulich

On Mon, Feb 16, 2015 at 7:44 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 14/02/15 03:01, Kai Huang wrote:
>>>> This will only function correctly if superpage shattering is used.
>>>>
>>>> As soon as a superpage D bit transitions from 0 to 1, the gfn is logged
>>>> and the guest can make further updated in the same frame without further
>>>> log entries being recorded. The PML flush code *must* assume that every
>>>> other gfn mapped by the superpage is dirty, or memory corruption could
>>>> occur when resuming on the far side of the migration.
>>> To me the superpage has been split before its D bit changes from 0 to
>>> 1, as in my understanding EPT violation happens before setting D-bit,
>>> and it's not possible to log gfn before superpage is split. Therefore
>>> PML doesn't need to assume every other gfn in superpage range is
>>> dirty, as they are already 4K pages now with D-bit clear and can be
>>> logged by PML.  Does this sound reasonable?
>
> Agreed - I was describing the non-shattering case.
>
>>>
>>>>>> It is also not conducive to minimising the data transmitted in the migration
>>>>>> stream.
>>>>> Yes PML itself is unlikely to minimize data transmitted in the
>>>>> migration stream, as how much dirty pages will be  transmitted is
>>>>> totally up to guest. But it reduces EPT violation of 4K page write
>>>>> protection, so theoretically PML can reduce CPU cycles in hypervisor
>>>>> context and more cycles can be used in guest mode, therefore it's
>>>>> reasonable to expect guest will have better performance.
>>>> "performance" is a huge amorphous blob of niceness that wants to be
>>>> achieved.  You must be more specific than that when describing
>>>> "performance" as "better".
>>> Yes I will gather some benchmark results prior to sending out the
>>> patch to review. Actually it will be helpful if you or other guys can
>>> provide some suggestion relating to how to measure the performance,
>>> such as which benchmarks should be run.
>
> At a start, a simple count of vmexits using xentrace would be
> interesting to see.

Will do.

>
> Can I highly recommend testing live migration using a memtest vm?  It
> was highly useful to me when developing migration v2 and complains very
> loudly if some if its memory gets left behind.

Sure. Thanks for suggestion.

>
>>>>> Why would PML interact with HAP vram tracking poorly?
>>>> I was referring to the shattering aspect, rather than PML itself.
>>>> Shattering all superpages would be overkill to just track vram, which
>>>> only needs to cover a small region.
>> To me looks currently tracking vram (HAP) shatters all superpages,
>> instead only superpages in vram range would be. Am I misunderstanding
>> here?
>
> You are completely correct.
>
> Having just re-reviewed the HAP code, superpages are fully shattered as
> soon as logdirty mode is touched, which realistically means
> unconditionally, given that Qemu will always track guest VRAM.  (So much
> for the toolstack trying to optimise the guest by building memory using
> superpages; Qemu goes and causes Xen extra work by shattering them all.)
>
> This means that PML needing superpage shattering is no different to the
> existing code, which means that there are no extra overheads incurred as
> a direct result of PML.

Agreed.

Thanks,
-Kai
>
> ~Andrew
>



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-16 14:01     ` Kai Huang
@ 2015-02-16 18:19       ` Tim Deegan
  0 siblings, 0 replies; 54+ messages in thread
From: Tim Deegan @ 2015-02-16 18:19 UTC (permalink / raw)
  To: Kai Huang
  Cc: Tian, Kevin, keir, Andrew Cooper, xen-devel, Kai Huang, Jan Beulich

Hi,

At 22:01 +0800 on 16 Feb (1424120474), Kai Huang wrote:
> On Fri, Feb 13, 2015 at 10:50 AM, Kai Huang <kai.huang@linux.intel.com> wrote:
> >
> > On 02/12/2015 08:34 PM, Tim Deegan wrote:
> >>
> >> Hi,
> >>
> >> Thanks for posting this design!
> >>
> >> At 16:28 +0800 on 11 Feb (1423668493), Kai Huang wrote:
> >>>
> >>> Design
> >>> ======
> >>>
> >>> - PML feature is used globally
> >>>
> >>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
> >>> control PML feature detection, and PML feature will only be detected if
> >>> opt_enable_pml = 1. Once PML feature is detected, it will be used for dirty
> >>> logging for all domains globally. Currently we don't support to use PML on
> >>> basis of per-domain as it will require additional control from XL tool.
> >>
> >> Sounds good.  I agree that there's no point in making this a per-VM
> >> feature.
> >>
> >>> - PML enable/disable for particular Domain
> >>>
> >>> PML needs to be enabled (allocate PML buffer, initialize PML index, PML
> >>> base address, turn PML on VMCS, etc) for all vcpus of the domain, as PML
> >>> buffer and PML index are per-vcpu, but EPT table may be shared by vcpus.
> >>> Enabling PML on partial vcpus of the domain won't work. Also PML will only
> >>> be enabled for the domain when it is switched to dirty logging mode, and it
> >>> will be disabled when domain is switched back to normal mode. As looks vcpu
> >>> number won't be changed dynamically during guest is running (correct me if I
> >>> am wrong here), so we don't have to consider enabling PML for new created
> >>> vcpu when guest is in dirty logging mode.
> >>>
> >> No - you really ought to handle enabling this for new VCPUs.  There
> >> have been cases in the past where VMs are put into log-dirty mode
> >> before their VCPUs are assigned, and there might be again.
> >
> > "Assigned" here means created?

Yes.

> >> It ought to be easy to handle, though - just one more check and
> >> function call on the vcpu setup path.
> >
> > I think "check and function call" means check function call to enable PML on
> > this vcpu? Then what if enabling PML for vcpu fails (possible as it needs to
> > allocate 4K PML buffer)? It's better to choose to roll back to use write
> > protection instead of indicating failure of creating the vcpu. But in this
> > case there will be problem if the domain has already been in log dirty mode
> > as we might already have EPT table setup with D-bit clear for logdirty
> > range, which means we need to re-check the logdirty ranges and re-set EPT
> > table to be read-only.  Does this sound reasonable?

If PML init fails for some reason, you can just fail the vcpu
init.  That's what we do for other things, and it avoids having to
deal with a half-PML domain. 

> If my above understanding is true, to me it's a little bit complicated
> to enable PML for domain on demand when it switches to log-dirty mode.
> Another approach is we  enable PML for vcpu unconditionally (if PML
> feature is detected of course) when vcpu is created, and if enabling
> PML failed, vcpu will just  not be created.

Nearly.  You should enable PML on vcpu creation _if_ the VM is
already in log-dirty mode.  If it is not, then do nothing and the
code that enables log-dirty later can set up PML for all existing VCPUs. 

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  2:39   ` Kai Huang
  2015-02-12  6:54     ` Tian, Kevin
@ 2015-02-17 10:19     ` Jan Beulich
  2015-02-17 11:57       ` Tim Deegan
                         ` (2 more replies)
  1 sibling, 3 replies; 54+ messages in thread
From: Jan Beulich @ 2015-02-17 10:19 UTC (permalink / raw)
  To: Kai Huang; +Cc: Andrew Cooper, kevin.tian, tim, keir, xen-devel

>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>> On 11/02/15 08:28, Kai Huang wrote:
>>> Design
>>> ======
>>>
>>> - PML feature is used globally
>>>
>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>> control PML feature detection, and PML feature will only be detected
>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>> for dirty logging for all domains globally. Currently we don't support
>>> to use PML on basis of per-domain as it will require additional
>>> control from XL tool.
>> Rather than adding in a new top level command line option for an ept
>> subfeature, it would be preferable to add an "ept=" option which has
>> "pml" as a sub boolean.
> Which is good to me, if Jan agrees.
> 
> Jan, which do you prefer here?

A single "ept=" option as Andrew suggested.

>>> Currently, PML will be used as long as there's guest memory in dirty
>>> logging mode, no matter globally or partially. And in case of partial
>>> dirty logging, we need to check if the logged GPA in PML buffer is in
>>> dirty logging range.
>> I am not sure this is a problem.  HAP vram tracking already leaks
>> non-vram frames into the dirty bitmap, caused by calls to
>> paging_mark_dirty() from paths which are not caused by a p2m_logdirty fault.
> Hmm. Seems right. Probably this also depends on how userspace uses the 
> dirty bitmap.
> 
> If this is not a problem, we can avoid the checking of whether logged 
> GPAs are in logdirty ranges but unconditionally update them to log-dirty 
> radix tree.
> 
> Jan, what's your comments here?

I agree with Andrew, but Tim's confirmation would be nice to have.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-12  2:49   ` Kai Huang
  2015-02-12  5:16     ` Kai Huang
  2015-02-12  7:02     ` Tian, Kevin
@ 2015-02-17 10:23     ` Jan Beulich
  2015-03-01 23:13       ` Kai Huang
  2 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2015-02-17 10:23 UTC (permalink / raw)
  To: Kai Huang; +Cc: andrew.cooper3, kevin.tian, tim, keir, xen-devel

>>> On 12.02.15 at 03:49, <kai.huang@linux.intel.com> wrote:
> On 02/11/2015 09:06 PM, Jan Beulich wrote:
>>>>> On 11.02.15 at 09:28, <kai.huang@linux.intel.com> wrote:
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is PML
>>> buffer full VMEXIT handler (apparently), and the second place is in
>>> paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from userspace via
>>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>>> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers
>>> before reporting dirty GPAs to userspace.
>>>
>>> We handle above two cases by flushing PML buffer at the beginning of all
>>> VMEXITs. This solves the first case above, and it also solves the second
>>> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks
>>> vcpus (that are in guest mode) out of guest mode via sending IPI, which cause
>>> VMEXIT, to them.
>>>
>>> This also makes log-dirty radix tree more updated as PML buffer is flushed
>>> on basis of all VMEXITs but not only PML buffer full VMEXIT.
>> Is that really efficient? Flushing the buffer only as needed doesn't
>> seem to be a major problem (apart from the usual preemption issue
>> when dealing with guests with very many vCPU-s, but you certainly
>> recall that at this point HVM is still limited to 128).
>>
>> Apart from these two remarks, the design looks okay to me.
> While keeping log-dirty radix tree more updated is probably irrelevant, 
> I do think we'd better to flush PML buffers in paging_log_dirty_op (both 
> peek and clear) before reporting dirty pages to userspace, in which case 
> I think flushing PML buffer at beginning of VMEXIT is a good idea, as 
> domain_pause does the job automatically. I am not sure how much cycles 
> will flushing PML buffer contribute but I think it should be relatively 
> small comparing to VMEXIT itself, therefore it can be ignored.

As far as my general thinking goes, this is the wrong attitude:
_Anything_ added to a hot path like VMEXIT processing should be
considered performance relevant. I.e. if everyone took the same
position as you do, we'd easily get many "negligible" additions, all
of which would add up to something no longer negligible.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-16 11:44                     ` Andrew Cooper
  2015-02-16 14:02                       ` Kai Huang
@ 2015-02-17 10:37                       ` Jan Beulich
  1 sibling, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2015-02-17 10:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, keir, tim, xen-devel, Kai Huang, Kai Huang

>>> On 16.02.15 at 12:44, <andrew.cooper3@citrix.com> wrote:
> On 14/02/15 03:01, Kai Huang wrote:
>> To me looks currently tracking vram (HAP) shatters all superpages,
>> instead only superpages in vram range would be. Am I misunderstanding
>> here?
> 
> You are completely correct.
> 
> Having just re-reviewed the HAP code, superpages are fully shattered as
> soon as logdirty mode is touched, which realistically means
> unconditionally, given that Qemu will always track guest VRAM.  (So much
> for the toolstack trying to optimise the guest by building memory using
> superpages; Qemu goes and causes Xen extra work by shattering them all.)

So this is due to which code? I ask because I can't seem to be able
to spot it (p2m-ept.c uses ept_split_super_page() only on when
needed, not unconditionally everywhere)...

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-13  2:28               ` Tian, Kevin
@ 2015-02-17 10:40                 ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2015-02-17 10:40 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Kai Huang, Andrew Cooper, Tim Deegan, keir, xen-devel

>>> On 13.02.15 at 03:28, <kevin.tian@intel.com> wrote:
>>  From: Tim Deegan [mailto:tim@xen.org]
>> Sent: Thursday, February 12, 2015 8:42 PM
>> 
>> At 07:08 +0000 on 12 Feb (1423721283), Tian, Kevin wrote:
>> > for general log dirty, ept_invalidate_emt is required because there is
>> > access permission change (dirtied page becomes rw after 1st fault,
>> > so need to change them back to ro again for the new dirty tracking
>> > round). But for PML, there's no permission change at all (always rw),
>> > so such behavior should be noted by general logdirty layer for better
>> > optimization.
>> 
>> AIUI the reason for calling ept_invalidate_emt() is to avoid having to
>> update a large number of EPTEs at once.  If you still need to update a
>> large number of EPTEs (to clear the Dirty bits), that has to me
>> preemptable, or else use ept_invalidate_emt().
>> 
>> Or have I misunderstood?
> 
> preemptable is fine and we can judge whether dirty set is large or not. 
> My feeling is that replace simple D-bit cleanup with ept misconfig exit
> is not optimal. Jan explained not strictly one misconfig exit for every D 
> bit since whole L1 will be handled in a batch, but we need have some 
> understanding of actual impact based on various workload patterns.

What alternatives do you see when dealing with a global run over
all entries?

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-17 10:19     ` Jan Beulich
@ 2015-02-17 11:57       ` Tim Deegan
  2015-03-11 10:59       ` George Dunlap
  2015-03-24  6:42       ` Kai Huang
  2 siblings, 0 replies; 54+ messages in thread
From: Tim Deegan @ 2015-02-17 11:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kai Huang, Andrew Cooper, kevin.tian, keir, xen-devel

At 10:19 +0000 on 17 Feb (1424164775), Jan Beulich wrote:
> >>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
> > On 02/11/2015 07:52 PM, Andrew Cooper wrote:
> >> On 11/02/15 08:28, Kai Huang wrote:
> >>> Currently, PML will be used as long as there's guest memory in dirty
> >>> logging mode, no matter globally or partially. And in case of partial
> >>> dirty logging, we need to check if the logged GPA in PML buffer is in
> >>> dirty logging range.
> >> I am not sure this is a problem.  HAP vram tracking already leaks
> >> non-vram frames into the dirty bitmap, caused by calls to
> >> paging_mark_dirty() from paths which are not caused by a p2m_logdirty fault.
> > Hmm. Seems right. Probably this also depends on how userspace uses the 
> > dirty bitmap.
> > 
> > If this is not a problem, we can avoid the checking of whether logged 
> > GPAs are in logdirty ranges but unconditionally update them to log-dirty 
> > radix tree.
> > 
> > Jan, what's your comments here?
> 
> I agree with Andrew, but Tim's confirmation would be nice to have.

Yes, I agree (as I said in an earlier reply, but understood that this
is a rambling thread).

Tim.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-17 10:23     ` Jan Beulich
@ 2015-03-01 23:13       ` Kai Huang
  0 siblings, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-03-01 23:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, kevin.tian, keir, tim, xen-devel


On 02/17/2015 06:23 PM, Jan Beulich wrote:
>>>> On 12.02.15 at 03:49, <kai.huang@linux.intel.com> wrote:
>> On 02/11/2015 09:06 PM, Jan Beulich wrote:
>>>>>> On 11.02.15 at 09:28, <kai.huang@linux.intel.com> wrote:
>>>> - PML buffer flush
>>>>
>>>> There are two places we need to flush PML buffer. The first place is PML
>>>> buffer full VMEXIT handler (apparently), and the second place is in
>>>> paging_log_dirty_op (either peek or clean), as vcpus are running
>>>> asynchronously along with paging_log_dirty_op is called from userspace via
>>>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>>>> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers
>>>> before reporting dirty GPAs to userspace.
>>>>
>>>> We handle above two cases by flushing PML buffer at the beginning of all
>>>> VMEXITs. This solves the first case above, and it also solves the second
>>>> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks
>>>> vcpus (that are in guest mode) out of guest mode via sending IPI, which cause
>>>> VMEXIT, to them.
>>>>
>>>> This also makes log-dirty radix tree more updated as PML buffer is flushed
>>>> on basis of all VMEXITs but not only PML buffer full VMEXIT.
>>> Is that really efficient? Flushing the buffer only as needed doesn't
>>> seem to be a major problem (apart from the usual preemption issue
>>> when dealing with guests with very many vCPU-s, but you certainly
>>> recall that at this point HVM is still limited to 128).
>>>
>>> Apart from these two remarks, the design looks okay to me.
>> While keeping log-dirty radix tree more updated is probably irrelevant,
>> I do think we'd better to flush PML buffers in paging_log_dirty_op (both
>> peek and clear) before reporting dirty pages to userspace, in which case
>> I think flushing PML buffer at beginning of VMEXIT is a good idea, as
>> domain_pause does the job automatically. I am not sure how much cycles
>> will flushing PML buffer contribute but I think it should be relatively
>> small comparing to VMEXIT itself, therefore it can be ignored.
> As far as my general thinking goes, this is the wrong attitude:
> _Anything_ added to a hot path like VMEXIT processing should be
> considered performance relevant. I.e. if everyone took the same
> position as you do, we'd easily get many "negligible" additions, all
> of which would add up to something no longer negligible.
Agreed. I'll do as Tim suggested: flush only on buffer-full VMEXITs and 
handle the peek/clear patch by explicitly reading all vcpu's PML buffer.

Thanks,
-Kai
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-17 10:19     ` Jan Beulich
  2015-02-17 11:57       ` Tim Deegan
@ 2015-03-11 10:59       ` George Dunlap
  2015-03-11 11:11         ` Andrew Cooper
  2015-03-24  6:42       ` Kai Huang
  2 siblings, 1 reply; 54+ messages in thread
From: George Dunlap @ 2015-03-11 10:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, Andrew Cooper, Tim Deegan, xen-devel,
	Kai Huang

On Tue, Feb 17, 2015 at 10:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
>> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>>> On 11/02/15 08:28, Kai Huang wrote:
>>>> Design
>>>> ======
>>>>
>>>> - PML feature is used globally
>>>>
>>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>>> control PML feature detection, and PML feature will only be detected
>>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>>> for dirty logging for all domains globally. Currently we don't support
>>>> to use PML on basis of per-domain as it will require additional
>>>> control from XL tool.
>>> Rather than adding in a new top level command line option for an ept
>>> subfeature, it would be preferable to add an "ept=" option which has
>>> "pml" as a sub boolean.
>> Which is good to me, if Jan agrees.
>>
>> Jan, which do you prefer here?
>
> A single "ept=" option as Andrew suggested.

Sorry to be coming late to this party -- what's the logic behind
having this enabled with "ept="? You're not changing anything about
how EPT itself works; you're adding a secondary feature which happens
to depend on ept.  Is there another hypervisor command-line option you
had in mind that works this way?

It might also be nice to be able to enable or disable this feature
with a sysctl call; but that's just a nice-to-have.

 -George

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-11 10:59       ` George Dunlap
@ 2015-03-11 11:11         ` Andrew Cooper
  2015-03-11 15:53           ` George Dunlap
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2015-03-11 11:11 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kai Huang, Keir Fraser, Tian, Kevin, Tim Deegan, xen-devel

On 11/03/15 10:59, George Dunlap wrote:
> On Tue, Feb 17, 2015 at 10:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
>>> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>>>> On 11/02/15 08:28, Kai Huang wrote:
>>>>> Design
>>>>> ======
>>>>>
>>>>> - PML feature is used globally
>>>>>
>>>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>>>> control PML feature detection, and PML feature will only be detected
>>>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>>>> for dirty logging for all domains globally. Currently we don't support
>>>>> to use PML on basis of per-domain as it will require additional
>>>>> control from XL tool.
>>>> Rather than adding in a new top level command line option for an ept
>>>> subfeature, it would be preferable to add an "ept=" option which has
>>>> "pml" as a sub boolean.
>>> Which is good to me, if Jan agrees.
>>>
>>> Jan, which do you prefer here?
>> A single "ept=" option as Andrew suggested.
> Sorry to be coming late to this party -- what's the logic behind
> having this enabled with "ept="? You're not changing anything about
> how EPT itself works; you're adding a secondary feature which happens
> to depend on ept.  Is there another hypervisor command-line option you
> had in mind that works this way?

iommu=

>
> It might also be nice to be able to enable or disable this feature
> with a sysctl call; but that's just a nice-to-have.

This feature should either be used or not.  It is impractical to
enable/disable at runtime.

However, it absolutely wants a knob for tweaking.  The likely
consequence of a bug in the implementation is VM memory corruption on
migrate, and you can get away with missing a large amount of a domains
memory before it blows up noticeably.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-11 11:11         ` Andrew Cooper
@ 2015-03-11 15:53           ` George Dunlap
  2015-03-12  7:36             ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: George Dunlap @ 2015-03-11 15:53 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tian, Kevin, Keir Fraser, Tim Deegan, xen-devel, Kai Huang, Jan Beulich

On Wed, Mar 11, 2015 at 11:11 AM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 11/03/15 10:59, George Dunlap wrote:
>> On Tue, Feb 17, 2015 at 10:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
>>>> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>>>>> On 11/02/15 08:28, Kai Huang wrote:
>>>>>> Design
>>>>>> ======
>>>>>>
>>>>>> - PML feature is used globally
>>>>>>
>>>>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>>>>> control PML feature detection, and PML feature will only be detected
>>>>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>>>>> for dirty logging for all domains globally. Currently we don't support
>>>>>> to use PML on basis of per-domain as it will require additional
>>>>>> control from XL tool.
>>>>> Rather than adding in a new top level command line option for an ept
>>>>> subfeature, it would be preferable to add an "ept=" option which has
>>>>> "pml" as a sub boolean.
>>>> Which is good to me, if Jan agrees.
>>>>
>>>> Jan, which do you prefer here?
>>> A single "ept=" option as Andrew suggested.
>> Sorry to be coming late to this party -- what's the logic behind
>> having this enabled with "ept="? You're not changing anything about
>> how EPT itself works; you're adding a secondary feature which happens
>> to depend on ept.  Is there another hypervisor command-line option you
>> had in mind that works this way?
>
> iommu=

Every option in iommu actually changes something about the way the
IOMMU actually works.  Analogous options for ept might be enabling /
disabling, setting a maximum entry size (1G, 2M, 4k), enabling the
no-execute bit, &c.

AFAICT PMT is a completely separate functionality.  Enabling it as an
option behind ept would be like saying you should enable BTS
(branch-trace-store) behind an option called "mmu=", since the
addresses in the BTS go through the MMU.

That's my $0.02 anyway...

>
>>
>> It might also be nice to be able to enable or disable this feature
>> with a sysctl call; but that's just a nice-to-have.
>
> This feature should either be used or not.  It is impractical to
> enable/disable at runtime.
>
> However, it absolutely wants a knob for tweaking.  The likely
> consequence of a bug in the implementation is VM memory corruption on
> migrate, and you can get away with missing a large amount of a domains
> memory before it blows up noticeably.

Those paragraphs sound to me like they say the opposite things.

And in any case, it's being enabled and disabled for particular
domains when they enable or disable logdirty mode, right?  It
shouldn't be hard at all to just fallback to the non-PML case if it's
been disabled.

Handling the case of enabling or disabling PML on domains that are
already in logdirty mode is, I agree, probably more trouble than it's
worth.  We can just document it to say that it will only have an
effect on domains that start logdirty in the future.

 -George

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-11 15:53           ` George Dunlap
@ 2015-03-12  7:36             ` Kai Huang
  2015-03-12 11:19               ` Andrew Cooper
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-03-12  7:36 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper
  Cc: Keir Fraser, Tian, Kevin, Tim Deegan, Jan Beulich, xen-devel



On 03/11/2015 11:53 PM, George Dunlap wrote:
> On Wed, Mar 11, 2015 at 11:11 AM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 11/03/15 10:59, George Dunlap wrote:
>>> On Tue, Feb 17, 2015 at 10:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
>>>>> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>>>>>> On 11/02/15 08:28, Kai Huang wrote:
>>>>>>> Design
>>>>>>> ======
>>>>>>>
>>>>>>> - PML feature is used globally
>>>>>>>
>>>>>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>>>>>> control PML feature detection, and PML feature will only be detected
>>>>>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>>>>>> for dirty logging for all domains globally. Currently we don't support
>>>>>>> to use PML on basis of per-domain as it will require additional
>>>>>>> control from XL tool.
>>>>>> Rather than adding in a new top level command line option for an ept
>>>>>> subfeature, it would be preferable to add an "ept=" option which has
>>>>>> "pml" as a sub boolean.
>>>>> Which is good to me, if Jan agrees.
>>>>>
>>>>> Jan, which do you prefer here?
>>>> A single "ept=" option as Andrew suggested.
>>> Sorry to be coming late to this party -- what's the logic behind
>>> having this enabled with "ept="? You're not changing anything about
>>> how EPT itself works; you're adding a secondary feature which happens
>>> to depend on ept.  Is there another hypervisor command-line option you
>>> had in mind that works this way?
>> iommu=
> Every option in iommu actually changes something about the way the
> IOMMU actually works.  Analogous options for ept might be enabling /
> disabling, setting a maximum entry size (1G, 2M, 4k), enabling the
> no-execute bit, &c.
>
> AFAICT PMT is a completely separate functionality.

Indeed, it doesn't impact functionality of existing EPT mechanism, 
though it depends on EPT mechanism to work (by checking D-bit updating 
from 0 to 1).

> Enabling it as an
> option behind ept would be like saying you should enable BTS
> (branch-trace-store) behind an option called "mmu=", since the
> addresses in the BTS go through the MMU.
>
> That's my $0.02 anyway...
To me a single opt_pml_enabled bool type parameter is OK, but I will 
keep using "ept=pml" as Jan/Andrew/Tim agreed, unless any of them oppose it.

>
>>> It might also be nice to be able to enable or disable this feature
>>> with a sysctl call; but that's just a nice-to-have.
>> This feature should either be used or not.  It is impractical to
>> enable/disable at runtime.
>>
>> However, it absolutely wants a knob for tweaking.  The likely
>> consequence of a bug in the implementation is VM memory corruption on
>> migrate, and you can get away with missing a large amount of a domains
>> memory before it blows up noticeably.
> Those paragraphs sound to me like they say the opposite things.
>
> And in any case, it's being enabled and disabled for particular
> domains when they enable or disable logdirty mode, right?  It
> shouldn't be hard at all to just fallback to the non-PML case if it's
> been disabled.
>
> Handling the case of enabling or disabling PML on domains that are
> already in logdirty mode is, I agree, probably more trouble than it's
> worth.  We can just document it to say that it will only have an
> effect on domains that start logdirty in the future.
Currently I only plan to support PML on boot parameter, but I can 
certainly add sysctl call to enable/disable PML dynamically if you guys 
think it's necessary in the future, which should be a separate 
nice-to-have feature and won't impact existing PML functionality.

Does this sound good to all of you?

Thanks,
-Kai
>
>   -George

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-12  7:36             ` Kai Huang
@ 2015-03-12 11:19               ` Andrew Cooper
  2015-03-14  3:04                 ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2015-03-12 11:19 UTC (permalink / raw)
  To: Kai Huang, George Dunlap
  Cc: Keir Fraser, Tian, Kevin, Tim Deegan, Jan Beulich, xen-devel

On 12/03/15 07:36, Kai Huang wrote:
>
>>>> It might also be nice to be able to enable or disable this feature
>>>> with a sysctl call; but that's just a nice-to-have.
>>> This feature should either be used or not.  It is impractical to
>>> enable/disable at runtime.
>>>
>>> However, it absolutely wants a knob for tweaking.  The likely
>>> consequence of a bug in the implementation is VM memory corruption on
>>> migrate, and you can get away with missing a large amount of a domains
>>> memory before it blows up noticeably.
>> Those paragraphs sound to me like they say the opposite things.
>>
>> And in any case, it's being enabled and disabled for particular
>> domains when they enable or disable logdirty mode, right?  It
>> shouldn't be hard at all to just fallback to the non-PML case if it's
>> been disabled.
>>
>> Handling the case of enabling or disabling PML on domains that are
>> already in logdirty mode is, I agree, probably more trouble than it's
>> worth.  We can just document it to say that it will only have an
>> effect on domains that start logdirty in the future.
> Currently I only plan to support PML on boot parameter, but I can
> certainly add sysctl call to enable/disable PML dynamically if you
> guys think it's necessary in the future, which should be a separate
> nice-to-have feature and won't impact existing PML functionality.
>
> Does this sound good to all of you?

I do not think a runtime switch will be useful.  The boot parameter is
useful for development, and debugging in that case that something goes
wrong, but as soon as the feature is stable I expect noone to ever tweak
the parameter again.

I certainly wouldn't focus on implementing something like this for v1. 
If a usecase develops in the future then we can certainly can reconsider.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-12 11:19               ` Andrew Cooper
@ 2015-03-14  3:04                 ` Kai Huang
  0 siblings, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-03-14  3:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Tim Deegan, xen-devel,
	Kai Huang, Jan Beulich

On Thu, Mar 12, 2015 at 7:19 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 12/03/15 07:36, Kai Huang wrote:
>>
>>>>> It might also be nice to be able to enable or disable this feature
>>>>> with a sysctl call; but that's just a nice-to-have.
>>>> This feature should either be used or not.  It is impractical to
>>>> enable/disable at runtime.
>>>>
>>>> However, it absolutely wants a knob for tweaking.  The likely
>>>> consequence of a bug in the implementation is VM memory corruption on
>>>> migrate, and you can get away with missing a large amount of a domains
>>>> memory before it blows up noticeably.
>>> Those paragraphs sound to me like they say the opposite things.
>>>
>>> And in any case, it's being enabled and disabled for particular
>>> domains when they enable or disable logdirty mode, right?  It
>>> shouldn't be hard at all to just fallback to the non-PML case if it's
>>> been disabled.
>>>
>>> Handling the case of enabling or disabling PML on domains that are
>>> already in logdirty mode is, I agree, probably more trouble than it's
>>> worth.  We can just document it to say that it will only have an
>>> effect on domains that start logdirty in the future.
>> Currently I only plan to support PML on boot parameter, but I can
>> certainly add sysctl call to enable/disable PML dynamically if you
>> guys think it's necessary in the future, which should be a separate
>> nice-to-have feature and won't impact existing PML functionality.
>>
>> Does this sound good to all of you?
>
> I do not think a runtime switch will be useful.  The boot parameter is
> useful for development, and debugging in that case that something goes
> wrong, but as soon as the feature is stable I expect noone to ever tweak
> the parameter again.
>
> I certainly wouldn't focus on implementing something like this for v1.
> If a usecase develops in the future then we can certainly can reconsider.

Agreed.

Thanks,
-Kai

>
> ~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-02-17 10:19     ` Jan Beulich
  2015-02-17 11:57       ` Tim Deegan
  2015-03-11 10:59       ` George Dunlap
@ 2015-03-24  6:42       ` Kai Huang
  2015-03-24  7:53         ` Jan Beulich
  2 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-03-24  6:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, kevin.tian, keir, tim, xen-devel



On 02/17/2015 06:19 PM, Jan Beulich wrote:
>>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
>> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>>> On 11/02/15 08:28, Kai Huang wrote:
>>>> Design
>>>> ======
>>>>
>>>> - PML feature is used globally
>>>>
>>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>>> control PML feature detection, and PML feature will only be detected
>>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>>> for dirty logging for all domains globally. Currently we don't support
>>>> to use PML on basis of per-domain as it will require additional
>>>> control from XL tool.
>>> Rather than adding in a new top level command line option for an ept
>>> subfeature, it would be preferable to add an "ept=" option which has
>>> "pml" as a sub boolean.
>> Which is good to me, if Jan agrees.
>>
>> Jan, which do you prefer here?
> A single "ept=" option as Andrew suggested.
Hi Andrew, Jan, Tim,

Sorry to bring this thread back.

Regarding to the parameter to control PML, I plan to enable PML by 
default, in which case would a "ept=no-pml" be more reasonable to 
disable it manually?

Actually by referring to "iommu=" parameter, I would like to do below 
changes. Is it good to you?

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index e895e6b..091335f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -53,6 +53,17 @@ boolean_param("apicv", opt_apicv_enabled);

+static void parse_ept_param(char *s);
+/*
+ * The 'ept' parameter controls functionalities that depends on, or 
impacts the
+ * EPT mechanism. Optional comma separated value may contain:
+ *
+ *  no-pml              Disable PML
+ */
+static custom_param("ept", parse_ept_param);
+static bool_t __read_mostly pml_enable = 1;
+
+/* Copied from parse_iommu_param */
+static void parse_ept_param(char *s)
+{
+    char *ss;
+    int val;
+
+    do {
+        val = !!strncmp(s, "no-", 3);
+        if ( !val )
+            s += 3;
+
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strcmp(s, "pml") )
+            pml_enable = val;
+
+        s = ss + 1;
+    } while ( ss );
+}
+

Thanks,
-Kai
>
>>>> Currently, PML will be used as long as there's guest memory in dirty
>>>> logging mode, no matter globally or partially. And in case of partial
>>>> dirty logging, we need to check if the logged GPA in PML buffer is in
>>>> dirty logging range.
>>> I am not sure this is a problem.  HAP vram tracking already leaks
>>> non-vram frames into the dirty bitmap, caused by calls to
>>> paging_mark_dirty() from paths which are not caused by a p2m_logdirty fault.
>> Hmm. Seems right. Probably this also depends on how userspace uses the
>> dirty bitmap.
>>
>> If this is not a problem, we can avoid the checking of whether logged
>> GPAs are in logdirty ranges but unconditionally update them to log-dirty
>> radix tree.
>>
>> Jan, what's your comments here?
> I agree with Andrew, but Tim's confirmation would be nice to have.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-24  6:42       ` Kai Huang
@ 2015-03-24  7:53         ` Jan Beulich
  2015-03-24  8:06           ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2015-03-24  7:53 UTC (permalink / raw)
  To: Kai Huang; +Cc: Andrew Cooper, kevin.tian, tim, keir, xen-devel

>>> On 24.03.15 at 07:42, <kai.huang@linux.intel.com> wrote:

> 
> On 02/17/2015 06:19 PM, Jan Beulich wrote:
>>>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
>>> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>>>> On 11/02/15 08:28, Kai Huang wrote:
>>>>> Design
>>>>> ======
>>>>>
>>>>> - PML feature is used globally
>>>>>
>>>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>>>> control PML feature detection, and PML feature will only be detected
>>>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>>>> for dirty logging for all domains globally. Currently we don't support
>>>>> to use PML on basis of per-domain as it will require additional
>>>>> control from XL tool.
>>>> Rather than adding in a new top level command line option for an ept
>>>> subfeature, it would be preferable to add an "ept=" option which has
>>>> "pml" as a sub boolean.
>>> Which is good to me, if Jan agrees.
>>>
>>> Jan, which do you prefer here?
>> A single "ept=" option as Andrew suggested.
> Hi Andrew, Jan, Tim,
> 
> Sorry to bring this thread back.
> 
> Regarding to the parameter to control PML, I plan to enable PML by 
> default, in which case would a "ept=no-pml" be more reasonable to 
> disable it manually?

Imo the default should be off at least initially. The command line
option parsing is (and should be) independent of the chosen
default anyway, i.e. overrides in either direction should be
possible.

> Actually by referring to "iommu=" parameter, I would like to do below 
> changes. Is it good to you?

Looks okay.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-24  7:53         ` Jan Beulich
@ 2015-03-24  8:06           ` Kai Huang
  2015-03-24  8:14             ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Kai Huang @ 2015-03-24  8:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, kevin.tian, keir, tim, xen-devel



On 03/24/2015 03:53 PM, Jan Beulich wrote:
>>>> On 24.03.15 at 07:42, <kai.huang@linux.intel.com> wrote:
>> On 02/17/2015 06:19 PM, Jan Beulich wrote:
>>>>>> On 12.02.15 at 03:39, <kai.huang@linux.intel.com> wrote:
>>>> On 02/11/2015 07:52 PM, Andrew Cooper wrote:
>>>>> On 11/02/15 08:28, Kai Huang wrote:
>>>>>> Design
>>>>>> ======
>>>>>>
>>>>>> - PML feature is used globally
>>>>>>
>>>>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>>>>> control PML feature detection, and PML feature will only be detected
>>>>>> if opt_enable_pml = 1. Once PML feature is detected, it will be used
>>>>>> for dirty logging for all domains globally. Currently we don't support
>>>>>> to use PML on basis of per-domain as it will require additional
>>>>>> control from XL tool.
>>>>> Rather than adding in a new top level command line option for an ept
>>>>> subfeature, it would be preferable to add an "ept=" option which has
>>>>> "pml" as a sub boolean.
>>>> Which is good to me, if Jan agrees.
>>>>
>>>> Jan, which do you prefer here?
>>> A single "ept=" option as Andrew suggested.
>> Hi Andrew, Jan, Tim,
>>
>> Sorry to bring this thread back.
>>
>> Regarding to the parameter to control PML, I plan to enable PML by
>> default, in which case would a "ept=no-pml" be more reasonable to
>> disable it manually?
> Imo the default should be off at least initially.
OK.
>   The command line
> option parsing is (and should be) independent of the chosen
> default anyway, i.e. overrides in either direction should be
> possible.
While the parse_ept_param function does support "ept=pml" and 
"ept=no-pml" both, I think in the comments of the function we should 
explicitly tell whether to use "ept=pml" (in case PML is off by 
default), or "ept=no-pml" (in case PML is on by default), otherwise 
"ept=pml,no-pml" is legal but obviously it doesn't make any sense (and 
looks this issue also exists in parse_iommu_param?).

Thanks,
-Kai
>
>> Actually by referring to "iommu=" parameter, I would like to do below
>> changes. Is it good to you?
> Looks okay.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-24  8:06           ` Kai Huang
@ 2015-03-24  8:14             ` Jan Beulich
  2015-03-24  8:17               ` Kai Huang
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2015-03-24  8:14 UTC (permalink / raw)
  To: Kai Huang; +Cc: Andrew Cooper, kevin.tian, tim, keir, xen-devel

>>> On 24.03.15 at 09:06, <kai.huang@linux.intel.com> wrote:
> On 03/24/2015 03:53 PM, Jan Beulich wrote:
>>   The command line
>> option parsing is (and should be) independent of the chosen
>> default anyway, i.e. overrides in either direction should be
>> possible.
> While the parse_ept_param function does support "ept=pml" and 
> "ept=no-pml" both, I think in the comments of the function we should 
> explicitly tell whether to use "ept=pml" (in case PML is off by 
> default), or "ept=no-pml" (in case PML is on by default), otherwise 
> "ept=pml,no-pml" is legal but obviously it doesn't make any sense (and 
> looks this issue also exists in parse_iommu_param?).

While "ept=pml,no-pml" makes little sense, there's nothing wrong
with allowing it. "ept=pml ept=no-pml" may in fact make sense,
when wanting to override a setting e.g. in an EFI config file on
the command (or grub) line. IOW don't lose time on preventing
non-sense option combinations if the resulting settings
nevertheless are valid / meaningful.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: PML (Page Modification Logging) design for Xen
  2015-03-24  8:14             ` Jan Beulich
@ 2015-03-24  8:17               ` Kai Huang
  0 siblings, 0 replies; 54+ messages in thread
From: Kai Huang @ 2015-03-24  8:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, kevin.tian, tim, keir, xen-devel



On 03/24/2015 04:14 PM, Jan Beulich wrote:
>>>> On 24.03.15 at 09:06, <kai.huang@linux.intel.com> wrote:
>> On 03/24/2015 03:53 PM, Jan Beulich wrote:
>>>    The command line
>>> option parsing is (and should be) independent of the chosen
>>> default anyway, i.e. overrides in either direction should be
>>> possible.
>> While the parse_ept_param function does support "ept=pml" and
>> "ept=no-pml" both, I think in the comments of the function we should
>> explicitly tell whether to use "ept=pml" (in case PML is off by
>> default), or "ept=no-pml" (in case PML is on by default), otherwise
>> "ept=pml,no-pml" is legal but obviously it doesn't make any sense (and
>> looks this issue also exists in parse_iommu_param?).
> While "ept=pml,no-pml" makes little sense, there's nothing wrong
> with allowing it. "ept=pml ept=no-pml" may in fact make sense,
> when wanting to override a setting e.g. in an EFI config file on
> the command (or grub) line. IOW don't lose time on preventing
> non-sense option combinations if the resulting settings
> nevertheless are valid / meaningful.
Hmm. Reasonable indeed. Thanks.

Thanks,
-Kai
>
> Jan
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2015-03-24  8:17 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-11  8:28 PML (Page Modification Logging) design for Xen Kai Huang
2015-02-11 11:52 ` Andrew Cooper
2015-02-11 13:13   ` Jan Beulich
2015-02-11 16:33     ` Andrew Cooper
2015-02-11 16:55       ` Jan Beulich
2015-02-12  2:35     ` Kai Huang
2015-02-12  6:25       ` Tian, Kevin
2015-02-12  6:45         ` Kai Huang
2015-02-12  7:08           ` Tian, Kevin
2015-02-12  7:34             ` Kai Huang
2015-02-12 12:42             ` Tim Deegan
2015-02-13  2:15               ` Kai Huang
2015-02-13  2:28               ` Tian, Kevin
2015-02-17 10:40                 ` Jan Beulich
2015-02-12  2:39   ` Kai Huang
2015-02-12  6:54     ` Tian, Kevin
2015-02-12  6:56       ` Kai Huang
2015-02-12  7:09         ` Tian, Kevin
2015-02-12  7:15           ` Kai Huang
2015-02-12 14:10       ` Andrew Cooper
2015-02-13  0:58         ` Bing
2015-02-13  2:11         ` Kai Huang
2015-02-13 10:57           ` Andrew Cooper
2015-02-13 14:32             ` Kai Huang
2015-02-13 15:28               ` Andrew Cooper
2015-02-13 15:52                 ` Kai Huang
2015-02-14  3:01                   ` Kai Huang
2015-02-16 11:44                     ` Andrew Cooper
2015-02-16 14:02                       ` Kai Huang
2015-02-17 10:37                       ` Jan Beulich
2015-02-17 10:19     ` Jan Beulich
2015-02-17 11:57       ` Tim Deegan
2015-03-11 10:59       ` George Dunlap
2015-03-11 11:11         ` Andrew Cooper
2015-03-11 15:53           ` George Dunlap
2015-03-12  7:36             ` Kai Huang
2015-03-12 11:19               ` Andrew Cooper
2015-03-14  3:04                 ` Kai Huang
2015-03-24  6:42       ` Kai Huang
2015-03-24  7:53         ` Jan Beulich
2015-03-24  8:06           ` Kai Huang
2015-03-24  8:14             ` Jan Beulich
2015-03-24  8:17               ` Kai Huang
2015-02-11 13:06 ` Jan Beulich
2015-02-12  2:49   ` Kai Huang
2015-02-12  5:16     ` Kai Huang
2015-02-12  7:02     ` Tian, Kevin
2015-02-12  7:04       ` Kai Huang
2015-02-17 10:23     ` Jan Beulich
2015-03-01 23:13       ` Kai Huang
2015-02-12 12:34 ` Tim Deegan
2015-02-13  2:50   ` Kai Huang
2015-02-16 14:01     ` Kai Huang
2015-02-16 18:19       ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.