All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] PML (Paging Modification Logging) support
@ 2015-03-27  2:35 Kai Huang
  2015-03-27  2:35 ` [PATCH 01/10] VMX: Enable EPT A/D bit support Kai Huang
                   ` (10 more replies)
  0 siblings, 11 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

Hi all,

This patch series adds PML support to Xen. Please kindly help to review it.

The patches were organized in below way:

patch 0:   
    enables EPT A/D bit support, which is a dependence of PML
patch 1 to 8:
    changes in VMX to enable PML
patch 9:
    change log-dirty common code to support PML
patch 10: 
    change p2m-ept.c to support PML

The design has already been discussed previously at below thread:

http://lists.xen.org/archives/html/xen-devel/2015-02/msg01305.html

I listed the brief changes to previous design according to your comments for
your quick reference, while the complete updated design was also pasted below.

- Instead of flushing PML buffer at beginning of every VMEXIT, at peek/clear
log-dirty op, we explicitly flush all vcpus' PML buffers before reporting dirty
pages to userspace.

- Address creating vcpu when domain is already in log-dirty mode, suggested by
Tim.

- In flushing PML buffer, instead of calling paging_mark_dirty unconditionally
for all logged dirty pages, we call p2m_type_change_one to change guest page
from log-dirty mode to normal mode, and only when the p2m_type_change_one has been
successfully done, the guest page will be marked dirty. This is required by
video ram tracking, and it's also reasonable to do it.

I also did some performance measure which was pasted after the design for your
reference.

====================== The Design ======================================

PML (Page Modification Logging) is a new feature on Intel's Boardwell server
platfrom targeted to reduce overhead of dirty logging mechanism. This patch
series add PML support to Xen.

Background
==========

Currently, dirty logging is done via write protection, which basically sets
guest memory we want to log to be read-only, then when guest performs first
write to that memory, write fault (EPT violation in case of EPT is used)
happens, in which we are able to log the dirty GFN, and set the W permission
bit. This mechanism works but at cost of one write fault for each first write
from the guest.

PML Introduction
================

PML is a hardware-assisted efficient way, based on EPT mechanism, for dirty
logging. Briefly, PML logs dirty GPA automatically to a 4K PML buffer when CPU
changes EPT table's D-bit from 0 to 1. To accomplish this, a new PML buffer base
address (machine address), a PML index, and a new PML buffer full VMEXIT were
added to VMCS. Initially PML index can be set to 511 (8 bytes for each GPA) to
indicate the buffer is empty, and CPU decreases PML index by 1 after logging
GPA. Before performing GPA logging, PML checks PML index to see if PML buffer
has been fully logged, in which case a PML buffer full VMEXIT happens, and VMM
should flush logged GPAs (to data structure keeps dirty pages) and reset PML
index so that further GPAs can be logged again.

The specification of PML can be found at:
http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html

With PML, we don't have to use write protection but just clear D-bit of EPT
entry of guest memory to do dirty logging, with an additional PML buffer full
VMEXIT for 512 dirty GPAs. Theoretically, this can reduce hypervisor overhead
when guest is in dirty logging mode, and more CPU cycles can be allocated to
guest, therefore it's expected benchmarks in guest will have better performance
comparing to non-PML.

Design
======

- PML feature is used globally

A new Xen boot parameter will be added to control PML feature detection. Once
PML feature is detected, it will be used for dirty logging for all domains
globally. Currently we don't support to use PML on basis of per-domain as it
will require additional control from XL tool.

The new parameter is a top level parameter 'ept=<options>' with a sub-boolean
'pml_enable'. PML is disabled by default and 'ept=pml' enables it.

- PML enable/disable for particular Domain

Enabling PML means:
    - allocate PML buffer, set PML base address
    - initialize PML index to 511
    - turn on PML in VMCS

PML needs to be enabled globally for all vcpus of a domain, as PML buffer and
PML index are per-vcpu, but EPT table is shared by vcpus, therefore enabling
PML on partial vcpus of a domain won't work.

PML will be enabled for the domain when it is switched to log-dirty mode, and
vice versa will be disabled when domain is switched back to normal mode.

Also commented by Tim Deegan, there have been cases where VMs are put into
log-dirty mode before their VCPUs are assigned, so we also need to handle
enabling PML for new vcpus. In this case, when creating vcpu, we check if the
domain has already been in log-dirty mode or not. If domain has been in
log-dirty mode, we enable PML for it, and failure of enabling PML results in
failure of vcpu creation.

After PML is enabled for the domain, we only need to clear EPT entry's D-bit for
guest memory in order to log dirty pages, instead of setting EPT entry to be
read-only. And for partial log-dirty, if PML is used, we also manually set D-bit
for guest memory that is in normal mode to avoid unnecessary GPA logging, as PML
works globally at entire EPT table.

However, for super pages, we still write protect them even with PML as we still
need to split super page to 4K page in log-dirty mode.

- PML buffer flush

PML buffer flush means:
    - read out PML index
    - go through all logged dirty GPAs, and update them to log-dirty radix tree
    - reset PML index to 511

There are two places we need to flush PML buffer. The first place is PML buffer
full VMEXIT handler (apparently), and the second place is in paging_log_dirty_op
(either peek or clean), as vcpus are running asynchronously along with
paging_log_dirty_op is called from userspace via hypercall, and it's possible
there are dirty GPAs logged in vcpus' PML buffers but not full, therefore we'd
better to flush all vcpus' PML buffers before reporting dirty GPAs to userspace.

Suggested by Jan and Tim, we flush PML buffer of vcpu on PML buffer full VMEXIT,
and in peek/clean path, we explicitly flush PML buffer of all vcpus of the
domain.

- Video RAM tracking (and partial dirty logging for guest memory range)

Video RAM is in dirty logging mode unconditionally during guest's run-time,
and it is partial memory range of the guest. However, PML operates on the
whole guest memory (the whole valid EPT table, more precisely), so we need
to choose whether to use PML if only partial guest memory ranges are in
dirty logging mode.

Currently, PML will be used as long as there's guest memory in dirty logging
mode, no matter globally or partially.

In flushing PML buffer, we will call p2m_type_change_one to change guest memory
from log-dirty to normal mode, and only the guest memory, which has been
successfully changed from log-dirty to normal mode, will be updated to log-dirty
radix tree. This is required by current video RAM tracking implementation.


======================== specjbb performance ===========================

I measured specjbb performance in guest when guest is in video ram tracking mode
(the most usual case I think), and when guest is in global log-dirty mode (I
made some change in XL tool to put guest global log-dirty mode infinitely, see
below), from which we can see that PML does improved the specjbb performance in
guest while guest is in log-dirty mode, and the more frequently dirty pages are
queried, the more performance gain we will have. So while PML probably can't
speed up live migration process directly, it will be benefical for use cases
such as guest memory dirty speed monitoring.

- video ram tracking:

    WP              PML         
    122805          123887
    120792          123249
    118577          123348
    121856          125195
    121286          122056
    120139          123037

avg 120909          123462      
    
    100%            102.11%    

performance gain:   2.11%                 

- global log-dirty:

    WP              PML
    72862           79511
    73466           81173
    72989           81177
    73138           81777
    72811           80257
    72486           80413

avg 72959           80718
    100%            110.63%

performance gain: 10.63%

Test machine: Boardwell server with 16 CPUs (1.6G) + 4G memory.
Xen hypervisor: lastest upstream Xen
dom0 kernel: 3.16.0
guest: 4 vcpus + 1G memory.
guest os: ubuntu 14.04 with 3.13.0-24-generic kernel.

Note: global log-dirty data was measured with below change, and running
'xl migrate <dom> localhost'.

diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..88a10f1 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -335,7 +335,12 @@ static int analysis_phase(xc_interface *xch, uint32_t domid, struct save_ctx *ct
         
    start = llgettimeofday();
              
+#define PML_TEST
+#ifdef  PML_TEST
+    for ( j = 0; true; j++ )
+#else
    for ( j = 0; j < runs; j++ )
+#endif
    {
        int i;




Kai Huang (10):
  VMX: Enable EPT A/D bit support
  VMX: New parameter to control PML enabling
  VMX: Add PML definition and feature detection.
  VMX: New data structure member to support PML
  VMX: add help functions to support PML
  VMX: handle PML buffer full VMEXIT
  VMX: handle PML enabling in vmx_vcpu_initialise
  VMX: disable PML in vmx_vcpu_destroy
  log-dirty: Refine common code to support PML
  p2m/ept: Enable PML in p2m-ept for log-dirty.

 xen/arch/x86/hvm/vmx/vmcs.c        | 241 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |  37 ++++++
 xen/arch/x86/mm/hap/hap.c          |  31 ++++-
 xen/arch/x86/mm/p2m-ept.c          |  81 ++++++++++++-
 xen/arch/x86/mm/p2m.c              |  36 ++++++
 xen/arch/x86/mm/paging.c           |  15 ++-
 xen/arch/x86/mm/shadow/common.c    |   2 +-
 xen/include/asm-x86/domain.h       |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  25 +++-
 xen/include/asm-x86/hvm/vmx/vmx.h  |   6 +-
 xen/include/asm-x86/p2m.h          |  11 ++
 xen/include/asm-x86/paging.h       |   3 +-
 12 files changed, 474 insertions(+), 15 deletions(-)

-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-03-27 20:38   ` Andrew Cooper
  2015-04-09 11:21   ` Tim Deegan
  2015-03-27  2:35 ` [PATCH 02/10] VMX: New parameter to control PML enabling Kai Huang
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

PML requires A/D bit support so enable it for further use.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 1 +
 xen/arch/x86/mm/p2m-ept.c          | 8 +++++++-
 xen/include/asm-x86/hvm/vmx/vmcs.h | 4 +++-
 xen/include/asm-x86/hvm/vmx/vmx.h  | 5 ++++-
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index d614638..2f645fe 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -103,6 +103,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_tpr_shadow, "APIC TPR shadow");
     P(cpu_has_vmx_ept, "Extended Page Tables (EPT)");
     P(cpu_has_vmx_vpid, "Virtual-Processor Identifiers (VPID)");
+    P(cpu_has_vmx_ept_ad_bit, "EPT A/D bit");
     P(cpu_has_vmx_vnmi, "Virtual NMI");
     P(cpu_has_vmx_msr_bitmap, "MSR direct-access bitmap");
     P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index c2d7720..8650092 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -233,6 +233,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
     if ( !ept_set_middle_entry(p2m, &new_ept) )
         return 0;
 
+    /* It's better to copy A bit of Middle entry from original entry */
+    new_ept.a = ept_entry->a;
+
     table = map_domain_page(new_ept.mfn);
     trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER);
 
@@ -244,7 +247,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
         epte->sp = (level > 1);
         epte->mfn += i * trunk;
         epte->snp = (iommu_enabled && iommu_snoop);
-        ASSERT(!epte->rsvd1);
+        /* A/D bits are inherited from superpage */
         ASSERT(!epte->avail3);
 
         ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
@@ -1071,6 +1074,9 @@ int ept_p2m_init(struct p2m_domain *p2m)
     /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
     ept->ept_wl = 3;
 
+    /* Enable EPT A/D bit if it's supported by hardware */
+    ept->ept_ad = cpu_has_vmx_ept_ad_bit ? 1 : 0;
+
     if ( !zalloc_cpumask_var(&ept->synced_mask) )
         return -ENOMEM;
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6fce6aa..4528346 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -62,7 +62,8 @@ struct ept_data {
     struct {
             u64 ept_mt :3,
                 ept_wl :3,
-                rsvd   :6,
+                ept_ad :1,
+                rsvd   :5,
                 asr    :52;
         };
         u64 eptp;
@@ -226,6 +227,7 @@ extern u32 vmx_secondary_exec_control;
 #define VMX_EPT_INVEPT_INSTRUCTION              0x00100000
 #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
 #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
+#define VMX_EPT_AD_BIT_SUPPORT                  0x00200000
 
 #define VMX_MISC_VMWRITE_ALL                    0x20000000
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 91c5e18..9afd351 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -37,7 +37,8 @@ typedef union {
         emt         :   3,  /* bits 5:3 - EPT Memory type */
         ipat        :   1,  /* bit 6 - Ignore PAT memory type */
         sp          :   1,  /* bit 7 - Is this a superpage? */
-        rsvd1       :   2,  /* bits 9:8 - Reserved for future use */
+        a           :   1,  /* bit 8 - Access bit */
+        d           :   1,  /* bit 9 - Dirty bit */
         recalc      :   1,  /* bit 10 - Software available 1 */
         snp         :   1,  /* bit 11 - VT-d snoop control in shared
                                EPT/VT-d usage */
@@ -261,6 +262,8 @@ extern uint8_t posted_intr_vector;
     (vmx_ept_vpid_cap & VMX_EPT_SUPERPAGE_2MB)
 #define cpu_has_vmx_ept_invept_single_context   \
     (vmx_ept_vpid_cap & VMX_EPT_INVEPT_SINGLE_CONTEXT)
+#define cpu_has_vmx_ept_ad_bit                  \
+    (vmx_ept_vpid_cap & VMX_EPT_AD_BIT_SUPPORT)
 
 #define EPT_2MB_SHIFT     16
 #define EPT_1GB_SHIFT     17
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 02/10] VMX: New parameter to control PML enabling
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
  2015-03-27  2:35 ` [PATCH 01/10] VMX: Enable EPT A/D bit support Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-03-27 20:42   ` Andrew Cooper
  2015-03-27  2:35 ` [PATCH 03/10] VMX: Add PML definition and feature detection Kai Huang
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

A top level EPT parameter "ept=<options>" and a sub boolean "pml_enable" are
added to control PML. Other booleans can be further added for any other EPT
related features.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 2f645fe..9b20a4b 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -50,6 +50,16 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
 static bool_t __read_mostly opt_apicv_enabled = 1;
 boolean_param("apicv", opt_apicv_enabled);
 
+static void parse_ept_param(char *s);
+/*
+ * The 'ept' parameter controls functionalities that depend on, or impact the
+ * EPT mechanism. Optional comma separated value may contain:
+ *
+ *  pml                 Enable PML
+ */
+custom_param("ept", parse_ept_param);
+static bool_t __read_mostly pml_enable = 0;
+
 /*
  * These two parameters are used to config the controls for Pause-Loop Exiting:
  * ple_gap:    upper bound on the amount of time between two successive
@@ -92,6 +102,28 @@ DEFINE_PER_CPU(bool_t, vmxon);
 static u32 vmcs_revision_id __read_mostly;
 u64 __read_mostly vmx_basic_msr;
 
+/* Copied from parse_iommu_param */
+static void parse_ept_param(char *s)
+{
+    char *ss;
+    int val;
+
+    do {
+        val = !!strncmp(s, "no-", 3);
+        if ( !val )
+            s += 3;
+
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strcmp(s, "pml") )
+            pml_enable = val;
+
+        s = ss + 1;
+    } while ( ss );
+}
+
 static void __init vmx_display_features(void)
 {
     int printed = 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 03/10] VMX: Add PML definition and feature detection.
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
  2015-03-27  2:35 ` [PATCH 01/10] VMX: Enable EPT A/D bit support Kai Huang
  2015-03-27  2:35 ` [PATCH 02/10] VMX: New parameter to control PML enabling Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-03-27 20:46   ` Andrew Cooper
  2015-03-27  2:35 ` [PATCH 04/10] VMX: New data structure member to support PML Kai Huang
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

The patch adds PML definition and feature detection. Note PML won't be detected
if PML is disabled from boot parameter. PML is also disabled in construct_vmcs,
as it will only be enabled when domain is switched to log dirty mode.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 18 ++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +++++
 xen/include/asm-x86/hvm/vmx/vmx.h  |  1 +
 3 files changed, 24 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9b20a4b..2798b0b 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -143,6 +143,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
     P(cpu_has_vmx_posted_intr_processing, "Posted Interrupt Processing");
     P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
+    P(cpu_has_vmx_pml, "Page Modification Logging");
 #undef P
 
     if ( !printed )
@@ -240,6 +241,8 @@ static int vmx_init_vmcs_config(void)
             opt |= SECONDARY_EXEC_ENABLE_VPID;
         if ( opt_unrestricted_guest_enabled )
             opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
+        if ( pml_enable )
+            opt |= SECONDARY_EXEC_ENABLE_PML;
 
         /*
          * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
@@ -286,6 +289,14 @@ static int vmx_init_vmcs_config(void)
          */
         if ( !(_vmx_ept_vpid_cap & VMX_VPID_INVVPID_ALL_CONTEXT) )
             _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
+
+	/*
+	 * PML cannot be supported if EPT A/D bits is not supported. Actually,
+	 * PML should not be detected if EPT A/D bits is not supported, but for
+	 * sure we do it anyway.
+	 */
+	if ( !(_vmx_ept_vpid_cap & VMX_EPT_AD_BIT_SUPPORT) )
+		_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
     }
 
     if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT )
@@ -306,6 +317,10 @@ static int vmx_init_vmcs_config(void)
                   SECONDARY_EXEC_UNRESTRICTED_GUEST);
     }
 
+    /* PML cannot be supported if we don't use EPT */
+    if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) )
+        _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
+
     if ( (_vmx_secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING) &&
           ple_gap == 0 )
     {
@@ -1041,6 +1056,9 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
     }
 
+    /* Disable PML anyway here as it will only be enabled in log dirty mode */
+    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
+
     /* Host data selectors. */
     __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
     __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 4528346..47b4df2 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -216,6 +216,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
 #define SECONDARY_EXEC_ENABLE_VMFUNC            0x00002000
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
+#define SECONDARY_EXEC_ENABLE_PML               0x00020000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
@@ -276,6 +277,8 @@ extern u32 vmx_secondary_exec_control;
     (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT)
 #define cpu_has_vmx_vmcs_shadowing \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
+#define cpu_has_vmx_pml \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
@@ -320,6 +323,7 @@ enum vmcs_field {
     GUEST_LDTR_SELECTOR             = 0x0000080c,
     GUEST_TR_SELECTOR               = 0x0000080e,
     GUEST_INTR_STATUS               = 0x00000810,
+    GUEST_PML_INDEX                 = 0x00000812,
     HOST_ES_SELECTOR                = 0x00000c00,
     HOST_CS_SELECTOR                = 0x00000c02,
     HOST_SS_SELECTOR                = 0x00000c04,
@@ -333,6 +337,7 @@ enum vmcs_field {
     VM_EXIT_MSR_STORE_ADDR          = 0x00002006,
     VM_EXIT_MSR_LOAD_ADDR           = 0x00002008,
     VM_ENTRY_MSR_LOAD_ADDR          = 0x0000200a,
+    PML_ADDRESS                     = 0x0000200e,
     TSC_OFFSET                      = 0x00002010,
     VIRTUAL_APIC_PAGE_ADDR          = 0x00002012,
     APIC_ACCESS_ADDR                = 0x00002014,
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 9afd351..c0e352d 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -186,6 +186,7 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 #define EXIT_REASON_XSETBV              55
 #define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
+#define EXIT_REASON_PML_FULL            62
 
 /*
  * Interruption-information format
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 04/10] VMX: New data structure member to support PML
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (2 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 03/10] VMX: Add PML definition and feature detection Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-03-27 20:48   ` Andrew Cooper
  2015-03-27  2:35 ` [PATCH 05/10] VMX: add help functions " Kai Huang
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

A new 4K page pointer is added to arch_vmx_struct as PML buffer for vcpu. And a
new 'status' field is added to vmx_domain to indicate whether PML is enabled for
the domain or not. The 'status' field also can be used for further similiar
purpose.

Note both new members don't have to be initialized to zero explicitly as both
vcpu and domain structure are zero-ed when they are created.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/include/asm-x86/hvm/vmx/vmcs.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 47b4df2..8cc1122 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -71,8 +71,12 @@ struct ept_data {
     cpumask_var_t synced_mask;
 };
 
+#define _VMX_DOMAIN_PML_ENABLED    0
+#define VMX_DOMAIN_PML_ENABLED     (1ul << _VMX_DOMAIN_PML_ENABLED)
 struct vmx_domain {
     unsigned long apic_access_mfn;
+    /* VMX_DOMAIN_* */
+    unsigned long status;
 };
 
 struct pi_desc {
@@ -143,6 +147,9 @@ struct arch_vmx_struct {
     /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */
     struct page_info     *vmread_bitmap;
     struct page_info     *vmwrite_bitmap;
+
+#define PML_ENTITY_NUM      512
+    struct page_info	*pml_pg;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 05/10] VMX: add help functions to support PML
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (3 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 04/10] VMX: New data structure member to support PML Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-03-27 21:09   ` Andrew Cooper
                     ` (2 more replies)
  2015-03-27  2:35 ` [PATCH 06/10] VMX: handle PML buffer full VMEXIT Kai Huang
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

This patch adds help functions to enable/disable PML, and flush PML buffer for
single vcpu and particular domain for further use.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 190 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |   9 ++
 2 files changed, 199 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 2798b0b..17cbef4 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1326,6 +1326,196 @@ void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector)
                 &v->arch.hvm_vmx.eoi_exitmap_changed);
 }
 
+int vmx_vcpu_pml_enabled(struct vcpu *v)
+{
+    return (v->arch.hvm_vmx.secondary_exec_control &
+            SECONDARY_EXEC_ENABLE_PML) ? 1 : 0;
+}
+
+int vmx_vcpu_enable_pml(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+
+    ASSERT(!vmx_vcpu_pml_enabled(v));
+
+    v->arch.hvm_vmx.pml_pg = d->arch.paging.alloc_page(d);
+    if ( !v->arch.hvm_vmx.pml_pg )
+        return -ENOMEM;
+
+    vmx_vmcs_enter(v);
+
+    __vmwrite(PML_ADDRESS, page_to_mfn(v->arch.hvm_vmx.pml_pg) << PAGE_SHIFT);
+    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
+
+    v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_PML;
+
+    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
+            v->arch.hvm_vmx.secondary_exec_control);
+
+    vmx_vmcs_exit(v);
+
+    return 0;
+}
+
+void vmx_vcpu_disable_pml(struct vcpu *v)
+{
+    ASSERT(vmx_vcpu_pml_enabled(v));
+
+    vmx_vmcs_enter(v);
+
+    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
+    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
+            v->arch.hvm_vmx.secondary_exec_control);
+
+    vmx_vmcs_exit(v);
+
+    v->domain->arch.paging.free_page(v->domain, v->arch.hvm_vmx.pml_pg);
+    v->arch.hvm_vmx.pml_pg = NULL;
+}
+
+void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
+{
+    uint64_t *pml_buf;
+    unsigned long pml_idx;
+
+    ASSERT(vmx_vcpu_pml_enabled(v));
+
+    vmx_vmcs_enter(v);
+
+    __vmread(GUEST_PML_INDEX, &pml_idx);
+
+    /* Do nothing if PML buffer is empty */
+    if ( pml_idx == (PML_ENTITY_NUM - 1) )
+        goto out;
+
+    pml_buf = map_domain_page(page_to_mfn(v->arch.hvm_vmx.pml_pg));
+
+    /*
+     * PML index can be either 2^16-1 (buffer is full), or 0~511 (buffer is not
+     * full), and in latter case PML index always points to next available
+     * entity.
+     */
+    if (pml_idx >= PML_ENTITY_NUM)
+        pml_idx = 0;
+    else
+        pml_idx++;
+
+    for ( ; pml_idx < PML_ENTITY_NUM; pml_idx++ )
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
+        unsigned long gfn;
+        mfn_t mfn;
+        p2m_type_t t;
+        p2m_access_t a;
+
+        gfn = pml_buf[pml_idx] >> PAGE_SHIFT;
+        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
+        if ( mfn_x(mfn) == INVALID_MFN )
+        {
+            /*
+             * Either EPT table entry for mapping the GFN has been destroyed, or
+             * there's something wrong with hardware behavior, in both cases we
+             * should report a warning.
+             */
+            dprintk(XENLOG_WARNING, "PML: vcpu %d: invalid GPA 0x%lx logged\n",
+                    v->vcpu_id, pml_buf[pml_idx]);
+            continue;
+        }
+
+        /*
+         * Need to change type from log-dirty to normal memory for logged GFN.
+         * hap_track_dirty_vram depends on it to work. And we really only need
+         * to mark GFNs which hve been successfully changed from log-dirty to
+         * normal memory to be dirty.
+         */
+        if ( !p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty,
+                    p2m_ram_rw) )
+            paging_mark_dirty(v->domain, mfn_x(mfn));
+    }
+
+    unmap_domain_page(pml_buf);
+
+    /* Reset PML index */
+    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
+
+out:
+    vmx_vmcs_exit(v);
+}
+
+int vmx_domain_pml_enabled(struct domain *d)
+{
+    return (d->arch.hvm_domain.vmx.status & VMX_DOMAIN_PML_ENABLED) ? 1 : 0;
+}
+
+/*
+ * This function enables PML for particular domain. It should be called when
+ * domain is paused.
+ *
+ * PML needs to be enabled globally for all vcpus of the domain, as PML buffer
+ * and PML index are pre-vcpu, but EPT table is shared by vcpus, therefore
+ * enabling PML on partial vcpus won't work.
+ */
+int vmx_domain_enable_pml(struct domain *d)
+{
+    struct vcpu *v;
+
+    ASSERT(!vmx_domain_pml_enabled(d));
+
+    for_each_vcpu( d, v )
+    {
+        if ( vmx_vcpu_enable_pml(v) )
+            goto error;
+    }
+
+    d->arch.hvm_domain.vmx.status |= VMX_DOMAIN_PML_ENABLED;
+
+    return 0;
+
+error:
+    for_each_vcpu( d, v )
+    {
+        if ( vmx_vcpu_pml_enabled(v) )
+            vmx_vcpu_disable_pml(v);
+    }
+    return -EINVAL;
+}
+
+/*
+ * Disable PML for particular domain. Called when domain is paused.
+ *
+ * The same as enabling PML for domain, disabling PML should be done for all
+ * vcpus at once.
+ */
+void vmx_domain_disable_pml(struct domain *d)
+{
+    struct vcpu *v;
+
+    ASSERT(vmx_domain_pml_enabled(d));
+
+    for_each_vcpu( d, v )
+    {
+        vmx_vcpu_disable_pml(v);
+    }
+
+    d->arch.hvm_domain.vmx.status &= ~VMX_DOMAIN_PML_ENABLED;
+}
+
+/*
+ * Flush PML buffer of all vcpus, and update the logged dirty pages to log-dirty
+ * radix tree. Called when domain is paused.
+ */
+void vmx_domain_flush_pml_buffers(struct domain *d)
+{
+    struct vcpu *v;
+
+    ASSERT(vmx_domain_pml_enabled(d));
+
+    for_each_vcpu( d, v )
+    {
+        vmx_vcpu_flush_pml_buffer(v);
+    }
+}
+
 int vmx_create_vmcs(struct vcpu *v)
 {
     struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 8cc1122..939d097 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -499,6 +499,15 @@ static inline int vmx_add_host_load_msr(u32 msr)
 
 DECLARE_PER_CPU(bool_t, vmxon);
 
+int vmx_vcpu_pml_enabled(struct vcpu *v);
+int vmx_vcpu_enable_pml(struct vcpu *v);
+void vmx_vcpu_disable_pml(struct vcpu *v);
+void vmx_vcpu_flush_pml_buffer(struct vcpu *v);
+int vmx_domain_pml_enabled(struct domain *d);
+int vmx_domain_enable_pml(struct domain *d);
+void vmx_domain_disable_pml(struct domain *d);
+void vmx_domain_flush_pml_buffers(struct domain *d);
+
 #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
 
 /*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 06/10] VMX: handle PML buffer full VMEXIT
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (4 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 05/10] VMX: add help functions " Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-03-27  2:35 ` [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise Kai Huang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

We need to flush PML buffer when it's full.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e1c55ce..453bcc5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3171,6 +3171,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_handle_apic_write();
         break;
 
+    case EXIT_REASON_PML_FULL:
+        vmx_vcpu_flush_pml_buffer(v);
+        break;
+
     case EXIT_REASON_ACCESS_GDTR_OR_IDTR:
     case EXIT_REASON_ACCESS_LDTR_OR_TR:
     case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (5 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 06/10] VMX: handle PML buffer full VMEXIT Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-03-27 21:12   ` Andrew Cooper
  2015-03-27  2:35 ` [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy Kai Huang
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

It's possible domain has already been in log-dirty mode when creating vcpu, in
which case we should enable PML for this vcpu if PML has been enabled for the
domain.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 453bcc5..fce3aa2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -116,6 +116,30 @@ static int vmx_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
+    /*
+     * It's rare but still possible that domain has already been in log-dirty
+     * mode when vcpu is being created (commented by Tim), in which case we
+     * should enable PML for this vcpu if PML has been enabled for the domain,
+     * and failure to enable results in failure of creating this vcpu.
+     *
+     * Note even there's no vcpu created for the domain, vmx_domain_enable_pml
+     * will return successful in which case vmx_domain_pml_enabled will also
+     * return true. And even this is the first vcpu to be created with
+     * vmx_domain_pml_enabled being true, failure of enabling PML still results
+     * in failure of creating vcpu, to avoid complicated logic to revert PML
+     * style EPT table to non-PML style EPT table.
+     */
+    if ( vmx_domain_pml_enabled(v->domain) )
+    {
+        if ( (rc = vmx_vcpu_enable_pml(v)) != 0 )
+        {
+            dprintk(XENLOG_ERR, "Failed to enable PML for vcpu %d\n",
+                    v->vcpu_id);
+            vmx_destroy_vmcs(v);
+            return rc;
+        }
+    }
+
     vpmu_initialise(v);
 
     vmx_install_vlapic_mapping(v);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (6 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-04-09 12:04   ` Tim Deegan
  2015-03-27  2:35 ` [PATCH 09/10] log-dirty: Refine common code to support PML Kai Huang
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

It's possible domain still remains in log-dirty mode when it is about to be
destroyed, in which case we should manually disable PML for it.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index fce3aa2..75ac44b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -153,6 +153,15 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
 static void vmx_vcpu_destroy(struct vcpu *v)
 {
+    /*
+     * There are cases that domain still remains in log-dirty mode when it is
+     * about to be destroyed (ex, user types 'xl destroy <dom>'), in which case
+     * we should disable PML manually here. Note that vmx_vcpu_destroy is called
+     * prior to vmx_domain_destroy so we need to disable PML for each vcpu
+     * separately here.
+     */
+    if ( vmx_vcpu_pml_enabled(v) )
+        vmx_vcpu_disable_pml(v);
     vmx_destroy_vmcs(v);
     vpmu_destroy(v);
     passive_domain_destroy(v);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 09/10] log-dirty: Refine common code to support PML
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (7 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-04-09 12:27   ` Tim Deegan
  2015-03-27  2:35 ` [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty Kai Huang
  2015-03-27 21:26 ` [PATCH 00/10] PML (Paging Modification Logging) support Andrew Cooper
  10 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

This patch adds several new callbacks in paging/hap/p2m layer to support PML.

At paging layer, a new callback is added to log_dirty_domain to flush hardware
cached dirty pages to log-dirty radix tree, as in case of PML, it's possible
there are dirty GPAs logged in vcpus' PML buffers when userspace peek/clear
dirty pages, therefore we need to flush them before reporting dirty pages to
userspace.

At p2m layer, three new callbacks are added to p2m_domain to enable/disable PML
and flush PML buffers. PML enabling/disabling callback will be called when
switching to log-dirty mode / switching back to normal mode respectively.
Flushing PML buffer callback will be called from paging layer when flushing PML
buffer manually.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/mm/hap/hap.c       | 16 +++++++++++++++-
 xen/arch/x86/mm/p2m.c           | 36 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/paging.c        | 15 ++++++++++++++-
 xen/arch/x86/mm/shadow/common.c |  2 +-
 xen/include/asm-x86/domain.h    |  1 +
 xen/include/asm-x86/p2m.h       | 11 +++++++++++
 xen/include/asm-x86/paging.h    |  3 ++-
 7 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 4ecb2e2..25f2f58 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -135,6 +135,10 @@ int hap_track_dirty_vram(struct domain *d,
 
             domain_pause(d);
 
+            /* flush dirty GFNs potentially cached by hardware */
+            if ( d->arch.paging.log_dirty.flush_cached_dirty )
+                d->arch.paging.log_dirty.flush_cached_dirty(d);
+
             /* get the bitmap */
             paging_log_dirty_range(d, begin_pfn, nr, dirty_bitmap);
 
@@ -190,6 +194,8 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
     d->arch.paging.mode |= PG_log_dirty;
     paging_unlock(d);
 
+    p2m_enable_hardware_log_dirty(d);
+
     if ( log_global )
     {
         /* set l1e entries of P2M table to be read-only. */
@@ -205,6 +211,8 @@ static int hap_disable_log_dirty(struct domain *d)
     d->arch.paging.mode &= ~PG_log_dirty;
     paging_unlock(d);
 
+    p2m_disable_hardware_log_dirty(d);
+
     /* set l1e entries of P2M table with normal mode */
     p2m_change_entry_type_global(d, p2m_ram_logdirty, p2m_ram_rw);
     return 0;
@@ -217,6 +225,11 @@ static void hap_clean_dirty_bitmap(struct domain *d)
     flush_tlb_mask(d->domain_dirty_cpumask);
 }
 
+static void hap_flush_cached_dirty(struct domain *d)
+{
+    p2m_flush_hardware_cached_dirty(d);
+}
+
 /************************************************/
 /*             HAP SUPPORT FUNCTIONS            */
 /************************************************/
@@ -431,7 +444,8 @@ void hap_domain_init(struct domain *d)
     /* Use HAP logdirty mechanism. */
     paging_log_dirty_init(d, hap_enable_log_dirty,
                           hap_disable_log_dirty,
-                          hap_clean_dirty_bitmap);
+                          hap_clean_dirty_bitmap,
+                          hap_flush_cached_dirty);
 }
 
 /* return 0 for success, -errno for failure */
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6a06e9f..291a275 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -239,6 +239,42 @@ void p2m_memory_type_changed(struct domain *d)
     }
 }
 
+void p2m_enable_hardware_log_dirty(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    if ( p2m->enable_hardware_log_dirty )
+    {
+        p2m_lock(p2m);
+        p2m->enable_hardware_log_dirty(p2m);
+        p2m_unlock(p2m);
+    }
+}
+
+void p2m_disable_hardware_log_dirty(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    if ( p2m->disable_hardware_log_dirty )
+    {
+        p2m_lock(p2m);
+        p2m->disable_hardware_log_dirty(p2m);
+        p2m_unlock(p2m);
+    }
+}
+
+void p2m_flush_hardware_cached_dirty(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    if ( p2m->flush_hardware_cached_dirty )
+    {
+        p2m_lock(p2m);
+        p2m->flush_hardware_cached_dirty(p2m);
+        p2m_unlock(p2m);
+    }
+}
+
 mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
                     p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
                     unsigned int *page_order, bool_t locked)
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index b54d76a..c2d336a 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -411,7 +411,18 @@ static int paging_log_dirty_op(struct domain *d,
     int i4, i3, i2;
 
     if ( !resuming )
+    {
         domain_pause(d);
+
+        /*
+         * Only need to flush when not resuming, as domain was paused in
+         * resuming case therefore it's not possible to have any new dirty
+         * page.
+         */
+        if ( d->arch.paging.log_dirty.flush_cached_dirty )
+            d->arch.paging.log_dirty.flush_cached_dirty(d);
+    }
+
     paging_lock(d);
 
     if ( !d->arch.paging.preempt.dom )
@@ -610,11 +621,13 @@ void paging_log_dirty_init(struct domain *d,
                            int    (*enable_log_dirty)(struct domain *d,
                                                       bool_t log_global),
                            int    (*disable_log_dirty)(struct domain *d),
-                           void   (*clean_dirty_bitmap)(struct domain *d))
+                           void   (*clean_dirty_bitmap)(struct domain *d),
+                           void   (*flush_cached_dirty)(struct domain *d))
 {
     d->arch.paging.log_dirty.enable_log_dirty = enable_log_dirty;
     d->arch.paging.log_dirty.disable_log_dirty = disable_log_dirty;
     d->arch.paging.log_dirty.clean_dirty_bitmap = clean_dirty_bitmap;
+    d->arch.paging.log_dirty.flush_cached_dirty = flush_cached_dirty;
 }
 
 /************************************************/
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 2e43d6d..f8451e8 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -54,7 +54,7 @@ int shadow_domain_init(struct domain *d, unsigned int domcr_flags)
 
     /* Use shadow pagetables for log-dirty support */
     paging_log_dirty_init(d, sh_enable_log_dirty,
-                          sh_disable_log_dirty, sh_clean_dirty_bitmap);
+                          sh_disable_log_dirty, sh_clean_dirty_bitmap, NULL);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
     d->arch.paging.shadow.oos_active = 0;
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 9cdffa8..0dc90d2 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -178,6 +178,7 @@ struct log_dirty_domain {
     int            (*enable_log_dirty   )(struct domain *d, bool_t log_global);
     int            (*disable_log_dirty  )(struct domain *d);
     void           (*clean_dirty_bitmap )(struct domain *d);
+    void           (*flush_cached_dirty )(struct domain *d);
 };
 
 struct paging_domain {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index e93c551..91c17a5 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -233,6 +233,9 @@ struct p2m_domain {
                                        p2m_access_t *p2ma,
                                        p2m_query_t q,
                                        unsigned int *page_order);
+    void               (*enable_hardware_log_dirty)(struct p2m_domain *p2m);
+    void               (*disable_hardware_log_dirty)(struct p2m_domain *p2m);
+    void               (*flush_hardware_cached_dirty)(struct p2m_domain *p2m);
     void               (*change_entry_type_global)(struct p2m_domain *p2m,
                                                    p2m_type_t ot,
                                                    p2m_type_t nt);
@@ -507,6 +510,14 @@ void guest_physmap_remove_page(struct domain *d,
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
                                           unsigned int order);
+/* Enable hardware-assisted log-dirty. */
+void p2m_enable_hardware_log_dirty(struct domain *d);
+
+/* Disable hardware-assisted log-dirty */
+void p2m_disable_hardware_log_dirty(struct domain *d);
+
+/* Flush hardware cached dirty GFNs */
+void p2m_flush_hardware_cached_dirty(struct domain *d);
 
 /* Change types across all p2m entries in a domain */
 void p2m_change_entry_type_global(struct domain *d, 
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index 53de715..9fa8d9d 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -152,7 +152,8 @@ void paging_log_dirty_init(struct domain *d,
                            int  (*enable_log_dirty)(struct domain *d,
                                                     bool_t log_global),
                            int  (*disable_log_dirty)(struct domain *d),
-                           void (*clean_dirty_bitmap)(struct domain *d));
+                           void (*clean_dirty_bitmap)(struct domain *d),
+                           void (*flush_cached_dirty)(struct domain *d));
 
 /* mark a page as dirty */
 void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty.
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (8 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 09/10] log-dirty: Refine common code to support PML Kai Huang
@ 2015-03-27  2:35 ` Kai Huang
  2015-04-09 12:20   ` Tim Deegan
  2015-03-27 21:26 ` [PATCH 00/10] PML (Paging Modification Logging) support Andrew Cooper
  10 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-27  2:35 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3, tim, kevin.tian, yang.z.zhang, xen-devel
  Cc: Kai Huang

This patch adds PML support in p2m-ept for log-dirty.

In case of PML is used, we just need to clear EPT entry's D-bit in order to log
that GFN instead of setting EPT entry to read-only. And for partial log-dirty,
we also set D-bit for guest memory in normal mode to avoid unnecessary GPA
logging, as PML works globally at entire valid EPT table.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/mm/hap/hap.c | 15 ++++++++--
 xen/arch/x86/mm/p2m-ept.c | 73 +++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 25f2f58..15fb5de 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -198,7 +198,10 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
 
     if ( log_global )
     {
-        /* set l1e entries of P2M table to be read-only. */
+        /*
+         * switch to log dirty mode. either set l1e entries of P2M table to be
+         * read-only, or enable log dirty in hardware-assisted way.
+         */
         p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
         flush_tlb_mask(d->domain_dirty_cpumask);
     }
@@ -213,14 +216,20 @@ static int hap_disable_log_dirty(struct domain *d)
 
     p2m_disable_hardware_log_dirty(d);
 
-    /* set l1e entries of P2M table with normal mode */
+    /*
+     * switch back to normal mode. either set l1e entries of P2M table with
+     * normal mode, or disable log dirty in hardware-assisted way.
+     */
     p2m_change_entry_type_global(d, p2m_ram_logdirty, p2m_ram_rw);
     return 0;
 }
 
 static void hap_clean_dirty_bitmap(struct domain *d)
 {
-    /* set l1e entries of P2M table to be read-only. */
+    /*
+     * switch to log dirty mode. either set l1e entries of P2M table to be
+     * read-only, or enable log dirty in hardware-assisted way.
+     */
     p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
     flush_tlb_mask(d->domain_dirty_cpumask);
 }
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 8650092..9a719f6 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -102,7 +102,8 @@ static int atomic_write_ept_entry(ept_entry_t *entryptr, ept_entry_t new,
     return rc;
 }
 
-static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_access_t access)
+static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
+                                  p2m_type_t type, p2m_access_t access)
 {
     /* First apply type permissions */
     switch(type)
@@ -118,6 +119,12 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
             break;
         case p2m_ram_rw:
             entry->r = entry->w = entry->x = 1;
+            /*
+             * This is about to avoid unnecessary GPA logging in PML buffer,
+             * such as normal memory in partial log-dirty
+             */
+            if ( vmx_domain_pml_enabled(p2m->domain) )
+                entry->d = 1;
             break;
         case p2m_mmio_direct:
             entry->r = entry->x = 1;
@@ -125,6 +132,26 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
                                                     entry->mfn);
             break;
         case p2m_ram_logdirty:
+            entry->r = entry->x = 1;
+            if ( vmx_domain_pml_enabled(p2m->domain) )
+            {
+                /*
+                 * In case of PML, we don't have to write protect 4K page, but
+                 * only need to clear D-bit for it. Note we still need to write
+                 * protect super page in order to split it to 4K pages in EPT
+                 * violation.
+                 */
+                if ( !is_epte_superpage(entry) )
+                {
+                    entry->w = 1;
+                    entry->d = 0;
+                }
+                else
+                    entry->w = 0;
+            }
+            else
+                entry->w = 0;
+            break;
         case p2m_ram_ro:
         case p2m_ram_shared:
             entry->r = entry->x = 1;
@@ -250,7 +277,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
         /* A/D bits are inherited from superpage */
         ASSERT(!epte->avail3);
 
-        ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
+        ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access);
 
         if ( (level - 1) == target )
             continue;
@@ -492,7 +519,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                     {
                          e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
                                      ? p2m_ram_logdirty : p2m_ram_rw;
-                         ept_p2m_type_to_flags(&e, e.sa_p2mt, e.access);
+                         ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
                     }
                     e.recalc = 0;
                     wrc = atomic_write_ept_entry(&epte[i], e, level);
@@ -544,7 +571,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                 e.ipat = ipat;
                 e.recalc = 0;
                 if ( recalc && p2m_is_changeable(e.sa_p2mt) )
-                    ept_p2m_type_to_flags(&e, e.sa_p2mt, e.access);
+                    ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
                 wrc = atomic_write_ept_entry(&epte[i], e, level);
                 ASSERT(wrc == 0);
             }
@@ -755,7 +782,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         if ( ept_entry->mfn == new_entry.mfn )
              need_modify_vtd_table = 0;
 
-        ept_p2m_type_to_flags(&new_entry, p2mt, p2ma);
+        ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
     }
 
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
@@ -1057,6 +1084,35 @@ void ept_sync_domain(struct p2m_domain *p2m)
                      __ept_sync_domain, p2m, 1);
 }
 
+static void ept_enable_pml(struct p2m_domain *p2m)
+{
+    /*
+     * No need to check if vmx_domain_enable_pml has succeeded or not, as
+     * ept_p2m_type_to_flags will check if PML has been enabled or not, and
+     * traditional write protection will be used if PML has not been enabled.
+     */
+    if ( vmx_domain_pml_enabled(p2m->domain) )
+        return;
+
+    vmx_domain_enable_pml(p2m->domain);
+}
+
+static void ept_disable_pml(struct p2m_domain *p2m)
+{
+    if ( !vmx_domain_pml_enabled(p2m->domain) )
+        return;
+
+    vmx_domain_disable_pml(p2m->domain);
+}
+
+static void ept_flush_pml_buffers(struct p2m_domain *p2m)
+{
+    if ( !vmx_domain_pml_enabled(p2m->domain) )
+        return;
+
+    vmx_domain_flush_pml_buffers(p2m->domain);
+}
+
 int ept_p2m_init(struct p2m_domain *p2m)
 {
     struct ept_data *ept = &p2m->ept;
@@ -1077,6 +1133,13 @@ int ept_p2m_init(struct p2m_domain *p2m)
     /* Enable EPT A/D bit if it's supported by hardware */
     ept->ept_ad = cpu_has_vmx_ept_ad_bit ? 1 : 0;
 
+    if ( cpu_has_vmx_pml )
+    {
+        p2m->enable_hardware_log_dirty = ept_enable_pml;
+        p2m->disable_hardware_log_dirty = ept_disable_pml;
+        p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers;
+    }
+
     if ( !zalloc_cpumask_var(&ept->synced_mask) )
         return -ENOMEM;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-27  2:35 ` [PATCH 01/10] VMX: Enable EPT A/D bit support Kai Huang
@ 2015-03-27 20:38   ` Andrew Cooper
  2015-03-30  6:11     ` Kai Huang
  2015-04-02  6:32     ` Kai Huang
  2015-04-09 11:21   ` Tim Deegan
  1 sibling, 2 replies; 65+ messages in thread
From: Andrew Cooper @ 2015-03-27 20:38 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 27/03/15 02:35, Kai Huang wrote:
> PML requires A/D bit support so enable it for further use.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        | 1 +
>  xen/arch/x86/mm/p2m-ept.c          | 8 +++++++-
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 4 +++-
>  xen/include/asm-x86/hvm/vmx/vmx.h  | 5 ++++-
>  4 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index d614638..2f645fe 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -103,6 +103,7 @@ static void __init vmx_display_features(void)
>      P(cpu_has_vmx_tpr_shadow, "APIC TPR shadow");
>      P(cpu_has_vmx_ept, "Extended Page Tables (EPT)");
>      P(cpu_has_vmx_vpid, "Virtual-Processor Identifiers (VPID)");
> +    P(cpu_has_vmx_ept_ad_bit, "EPT A/D bit");
>      P(cpu_has_vmx_vnmi, "Virtual NMI");
>      P(cpu_has_vmx_msr_bitmap, "MSR direct-access bitmap");
>      P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index c2d7720..8650092 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>      if ( !ept_set_middle_entry(p2m, &new_ept) )
>          return 0;
>  
> +    /* It's better to copy A bit of Middle entry from original entry */
> +    new_ept.a = ept_entry->a;

Surely d needs to be propagated as well?  Would it make sense to extend
ept_set_middle_entry() to do all of new_ept setup in one location?

> +
>      table = map_domain_page(new_ept.mfn);
>      trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER);
>  
> @@ -244,7 +247,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>          epte->sp = (level > 1);
>          epte->mfn += i * trunk;
>          epte->snp = (iommu_enabled && iommu_snoop);
> -        ASSERT(!epte->rsvd1);
> +        /* A/D bits are inherited from superpage */
>          ASSERT(!epte->avail3);
>  
>          ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
> @@ -1071,6 +1074,9 @@ int ept_p2m_init(struct p2m_domain *p2m)
>      /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
>      ept->ept_wl = 3;
>  
> +    /* Enable EPT A/D bit if it's supported by hardware */
> +    ept->ept_ad = cpu_has_vmx_ept_ad_bit ? 1 : 0;

This will incur overhead on all EPT operations.  It should only be
enabled if pml is going to be in use.  (I think you need reverse patches
1 and 2 in the series, and gate on pml_enable here)

> +
>      if ( !zalloc_cpumask_var(&ept->synced_mask) )
>          return -ENOMEM;
>  
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 6fce6aa..4528346 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -62,7 +62,8 @@ struct ept_data {
>      struct {
>              u64 ept_mt :3,
>                  ept_wl :3,
> -                rsvd   :6,
> +                ept_ad :1,
> +                rsvd   :5,
>                  asr    :52;

While you are making this change, can you add comments similar to
ept_entry_t describing the bits?

>          };
>          u64 eptp;
> @@ -226,6 +227,7 @@ extern u32 vmx_secondary_exec_control;
>  #define VMX_EPT_INVEPT_INSTRUCTION              0x00100000
>  #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
>  #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
> +#define VMX_EPT_AD_BIT_SUPPORT                  0x00200000
>  
>  #define VMX_MISC_VMWRITE_ALL                    0x20000000
>  
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 91c5e18..9afd351 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -37,7 +37,8 @@ typedef union {
>          emt         :   3,  /* bits 5:3 - EPT Memory type */
>          ipat        :   1,  /* bit 6 - Ignore PAT memory type */
>          sp          :   1,  /* bit 7 - Is this a superpage? */
> -        rsvd1       :   2,  /* bits 9:8 - Reserved for future use */
> +        a           :   1,  /* bit 8 - Access bit */
> +        d           :   1,  /* bit 9 - Dirty bit */
>          recalc      :   1,  /* bit 10 - Software available 1 */
>          snp         :   1,  /* bit 11 - VT-d snoop control in shared
>                                 EPT/VT-d usage */
> @@ -261,6 +262,8 @@ extern uint8_t posted_intr_vector;
>      (vmx_ept_vpid_cap & VMX_EPT_SUPERPAGE_2MB)
>  #define cpu_has_vmx_ept_invept_single_context   \
>      (vmx_ept_vpid_cap & VMX_EPT_INVEPT_SINGLE_CONTEXT)
> +#define cpu_has_vmx_ept_ad_bit                  \
> +    (vmx_ept_vpid_cap & VMX_EPT_AD_BIT_SUPPORT)

I think cpu_has_vmx_ept_ad is sufficient, without the _bit suffix making
it longer.

~Andrew

>  
>  #define EPT_2MB_SHIFT     16
>  #define EPT_1GB_SHIFT     17

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/10] VMX: New parameter to control PML enabling
  2015-03-27  2:35 ` [PATCH 02/10] VMX: New parameter to control PML enabling Kai Huang
@ 2015-03-27 20:42   ` Andrew Cooper
  2015-03-30  6:16     ` Kai Huang
  2015-04-02  5:46     ` Kai Huang
  0 siblings, 2 replies; 65+ messages in thread
From: Andrew Cooper @ 2015-03-27 20:42 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 27/03/15 02:35, Kai Huang wrote:
> A top level EPT parameter "ept=<options>" and a sub boolean "pml_enable" are
> added to control PML. Other booleans can be further added for any other EPT
> related features.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>

Please patch docs/misc/xen-command-line.markdown as well.  See the
existing "psr" option as a similar example.

Also, as indicated in patch 1, I think patches 1 and 2 need swapping in
the series.

> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 2f645fe..9b20a4b 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -50,6 +50,16 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
>  static bool_t __read_mostly opt_apicv_enabled = 1;
>  boolean_param("apicv", opt_apicv_enabled);
>  
> +static void parse_ept_param(char *s);
> +/*
> + * The 'ept' parameter controls functionalities that depend on, or impact the
> + * EPT mechanism. Optional comma separated value may contain:
> + *
> + *  pml                 Enable PML
> + */
> +custom_param("ept", parse_ept_param);

It is common to put the custom_param() call below parse_ept_param() so
you don't need to forward-declare the function.  The comment can happily
live at the top of parse_ept_param().

> +static bool_t __read_mostly pml_enable = 0;
> +
>  /*
>   * These two parameters are used to config the controls for Pause-Loop Exiting:
>   * ple_gap:    upper bound on the amount of time between two successive
> @@ -92,6 +102,28 @@ DEFINE_PER_CPU(bool_t, vmxon);
>  static u32 vmcs_revision_id __read_mostly;
>  u64 __read_mostly vmx_basic_msr;
>  
> +/* Copied from parse_iommu_param */

Not a useful comment, as it is likely to diverge in the future.

> +static void parse_ept_param(char *s)

__init

~Andrew

> +{
> +    char *ss;
> +    int val;
> +
> +    do {
> +        val = !!strncmp(s, "no-", 3);
> +        if ( !val )
> +            s += 3;
> +
> +        ss = strchr(s, ',');
> +        if ( ss )
> +            *ss = '\0';
> +
> +        if ( !strcmp(s, "pml") )
> +            pml_enable = val;
> +
> +        s = ss + 1;
> +    } while ( ss );
> +}
> +
>  static void __init vmx_display_features(void)
>  {
>      int printed = 0;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/10] VMX: Add PML definition and feature detection.
  2015-03-27  2:35 ` [PATCH 03/10] VMX: Add PML definition and feature detection Kai Huang
@ 2015-03-27 20:46   ` Andrew Cooper
  2015-03-30  6:18     ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-03-27 20:46 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 27/03/15 02:35, Kai Huang wrote:
> The patch adds PML definition and feature detection. Note PML won't be detected
> if PML is disabled from boot parameter. PML is also disabled in construct_vmcs,
> as it will only be enabled when domain is switched to log dirty mode.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        | 18 ++++++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +++++
>  xen/include/asm-x86/hvm/vmx/vmx.h  |  1 +
>  3 files changed, 24 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 9b20a4b..2798b0b 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -143,6 +143,7 @@ static void __init vmx_display_features(void)
>      P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
>      P(cpu_has_vmx_posted_intr_processing, "Posted Interrupt Processing");
>      P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
> +    P(cpu_has_vmx_pml, "Page Modification Logging");
>  #undef P
>  
>      if ( !printed )
> @@ -240,6 +241,8 @@ static int vmx_init_vmcs_config(void)
>              opt |= SECONDARY_EXEC_ENABLE_VPID;
>          if ( opt_unrestricted_guest_enabled )
>              opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
> +        if ( pml_enable )

This should be named opt_pml_enable in patch 1 or 2 to identify that it
is a command line option.

> +            opt |= SECONDARY_EXEC_ENABLE_PML;
>  
>          /*
>           * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
> @@ -286,6 +289,14 @@ static int vmx_init_vmcs_config(void)
>           */
>          if ( !(_vmx_ept_vpid_cap & VMX_VPID_INVVPID_ALL_CONTEXT) )
>              _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
> +
> +	/*
> +	 * PML cannot be supported if EPT A/D bits is not supported. Actually,
> +	 * PML should not be detected if EPT A/D bits is not supported, but for
> +	 * sure we do it anyway.
> +	 */
> +	if ( !(_vmx_ept_vpid_cap & VMX_EPT_AD_BIT_SUPPORT) )
> +		_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>      }
>  
>      if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT )
> @@ -306,6 +317,10 @@ static int vmx_init_vmcs_config(void)
>                    SECONDARY_EXEC_UNRESTRICTED_GUEST);
>      }
>  
> +    /* PML cannot be supported if we don't use EPT */
> +    if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) )
> +        _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
> +

Somewhere in here you should clear pml_enable if hardware doesn't
support it.

~Andrew

>      if ( (_vmx_secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING) &&
>            ple_gap == 0 )
>      {
> @@ -1041,6 +1056,9 @@ static int construct_vmcs(struct vcpu *v)
>          __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
>      }
>  
> +    /* Disable PML anyway here as it will only be enabled in log dirty mode */
> +    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
> +
>      /* Host data selectors. */
>      __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
>      __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 4528346..47b4df2 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -216,6 +216,7 @@ extern u32 vmx_vmentry_control;
>  #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
>  #define SECONDARY_EXEC_ENABLE_VMFUNC            0x00002000
>  #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
> +#define SECONDARY_EXEC_ENABLE_PML               0x00020000
>  extern u32 vmx_secondary_exec_control;
>  
>  #define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
> @@ -276,6 +277,8 @@ extern u32 vmx_secondary_exec_control;
>      (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT)
>  #define cpu_has_vmx_vmcs_shadowing \
>      (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
> +#define cpu_has_vmx_pml \
> +    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
>  
>  #define VMCS_RID_TYPE_MASK              0x80000000
>  
> @@ -320,6 +323,7 @@ enum vmcs_field {
>      GUEST_LDTR_SELECTOR             = 0x0000080c,
>      GUEST_TR_SELECTOR               = 0x0000080e,
>      GUEST_INTR_STATUS               = 0x00000810,
> +    GUEST_PML_INDEX                 = 0x00000812,
>      HOST_ES_SELECTOR                = 0x00000c00,
>      HOST_CS_SELECTOR                = 0x00000c02,
>      HOST_SS_SELECTOR                = 0x00000c04,
> @@ -333,6 +337,7 @@ enum vmcs_field {
>      VM_EXIT_MSR_STORE_ADDR          = 0x00002006,
>      VM_EXIT_MSR_LOAD_ADDR           = 0x00002008,
>      VM_ENTRY_MSR_LOAD_ADDR          = 0x0000200a,
> +    PML_ADDRESS                     = 0x0000200e,
>      TSC_OFFSET                      = 0x00002010,
>      VIRTUAL_APIC_PAGE_ADDR          = 0x00002012,
>      APIC_ACCESS_ADDR                = 0x00002014,
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 9afd351..c0e352d 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -186,6 +186,7 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>  #define EXIT_REASON_XSETBV              55
>  #define EXIT_REASON_APIC_WRITE          56
>  #define EXIT_REASON_INVPCID             58
> +#define EXIT_REASON_PML_FULL            62
>  
>  /*
>   * Interruption-information format

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/10] VMX: New data structure member to support PML
  2015-03-27  2:35 ` [PATCH 04/10] VMX: New data structure member to support PML Kai Huang
@ 2015-03-27 20:48   ` Andrew Cooper
  2015-03-30  6:19     ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-03-27 20:48 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 27/03/15 02:35, Kai Huang wrote:
> A new 4K page pointer is added to arch_vmx_struct as PML buffer for vcpu. And a
> new 'status' field is added to vmx_domain to indicate whether PML is enabled for
> the domain or not. The 'status' field also can be used for further similiar
> purpose.
>
> Note both new members don't have to be initialized to zero explicitly as both
> vcpu and domain structure are zero-ed when they are created.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 47b4df2..8cc1122 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -71,8 +71,12 @@ struct ept_data {
>      cpumask_var_t synced_mask;
>  };
>  
> +#define _VMX_DOMAIN_PML_ENABLED    0
> +#define VMX_DOMAIN_PML_ENABLED     (1ul << _VMX_DOMAIN_PML_ENABLED)
>  struct vmx_domain {
>      unsigned long apic_access_mfn;
> +    /* VMX_DOMAIN_* */
> +    unsigned long status;
>  };
>  
>  struct pi_desc {
> @@ -143,6 +147,9 @@ struct arch_vmx_struct {
>      /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */
>      struct page_info     *vmread_bitmap;
>      struct page_info     *vmwrite_bitmap;
> +
> +#define PML_ENTITY_NUM      512

This is the number of pml entries, not entities.  NR_PML_ENTRIES perhaps?

> +    struct page_info	*pml_pg;

Please align the fields vertically like vmwrite_bitmap.

~Andrew

>  };
>  
>  int vmx_create_vmcs(struct vcpu *v);

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-03-27  2:35 ` [PATCH 05/10] VMX: add help functions " Kai Huang
@ 2015-03-27 21:09   ` Andrew Cooper
  2015-03-30  6:43     ` Kai Huang
  2015-04-09 12:00   ` Tim Deegan
  2015-04-09 12:31   ` Tim Deegan
  2 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-03-27 21:09 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 27/03/15 02:35, Kai Huang wrote:
> This patch adds help functions to enable/disable PML, and flush PML buffer for
> single vcpu and particular domain for further use.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        | 190 +++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |   9 ++
>  2 files changed, 199 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 2798b0b..17cbef4 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1326,6 +1326,196 @@ void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector)
>                  &v->arch.hvm_vmx.eoi_exitmap_changed);
>  }
>  
> +int vmx_vcpu_pml_enabled(struct vcpu *v)

bool_t vmx_vcpu_pml_enabled(const struct vcpu *v)

> +{
> +    return (v->arch.hvm_vmx.secondary_exec_control &
> +            SECONDARY_EXEC_ENABLE_PML) ? 1 : 0;

This would be slightly shorter as
!!(v->arch.hvm_vmx.secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)

> +}
> +
> +int vmx_vcpu_enable_pml(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +
> +    ASSERT(!vmx_vcpu_pml_enabled(v));
> +
> +    v->arch.hvm_vmx.pml_pg = d->arch.paging.alloc_page(d);
> +    if ( !v->arch.hvm_vmx.pml_pg )
> +        return -ENOMEM;
> +
> +    vmx_vmcs_enter(v);
> +
> +    __vmwrite(PML_ADDRESS, page_to_mfn(v->arch.hvm_vmx.pml_pg) << PAGE_SHIFT);
> +    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
> +
> +    v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_PML;
> +
> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
> +            v->arch.hvm_vmx.secondary_exec_control);

Alignment.

> +
> +    vmx_vmcs_exit(v);
> +
> +    return 0;
> +}
> +
> +void vmx_vcpu_disable_pml(struct vcpu *v)
> +{
> +    ASSERT(vmx_vcpu_pml_enabled(v));
> +
> +    vmx_vmcs_enter(v);
> +
> +    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
> +            v->arch.hvm_vmx.secondary_exec_control);
> +
> +    vmx_vmcs_exit(v);
> +
> +    v->domain->arch.paging.free_page(v->domain, v->arch.hvm_vmx.pml_pg);
> +    v->arch.hvm_vmx.pml_pg = NULL;
> +}
> +
> +void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
> +{
> +    uint64_t *pml_buf;
> +    unsigned long pml_idx;
> +
> +    ASSERT(vmx_vcpu_pml_enabled(v));
> +
> +    vmx_vmcs_enter(v);
> +
> +    __vmread(GUEST_PML_INDEX, &pml_idx);
> +
> +    /* Do nothing if PML buffer is empty */
> +    if ( pml_idx == (PML_ENTITY_NUM - 1) )
> +        goto out;
> +
> +    pml_buf = map_domain_page(page_to_mfn(v->arch.hvm_vmx.pml_pg));

__map_domain_page() is a wrapper which takes a struct page_info

> +
> +    /*
> +     * PML index can be either 2^16-1 (buffer is full), or 0~511 (buffer is not
> +     * full), and in latter case PML index always points to next available
> +     * entity.
> +     */
> +    if (pml_idx >= PML_ENTITY_NUM)
> +        pml_idx = 0;
> +    else
> +        pml_idx++;
> +
> +    for ( ; pml_idx < PML_ENTITY_NUM; pml_idx++ )
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);

This p2m_get_host_p2m() call should be hoisted out of the loop.

> +        unsigned long gfn;
> +        mfn_t mfn;
> +        p2m_type_t t;
> +        p2m_access_t a;
> +
> +        gfn = pml_buf[pml_idx] >> PAGE_SHIFT;
> +        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
> +        if ( mfn_x(mfn) == INVALID_MFN )
> +        {
> +            /*
> +             * Either EPT table entry for mapping the GFN has been destroyed, or
> +             * there's something wrong with hardware behavior, in both cases we
> +             * should report a warning.
> +             */
> +            dprintk(XENLOG_WARNING, "PML: vcpu %d: invalid GPA 0x%lx logged\n",
> +                    v->vcpu_id, pml_buf[pml_idx]);

It would be shorter to log gfn rather than gpa.

> +            continue;
> +        }
> +
> +        /*
> +         * Need to change type from log-dirty to normal memory for logged GFN.
> +         * hap_track_dirty_vram depends on it to work. And we really only need
> +         * to mark GFNs which hve been successfully changed from log-dirty to
> +         * normal memory to be dirty.
> +         */
> +        if ( !p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty,
> +                    p2m_ram_rw) )
> +            paging_mark_dirty(v->domain, mfn_x(mfn));
> +    }
> +
> +    unmap_domain_page(pml_buf);
> +
> +    /* Reset PML index */
> +    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
> +
> +out:
> +    vmx_vmcs_exit(v);
> +}
> +
> +int vmx_domain_pml_enabled(struct domain *d)

bool_t and const as per vcpu variant.

> +{
> +    return (d->arch.hvm_domain.vmx.status & VMX_DOMAIN_PML_ENABLED) ? 1 : 0;
> +}
> +
> +/*
> + * This function enables PML for particular domain. It should be called when
> + * domain is paused.

In which case assert that the domain is paused, or call domain_pause()
yourself to take an extra pause refcount.

> + *
> + * PML needs to be enabled globally for all vcpus of the domain, as PML buffer
> + * and PML index are pre-vcpu, but EPT table is shared by vcpus, therefore
> + * enabling PML on partial vcpus won't work.
> + */
> +int vmx_domain_enable_pml(struct domain *d)
> +{
> +    struct vcpu *v;
> +
> +    ASSERT(!vmx_domain_pml_enabled(d));
> +
> +    for_each_vcpu( d, v )
> +    {
> +        if ( vmx_vcpu_enable_pml(v) )
> +            goto error;

Please catch the actual rc from vmx_vcpu_enable_pml() and propagate out
of this function, rather than clobbering -ENOMEM with -EINVAL.

Also, per Xen style, you can drop the braces.

~Andrew

> +    }
> +
> +    d->arch.hvm_domain.vmx.status |= VMX_DOMAIN_PML_ENABLED;
> +
> +    return 0;
> +
> +error:
> +    for_each_vcpu( d, v )
> +    {
> +        if ( vmx_vcpu_pml_enabled(v) )
> +            vmx_vcpu_disable_pml(v);
> +    }
> +    return -EINVAL;
> +}
> +
> +/*
> + * Disable PML for particular domain. Called when domain is paused.
> + *
> + * The same as enabling PML for domain, disabling PML should be done for all
> + * vcpus at once.
> + */
> +void vmx_domain_disable_pml(struct domain *d)
> +{
> +    struct vcpu *v;
> +
> +    ASSERT(vmx_domain_pml_enabled(d));
> +
> +    for_each_vcpu( d, v )
> +    {
> +        vmx_vcpu_disable_pml(v);
> +    }
> +
> +    d->arch.hvm_domain.vmx.status &= ~VMX_DOMAIN_PML_ENABLED;
> +}
> +
> +/*
> + * Flush PML buffer of all vcpus, and update the logged dirty pages to log-dirty
> + * radix tree. Called when domain is paused.
> + */
> +void vmx_domain_flush_pml_buffers(struct domain *d)
> +{
> +    struct vcpu *v;
> +
> +    ASSERT(vmx_domain_pml_enabled(d));
> +
> +    for_each_vcpu( d, v )
> +    {
> +        vmx_vcpu_flush_pml_buffer(v);
> +    }
> +}
> +
>  int vmx_create_vmcs(struct vcpu *v)
>  {
>      struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 8cc1122..939d097 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -499,6 +499,15 @@ static inline int vmx_add_host_load_msr(u32 msr)
>  
>  DECLARE_PER_CPU(bool_t, vmxon);
>  
> +int vmx_vcpu_pml_enabled(struct vcpu *v);
> +int vmx_vcpu_enable_pml(struct vcpu *v);
> +void vmx_vcpu_disable_pml(struct vcpu *v);
> +void vmx_vcpu_flush_pml_buffer(struct vcpu *v);
> +int vmx_domain_pml_enabled(struct domain *d);
> +int vmx_domain_enable_pml(struct domain *d);
> +void vmx_domain_disable_pml(struct domain *d);
> +void vmx_domain_flush_pml_buffers(struct domain *d);
> +
>  #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
>  
>  /*

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise
  2015-03-27  2:35 ` [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise Kai Huang
@ 2015-03-27 21:12   ` Andrew Cooper
  2015-03-30  7:03     ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-03-27 21:12 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 27/03/15 02:35, Kai Huang wrote:
> It's possible domain has already been in log-dirty mode when creating vcpu, in
> which case we should enable PML for this vcpu if PML has been enabled for the
> domain.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 453bcc5..fce3aa2 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -116,6 +116,30 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>          return rc;
>      }
>  
> +    /*
> +     * It's rare but still possible that domain has already been in log-dirty
> +     * mode when vcpu is being created (commented by Tim), in which case we
> +     * should enable PML for this vcpu if PML has been enabled for the domain,
> +     * and failure to enable results in failure of creating this vcpu.
> +     *
> +     * Note even there's no vcpu created for the domain, vmx_domain_enable_pml
> +     * will return successful in which case vmx_domain_pml_enabled will also
> +     * return true. And even this is the first vcpu to be created with
> +     * vmx_domain_pml_enabled being true, failure of enabling PML still results
> +     * in failure of creating vcpu, to avoid complicated logic to revert PML
> +     * style EPT table to non-PML style EPT table.
> +     */
> +    if ( vmx_domain_pml_enabled(v->domain) )
> +    {
> +        if ( (rc = vmx_vcpu_enable_pml(v)) != 0 )

Given the comment here, is the assertion in the top of
vmx_vcpu_enable_pml() liable to trip?

> +        {
> +            dprintk(XENLOG_ERR, "Failed to enable PML for vcpu %d\n",
> +                    v->vcpu_id);

Please use %pv to identify the domain as well as vcpu.

~Andrew

> +            vmx_destroy_vmcs(v);
> +            return rc;
> +        }
> +    }
> +
>      vpmu_initialise(v);
>  
>      vmx_install_vlapic_mapping(v);

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/10] PML (Paging Modification Logging) support
  2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
                   ` (9 preceding siblings ...)
  2015-03-27  2:35 ` [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty Kai Huang
@ 2015-03-27 21:26 ` Andrew Cooper
  2015-03-30  5:50   ` Kai Huang
  10 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-03-27 21:26 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 27/03/15 02:35, Kai Huang wrote:
> Hi all,
>
> This patch series adds PML support to Xen. Please kindly help to review it.

Overall this looks like a very good series, and it is particularly
helpful given the level of commenting.

Which platforms is/will PML be available for?

~Andrew

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/10] PML (Paging Modification Logging) support
  2015-03-27 21:26 ` [PATCH 00/10] PML (Paging Modification Logging) support Andrew Cooper
@ 2015-03-30  5:50   ` Kai Huang
  2015-04-07  8:30     ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-30  5:50 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 05:26 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> Hi all,
>>
>> This patch series adds PML support to Xen. Please kindly help to review it.
> Overall this looks like a very good series, and it is particularly
> helpful given the level of commenting.
>
> Which platforms is/will PML be available for?
Hi Andrew,

Thanks for your quick review. PML will be available from Intel's 
"Broadwell server" platform.

Thanks,
-Kai
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-27 20:38   ` Andrew Cooper
@ 2015-03-30  6:11     ` Kai Huang
  2015-03-30  9:36       ` Andrew Cooper
  2015-04-02  6:32     ` Kai Huang
  1 sibling, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-30  6:11 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 04:38 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> PML requires A/D bit support so enable it for further use.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/hvm/vmx/vmcs.c        | 1 +
>>   xen/arch/x86/mm/p2m-ept.c          | 8 +++++++-
>>   xen/include/asm-x86/hvm/vmx/vmcs.h | 4 +++-
>>   xen/include/asm-x86/hvm/vmx/vmx.h  | 5 ++++-
>>   4 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index d614638..2f645fe 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -103,6 +103,7 @@ static void __init vmx_display_features(void)
>>       P(cpu_has_vmx_tpr_shadow, "APIC TPR shadow");
>>       P(cpu_has_vmx_ept, "Extended Page Tables (EPT)");
>>       P(cpu_has_vmx_vpid, "Virtual-Processor Identifiers (VPID)");
>> +    P(cpu_has_vmx_ept_ad_bit, "EPT A/D bit");
>>       P(cpu_has_vmx_vnmi, "Virtual NMI");
>>       P(cpu_has_vmx_msr_bitmap, "MSR direct-access bitmap");
>>       P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index c2d7720..8650092 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>>       if ( !ept_set_middle_entry(p2m, &new_ept) )
>>           return 0;
>>   
>> +    /* It's better to copy A bit of Middle entry from original entry */
>> +    new_ept.a = ept_entry->a;
> Surely d needs to be propagated as well?
No it's not necessary. D-bit is not defined in middle level EPT table. 
Only leaf table entry has D-bit definition.
> Would it make sense to extend
> ept_set_middle_entry() to do all of new_ept setup in one location?
Yes it certainly makes sense to move A-bit propagation into 
ept_set_middle_entry, but this also requires adding additional original 
EPT entry pointer to ept_set_middle_entry as parameter. And 
ept_set_middle_entry is also called by ept_next_level, therefore 
changing it requires more code change, something like below. While I am 
fine with both, which solution do you prefer?

+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -208,7 +208,8 @@ static void ept_p2m_type_to_flags(struct p2m_domain 
*p2m, ept_entry_t *entry,
  #define GUEST_TABLE_POD_PAGE    3

  /* Fill in middle levels of ept table */
-static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t 
*ept_entry)
+static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t 
*new_entry,
+        ept_entry_t *ori_entry)
  {
      struct page_info *pg;

@@ -216,11 +217,13 @@ static int ept_set_middle_entry(struct p2m_domain 
*p2m, ept_entry_t *ept_entry)
      if ( pg == NULL )
          return 0;

-    ept_entry->epte = 0;
-    ept_entry->mfn = page_to_mfn(pg);
-    ept_entry->access = p2m->default_access;
+    new_entry->epte = 0;
+    new_entry->mfn = page_to_mfn(pg);
+    new_entry->access = p2m->default_access;

-    ept_entry->r = ept_entry->w = ept_entry->x = 1;
+    new_entry->r = new_entry->w = new_entry->x = 1;
+
+    new_entry->a = ori_entry->a;

      return 1;
  }
@@ -257,7 +260,7 @@ static int ept_split_super_page(struct p2m_domain 
*p2m, ept_entry_t *ept_entry,

      ASSERT(is_epte_superpage(ept_entry));

-    if ( !ept_set_middle_entry(p2m, &new_ept) )
+    if ( !ept_set_middle_entry(p2m, &new_ept, ept_entry) )
          return 0;

      table = map_domain_page(new_ept.mfn);
@@ -337,7 +340,7 @@ static int ept_next_level(struct p2m_domain *p2m, 
bool_t read_only,
          if ( read_only )
              return GUEST_TABLE_MAP_FAILED;

-        if ( !ept_set_middle_entry(p2m, ept_entry) )
+        if ( !ept_set_middle_entry(p2m, ept_entry, &e) )
              return GUEST_TABLE_MAP_FAILED;
          else
              e = atomic_read_ept_entry(ept_entry); /* Refresh */
>
>> +
>>       table = map_domain_page(new_ept.mfn);
>>       trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER);
>>   
>> @@ -244,7 +247,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>>           epte->sp = (level > 1);
>>           epte->mfn += i * trunk;
>>           epte->snp = (iommu_enabled && iommu_snoop);
>> -        ASSERT(!epte->rsvd1);
>> +        /* A/D bits are inherited from superpage */
>>           ASSERT(!epte->avail3);
>>   
>>           ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
>> @@ -1071,6 +1074,9 @@ int ept_p2m_init(struct p2m_domain *p2m)
>>       /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
>>       ept->ept_wl = 3;
>>   
>> +    /* Enable EPT A/D bit if it's supported by hardware */
>> +    ept->ept_ad = cpu_has_vmx_ept_ad_bit ? 1 : 0;
> This will incur overhead on all EPT operations.  It should only be
> enabled if pml is going to be in use.  (I think you need reverse patches
> 1 and 2 in the series, and gate on pml_enable here)
Sure. Will do.

>
>> +
>>       if ( !zalloc_cpumask_var(&ept->synced_mask) )
>>           return -ENOMEM;
>>   
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 6fce6aa..4528346 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -62,7 +62,8 @@ struct ept_data {
>>       struct {
>>               u64 ept_mt :3,
>>                   ept_wl :3,
>> -                rsvd   :6,
>> +                ept_ad :1,
>> +                rsvd   :5,
>>                   asr    :52;
> While you are making this change, can you add comments similar to
> ept_entry_t describing the bits?
Sure. I can add a comment for 'ept_ad' bit here. I didn't do it as 
there's no comments for other bits neither.

>
>>           };
>>           u64 eptp;
>> @@ -226,6 +227,7 @@ extern u32 vmx_secondary_exec_control;
>>   #define VMX_EPT_INVEPT_INSTRUCTION              0x00100000
>>   #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
>>   #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
>> +#define VMX_EPT_AD_BIT_SUPPORT                  0x00200000
>>   
>>   #define VMX_MISC_VMWRITE_ALL                    0x20000000
>>   
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
>> index 91c5e18..9afd351 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
>> @@ -37,7 +37,8 @@ typedef union {
>>           emt         :   3,  /* bits 5:3 - EPT Memory type */
>>           ipat        :   1,  /* bit 6 - Ignore PAT memory type */
>>           sp          :   1,  /* bit 7 - Is this a superpage? */
>> -        rsvd1       :   2,  /* bits 9:8 - Reserved for future use */
>> +        a           :   1,  /* bit 8 - Access bit */
>> +        d           :   1,  /* bit 9 - Dirty bit */
>>           recalc      :   1,  /* bit 10 - Software available 1 */
>>           snp         :   1,  /* bit 11 - VT-d snoop control in shared
>>                                  EPT/VT-d usage */
>> @@ -261,6 +262,8 @@ extern uint8_t posted_intr_vector;
>>       (vmx_ept_vpid_cap & VMX_EPT_SUPERPAGE_2MB)
>>   #define cpu_has_vmx_ept_invept_single_context   \
>>       (vmx_ept_vpid_cap & VMX_EPT_INVEPT_SINGLE_CONTEXT)
>> +#define cpu_has_vmx_ept_ad_bit                  \
>> +    (vmx_ept_vpid_cap & VMX_EPT_AD_BIT_SUPPORT)
> I think cpu_has_vmx_ept_ad is sufficient, without the _bit suffix making
> it longer.
Sure. Will do.

Thanks,
-Kai
>
> ~Andrew
>
>>   
>>   #define EPT_2MB_SHIFT     16
>>   #define EPT_1GB_SHIFT     17
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/10] VMX: New parameter to control PML enabling
  2015-03-27 20:42   ` Andrew Cooper
@ 2015-03-30  6:16     ` Kai Huang
  2015-04-02  5:46     ` Kai Huang
  1 sibling, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-30  6:16 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 04:42 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> A top level EPT parameter "ept=<options>" and a sub boolean "pml_enable" are
>> added to control PML. Other booleans can be further added for any other EPT
>> related features.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> Please patch docs/misc/xen-command-line.markdown as well.  See the
> existing "psr" option as a similar example.
Will do. Thanks for pointing out.

>
> Also, as indicated in patch 1, I think patches 1 and 2 need swapping in
> the series.
Sure.

>
>> ---
>>   xen/arch/x86/hvm/vmx/vmcs.c | 32 ++++++++++++++++++++++++++++++++
>>   1 file changed, 32 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 2f645fe..9b20a4b 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -50,6 +50,16 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
>>   static bool_t __read_mostly opt_apicv_enabled = 1;
>>   boolean_param("apicv", opt_apicv_enabled);
>>   
>> +static void parse_ept_param(char *s);
>> +/*
>> + * The 'ept' parameter controls functionalities that depend on, or impact the
>> + * EPT mechanism. Optional comma separated value may contain:
>> + *
>> + *  pml                 Enable PML
>> + */
>> +custom_param("ept", parse_ept_param);
> It is common to put the custom_param() call below parse_ept_param() so
> you don't need to forward-declare the function.  The comment can happily
> live at the top of parse_ept_param().
Will do.

>
>> +static bool_t __read_mostly pml_enable = 0;
>> +
>>   /*
>>    * These two parameters are used to config the controls for Pause-Loop Exiting:
>>    * ple_gap:    upper bound on the amount of time between two successive
>> @@ -92,6 +102,28 @@ DEFINE_PER_CPU(bool_t, vmxon);
>>   static u32 vmcs_revision_id __read_mostly;
>>   u64 __read_mostly vmx_basic_msr;
>>   
>> +/* Copied from parse_iommu_param */
> Not a useful comment, as it is likely to diverge in the future.
I will move comments for 'custom_param("ept", parse_ept_param)' here, as 
you suggested above, and this useless comment can be eliminated.

>
>> +static void parse_ept_param(char *s)
> __init
Will do.

Thanks,
-Kai
>
> ~Andrew
>
>> +{
>> +    char *ss;
>> +    int val;
>> +
>> +    do {
>> +        val = !!strncmp(s, "no-", 3);
>> +        if ( !val )
>> +            s += 3;
>> +
>> +        ss = strchr(s, ',');
>> +        if ( ss )
>> +            *ss = '\0';
>> +
>> +        if ( !strcmp(s, "pml") )
>> +            pml_enable = val;
>> +
>> +        s = ss + 1;
>> +    } while ( ss );
>> +}
>> +
>>   static void __init vmx_display_features(void)
>>   {
>>       int printed = 0;
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/10] VMX: Add PML definition and feature detection.
  2015-03-27 20:46   ` Andrew Cooper
@ 2015-03-30  6:18     ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-30  6:18 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 04:46 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> The patch adds PML definition and feature detection. Note PML won't be detected
>> if PML is disabled from boot parameter. PML is also disabled in construct_vmcs,
>> as it will only be enabled when domain is switched to log dirty mode.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/hvm/vmx/vmcs.c        | 18 ++++++++++++++++++
>>   xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +++++
>>   xen/include/asm-x86/hvm/vmx/vmx.h  |  1 +
>>   3 files changed, 24 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 9b20a4b..2798b0b 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -143,6 +143,7 @@ static void __init vmx_display_features(void)
>>       P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
>>       P(cpu_has_vmx_posted_intr_processing, "Posted Interrupt Processing");
>>       P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
>> +    P(cpu_has_vmx_pml, "Page Modification Logging");
>>   #undef P
>>   
>>       if ( !printed )
>> @@ -240,6 +241,8 @@ static int vmx_init_vmcs_config(void)
>>               opt |= SECONDARY_EXEC_ENABLE_VPID;
>>           if ( opt_unrestricted_guest_enabled )
>>               opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
>> +        if ( pml_enable )
> This should be named opt_pml_enable in patch 1 or 2 to identify that it
> is a command line option.
Sure.

>
>> +            opt |= SECONDARY_EXEC_ENABLE_PML;
>>   
>>           /*
>>            * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
>> @@ -286,6 +289,14 @@ static int vmx_init_vmcs_config(void)
>>            */
>>           if ( !(_vmx_ept_vpid_cap & VMX_VPID_INVVPID_ALL_CONTEXT) )
>>               _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
>> +
>> +	/*
>> +	 * PML cannot be supported if EPT A/D bits is not supported. Actually,
>> +	 * PML should not be detected if EPT A/D bits is not supported, but for
>> +	 * sure we do it anyway.
>> +	 */
>> +	if ( !(_vmx_ept_vpid_cap & VMX_EPT_AD_BIT_SUPPORT) )
>> +		_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>>       }
>>   
>>       if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT )
>> @@ -306,6 +317,10 @@ static int vmx_init_vmcs_config(void)
>>                     SECONDARY_EXEC_UNRESTRICTED_GUEST);
>>       }
>>   
>> +    /* PML cannot be supported if we don't use EPT */
>> +    if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) )
>> +        _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>> +
> Somewhere in here you should clear pml_enable if hardware doesn't
> support it.
Will do. Thanks for catching.

Thanks,
-Kai
>
> ~Andrew
>
>>       if ( (_vmx_secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING) &&
>>             ple_gap == 0 )
>>       {
>> @@ -1041,6 +1056,9 @@ static int construct_vmcs(struct vcpu *v)
>>           __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
>>       }
>>   
>> +    /* Disable PML anyway here as it will only be enabled in log dirty mode */
>> +    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>> +
>>       /* Host data selectors. */
>>       __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
>>       __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 4528346..47b4df2 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -216,6 +216,7 @@ extern u32 vmx_vmentry_control;
>>   #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
>>   #define SECONDARY_EXEC_ENABLE_VMFUNC            0x00002000
>>   #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
>> +#define SECONDARY_EXEC_ENABLE_PML               0x00020000
>>   extern u32 vmx_secondary_exec_control;
>>   
>>   #define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
>> @@ -276,6 +277,8 @@ extern u32 vmx_secondary_exec_control;
>>       (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT)
>>   #define cpu_has_vmx_vmcs_shadowing \
>>       (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
>> +#define cpu_has_vmx_pml \
>> +    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
>>   
>>   #define VMCS_RID_TYPE_MASK              0x80000000
>>   
>> @@ -320,6 +323,7 @@ enum vmcs_field {
>>       GUEST_LDTR_SELECTOR             = 0x0000080c,
>>       GUEST_TR_SELECTOR               = 0x0000080e,
>>       GUEST_INTR_STATUS               = 0x00000810,
>> +    GUEST_PML_INDEX                 = 0x00000812,
>>       HOST_ES_SELECTOR                = 0x00000c00,
>>       HOST_CS_SELECTOR                = 0x00000c02,
>>       HOST_SS_SELECTOR                = 0x00000c04,
>> @@ -333,6 +337,7 @@ enum vmcs_field {
>>       VM_EXIT_MSR_STORE_ADDR          = 0x00002006,
>>       VM_EXIT_MSR_LOAD_ADDR           = 0x00002008,
>>       VM_ENTRY_MSR_LOAD_ADDR          = 0x0000200a,
>> +    PML_ADDRESS                     = 0x0000200e,
>>       TSC_OFFSET                      = 0x00002010,
>>       VIRTUAL_APIC_PAGE_ADDR          = 0x00002012,
>>       APIC_ACCESS_ADDR                = 0x00002014,
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
>> index 9afd351..c0e352d 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
>> @@ -186,6 +186,7 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>>   #define EXIT_REASON_XSETBV              55
>>   #define EXIT_REASON_APIC_WRITE          56
>>   #define EXIT_REASON_INVPCID             58
>> +#define EXIT_REASON_PML_FULL            62
>>   
>>   /*
>>    * Interruption-information format
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/10] VMX: New data structure member to support PML
  2015-03-27 20:48   ` Andrew Cooper
@ 2015-03-30  6:19     ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-30  6:19 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 04:48 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> A new 4K page pointer is added to arch_vmx_struct as PML buffer for vcpu. And a
>> new 'status' field is added to vmx_domain to indicate whether PML is enabled for
>> the domain or not. The 'status' field also can be used for further similiar
>> purpose.
>>
>> Note both new members don't have to be initialized to zero explicitly as both
>> vcpu and domain structure are zero-ed when they are created.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/include/asm-x86/hvm/vmx/vmcs.h | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 47b4df2..8cc1122 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -71,8 +71,12 @@ struct ept_data {
>>       cpumask_var_t synced_mask;
>>   };
>>   
>> +#define _VMX_DOMAIN_PML_ENABLED    0
>> +#define VMX_DOMAIN_PML_ENABLED     (1ul << _VMX_DOMAIN_PML_ENABLED)
>>   struct vmx_domain {
>>       unsigned long apic_access_mfn;
>> +    /* VMX_DOMAIN_* */
>> +    unsigned long status;
>>   };
>>   
>>   struct pi_desc {
>> @@ -143,6 +147,9 @@ struct arch_vmx_struct {
>>       /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */
>>       struct page_info     *vmread_bitmap;
>>       struct page_info     *vmwrite_bitmap;
>> +
>> +#define PML_ENTITY_NUM      512
> This is the number of pml entries, not entities.  NR_PML_ENTRIES perhaps?
Yours is better indeed. Will do.

>
>> +    struct page_info	*pml_pg;
> Please align the fields vertically like vmwrite_bitmap.
Sure.

Thanks,
-Kai
>
> ~Andrew
>
>>   };
>>   
>>   int vmx_create_vmcs(struct vcpu *v);
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-03-27 21:09   ` Andrew Cooper
@ 2015-03-30  6:43     ` Kai Huang
  2015-03-30  9:54       ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-30  6:43 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 05:09 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> This patch adds help functions to enable/disable PML, and flush PML buffer for
>> single vcpu and particular domain for further use.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/hvm/vmx/vmcs.c        | 190 +++++++++++++++++++++++++++++++++++++
>>   xen/include/asm-x86/hvm/vmx/vmcs.h |   9 ++
>>   2 files changed, 199 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 2798b0b..17cbef4 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1326,6 +1326,196 @@ void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector)
>>                   &v->arch.hvm_vmx.eoi_exitmap_changed);
>>   }
>>   
>> +int vmx_vcpu_pml_enabled(struct vcpu *v)
> bool_t vmx_vcpu_pml_enabled(const struct vcpu *v)
Will do.

>
>> +{
>> +    return (v->arch.hvm_vmx.secondary_exec_control &
>> +            SECONDARY_EXEC_ENABLE_PML) ? 1 : 0;
> This would be slightly shorter as
> !!(v->arch.hvm_vmx.secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
Will do.

>
>> +}
>> +
>> +int vmx_vcpu_enable_pml(struct vcpu *v)
>> +{
>> +    struct domain *d = v->domain;
>> +
>> +    ASSERT(!vmx_vcpu_pml_enabled(v));
>> +
>> +    v->arch.hvm_vmx.pml_pg = d->arch.paging.alloc_page(d);
>> +    if ( !v->arch.hvm_vmx.pml_pg )
>> +        return -ENOMEM;
>> +
>> +    vmx_vmcs_enter(v);
>> +
>> +    __vmwrite(PML_ADDRESS, page_to_mfn(v->arch.hvm_vmx.pml_pg) << PAGE_SHIFT);
>> +    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>> +
>> +    v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_PML;
>> +
>> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
>> +            v->arch.hvm_vmx.secondary_exec_control);
> Alignment.
Do you mean to put 'v->arch.hvm_vmx.secondary_exec_control' to the same 
line with '__vmwrite(SECONDARY_VM_EXEC_CONTROL,'? In this case the 
number of characters will be 81.

>
>> +
>> +    vmx_vmcs_exit(v);
>> +
>> +    return 0;
>> +}
>> +
>> +void vmx_vcpu_disable_pml(struct vcpu *v)
>> +{
>> +    ASSERT(vmx_vcpu_pml_enabled(v));
>> +
>> +    vmx_vmcs_enter(v);
>> +
>> +    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
>> +            v->arch.hvm_vmx.secondary_exec_control);
>> +
>> +    vmx_vmcs_exit(v);
>> +
>> +    v->domain->arch.paging.free_page(v->domain, v->arch.hvm_vmx.pml_pg);
>> +    v->arch.hvm_vmx.pml_pg = NULL;
>> +}
>> +
>> +void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
>> +{
>> +    uint64_t *pml_buf;
>> +    unsigned long pml_idx;
>> +
>> +    ASSERT(vmx_vcpu_pml_enabled(v));
>> +
>> +    vmx_vmcs_enter(v);
>> +
>> +    __vmread(GUEST_PML_INDEX, &pml_idx);
>> +
>> +    /* Do nothing if PML buffer is empty */
>> +    if ( pml_idx == (PML_ENTITY_NUM - 1) )
>> +        goto out;
>> +
>> +    pml_buf = map_domain_page(page_to_mfn(v->arch.hvm_vmx.pml_pg));
> __map_domain_page() is a wrapper which takes a struct page_info
Will do.

>
>> +
>> +    /*
>> +     * PML index can be either 2^16-1 (buffer is full), or 0~511 (buffer is not
>> +     * full), and in latter case PML index always points to next available
>> +     * entity.
>> +     */
>> +    if (pml_idx >= PML_ENTITY_NUM)
>> +        pml_idx = 0;
>> +    else
>> +        pml_idx++;
>> +
>> +    for ( ; pml_idx < PML_ENTITY_NUM; pml_idx++ )
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
> This p2m_get_host_p2m() call should be hoisted out of the loop.
Will do.

>
>> +        unsigned long gfn;
>> +        mfn_t mfn;
>> +        p2m_type_t t;
>> +        p2m_access_t a;
>> +
>> +        gfn = pml_buf[pml_idx] >> PAGE_SHIFT;
>> +        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
>> +        if ( mfn_x(mfn) == INVALID_MFN )
>> +        {
>> +            /*
>> +             * Either EPT table entry for mapping the GFN has been destroyed, or
>> +             * there's something wrong with hardware behavior, in both cases we
>> +             * should report a warning.
>> +             */
>> +            dprintk(XENLOG_WARNING, "PML: vcpu %d: invalid GPA 0x%lx logged\n",
>> +                    v->vcpu_id, pml_buf[pml_idx]);
> It would be shorter to log gfn rather than gpa.
Will do. And I'd also like to add the domain ID in the warning info.

>
>> +            continue;
>> +        }
>> +
>> +        /*
>> +         * Need to change type from log-dirty to normal memory for logged GFN.
>> +         * hap_track_dirty_vram depends on it to work. And we really only need
>> +         * to mark GFNs which hve been successfully changed from log-dirty to
>> +         * normal memory to be dirty.
>> +         */
>> +        if ( !p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty,
>> +                    p2m_ram_rw) )
>> +            paging_mark_dirty(v->domain, mfn_x(mfn));
>> +    }
>> +
>> +    unmap_domain_page(pml_buf);
>> +
>> +    /* Reset PML index */
>> +    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>> +
>> +out:
>> +    vmx_vmcs_exit(v);
>> +}
>> +
>> +int vmx_domain_pml_enabled(struct domain *d)
> bool_t and const as per vcpu variant.
Will do.

>
>> +{
>> +    return (d->arch.hvm_domain.vmx.status & VMX_DOMAIN_PML_ENABLED) ? 1 : 0;
>> +}
>> +
>> +/*
>> + * This function enables PML for particular domain. It should be called when
>> + * domain is paused.
> In which case assert that the domain is paused, or call domain_pause()
> yourself to take an extra pause refcount.
Which function should I use to assert domain is paused? I didn't find a 
function like "domain_paused". Is below good enough?

ASSERT(atomic_read(&d->pause_count));

>
>> + *
>> + * PML needs to be enabled globally for all vcpus of the domain, as PML buffer
>> + * and PML index are pre-vcpu, but EPT table is shared by vcpus, therefore
>> + * enabling PML on partial vcpus won't work.
>> + */
>> +int vmx_domain_enable_pml(struct domain *d)
>> +{
>> +    struct vcpu *v;
>> +
>> +    ASSERT(!vmx_domain_pml_enabled(d));
>> +
>> +    for_each_vcpu( d, v )
>> +    {
>> +        if ( vmx_vcpu_enable_pml(v) )
>> +            goto error;
> Please catch the actual rc from vmx_vcpu_enable_pml() and propagate out
> of this function, rather than clobbering -ENOMEM with -EINVAL.
>
> Also, per Xen style, you can drop the braces.
Will do. And I'll drop the braces in other functions I added in this 
patch as well.

>
> ~Andrew
>
>> +    }
>> +
>> +    d->arch.hvm_domain.vmx.status |= VMX_DOMAIN_PML_ENABLED;
>> +
>> +    return 0;
>> +
>> +error:
>> +    for_each_vcpu( d, v )
>> +    {
>> +        if ( vmx_vcpu_pml_enabled(v) )
>> +            vmx_vcpu_disable_pml(v);
>> +    }
>> +    return -EINVAL;
>> +}
>> +
>> +/*
>> + * Disable PML for particular domain. Called when domain is paused.
>> + *
>> + * The same as enabling PML for domain, disabling PML should be done for all
>> + * vcpus at once.
>> + */
>> +void vmx_domain_disable_pml(struct domain *d)
>> +{
>> +    struct vcpu *v;
>> +
>> +    ASSERT(vmx_domain_pml_enabled(d));
>> +
>> +    for_each_vcpu( d, v )
>> +    {
>> +        vmx_vcpu_disable_pml(v);
>> +    }
>> +
>> +    d->arch.hvm_domain.vmx.status &= ~VMX_DOMAIN_PML_ENABLED;
>> +}
>> +
>> +/*
>> + * Flush PML buffer of all vcpus, and update the logged dirty pages to log-dirty
>> + * radix tree. Called when domain is paused.
>> + */
>> +void vmx_domain_flush_pml_buffers(struct domain *d)
>> +{
>> +    struct vcpu *v;
>> +
>> +    ASSERT(vmx_domain_pml_enabled(d));
>> +
>> +    for_each_vcpu( d, v )
>> +    {
>> +        vmx_vcpu_flush_pml_buffer(v);
>> +    }
>> +}
>> +
>>   int vmx_create_vmcs(struct vcpu *v)
>>   {
>>       struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 8cc1122..939d097 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -499,6 +499,15 @@ static inline int vmx_add_host_load_msr(u32 msr)
>>   
>>   DECLARE_PER_CPU(bool_t, vmxon);
>>   
>> +int vmx_vcpu_pml_enabled(struct vcpu *v);
>> +int vmx_vcpu_enable_pml(struct vcpu *v);
>> +void vmx_vcpu_disable_pml(struct vcpu *v);
>> +void vmx_vcpu_flush_pml_buffer(struct vcpu *v);
>> +int vmx_domain_pml_enabled(struct domain *d);
>> +int vmx_domain_enable_pml(struct domain *d);
>> +void vmx_domain_disable_pml(struct domain *d);
>> +void vmx_domain_flush_pml_buffers(struct domain *d);
>> +
>>   #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
>>   
>>   /*
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise
  2015-03-27 21:12   ` Andrew Cooper
@ 2015-03-30  7:03     ` Kai Huang
  2015-03-30 10:00       ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-30  7:03 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 05:12 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> It's possible domain has already been in log-dirty mode when creating vcpu, in
>> which case we should enable PML for this vcpu if PML has been enabled for the
>> domain.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/hvm/vmx/vmx.c | 24 ++++++++++++++++++++++++
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 453bcc5..fce3aa2 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -116,6 +116,30 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>>           return rc;
>>       }
>>   
>> +    /*
>> +     * It's rare but still possible that domain has already been in log-dirty
>> +     * mode when vcpu is being created (commented by Tim), in which case we
>> +     * should enable PML for this vcpu if PML has been enabled for the domain,
>> +     * and failure to enable results in failure of creating this vcpu.
>> +     *
>> +     * Note even there's no vcpu created for the domain, vmx_domain_enable_pml
>> +     * will return successful in which case vmx_domain_pml_enabled will also
>> +     * return true. And even this is the first vcpu to be created with
>> +     * vmx_domain_pml_enabled being true, failure of enabling PML still results
>> +     * in failure of creating vcpu, to avoid complicated logic to revert PML
>> +     * style EPT table to non-PML style EPT table.
>> +     */
>> +    if ( vmx_domain_pml_enabled(v->domain) )
>> +    {
>> +        if ( (rc = vmx_vcpu_enable_pml(v)) != 0 )
> Given the comment here, is the assertion in the top of
> vmx_vcpu_enable_pml() liable to trip?
Do you mean below assertion at beginning of vmx_vcpu_enable_pml might 
not work here?

     ASSERT(!vmx_vcpu_pml_enabled(v));

To me it asserts for this particular vcpu, not the domain, so even in 
this case the assertion is reasonable and should work fine, shouldn't it?

>
>> +        {
>> +            dprintk(XENLOG_ERR, "Failed to enable PML for vcpu %d\n",
>> +                    v->vcpu_id);
> Please use %pv to identify the domain as well as vcpu.
Sure.

Thanks,
-Kai
>
> ~Andrew
>
>> +            vmx_destroy_vmcs(v);
>> +            return rc;
>> +        }
>> +    }
>> +
>>       vpmu_initialise(v);
>>   
>>       vmx_install_vlapic_mapping(v);
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-30  6:11     ` Kai Huang
@ 2015-03-30  9:36       ` Andrew Cooper
  2015-03-30 13:35         ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-03-30  9:36 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 30/03/15 07:11, Kai Huang wrote:
>
>
> On 03/28/2015 04:38 AM, Andrew Cooper wrote:
>> On 27/03/15 02:35, Kai Huang wrote:
>>> PML requires A/D bit support so enable it for further use.
>>>
>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>> ---
>>>   xen/arch/x86/hvm/vmx/vmcs.c        | 1 +
>>>   xen/arch/x86/mm/p2m-ept.c          | 8 +++++++-
>>>   xen/include/asm-x86/hvm/vmx/vmcs.h | 4 +++-
>>>   xen/include/asm-x86/hvm/vmx/vmx.h  | 5 ++++-
>>>   4 files changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index d614638..2f645fe 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -103,6 +103,7 @@ static void __init vmx_display_features(void)
>>>       P(cpu_has_vmx_tpr_shadow, "APIC TPR shadow");
>>>       P(cpu_has_vmx_ept, "Extended Page Tables (EPT)");
>>>       P(cpu_has_vmx_vpid, "Virtual-Processor Identifiers (VPID)");
>>> +    P(cpu_has_vmx_ept_ad_bit, "EPT A/D bit");
>>>       P(cpu_has_vmx_vnmi, "Virtual NMI");
>>>       P(cpu_has_vmx_msr_bitmap, "MSR direct-access bitmap");
>>>       P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>> index c2d7720..8650092 100644
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct
>>> p2m_domain *p2m, ept_entry_t *ept_entry,
>>>       if ( !ept_set_middle_entry(p2m, &new_ept) )
>>>           return 0;
>>>   +    /* It's better to copy A bit of Middle entry from original
>>> entry */
>>> +    new_ept.a = ept_entry->a;
>> Surely d needs to be propagated as well?
> No it's not necessary. D-bit is not defined in middle level EPT table.
> Only leaf table entry has D-bit definition.

Ok - so the middle doesn't have a D.

What about the superpage having D set? Surely that needs propagated down
to the new shattered leaves?

>> Would it make sense to extend
>> ept_set_middle_entry() to do all of new_ept setup in one location?
> Yes it certainly makes sense to move A-bit propagation into
> ept_set_middle_entry, but this also requires adding additional
> original EPT entry pointer to ept_set_middle_entry as parameter. And
> ept_set_middle_entry is also called by ept_next_level, therefore
> changing it requires more code change, something like below. While I
> am fine with both, which solution do you prefer?
>
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -208,7 +208,8 @@ static void ept_p2m_type_to_flags(struct
> p2m_domain *p2m, ept_entry_t *entry,
>  #define GUEST_TABLE_POD_PAGE    3
>
>  /* Fill in middle levels of ept table */
> -static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t
> *ept_entry)
> +static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t
> *new_entry,
> +        ept_entry_t *ori_entry)

const ept_entry_t *old_entry (for consistency with other similar
functions, or even just 'new' and 'old' as you are already changing the
names)

This looks fine.  Being a static function with only two callsites, it is
very likely to be inlined by the compiler.

~Andrew

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-03-30  6:43     ` Kai Huang
@ 2015-03-30  9:54       ` Andrew Cooper
  2015-03-30 13:40         ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-03-30  9:54 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 30/03/15 07:43, Kai Huang wrote:
>
>
> On 03/28/2015 05:09 AM, Andrew Cooper wrote:
>> On 27/03/15 02:35, Kai Huang wrote:
>>
>>> +}
>>> +
>>> +int vmx_vcpu_enable_pml(struct vcpu *v)
>>> +{
>>> +    struct domain *d = v->domain;
>>> +
>>> +    ASSERT(!vmx_vcpu_pml_enabled(v));
>>> +
>>> +    v->arch.hvm_vmx.pml_pg = d->arch.paging.alloc_page(d);
>>> +    if ( !v->arch.hvm_vmx.pml_pg )
>>> +        return -ENOMEM;
>>> +
>>> +    vmx_vmcs_enter(v);
>>> +
>>> +    __vmwrite(PML_ADDRESS, page_to_mfn(v->arch.hvm_vmx.pml_pg) <<
>>> PAGE_SHIFT);
>>> +    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>>> +
>>> +    v->arch.hvm_vmx.secondary_exec_control |=
>>> SECONDARY_EXEC_ENABLE_PML;
>>> +
>>> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
>>> +            v->arch.hvm_vmx.secondary_exec_control);
>> Alignment.
> Do you mean to put 'v->arch.hvm_vmx.secondary_exec_control' to the
> same line with '__vmwrite(SECONDARY_VM_EXEC_CONTROL,'? In this case
> the number of characters will be 81.

Splitting the line is fine.  The v should be vertically in line with S
from SECONDARY

>
>>
>>> +        unsigned long gfn;
>>> +        mfn_t mfn;
>>> +        p2m_type_t t;
>>> +        p2m_access_t a;
>>> +
>>> +        gfn = pml_buf[pml_idx] >> PAGE_SHIFT;
>>> +        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
>>> +        if ( mfn_x(mfn) == INVALID_MFN )
>>> +        {
>>> +            /*
>>> +             * Either EPT table entry for mapping the GFN has been
>>> destroyed, or
>>> +             * there's something wrong with hardware behavior, in
>>> both cases we
>>> +             * should report a warning.
>>> +             */
>>> +            dprintk(XENLOG_WARNING, "PML: vcpu %d: invalid GPA
>>> 0x%lx logged\n",
>>> +                    v->vcpu_id, pml_buf[pml_idx]);
>> It would be shorter to log gfn rather than gpa.
> Will do. And I'd also like to add the domain ID in the warning info.

Ah of course - dprintk() doesn't identify current().  Use %pv with v.

>
>>
>>> +{
>>> +    return (d->arch.hvm_domain.vmx.status & VMX_DOMAIN_PML_ENABLED)
>>> ? 1 : 0;
>>> +}
>>> +
>>> +/*
>>> + * This function enables PML for particular domain. It should be
>>> called when
>>> + * domain is paused.
>> In which case assert that the domain is paused, or call domain_pause()
>> yourself to take an extra pause refcount.
> Which function should I use to assert domain is paused? I didn't find
> a function like "domain_paused". Is below good enough?
>
> ASSERT(atomic_read(&d->pause_count));

Hmm - we indeed don't have an appropriate helper.  That ASSERT() will do
for now.

~Andrew

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise
  2015-03-30  7:03     ` Kai Huang
@ 2015-03-30 10:00       ` Andrew Cooper
  0 siblings, 0 replies; 65+ messages in thread
From: Andrew Cooper @ 2015-03-30 10:00 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 30/03/15 08:03, Kai Huang wrote:
>
>
> On 03/28/2015 05:12 AM, Andrew Cooper wrote:
>> On 27/03/15 02:35, Kai Huang wrote:
>>> It's possible domain has already been in log-dirty mode when
>>> creating vcpu, in
>>> which case we should enable PML for this vcpu if PML has been
>>> enabled for the
>>> domain.
>>>
>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>> ---
>>>   xen/arch/x86/hvm/vmx/vmx.c | 24 ++++++++++++++++++++++++
>>>   1 file changed, 24 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index 453bcc5..fce3aa2 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -116,6 +116,30 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>>>           return rc;
>>>       }
>>>   +    /*
>>> +     * It's rare but still possible that domain has already been in
>>> log-dirty
>>> +     * mode when vcpu is being created (commented by Tim), in which
>>> case we
>>> +     * should enable PML for this vcpu if PML has been enabled for
>>> the domain,
>>> +     * and failure to enable results in failure of creating this vcpu.
>>> +     *
>>> +     * Note even there's no vcpu created for the domain,
>>> vmx_domain_enable_pml
>>> +     * will return successful in which case vmx_domain_pml_enabled
>>> will also
>>> +     * return true. And even this is the first vcpu to be created with
>>> +     * vmx_domain_pml_enabled being true, failure of enabling PML
>>> still results
>>> +     * in failure of creating vcpu, to avoid complicated logic to
>>> revert PML
>>> +     * style EPT table to non-PML style EPT table.
>>> +     */
>>> +    if ( vmx_domain_pml_enabled(v->domain) )
>>> +    {
>>> +        if ( (rc = vmx_vcpu_enable_pml(v)) != 0 )
>> Given the comment here, is the assertion in the top of
>> vmx_vcpu_enable_pml() liable to trip?
> Do you mean below assertion at beginning of vmx_vcpu_enable_pml might
> not work here?
>
>     ASSERT(!vmx_vcpu_pml_enabled(v));
>
> To me it asserts for this particular vcpu, not the domain, so even in
> this case the assertion is reasonable and should work fine, shouldn't it?

You are correct.  I was getting vcpus and domains mixed up.  Sorry for
the noise.

~Andrew

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-30  9:36       ` Andrew Cooper
@ 2015-03-30 13:35         ` Kai Huang
  2015-03-30 13:39           ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-03-30 13:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tian, Kevin, tim, xen-devel, Kai Huang, Jan Beulich, yang.z.zhang

On Mon, Mar 30, 2015 at 5:36 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 30/03/15 07:11, Kai Huang wrote:
>>
>>
>> On 03/28/2015 04:38 AM, Andrew Cooper wrote:
>>> On 27/03/15 02:35, Kai Huang wrote:
>>>> PML requires A/D bit support so enable it for further use.
>>>>
>>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>>> ---
>>>>   xen/arch/x86/hvm/vmx/vmcs.c        | 1 +
>>>>   xen/arch/x86/mm/p2m-ept.c          | 8 +++++++-
>>>>   xen/include/asm-x86/hvm/vmx/vmcs.h | 4 +++-
>>>>   xen/include/asm-x86/hvm/vmx/vmx.h  | 5 ++++-
>>>>   4 files changed, 15 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> index d614638..2f645fe 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> @@ -103,6 +103,7 @@ static void __init vmx_display_features(void)
>>>>       P(cpu_has_vmx_tpr_shadow, "APIC TPR shadow");
>>>>       P(cpu_has_vmx_ept, "Extended Page Tables (EPT)");
>>>>       P(cpu_has_vmx_vpid, "Virtual-Processor Identifiers (VPID)");
>>>> +    P(cpu_has_vmx_ept_ad_bit, "EPT A/D bit");
>>>>       P(cpu_has_vmx_vnmi, "Virtual NMI");
>>>>       P(cpu_has_vmx_msr_bitmap, "MSR direct-access bitmap");
>>>>       P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
>>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>>> index c2d7720..8650092 100644
>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct
>>>> p2m_domain *p2m, ept_entry_t *ept_entry,
>>>>       if ( !ept_set_middle_entry(p2m, &neThanks,
-Kaiw_ept) )
>>>>           return 0;
>>>>   +    /* It's better to copy A bit of Middle entry from original
>>>> entry */
>>>> +    new_ept.a = ept_entry->a;
>>> Surely d needs to be propagated as well?
>> No it's not necessary. D-bit is not defined in middle level EPT table.
>> Only leaf table entry has D-bit definition.
>
> Ok - so the middle doesn't have a D.
>
> What about the superpage having D set? Surely that needs propagated down
> to the new shattered leaves?

Yes shattered leaves will inherit both A and D bits from the original
superpage entry in " *epte = *ept_entry; " statement in below code in
ept_split_super_page.

    for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
    {
        ept_entry_t *epte = table + i;

        *epte = *ept_entry;
        ......

>
>>> Would it make sense to extend
>>> ept_set_middle_entry() to do all of new_ept setup in one location?
>> Yes it certainly makes sense to move A-bit propagation into
>> ept_set_middle_entry, but this also requires adding additional
>> original EPT entry pointer to ept_set_middle_entry as parameter. And
>> ept_set_middle_entry is also called by ept_next_level, therefore
>> changing it requires more code change, something like below. While I
>> am fine with both, which solution do you prefer?
>>
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -208,7 +208,8 @@ static void ept_p2m_type_to_flags(struct
>> p2m_domain *p2m, ept_entry_t *entry,
>>  #define GUEST_TABLE_POD_PAGE    3
>>
>>  /* Fill in middle levels of ept table */
>> -static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t
>> *ept_entry)
>> +static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t
>> *new_entry,
>> +        ept_entry_t *ori_entry)
>
> const ept_entry_t *old_entry (for consistency with other similar
> functions, or even just 'new' and 'old' as you are already changing the
> names)
>
> This looks fine.  Being a static function with only two callsites, it is
> very likely to be inlined by the compiler.

Sure I will  do in this way.

>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-30 13:35         ` Kai Huang
@ 2015-03-30 13:39           ` Andrew Cooper
  0 siblings, 0 replies; 65+ messages in thread
From: Andrew Cooper @ 2015-03-30 13:39 UTC (permalink / raw)
  To: Kai Huang
  Cc: Tian, Kevin, tim, xen-devel, Kai Huang, Jan Beulich, yang.z.zhang

On 30/03/15 14:35, Kai Huang wrote:
> On Mon, Mar 30, 2015 at 5:36 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 30/03/15 07:11, Kai Huang wrote:
>>>
>>> On 03/28/2015 04:38 AM, Andrew Cooper wrote:
>>>> On 27/03/15 02:35, Kai Huang wrote:
>>>>> PML requires A/D bit support so enable it for further use.
>>>>>
>>>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>>>> ---
>>>>>   xen/arch/x86/hvm/vmx/vmcs.c        | 1 +
>>>>>   xen/arch/x86/mm/p2m-ept.c          | 8 +++++++-
>>>>>   xen/include/asm-x86/hvm/vmx/vmcs.h | 4 +++-
>>>>>   xen/include/asm-x86/hvm/vmx/vmx.h  | 5 ++++-
>>>>>   4 files changed, 15 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>>>> index d614638..2f645fe 100644
>>>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>>>> @@ -103,6 +103,7 @@ static void __init vmx_display_features(void)
>>>>>       P(cpu_has_vmx_tpr_shadow, "APIC TPR shadow");
>>>>>       P(cpu_has_vmx_ept, "Extended Page Tables (EPT)");
>>>>>       P(cpu_has_vmx_vpid, "Virtual-Processor Identifiers (VPID)");
>>>>> +    P(cpu_has_vmx_ept_ad_bit, "EPT A/D bit");
>>>>>       P(cpu_has_vmx_vnmi, "Virtual NMI");
>>>>>       P(cpu_has_vmx_msr_bitmap, "MSR direct-access bitmap");
>>>>>       P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
>>>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>>>> index c2d7720..8650092 100644
>>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>>> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct
>>>>> p2m_domain *p2m, ept_entry_t *ept_entry,
>>>>>       if ( !ept_set_middle_entry(p2m, &neThanks,
> -Kaiw_ept) )
>>>>>           return 0;
>>>>>   +    /* It's better to copy A bit of Middle entry from original
>>>>> entry */
>>>>> +    new_ept.a = ept_entry->a;
>>>> Surely d needs to be propagated as well?
>>> No it's not necessary. D-bit is not defined in middle level EPT table.
>>> Only leaf table entry has D-bit definition.
>> Ok - so the middle doesn't have a D.
>>
>> What about the superpage having D set? Surely that needs propagated down
>> to the new shattered leaves?
> Yes shattered leaves will inherit both A and D bits from the original
> superpage entry in " *epte = *ept_entry; " statement in below code in
> ept_split_super_page.
>
>     for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>     {
>         ept_entry_t *epte = table + i;
>
>         *epte = *ept_entry;
>         ......

So you are.  Apologies for the confusion.

~Andrew

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-03-30  9:54       ` Andrew Cooper
@ 2015-03-30 13:40         ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-03-30 13:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tian, Kevin, tim, xen-devel, Kai Huang, Jan Beulich, yang.z.zhang

On Mon, Mar 30, 2015 at 5:54 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 30/03/15 07:43, Kai Huang wrote:
>>
>>
>> On 03/28/2015 05:09 AM, Andrew Cooper wrote:
>>> On 27/03/15 02:35, Kai Huang wrote:
>>>
>>>> +}
>>>> +
>>>> +int vmx_vcpu_enable_pml(struct vcpu *v)
>>>> +{
>>>> +    struct domain *d = v->domain;
>>>> +
>>>> +    ASSERT(!vmx_vcpu_pml_enabled(v));
>>>> +
>>>> +    v->arch.hvm_vmx.pml_pg = d->arch.paging.alloc_page(d);
>>>> +    if ( !v->arch.hvm_vmx.pml_pg )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    vmx_vmcs_enter(v);
>>>> +
>>>> +    __vmwrite(PML_ADDRESS, page_to_mfn(v->arch.hvm_vmx.pml_pg) <<
>>>> PAGE_SHIFT);
>>>> +    __vmwrite(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>>>> +
>>>> +    v->arch.hvm_vmx.secondary_exec_control |=
>>>> SECONDARY_EXEC_ENABLE_PML;
>>>> +
>>>> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
>>>> +            v->arch.hvm_vmx.secondary_exec_control);
>>> Alignment.
>> Do you mean to put 'v->arch.hvm_vmx.secondary_exec_control' to the
>> same line with '__vmwrite(SECONDARY_VM_EXEC_CONTROL,'? In this case
>> the number of characters will be 81.
>
> Splitting the line is fine.  The v should be vertically in line with S
> from SECONDARY

Oh I got your point. Thanks. Will do.

>
>>
>>>
>>>> +        unsigned long gfn;
>>>> +        mfn_t mfn;
>>>> +        p2m_type_t t;
>>>> +        p2m_access_t a;
>>>> +
>>>> +        gfn = pml_buf[pml_idx] >> PAGE_SHIFT;
>>>> +        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
>>>> +        if ( mfn_x(mfn) == INVALID_MFN )
>>>> +        {
>>>> +            /*
>>>> +             * Either EPT table entry for mapping the GFN has been
>>>> destroyed, or
>>>> +             * there's something wrong with hardware behavior, in
>>>> both cases we
>>>> +             * should report a warning.
>>>> +             */
>>>> +            dprintk(XENLOG_WARNING, "PML: vcpu %d: invalid GPA
>>>> 0x%lx logged\n",
>>>> +                    v->vcpu_id, pml_buf[pml_idx]);
>>> It would be shorter to log gfn rather than gpa.
>> Will do. And I'd also like to add the domain ID in the warning info.
>
> Ah of course - dprintk() doesn't identify current().  Use %pv with v.
>
>>
>>>
>>>> +{
>>>> +    return (d->arch.hvm_domain.vmx.status & VMX_DOMAIN_PML_ENABLED)
>>>> ? 1 : 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * This function enables PML for particular domain. It should be
>>>> called when
>>>> + * domain is paused.
>>> In which case assert that the domain is paused, or call domain_pause()
>>> yourself to take an extra pause refcount.
>> Which function should I use to assert domain is paused? I didn't find
>> a function like "domain_paused". Is below good enough?
>>
>> ASSERT(atomic_read(&d->pause_count));
>
> Hmm - we indeed don't have an appropriate helper.  That ASSERT() will do
> for now.
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/10] VMX: New parameter to control PML enabling
  2015-03-27 20:42   ` Andrew Cooper
  2015-03-30  6:16     ` Kai Huang
@ 2015-04-02  5:46     ` Kai Huang
  2015-04-02  9:58       ` Andrew Cooper
  1 sibling, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-02  5:46 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 04:42 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> A top level EPT parameter "ept=<options>" and a sub boolean "pml_enable" are
>> added to control PML. Other booleans can be further added for any other EPT
>> related features.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> Please patch docs/misc/xen-command-line.markdown as well.  See the
> existing "psr" option as a similar example.
>
> Also, as indicated in patch 1, I think patches 1 and 2 need swapping in
> the series.
>
>> ---
>>   xen/arch/x86/hvm/vmx/vmcs.c | 32 ++++++++++++++++++++++++++++++++
>>   1 file changed, 32 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 2f645fe..9b20a4b 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -50,6 +50,16 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
>>   static bool_t __read_mostly opt_apicv_enabled = 1;
>>   boolean_param("apicv", opt_apicv_enabled);
>>   
>> +static void parse_ept_param(char *s);
>> +/*
>> + * The 'ept' parameter controls functionalities that depend on, or impact the
>> + * EPT mechanism. Optional comma separated value may contain:
>> + *
>> + *  pml                 Enable PML
>> + */
>> +custom_param("ept", parse_ept_param);
> It is common to put the custom_param() call below parse_ept_param() so
> you don't need to forward-declare the function.  The comment can happily
> live at the top of parse_ept_param().
Hi Andrew,

Looks it's better to keep parse_ept_param() below custom_param(), as 
simply moving parse_ept_param() above custom_param() results in below 
error (I also changed pml_enable to opt_pml_enabled), as it references 
opt_pml_enabled variable, which is defined below custom_param(). 
Actually for "iommu=<options>" parameter, parse_iommu_param() was also 
placed below custom_param().

What do you think?

vmcs.c: In function ‘parse_ept_param’:
vmcs.c:74:13: error: ‘opt_pml_enabled’ undeclared (first use in this 
function)
              opt_pml_enabled = val;
              ^
vmcs.c:74:13: note: each undeclared identifier is reported only once for 
each function it appears in
vmcs.c: At top level:
vmcs.c:81:29: error: ‘opt_pml_enabled’ defined but not used 
[-Werror=unused-variable]
  static bool_t __read_mostly opt_pml_enabled = 0;

Thanks,
-Kai

>
>> +static bool_t __read_mostly pml_enable = 0;
>> +
>>   /*
>>    * These two parameters are used to config the controls for Pause-Loop Exiting:
>>    * ple_gap:    upper bound on the amount of time between two successive
>> @@ -92,6 +102,28 @@ DEFINE_PER_CPU(bool_t, vmxon);
>>   static u32 vmcs_revision_id __read_mostly;
>>   u64 __read_mostly vmx_basic_msr;
>>   
>> +/* Copied from parse_iommu_param */
> Not a useful comment, as it is likely to diverge in the future.
>
>> +static void parse_ept_param(char *s)
> __init
>
> ~Andrew
>
>> +{
>> +    char *ss;
>> +    int val;
>> +
>> +    do {
>> +        val = !!strncmp(s, "no-", 3);
>> +        if ( !val )
>> +            s += 3;
>> +
>> +        ss = strchr(s, ',');
>> +        if ( ss )
>> +            *ss = '\0';
>> +
>> +        if ( !strcmp(s, "pml") )
>> +            pml_enable = val;
>> +
>> +        s = ss + 1;
>> +    } while ( ss );
>> +}
>> +
>>   static void __init vmx_display_features(void)
>>   {
>>       int printed = 0;
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-27 20:38   ` Andrew Cooper
  2015-03-30  6:11     ` Kai Huang
@ 2015-04-02  6:32     ` Kai Huang
  2015-04-02  9:55       ` Andrew Cooper
  1 sibling, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-02  6:32 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel



On 03/28/2015 04:38 AM, Andrew Cooper wrote:
> On 27/03/15 02:35, Kai Huang wrote:
>> PML requires A/D bit support so enable it for further use.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/hvm/vmx/vmcs.c        | 1 +
>>   xen/arch/x86/mm/p2m-ept.c          | 8 +++++++-
>>   xen/include/asm-x86/hvm/vmx/vmcs.h | 4 +++-
>>   xen/include/asm-x86/hvm/vmx/vmx.h  | 5 ++++-
>>   4 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index d614638..2f645fe 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -103,6 +103,7 @@ static void __init vmx_display_features(void)
>>       P(cpu_has_vmx_tpr_shadow, "APIC TPR shadow");
>>       P(cpu_has_vmx_ept, "Extended Page Tables (EPT)");
>>       P(cpu_has_vmx_vpid, "Virtual-Processor Identifiers (VPID)");
>> +    P(cpu_has_vmx_ept_ad_bit, "EPT A/D bit");
>>       P(cpu_has_vmx_vnmi, "Virtual NMI");
>>       P(cpu_has_vmx_msr_bitmap, "MSR direct-access bitmap");
>>       P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index c2d7720..8650092 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>>       if ( !ept_set_middle_entry(p2m, &new_ept) )
>>           return 0;
>>   
>> +    /* It's better to copy A bit of Middle entry from original entry */
>> +    new_ept.a = ept_entry->a;
> Surely d needs to be propagated as well?  Would it make sense to extend
> ept_set_middle_entry() to do all of new_ept setup in one location?
>
>> +
>>       table = map_domain_page(new_ept.mfn);
>>       trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER);
>>   
>> @@ -244,7 +247,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>>           epte->sp = (level > 1);
>>           epte->mfn += i * trunk;
>>           epte->snp = (iommu_enabled && iommu_snoop);
>> -        ASSERT(!epte->rsvd1);
>> +        /* A/D bits are inherited from superpage */
>>           ASSERT(!epte->avail3);
>>   
>>           ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
>> @@ -1071,6 +1074,9 @@ int ept_p2m_init(struct p2m_domain *p2m)
>>       /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
>>       ept->ept_wl = 3;
>>   
>> +    /* Enable EPT A/D bit if it's supported by hardware */
>> +    ept->ept_ad = cpu_has_vmx_ept_ad_bit ? 1 : 0;
> This will incur overhead on all EPT operations.  It should only be
> enabled if pml is going to be in use.  (I think you need reverse patches
> 1 and 2 in the series, and gate on pml_enable here)
Hi Andrew,

I'd like to also put patch 3 (PML feature detection) before this patch, 
as it's better to use cpu_has_vmx_pml to gate A/D bit enabling here. 
Theoretically simple "pml_enable = 1" here doesn't guarantee 
cpu_has_vmx_pml to be true, as PML may be turned off during 
vmx_init_vmcs_config.

And in this case I also want to delete below code, as if PML is not 
enabled, below code will print but actually EPT A/D bits is not enabled 
in hardware.

    P(cpu_has_vmx_ept_ad, "EPT A/D bit");

Thanks,
-Kai
>> +
>>       if ( !zalloc_cpumask_var(&ept->synced_mask) )
>>           return -ENOMEM;
>>   
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 6fce6aa..4528346 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -62,7 +62,8 @@ struct ept_data {
>>       struct {
>>               u64 ept_mt :3,
>>                   ept_wl :3,
>> -                rsvd   :6,
>> +                ept_ad :1,
>> +                rsvd   :5,
>>                   asr    :52;
> While you are making this change, can you add comments similar to
> ept_entry_t describing the bits?
>
>>           };
>>           u64 eptp;
>> @@ -226,6 +227,7 @@ extern u32 vmx_secondary_exec_control;
>>   #define VMX_EPT_INVEPT_INSTRUCTION              0x00100000
>>   #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
>>   #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
>> +#define VMX_EPT_AD_BIT_SUPPORT                  0x00200000
>>   
>>   #define VMX_MISC_VMWRITE_ALL                    0x20000000
>>   
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
>> index 91c5e18..9afd351 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
>> @@ -37,7 +37,8 @@ typedef union {
>>           emt         :   3,  /* bits 5:3 - EPT Memory type */
>>           ipat        :   1,  /* bit 6 - Ignore PAT memory type */
>>           sp          :   1,  /* bit 7 - Is this a superpage? */
>> -        rsvd1       :   2,  /* bits 9:8 - Reserved for future use */
>> +        a           :   1,  /* bit 8 - Access bit */
>> +        d           :   1,  /* bit 9 - Dirty bit */
>>           recalc      :   1,  /* bit 10 - Software available 1 */
>>           snp         :   1,  /* bit 11 - VT-d snoop control in shared
>>                                  EPT/VT-d usage */
>> @@ -261,6 +262,8 @@ extern uint8_t posted_intr_vector;
>>       (vmx_ept_vpid_cap & VMX_EPT_SUPERPAGE_2MB)
>>   #define cpu_has_vmx_ept_invept_single_context   \
>>       (vmx_ept_vpid_cap & VMX_EPT_INVEPT_SINGLE_CONTEXT)
>> +#define cpu_has_vmx_ept_ad_bit                  \
>> +    (vmx_ept_vpid_cap & VMX_EPT_AD_BIT_SUPPORT)
> I think cpu_has_vmx_ept_ad is sufficient, without the _bit suffix making
> it longer.
>
> ~Andrew
>
>>   
>>   #define EPT_2MB_SHIFT     16
>>   #define EPT_1GB_SHIFT     17
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-04-02  6:32     ` Kai Huang
@ 2015-04-02  9:55       ` Andrew Cooper
  0 siblings, 0 replies; 65+ messages in thread
From: Andrew Cooper @ 2015-04-02  9:55 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 02/04/15 07:32, Kai Huang wrote:
>
>>> +
>>>       table = map_domain_page(new_ept.mfn);
>>>       trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER);
>>>   @@ -244,7 +247,7 @@ static int ept_split_super_page(struct
>>> p2m_domain *p2m, ept_entry_t *ept_entry,
>>>           epte->sp = (level > 1);
>>>           epte->mfn += i * trunk;
>>>           epte->snp = (iommu_enabled && iommu_snoop);
>>> -        ASSERT(!epte->rsvd1);
>>> +        /* A/D bits are inherited from superpage */
>>>           ASSERT(!epte->avail3);
>>>             ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
>>> @@ -1071,6 +1074,9 @@ int ept_p2m_init(struct p2m_domain *p2m)
>>>       /* set EPT page-walk length, now it's actual walk length - 1,
>>> i.e. 3 */
>>>       ept->ept_wl = 3;
>>>   +    /* Enable EPT A/D bit if it's supported by hardware */
>>> +    ept->ept_ad = cpu_has_vmx_ept_ad_bit ? 1 : 0;
>> This will incur overhead on all EPT operations.  It should only be
>> enabled if pml is going to be in use.  (I think you need reverse patches
>> 1 and 2 in the series, and gate on pml_enable here)
> Hi Andrew,
>
> I'd like to also put patch 3 (PML feature detection) before this
> patch, as it's better to use cpu_has_vmx_pml to gate A/D bit enabling
> here. Theoretically simple "pml_enable = 1" here doesn't guarantee
> cpu_has_vmx_pml to be true, as PML may be turned off during
> vmx_init_vmcs_config.
>
> And in this case I also want to delete below code, as if PML is not
> enabled, below code will print but actually EPT A/D bits is not
> enabled in hardware.
>
>    P(cpu_has_vmx_ept_ad, "EPT A/D bit");
>

Sounds sensible.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/10] VMX: New parameter to control PML enabling
  2015-04-02  5:46     ` Kai Huang
@ 2015-04-02  9:58       ` Andrew Cooper
  2015-04-02 13:34         ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2015-04-02  9:58 UTC (permalink / raw)
  To: Kai Huang, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

On 02/04/15 06:46, Kai Huang wrote:
>
>
> On 03/28/2015 04:42 AM, Andrew Cooper wrote:
>> On 27/03/15 02:35, Kai Huang wrote:
>>> A top level EPT parameter "ept=<options>" and a sub boolean
>>> "pml_enable" are
>>> added to control PML. Other booleans can be further added for any
>>> other EPT
>>> related features.
>>>
>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> Please patch docs/misc/xen-command-line.markdown as well.  See the
>> existing "psr" option as a similar example.
>>
>> Also, as indicated in patch 1, I think patches 1 and 2 need swapping in
>> the series.
>>
>>> ---
>>>   xen/arch/x86/hvm/vmx/vmcs.c | 32 ++++++++++++++++++++++++++++++++
>>>   1 file changed, 32 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index 2f645fe..9b20a4b 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -50,6 +50,16 @@ boolean_param("unrestricted_guest",
>>> opt_unrestricted_guest_enabled);
>>>   static bool_t __read_mostly opt_apicv_enabled = 1;
>>>   boolean_param("apicv", opt_apicv_enabled);
>>>   +static void parse_ept_param(char *s);
>>> +/*
>>> + * The 'ept' parameter controls functionalities that depend on, or
>>> impact the
>>> + * EPT mechanism. Optional comma separated value may contain:
>>> + *
>>> + *  pml                 Enable PML
>>> + */
>>> +custom_param("ept", parse_ept_param);
>> It is common to put the custom_param() call below parse_ept_param() so
>> you don't need to forward-declare the function.  The comment can happily
>> live at the top of parse_ept_param().
> Hi Andrew,
>
> Looks it's better to keep parse_ept_param() below custom_param(), as
> simply moving parse_ept_param() above custom_param() results in below
> error (I also changed pml_enable to opt_pml_enabled), as it references
> opt_pml_enabled variable, which is defined below custom_param().
> Actually for "iommu=<options>" parameter, parse_iommu_param() was also
> placed below custom_param().
>
> What do you think?
>
> vmcs.c: In function ‘parse_ept_param’:
> vmcs.c:74:13: error: ‘opt_pml_enabled’ undeclared (first use in this
> function)
>              opt_pml_enabled = val;
>              ^
> vmcs.c:74:13: note: each undeclared identifier is reported only once
> for each function it appears in
> vmcs.c: At top level:
> vmcs.c:81:29: error: ‘opt_pml_enabled’ defined but not used
> [-Werror=unused-variable]
>  static bool_t __read_mostly opt_pml_enabled = 0;

The most concise way of doing this is:

static bool_t __read_mostly opt_pml_enabled = 0;

static void parse_ept_param(char *s)
{
  ...
}
custom_param("ept", parse_ept_param);

~Andrew

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/10] VMX: New parameter to control PML enabling
  2015-04-02  9:58       ` Andrew Cooper
@ 2015-04-02 13:34         ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-04-02 13:34 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tian, Kevin, tim, xen-devel, Kai Huang, Jan Beulich, yang.z.zhang

On Thu, Apr 2, 2015 at 5:58 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 02/04/15 06:46, Kai Huang wrote:
>>
>>
>> On 03/28/2015 04:42 AM, Andrew Cooper wrote:
>>> On 27/03/15 02:35, Kai Huang wrote:
>>>> A top level EPT parameter "ept=<options>" and a sub boolean
>>>> "pml_enable" are
>>>> added to control PML. Other booleans can be further added for any
>>>> other EPT
>>>> related features.
>>>>
>>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>> Please patch docs/misc/xen-command-line.markdown as well.  See the
>>> existing "psr" option as a similar example.
>>>
>>> Also, as indicated in patch 1, I think patches 1 and 2 need swapping in
>>> the series.
>>>
>>>> ---
>>>>   xen/arch/x86/hvm/vmx/vmcs.c | 32 ++++++++++++++++++++++++++++++++
>>>>   1 file changed, 32 insertions(+)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> index 2f645fe..9b20a4b 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> @@ -50,6 +50,16 @@ boolean_param("unrestricted_guest",
>>>> opt_unrestricted_guest_enabled);
>>>>   static bool_t __read_mostly opt_apicv_enabled = 1;
>>>>   boolean_param("apicv", opt_apicv_enabled);
>>>>   +static void parse_ept_param(char *s);
>>>> +/*
>>>> + * The 'ept' parameter controls functionalities that depend on, or
>>>> impact the
>>>> + * EPT mechanism. Optional comma separated value may contain:
>>>> + *
>>>> + *  pml                 Enable PML
>>>> + */
>>>> +custom_param("ept", parse_ept_param);
>>> It is common to put the custom_param() call below parse_ept_param() so
>>> you don't need to forward-declare the function.  The comment can happily
>>> live at the top of parse_ept_param().
>> Hi Andrew,
>>
>> Looks it's better to keep parse_ept_param() below custom_param(), as
>> simply moving parse_ept_param() above custom_param() results in below
>> error (I also changed pml_enable to opt_pml_enabled), as it references
>> opt_pml_enabled variable, which is defined below custom_param().
>> Actually for "iommu=<options>" parameter, parse_iommu_param() was also
>> placed below custom_param().
>>
>> What do you think?
>>
>> vmcs.c: In function ‘parse_ept_param’:
>> vmcs.c:74:13: error: ‘opt_pml_enabled’ undeclared (first use in this
>> function)
>>              opt_pml_enabled = val;
>>              ^
>> vmcs.c:74:13: note: each undeclared identifier is reported only once
>> for each function it appears in
>> vmcs.c: At top level:
>> vmcs.c:81:29: error: ‘opt_pml_enabled’ defined but not used
>> [-Werror=unused-variable]
>>  static bool_t __read_mostly opt_pml_enabled = 0;
>
> The most concise way of doing this is:
>
> static bool_t __read_mostly opt_pml_enabled = 0;
>
> static void parse_ept_param(char *s)
> {
>   ...
> }
> custom_param("ept", parse_ept_param);
>
Sure. Thanks.


> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Thanks,
-Kai

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/10] PML (Paging Modification Logging) support
  2015-03-30  5:50   ` Kai Huang
@ 2015-04-07  8:30     ` Kai Huang
  2015-04-07  9:24       ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-07  8:30 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, tim, kevin.tian, yang.z.zhang, xen-devel

Hi Jan, Tim, other maintainers,

Do you have comments? Or should I send out the v2 addressing Andrew's 
comments, as it's been more than a week since this patch series were 
sent out?

Thanks,
-Kai

On 03/30/2015 01:50 PM, Kai Huang wrote:
>
>
> On 03/28/2015 05:26 AM, Andrew Cooper wrote:
>> On 27/03/15 02:35, Kai Huang wrote:
>>> Hi all,
>>>
>>> This patch series adds PML support to Xen. Please kindly help to 
>>> review it.
>> Overall this looks like a very good series, and it is particularly
>> helpful given the level of commenting.
>>
>> Which platforms is/will PML be available for?
> Hi Andrew,
>
> Thanks for your quick review. PML will be available from Intel's 
> "Broadwell server" platform.
>
> Thanks,
> -Kai
>>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/10] PML (Paging Modification Logging) support
  2015-04-07  8:30     ` Kai Huang
@ 2015-04-07  9:24       ` Tim Deegan
  2015-04-08  2:23         ` Kai Huang
  2015-04-09 12:32         ` Tim Deegan
  0 siblings, 2 replies; 65+ messages in thread
From: Tim Deegan @ 2015-04-07  9:24 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, Andrew Cooper, kevin.tian, jbeulich, xen-devel

Hi,

At 16:30 +0800 on 07 Apr (1428424218), Kai Huang wrote:
> Hi Jan, Tim, other maintainers,
> 
> Do you have comments? Or should I send out the v2 addressing Andrew's 
> comments, as it's been more than a week since this patch series were 
> sent out?

I'm sorry, I was away last week so I haven't had a chance to review
these patches.  I'll probably be able to look at them on Thursday.

Cheers,

Tim.

> On 03/30/2015 01:50 PM, Kai Huang wrote:
> >
> >
> > On 03/28/2015 05:26 AM, Andrew Cooper wrote:
> >> On 27/03/15 02:35, Kai Huang wrote:
> >>> Hi all,
> >>>
> >>> This patch series adds PML support to Xen. Please kindly help to 
> >>> review it.
> >> Overall this looks like a very good series, and it is particularly
> >> helpful given the level of commenting.
> >>
> >> Which platforms is/will PML be available for?
> > Hi Andrew,
> >
> > Thanks for your quick review. PML will be available from Intel's 
> > "Broadwell server" platform.
> >
> > Thanks,
> > -Kai
> >>
> >> ~Andrew
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/10] PML (Paging Modification Logging) support
  2015-04-07  9:24       ` Tim Deegan
@ 2015-04-08  2:23         ` Kai Huang
  2015-04-09 12:32         ` Tim Deegan
  1 sibling, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-04-08  2:23 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, Andrew Cooper, kevin.tian, jbeulich, xen-devel



On 04/07/2015 05:24 PM, Tim Deegan wrote:
> Hi,
>
> At 16:30 +0800 on 07 Apr (1428424218), Kai Huang wrote:
>> Hi Jan, Tim, other maintainers,
>>
>> Do you have comments? Or should I send out the v2 addressing Andrew's
>> comments, as it's been more than a week since this patch series were
>> sent out?
> I'm sorry, I was away last week so I haven't had a chance to review
> these patches.  I'll probably be able to look at them on Thursday.
Yeah sure. Thanks Tim!

Thanks,
-Kai
>
> Cheers,
>
> Tim.
>
>> On 03/30/2015 01:50 PM, Kai Huang wrote:
>>>
>>> On 03/28/2015 05:26 AM, Andrew Cooper wrote:
>>>> On 27/03/15 02:35, Kai Huang wrote:
>>>>> Hi all,
>>>>>
>>>>> This patch series adds PML support to Xen. Please kindly help to
>>>>> review it.
>>>> Overall this looks like a very good series, and it is particularly
>>>> helpful given the level of commenting.
>>>>
>>>> Which platforms is/will PML be available for?
>>> Hi Andrew,
>>>
>>> Thanks for your quick review. PML will be available from Intel's
>>> "Broadwell server" platform.
>>>
>>> Thanks,
>>> -Kai
>>>> ~Andrew
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-03-27  2:35 ` [PATCH 01/10] VMX: Enable EPT A/D bit support Kai Huang
  2015-03-27 20:38   ` Andrew Cooper
@ 2015-04-09 11:21   ` Tim Deegan
  2015-04-10  6:40     ` Kai Huang
  1 sibling, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-09 11:21 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 10:35 +0800 on 27 Mar (1427452545), Kai Huang wrote:
> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>      if ( !ept_set_middle_entry(p2m, &new_ept) )
>          return 0;
>  
> +    /* It's better to copy A bit of Middle entry from original entry */
> +    new_ept.a = ept_entry->a;

Shouldn't this A bit always be set to 1?  After all we're not using it
for anything, and it doesn't mean the same thing as the A bit in the
superpage entry did.

In fact, thinking about that, we should probably be setting _all_ the
A bits to 1, even in leaf entries, to avoid the overhead of the MMU
having to write them later.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-03-27  2:35 ` [PATCH 05/10] VMX: add help functions " Kai Huang
  2015-03-27 21:09   ` Andrew Cooper
@ 2015-04-09 12:00   ` Tim Deegan
  2015-04-10  7:05     ` Kai Huang
  2015-04-09 12:31   ` Tim Deegan
  2 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-09 12:00 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

Hi,

At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote:
> +void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
> +{
> +    uint64_t *pml_buf;
> +    unsigned long pml_idx;
> +
> +    ASSERT(vmx_vcpu_pml_enabled(v));
> +
> +    vmx_vmcs_enter(v);
> +
> +    __vmread(GUEST_PML_INDEX, &pml_idx);
> +
> +    /* Do nothing if PML buffer is empty */
> +    if ( pml_idx == (PML_ENTITY_NUM - 1) )
> +        goto out;
> +
> +    pml_buf = map_domain_page(page_to_mfn(v->arch.hvm_vmx.pml_pg));
> +
> +    /*
> +     * PML index can be either 2^16-1 (buffer is full), or 0~511 (buffer is not
> +     * full), and in latter case PML index always points to next available
> +     * entity.
> +     */
> +    if (pml_idx >= PML_ENTITY_NUM)
> +        pml_idx = 0;
> +    else
> +        pml_idx++;
> +
> +    for ( ; pml_idx < PML_ENTITY_NUM; pml_idx++ )
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
> +        unsigned long gfn;
> +        mfn_t mfn;
> +        p2m_type_t t;
> +        p2m_access_t a;
> +
> +        gfn = pml_buf[pml_idx] >> PAGE_SHIFT;
> +        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);

Please don't call p2m->get_entry() directly -- that interface should
only be used inside the p2m code.  As it happens, I don't think this
lookup is correct anyway: the logging only sees races (which are not
interesting) or buggy hardware (which is not worth the extra lookup to
detect).

So you only need this to get 'mfn' to pass to paging_mark_dirty().
That's also buggy, because there's no locking here to make sure
gfn->mfn->gfn ends up in the right place. :(

I think the right thing to do is:

 - split paging_park_dirty() into paging_mark_gfn_dirty() (the bulk of
   the current function) and a paging_mark_dirty() wrapper that does
   get_gpfn_from_mfn(mfn_x(gmfn)) and calls paging_mark_gfn_dirty().

 - call paging_mark_gfn_dirty() from vmx_vcpu_flush_pml_buffer().

That will avoid _two_ p2m lookups in this function. :)

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy
  2015-03-27  2:35 ` [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy Kai Huang
@ 2015-04-09 12:04   ` Tim Deegan
  2015-04-10  7:25     ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-09 12:04 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 10:35 +0800 on 27 Mar (1427452552), Kai Huang wrote:
> It's possible domain still remains in log-dirty mode when it is about to be
> destroyed, in which case we should manually disable PML for it.
> 
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index fce3aa2..75ac44b 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -153,6 +153,15 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>  static void vmx_vcpu_destroy(struct vcpu *v)
>  {
> +    /*
> +     * There are cases that domain still remains in log-dirty mode when it is
> +     * about to be destroyed (ex, user types 'xl destroy <dom>'), in which case
> +     * we should disable PML manually here. Note that vmx_vcpu_destroy is called
> +     * prior to vmx_domain_destroy so we need to disable PML for each vcpu
> +     * separately here.
> +     */
> +    if ( vmx_vcpu_pml_enabled(v) )
> +        vmx_vcpu_disable_pml(v);

Looking at this and other callers of these enable/disable functions, I
think it would be better to make those functions idempotent (i.e.
*_{en,dis}able_pml() should just return success if PML is already
enabled/disabled).  Then you don't need to check in every caller, and
there's no risk of a crash if one caller is missing the check.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty.
  2015-03-27  2:35 ` [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty Kai Huang
@ 2015-04-09 12:20   ` Tim Deegan
  2015-04-10  8:44     ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-09 12:20 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 10:35 +0800 on 27 Mar (1427452554), Kai Huang wrote:
> @@ -118,6 +119,12 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
>              break;
>          case p2m_ram_rw:
>              entry->r = entry->w = entry->x = 1;
> +            /*
> +             * This is about to avoid unnecessary GPA logging in PML buffer,
> +             * such as normal memory in partial log-dirty
> +             */
> +            if ( vmx_domain_pml_enabled(p2m->domain) )
> +                entry->d = 1;

I think that both A and D should be set here, even when PML is not
supported, to avoid the MMU having to write them later.  And indeed
not just for ram_rw, but for every present leaf type.

You will still need to check whether the CPU supports the A/D bits,
since AIUI those are reserved bits otherwise.

>              break;
>          case p2m_mmio_direct:
>              entry->r = entry->x = 1;
> @@ -125,6 +132,26 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
>                                                      entry->mfn);
>              break;
>          case p2m_ram_logdirty:
> +            entry->r = entry->x = 1;
> +            if ( vmx_domain_pml_enabled(p2m->domain) )
> +            {
> +                /*
> +                 * In case of PML, we don't have to write protect 4K page, but
> +                 * only need to clear D-bit for it. Note we still need to write
> +                 * protect super page in order to split it to 4K pages in EPT
> +                 * violation.
> +                 */
> +                if ( !is_epte_superpage(entry) )
> +                {
> +                    entry->w = 1;
> +                    entry->d = 0;
> +                }
> +                else
> +                    entry->w = 0;
> +            }
> +            else
> +                entry->w = 0;
> +            break;

This can be folded into a neater form:

            if ( vmx_domain_pml_enabled(p2m->domain)
	         && !is_epte_superpage(entry) )
            {
                /*
                 * In case of PML, we don't have to write protect 4K page, but
                 * only need to clear D-bit for it. Note we still need to write
                 * protect super page in order to split it to 4K pages in EPT
                 * violation.
                 */
                entry->w = 1;
                entry->d = 0;
            }
            else
                entry->w = 0;

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/10] log-dirty: Refine common code to support PML
  2015-03-27  2:35 ` [PATCH 09/10] log-dirty: Refine common code to support PML Kai Huang
@ 2015-04-09 12:27   ` Tim Deegan
  2015-04-10  7:38     ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-09 12:27 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 10:35 +0800 on 27 Mar (1427452553), Kai Huang wrote:
> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -411,7 +411,18 @@ static int paging_log_dirty_op(struct domain *d,
>      int i4, i3, i2;
>  
>      if ( !resuming )
> +    {
>          domain_pause(d);
> +
> +        /*
> +         * Only need to flush when not resuming, as domain was paused in
> +         * resuming case therefore it's not possible to have any new dirty
> +         * page.
> +         */
> +        if ( d->arch.paging.log_dirty.flush_cached_dirty )
> +            d->arch.paging.log_dirty.flush_cached_dirty(d);

I think there are too many layers of indirection here. :)  How about:
 - don't add a flush_cached_dirty() function to the log_dirty ops.
 - just call p2m_flush_hardware_cached_dirty(d) here.

Would that work OK?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-03-27  2:35 ` [PATCH 05/10] VMX: add help functions " Kai Huang
  2015-03-27 21:09   ` Andrew Cooper
  2015-04-09 12:00   ` Tim Deegan
@ 2015-04-09 12:31   ` Tim Deegan
  2015-04-10  7:07     ` Kai Huang
  2 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-09 12:31 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote:
> +void vmx_vcpu_disable_pml(struct vcpu *v)
> +{
> +    ASSERT(vmx_vcpu_pml_enabled(v));
> +

I think this function ought to call vmx_vcpu_flush_pml_buffer() before
disabling PML.  That way we don't need to worry about losing any
information if a guest vcpu is reset or offlined during migration. 

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/10] PML (Paging Modification Logging) support
  2015-04-07  9:24       ` Tim Deegan
  2015-04-08  2:23         ` Kai Huang
@ 2015-04-09 12:32         ` Tim Deegan
  2015-04-10  6:40           ` Kai Huang
  1 sibling, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-09 12:32 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, Andrew Cooper, kevin.tian, jbeulich, xen-devel

At 10:24 +0100 on 07 Apr (1428402277), Tim Deegan wrote:
> Hi,
> 
> At 16:30 +0800 on 07 Apr (1428424218), Kai Huang wrote:
> > Hi Jan, Tim, other maintainers,
> > 
> > Do you have comments? Or should I send out the v2 addressing Andrew's 
> > comments, as it's been more than a week since this patch series were 
> > sent out?
> 
> I'm sorry, I was away last week so I haven't had a chance to review
> these patches.  I'll probably be able to look at them on Thursday.

Done.  They seem to be in good shape for a first cut!  I've commented
on the patches where there was anything I think needs improvement.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/10] PML (Paging Modification Logging) support
  2015-04-09 12:32         ` Tim Deegan
@ 2015-04-10  6:40           ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-04-10  6:40 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, Andrew Cooper, kevin.tian, jbeulich, xen-devel



On 04/09/2015 08:32 PM, Tim Deegan wrote:
> At 10:24 +0100 on 07 Apr (1428402277), Tim Deegan wrote:
>> Hi,
>>
>> At 16:30 +0800 on 07 Apr (1428424218), Kai Huang wrote:
>>> Hi Jan, Tim, other maintainers,
>>>
>>> Do you have comments? Or should I send out the v2 addressing Andrew's
>>> comments, as it's been more than a week since this patch series were
>>> sent out?
>> I'm sorry, I was away last week so I haven't had a chance to review
>> these patches.  I'll probably be able to look at them on Thursday.
> Done.  They seem to be in good shape for a first cut!  I've commented
> on the patches where there was anything I think needs improvement.
Thanks Tim for review! Please see my reply.

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-04-09 11:21   ` Tim Deegan
@ 2015-04-10  6:40     ` Kai Huang
  2015-04-10  8:54       ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-10  6:40 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/09/2015 07:21 PM, Tim Deegan wrote:
> At 10:35 +0800 on 27 Mar (1427452545), Kai Huang wrote:
>> @@ -233,6 +233,9 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>>       if ( !ept_set_middle_entry(p2m, &new_ept) )
>>           return 0;
>>   
>> +    /* It's better to copy A bit of Middle entry from original entry */
>> +    new_ept.a = ept_entry->a;
> Shouldn't this A bit always be set to 1?  After all we're not using it
> for anything, and it doesn't mean the same thing as the A bit in the
> superpage entry did.
This is the statement in Intel's SDM:

"Whenever the processor uses an EPT paging-structure entry as part of 
guest-physical-address translation, it sets
the accessed flag in that entry (if it is not already set)".

In my understanding, once the guest-physical-address is used by 
processor, all EPT entries used for that translation will have their A 
bits set, from top level entry to leaf entry, therefore A bits of EPT 
entries for one translation should be consistent, in which case looks 
it's reasonable to inherit A bit from superpage's leaf entry to middle 
entry of new shattered 4K pages.

Do I have misunderstanding on this?

But this is really not an important issue, and we can safely set A bit 
of middle entry to 1. It is Yang's idea that it's better to inherit it.

Yang, would you comment here?

>
> In fact, thinking about that, we should probably be setting _all_ the
> A bits to 1, even in leaf entries, to avoid the overhead of the MMU
> having to write them later.
I suppose you mean A bits of leaf entries of new shattered 4K pages? Yes 
we can, but looks it's more reasonable to inherit according to original 
A bit of super page. But as I said, I don't think A bit matters a lot, 
so I am OK with both. Maybe others can provide some comments here?

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-04-09 12:00   ` Tim Deegan
@ 2015-04-10  7:05     ` Kai Huang
  2015-04-10  9:03       ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-10  7:05 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/09/2015 08:00 PM, Tim Deegan wrote:
> Hi,
>
> At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote:
>> +void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
>> +{
>> +    uint64_t *pml_buf;
>> +    unsigned long pml_idx;
>> +
>> +    ASSERT(vmx_vcpu_pml_enabled(v));
>> +
>> +    vmx_vmcs_enter(v);
>> +
>> +    __vmread(GUEST_PML_INDEX, &pml_idx);
>> +
>> +    /* Do nothing if PML buffer is empty */
>> +    if ( pml_idx == (PML_ENTITY_NUM - 1) )
>> +        goto out;
>> +
>> +    pml_buf = map_domain_page(page_to_mfn(v->arch.hvm_vmx.pml_pg));
>> +
>> +    /*
>> +     * PML index can be either 2^16-1 (buffer is full), or 0~511 (buffer is not
>> +     * full), and in latter case PML index always points to next available
>> +     * entity.
>> +     */
>> +    if (pml_idx >= PML_ENTITY_NUM)
>> +        pml_idx = 0;
>> +    else
>> +        pml_idx++;
>> +
>> +    for ( ; pml_idx < PML_ENTITY_NUM; pml_idx++ )
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
>> +        unsigned long gfn;
>> +        mfn_t mfn;
>> +        p2m_type_t t;
>> +        p2m_access_t a;
>> +
>> +        gfn = pml_buf[pml_idx] >> PAGE_SHIFT;
>> +        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
> Please don't call p2m->get_entry() directly -- that interface should
> only be used inside the p2m code.  As it happens, I don't think this
> lookup is correct anyway: the logging only sees races (which are not
> interesting) or buggy hardware (which is not worth the extra lookup to
> detect).
>
> So you only need this to get 'mfn' to pass to paging_mark_dirty().
True.
> That's also buggy, because there's no locking here to make sure
> gfn->mfn->gfn ends up in the right place. :(
Yes you are right. Thanks for pointing out.

Just curious, looks if I use get_gfn_type_access and put_gfn here, the 
gfn->mfn->gfn can be guaranteed, and it's safe to use them here, right?

>
> I think the right thing to do is:
>
>   - split paging_park_dirty() into paging_mark_gfn_dirty() (the bulk of
>     the current function) and a paging_mark_dirty() wrapper that does
>     get_gpfn_from_mfn(mfn_x(gmfn)) and calls paging_mark_gfn_dirty().
>
>   - call paging_mark_gfn_dirty() from vmx_vcpu_flush_pml_buffer().
>
> That will avoid _two_ p2m lookups in this function. :)
Yours is indeed much better! Will do in this way.

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-04-09 12:31   ` Tim Deegan
@ 2015-04-10  7:07     ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-04-10  7:07 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/09/2015 08:31 PM, Tim Deegan wrote:
> At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote:
>> +void vmx_vcpu_disable_pml(struct vcpu *v)
>> +{
>> +    ASSERT(vmx_vcpu_pml_enabled(v));
>> +
> I think this function ought to call vmx_vcpu_flush_pml_buffer() before
> disabling PML.  That way we don't need to worry about losing any
> information if a guest vcpu is reset or offlined during migration.
Good point. Thanks for catching. Will do :)

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy
  2015-04-09 12:04   ` Tim Deegan
@ 2015-04-10  7:25     ` Kai Huang
  2015-04-10  9:30       ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-10  7:25 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/09/2015 08:04 PM, Tim Deegan wrote:
> At 10:35 +0800 on 27 Mar (1427452552), Kai Huang wrote:
>> It's possible domain still remains in log-dirty mode when it is about to be
>> destroyed, in which case we should manually disable PML for it.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/hvm/vmx/vmx.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index fce3aa2..75ac44b 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -153,6 +153,15 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>>   
>>   static void vmx_vcpu_destroy(struct vcpu *v)
>>   {
>> +    /*
>> +     * There are cases that domain still remains in log-dirty mode when it is
>> +     * about to be destroyed (ex, user types 'xl destroy <dom>'), in which case
>> +     * we should disable PML manually here. Note that vmx_vcpu_destroy is called
>> +     * prior to vmx_domain_destroy so we need to disable PML for each vcpu
>> +     * separately here.
>> +     */
>> +    if ( vmx_vcpu_pml_enabled(v) )
>> +        vmx_vcpu_disable_pml(v);
> Looking at this and other callers of these enable/disable functions, I
> think it would be better to make those functions idempotent (i.e.
> *_{en,dis}able_pml() should just return success if PML is already
> enabled/disabled).  Then you don't need to check in every caller, and
> there's no risk of a crash if one caller is missing the check.
Do you mean something like below?

bool_t vmx_vcpu_enable_pml(struct vcpu *v)
{
     if ( vmx_vcpu_pml_enabled(v) )
         return success;
     ......    /* code to enable */

     return success;
error:
     ...
     return failure;
}

This should be also fine. I will do this. But I think 
vmx_{vcpu|domain}_disable_pml should remain return value of void.

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/10] log-dirty: Refine common code to support PML
  2015-04-09 12:27   ` Tim Deegan
@ 2015-04-10  7:38     ` Kai Huang
  2015-04-10  9:31       ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-10  7:38 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/09/2015 08:27 PM, Tim Deegan wrote:
> At 10:35 +0800 on 27 Mar (1427452553), Kai Huang wrote:
>> --- a/xen/arch/x86/mm/paging.c
>> +++ b/xen/arch/x86/mm/paging.c
>> @@ -411,7 +411,18 @@ static int paging_log_dirty_op(struct domain *d,
>>       int i4, i3, i2;
>>   
>>       if ( !resuming )
>> +    {
>>           domain_pause(d);
>> +
>> +        /*
>> +         * Only need to flush when not resuming, as domain was paused in
>> +         * resuming case therefore it's not possible to have any new dirty
>> +         * page.
>> +         */
>> +        if ( d->arch.paging.log_dirty.flush_cached_dirty )
>> +            d->arch.paging.log_dirty.flush_cached_dirty(d);
> I think there are too many layers of indirection here. :)  How about:
>   - don't add a flush_cached_dirty() function to the log_dirty ops.
>   - just call p2m_flush_hardware_cached_dirty(d) here.
>
> Would that work OK?
Thanks for pointing out.

Is it nature to call p2m layer functions in paging.c? If there's no 
restriction on it, calling p2m_flush_hardware_cached_dirty is more 
clear, and it should work.

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty.
  2015-04-09 12:20   ` Tim Deegan
@ 2015-04-10  8:44     ` Kai Huang
  2015-04-10  9:46       ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-10  8:44 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/09/2015 08:20 PM, Tim Deegan wrote:
> At 10:35 +0800 on 27 Mar (1427452554), Kai Huang wrote:
>> @@ -118,6 +119,12 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
>>               break;
>>           case p2m_ram_rw:
>>               entry->r = entry->w = entry->x = 1;
>> +            /*
>> +             * This is about to avoid unnecessary GPA logging in PML buffer,
>> +             * such as normal memory in partial log-dirty
>> +             */
>> +            if ( vmx_domain_pml_enabled(p2m->domain) )
>> +                entry->d = 1;
> I think that both A and D should be set here, even when PML is not
> supported, to avoid the MMU having to write them later.  And indeed
> not just for ram_rw, but for every present leaf type.
I do agree we should also set A if PML is enabled for the domain, and 
looks there's no harm to set A bit for all present leaf types. Actually 
there should be no harm to set A bit even the leaf entry is not present..

But for D bit I think we should be more specific. For p2m types that are 
writable, we should set the D-bit to avoid unnecessary GPA logging, but 
for those are read-only, I think it's not reasonable to set D bit, as 
it's not possible for them to be dirty.

>
> You will still need to check whether the CPU supports the A/D bits,
> since AIUI those are reserved bits otherwise.
In my understanding, A/D bits actually are ignored if EPT A/D bit is not 
enabled in EPTP. Citing one statement in SDM for an example:

"If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether 
software has accessed the 4-KByte page referenced
by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0".

Therefore I think it's unnecessary to check whether CPU supports A/D 
bits prior to setting A/D bits. The checking is an overhead anyway. And 
it reminds me we can safely set A bit in middle level entry in splitting 
super page case, which you pointed out in patch 1 to enable A/D bits.

Of course it's safer to operate A/D bits with whether cpu supports A/D 
bits checked, and we might better do it.

Is this reasonable?

Thanks,
-Kai
>
>>               break;
>>           case p2m_mmio_direct:
>>               entry->r = entry->x = 1;
>> @@ -125,6 +132,26 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
>>                                                       entry->mfn);
>>               break;
>>           case p2m_ram_logdirty:
>> +            entry->r = entry->x = 1;
>> +            if ( vmx_domain_pml_enabled(p2m->domain) )
>> +            {
>> +                /*
>> +                 * In case of PML, we don't have to write protect 4K page, but
>> +                 * only need to clear D-bit for it. Note we still need to write
>> +                 * protect super page in order to split it to 4K pages in EPT
>> +                 * violation.
>> +                 */
>> +                if ( !is_epte_superpage(entry) )
>> +                {
>> +                    entry->w = 1;
>> +                    entry->d = 0;
>> +                }
>> +                else
>> +                    entry->w = 0;
>> +            }
>> +            else
>> +                entry->w = 0;
>> +            break;
> This can be folded into a neater form:
>
>              if ( vmx_domain_pml_enabled(p2m->domain)
> 	         && !is_epte_superpage(entry) )
>              {
>                  /*
>                   * In case of PML, we don't have to write protect 4K page, but
>                   * only need to clear D-bit for it. Note we still need to write
>                   * protect super page in order to split it to 4K pages in EPT
>                   * violation.
>                   */
>                  entry->w = 1;
>                  entry->d = 0;
>              }
>              else
>                  entry->w = 0;
Sure.

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-04-10  6:40     ` Kai Huang
@ 2015-04-10  8:54       ` Tim Deegan
  2015-04-10  9:26         ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-10  8:54 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 14:40 +0800 on 10 Apr (1428676858), Kai Huang wrote:
> > In fact, thinking about that, we should probably be setting _all_ the
> > A bits to 1, even in leaf entries, to avoid the overhead of the MMU
> > having to write them later.
> I suppose you mean A bits of leaf entries of new shattered 4K pages? Yes 
> we can, but looks it's more reasonable to inherit according to original 
> A bit of super page.

Yes, inheriting is best, but we should have already set the A bit of the
superpage to 1 as well, for the same reasons.  IOW we should set every A
bit in the whole system to 1 all the time.

BTW, this is what we do with A/D bits in normal pagetables in Xen too
-- the default is to set them all the time, so the MMU doesn't have to
fix them up later.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-04-10  7:05     ` Kai Huang
@ 2015-04-10  9:03       ` Tim Deegan
  2015-04-10  9:28         ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-10  9:03 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 15:05 +0800 on 10 Apr (1428678346), Kai Huang wrote:
> 
> 
> On 04/09/2015 08:00 PM, Tim Deegan wrote:
> > Hi,
> >
> > That's also buggy, because there's no locking here to make sure
> > gfn->mfn->gfn ends up in the right place. :(
> Yes you are right. Thanks for pointing out.
> 
> Just curious, looks if I use get_gfn_type_access and put_gfn here, the 
> gfn->mfn->gfn can be guaranteed, and it's safe to use them here, right?

Yes, that would be the right way to do it.

In this particular case, there's another race too: the gfn might have
been entirely deleted after it's logged by PML but before the entry is
processed here.  I don't think that's a problem here, because if the
gfn's not mapped any more it's not relevant that it's dirty.  Just
mentioning it out of interest.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-04-10  8:54       ` Tim Deegan
@ 2015-04-10  9:26         ` Kai Huang
  2015-04-10  9:51           ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-10  9:26 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/10/2015 04:54 PM, Tim Deegan wrote:
> At 14:40 +0800 on 10 Apr (1428676858), Kai Huang wrote:
>>> In fact, thinking about that, we should probably be setting _all_ the
>>> A bits to 1, even in leaf entries, to avoid the overhead of the MMU
>>> having to write them later.
>> I suppose you mean A bits of leaf entries of new shattered 4K pages? Yes
>> we can, but looks it's more reasonable to inherit according to original
>> A bit of super page.
> Yes, inheriting is best, but we should have already set the A bit of the
> superpage to 1 as well, for the same reasons.  IOW we should set every A
> bit in the whole system to 1 all the time.
>
> BTW, this is what we do with A/D bits in normal pagetables in Xen too
> -- the default is to set them all the time, so the MMU doesn't have to
> fix them up later.
Sure then we can set all A bit to 1, which can be done by setting A bit 
properly in ept_p2m_type_to_flags I think.

What does normal pagetables mean here? I don't think you mean shadow 
page table, as shadow page table needs to sync A/D bits with guest 
table, right?

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/10] VMX: add help functions to support PML
  2015-04-10  9:03       ` Tim Deegan
@ 2015-04-10  9:28         ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-04-10  9:28 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/10/2015 05:03 PM, Tim Deegan wrote:
> At 15:05 +0800 on 10 Apr (1428678346), Kai Huang wrote:
>>
>> On 04/09/2015 08:00 PM, Tim Deegan wrote:
>>> Hi,
>>>
>>> That's also buggy, because there's no locking here to make sure
>>> gfn->mfn->gfn ends up in the right place. :(
>> Yes you are right. Thanks for pointing out.
>>
>> Just curious, looks if I use get_gfn_type_access and put_gfn here, the
>> gfn->mfn->gfn can be guaranteed, and it's safe to use them here, right?
> Yes, that would be the right way to do it.
>
> In this particular case, there's another race too: the gfn might have
> been entirely deleted after it's logged by PML but before the entry is
> processed here.  I don't think that's a problem here, because if the
> gfn's not mapped any more it's not relevant that it's dirty.  Just
> mentioning it out of interest.
Agreed. I think the gone gfn should be irrelevant. Thanks.

Thanks,
-Kai
>
> Cheers,
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy
  2015-04-10  7:25     ` Kai Huang
@ 2015-04-10  9:30       ` Tim Deegan
  0 siblings, 0 replies; 65+ messages in thread
From: Tim Deegan @ 2015-04-10  9:30 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 15:25 +0800 on 10 Apr (1428679508), Kai Huang wrote:
> 
> 
> On 04/09/2015 08:04 PM, Tim Deegan wrote:
> > Looking at this and other callers of these enable/disable functions, I
> > think it would be better to make those functions idempotent (i.e.
> > *_{en,dis}able_pml() should just return success if PML is already
> > enabled/disabled).  Then you don't need to check in every caller, and
> > there's no risk of a crash if one caller is missing the check.
> Do you mean something like below?
> 
> bool_t vmx_vcpu_enable_pml(struct vcpu *v)
> {
>      if ( vmx_vcpu_pml_enabled(v) )
>          return success;
>      ......    /* code to enable */
> 
>      return success;
> error:
>      ...
>      return failure;
> }
> 
> This should be also fine. I will do this. But I think 
> vmx_{vcpu|domain}_disable_pml should remain return value of void.

Yes, that sounds fine.

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/10] log-dirty: Refine common code to support PML
  2015-04-10  7:38     ` Kai Huang
@ 2015-04-10  9:31       ` Tim Deegan
  2015-04-10  9:33         ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-10  9:31 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 15:38 +0800 on 10 Apr (1428680289), Kai Huang wrote:
> 
> 
> On 04/09/2015 08:27 PM, Tim Deegan wrote:
> > At 10:35 +0800 on 27 Mar (1427452553), Kai Huang wrote:
> >> --- a/xen/arch/x86/mm/paging.c
> >> +++ b/xen/arch/x86/mm/paging.c
> >> @@ -411,7 +411,18 @@ static int paging_log_dirty_op(struct domain *d,
> >>       int i4, i3, i2;
> >>   
> >>       if ( !resuming )
> >> +    {
> >>           domain_pause(d);
> >> +
> >> +        /*
> >> +         * Only need to flush when not resuming, as domain was paused in
> >> +         * resuming case therefore it's not possible to have any new dirty
> >> +         * page.
> >> +         */
> >> +        if ( d->arch.paging.log_dirty.flush_cached_dirty )
> >> +            d->arch.paging.log_dirty.flush_cached_dirty(d);
> > I think there are too many layers of indirection here. :)  How about:
> >   - don't add a flush_cached_dirty() function to the log_dirty ops.
> >   - just call p2m_flush_hardware_cached_dirty(d) here.
> >
> > Would that work OK?
> Thanks for pointing out.
> 
> Is it nature to call p2m layer functions in paging.c? If there's no 
> restriction on it, calling p2m_flush_hardware_cached_dirty is more 
> clear, and it should work.

Yes, calling public p2m functions directly is OK -- it's the internal
function pointers like ->get_entry() that aren't supposed to be called
from outside p2m code.  I guess that's not terribly clear - maybe it
needs some better comments.

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/10] log-dirty: Refine common code to support PML
  2015-04-10  9:31       ` Tim Deegan
@ 2015-04-10  9:33         ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-04-10  9:33 UTC (permalink / raw)
  To: Tim Deegan; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel



On 04/10/2015 05:31 PM, Tim Deegan wrote:
> At 15:38 +0800 on 10 Apr (1428680289), Kai Huang wrote:
>>
>> On 04/09/2015 08:27 PM, Tim Deegan wrote:
>>> At 10:35 +0800 on 27 Mar (1427452553), Kai Huang wrote:
>>>> --- a/xen/arch/x86/mm/paging.c
>>>> +++ b/xen/arch/x86/mm/paging.c
>>>> @@ -411,7 +411,18 @@ static int paging_log_dirty_op(struct domain *d,
>>>>        int i4, i3, i2;
>>>>    
>>>>        if ( !resuming )
>>>> +    {
>>>>            domain_pause(d);
>>>> +
>>>> +        /*
>>>> +         * Only need to flush when not resuming, as domain was paused in
>>>> +         * resuming case therefore it's not possible to have any new dirty
>>>> +         * page.
>>>> +         */
>>>> +        if ( d->arch.paging.log_dirty.flush_cached_dirty )
>>>> +            d->arch.paging.log_dirty.flush_cached_dirty(d);
>>> I think there are too many layers of indirection here. :)  How about:
>>>    - don't add a flush_cached_dirty() function to the log_dirty ops.
>>>    - just call p2m_flush_hardware_cached_dirty(d) here.
>>>
>>> Would that work OK?
>> Thanks for pointing out.
>>
>> Is it nature to call p2m layer functions in paging.c? If there's no
>> restriction on it, calling p2m_flush_hardware_cached_dirty is more
>> clear, and it should work.
> Yes, calling public p2m functions directly is OK -- it's the internal
> function pointers like ->get_entry() that aren't supposed to be called
> from outside p2m code.  I guess that's not terribly clear - maybe it
> needs some better comments.
Pretty clear to me now. Thanks. Will do.:)

Thanks,
-Kai
>
> Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty.
  2015-04-10  8:44     ` Kai Huang
@ 2015-04-10  9:46       ` Tim Deegan
  2015-04-10 13:18         ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-10  9:46 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 16:44 +0800 on 10 Apr (1428684271), Kai Huang wrote:
> 
> 
> On 04/09/2015 08:20 PM, Tim Deegan wrote:
> > At 10:35 +0800 on 27 Mar (1427452554), Kai Huang wrote:
> >> @@ -118,6 +119,12 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
> >>               break;
> >>           case p2m_ram_rw:
> >>               entry->r = entry->w = entry->x = 1;
> >> +            /*
> >> +             * This is about to avoid unnecessary GPA logging in PML buffer,
> >> +             * such as normal memory in partial log-dirty
> >> +             */
> >> +            if ( vmx_domain_pml_enabled(p2m->domain) )
> >> +                entry->d = 1;
> > I think that both A and D should be set here, even when PML is not
> > supported, to avoid the MMU having to write them later.  And indeed
> > not just for ram_rw, but for every present leaf type.
> I do agree we should also set A if PML is enabled for the domain, and 
> looks there's no harm to set A bit for all present leaf types. Actually 
> there should be no harm to set A bit even the leaf entry is not present..
> 
> But for D bit I think we should be more specific. For p2m types that are 
> writable, we should set the D-bit to avoid unnecessary GPA logging, but 
> for those are read-only, I think it's not reasonable to set D bit, as 
> it's not possible for them to be dirty.

Fair enough.

> > You will still need to check whether the CPU supports the A/D bits,
> > since AIUI those are reserved bits otherwise.
> In my understanding, A/D bits actually are ignored if EPT A/D bit is not 
> enabled in EPTP. Citing one statement in SDM for an example:
> 
> "If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether 
> software has accessed the 4-KByte page referenced
> by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0".

I see.  I think I was confused by the field being called "reserved" in
the struct.  I've just checked an older copy of the SDM and those bits
used to be "Ignored", so we can just set them on all hardware.  Good!

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-04-10  9:26         ` Kai Huang
@ 2015-04-10  9:51           ` Tim Deegan
  2015-04-10 13:14             ` Kai Huang
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Deegan @ 2015-04-10  9:51 UTC (permalink / raw)
  To: Kai Huang; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, jbeulich, xen-devel

At 17:26 +0800 on 10 Apr (1428686789), Kai Huang wrote:
> 
> 
> On 04/10/2015 04:54 PM, Tim Deegan wrote:
> > At 14:40 +0800 on 10 Apr (1428676858), Kai Huang wrote:
> >>> In fact, thinking about that, we should probably be setting _all_ the
> >>> A bits to 1, even in leaf entries, to avoid the overhead of the MMU
> >>> having to write them later.
> >> I suppose you mean A bits of leaf entries of new shattered 4K pages? Yes
> >> we can, but looks it's more reasonable to inherit according to original
> >> A bit of super page.
> > Yes, inheriting is best, but we should have already set the A bit of the
> > superpage to 1 as well, for the same reasons.  IOW we should set every A
> > bit in the whole system to 1 all the time.
> >
> > BTW, this is what we do with A/D bits in normal pagetables in Xen too
> > -- the default is to set them all the time, so the MMU doesn't have to
> > fix them up later.
> Sure then we can set all A bit to 1, which can be done by setting A bit 
> properly in ept_p2m_type_to_flags I think.
> 
> What does normal pagetables mean here? I don't think you mean shadow 
> page table, as shadow page table needs to sync A/D bits with guest 
> table, right?

I meant xen's own pagetables, that it maps its own text and data with.
But IIRC we also set the Accessed bit in all shadow pagetable entries
-- fixing up the A/D bits in the guest PTEs happens in the pagefault
handler.

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/10] VMX: Enable EPT A/D bit support
  2015-04-10  9:51           ` Tim Deegan
@ 2015-04-10 13:14             ` Kai Huang
  0 siblings, 0 replies; 65+ messages in thread
From: Kai Huang @ 2015-04-10 13:14 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Tian, Kevin, Andrew Cooper, xen-devel, Kai Huang, Jan Beulich,
	yang.z.zhang

On Fri, Apr 10, 2015 at 5:51 PM, Tim Deegan <tim@xen.org> wrote:
> At 17:26 +0800 on 10 Apr (1428686789), Kai Huang wrote:
>>
>>
>> On 04/10/2015 04:54 PM, Tim Deegan wrote:
>> > At 14:40 +0800 on 10 Apr (1428676858), Kai Huang wrote:
>> >>> In fact, thinking about that, we should probably be setting _all_ the
>> >>> A bits to 1, even in leaf entries, to avoid the overhead of the MMU
>> >>> having to write them later.
>> >> I suppose you mean A bits of leaf entries of new shattered 4K pages? Yes
>> >> we can, but looks it's more reasonable to inherit according to original
>> >> A bit of super page.
>> > Yes, inheriting is best, but we should have already set the A bit of the
>> > superpage to 1 as well, for the same reasons.  IOW we should set every A
>> > bit in the whole system to 1 all the time.
>> >
>> > BTW, this is what we do with A/D bits in normal pagetables in Xen too
>> > -- the default is to set them all the time, so the MMU doesn't have to
>> > fix them up later.
>> Sure then we can set all A bit to 1, which can be done by setting A bit
>> properly in ept_p2m_type_to_flags I think.
>>
>> What does normal pagetables mean here? I don't think you mean shadow
>> page table, as shadow page table needs to sync A/D bits with guest
>> table, right?
>
> I meant xen's own pagetables, that it maps its own text and data with.
> But IIRC we also set the Accessed bit in all shadow pagetable entries
> -- fixing up the A/D bits in the guest PTEs happens in the pagefault
> handler.
Interesting. Thanks for pointing out. I should read that code when I
get spare time :)

Thanks,
-Kai
>
> Tim.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty.
  2015-04-10  9:46       ` Tim Deegan
@ 2015-04-10 13:18         ` Kai Huang
  2015-04-10 14:35           ` Tim Deegan
  0 siblings, 1 reply; 65+ messages in thread
From: Kai Huang @ 2015-04-10 13:18 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Tian, Kevin, Andrew Cooper, xen-devel, Kai Huang, Jan Beulich,
	yang.z.zhang

On Fri, Apr 10, 2015 at 5:46 PM, Tim Deegan <tim@xen.org> wrote:
> At 16:44 +0800 on 10 Apr (1428684271), Kai Huang wrote:
>>
>>
>> On 04/09/2015 08:20 PM, Tim Deegan wrote:
>> > At 10:35 +0800 on 27 Mar (1427452554), Kai Huang wrote:
>> >> @@ -118,6 +119,12 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
>> >>               break;
>> >>           case p2m_ram_rw:
>> >>               entry->r = entry->w = entry->x = 1;
>> >> +            /*
>> >> +             * This is about to avoid unnecessary GPA logging in PML buffer,
>> >> +             * such as normal memory in partial log-dirty
>> >> +             */
>> >> +            if ( vmx_domain_pml_enabled(p2m->domain) )
>> >> +                entry->d = 1;
>> > I think that both A and D should be set here, even when PML is not
>> > supported, to avoid the MMU having to write them later.  And indeed
>> > not just for ram_rw, but for every present leaf type.
>> I do agree we should also set A if PML is enabled for the domain, and
>> looks there's no harm to set A bit for all present leaf types. Actually
>> there should be no harm to set A bit even the leaf entry is not present..
>>
>> But for D bit I think we should be more specific. For p2m types that are
>> writable, we should set the D-bit to avoid unnecessary GPA logging, but
>> for those are read-only, I think it's not reasonable to set D bit, as
>> it's not possible for them to be dirty.
>
> Fair enough.
>
>> > You will still need to check whether the CPU supports the A/D bits,
>> > since AIUI those are reserved bits otherwise.
>> In my understanding, A/D bits actually are ignored if EPT A/D bit is not
>> enabled in EPTP. Citing one statement in SDM for an example:
>>
>> "If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether
>> software has accessed the 4-KByte page referenced
>> by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0".
>
> I see.  I think I was confused by the field being called "reserved" in
> the struct.  I've just checked an older copy of the SDM and those bits
> used to be "Ignored", so we can just set them on all hardware.  Good!

So to confirm, you intend to  update EPT A/D bits directly without
prior checking whether cpu supports A/D bits?

Thanks,
-Kai
>
> Tim.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Thanks,
-Kai

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty.
  2015-04-10 13:18         ` Kai Huang
@ 2015-04-10 14:35           ` Tim Deegan
  0 siblings, 0 replies; 65+ messages in thread
From: Tim Deegan @ 2015-04-10 14:35 UTC (permalink / raw)
  To: Kai Huang
  Cc: Tian, Kevin, Andrew Cooper, xen-devel, Kai Huang, Jan Beulich,
	yang.z.zhang

At 21:18 +0800 on 10 Apr (1428700730), Kai Huang wrote:
> On Fri, Apr 10, 2015 at 5:46 PM, Tim Deegan <tim@xen.org> wrote:
> > I see.  I think I was confused by the field being called "reserved" in
> > the struct.  I've just checked an older copy of the SDM and those bits
> > used to be "Ignored", so we can just set them on all hardware.  Good!
> 
> So to confirm, you intend to  update EPT A/D bits directly without
> prior checking whether cpu supports A/D bits?

Yes, that's what I meant.

Thanks,

Tim.

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2015-04-10 14:35 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-27  2:35 [PATCH 00/10] PML (Paging Modification Logging) support Kai Huang
2015-03-27  2:35 ` [PATCH 01/10] VMX: Enable EPT A/D bit support Kai Huang
2015-03-27 20:38   ` Andrew Cooper
2015-03-30  6:11     ` Kai Huang
2015-03-30  9:36       ` Andrew Cooper
2015-03-30 13:35         ` Kai Huang
2015-03-30 13:39           ` Andrew Cooper
2015-04-02  6:32     ` Kai Huang
2015-04-02  9:55       ` Andrew Cooper
2015-04-09 11:21   ` Tim Deegan
2015-04-10  6:40     ` Kai Huang
2015-04-10  8:54       ` Tim Deegan
2015-04-10  9:26         ` Kai Huang
2015-04-10  9:51           ` Tim Deegan
2015-04-10 13:14             ` Kai Huang
2015-03-27  2:35 ` [PATCH 02/10] VMX: New parameter to control PML enabling Kai Huang
2015-03-27 20:42   ` Andrew Cooper
2015-03-30  6:16     ` Kai Huang
2015-04-02  5:46     ` Kai Huang
2015-04-02  9:58       ` Andrew Cooper
2015-04-02 13:34         ` Kai Huang
2015-03-27  2:35 ` [PATCH 03/10] VMX: Add PML definition and feature detection Kai Huang
2015-03-27 20:46   ` Andrew Cooper
2015-03-30  6:18     ` Kai Huang
2015-03-27  2:35 ` [PATCH 04/10] VMX: New data structure member to support PML Kai Huang
2015-03-27 20:48   ` Andrew Cooper
2015-03-30  6:19     ` Kai Huang
2015-03-27  2:35 ` [PATCH 05/10] VMX: add help functions " Kai Huang
2015-03-27 21:09   ` Andrew Cooper
2015-03-30  6:43     ` Kai Huang
2015-03-30  9:54       ` Andrew Cooper
2015-03-30 13:40         ` Kai Huang
2015-04-09 12:00   ` Tim Deegan
2015-04-10  7:05     ` Kai Huang
2015-04-10  9:03       ` Tim Deegan
2015-04-10  9:28         ` Kai Huang
2015-04-09 12:31   ` Tim Deegan
2015-04-10  7:07     ` Kai Huang
2015-03-27  2:35 ` [PATCH 06/10] VMX: handle PML buffer full VMEXIT Kai Huang
2015-03-27  2:35 ` [PATCH 07/10] VMX: handle PML enabling in vmx_vcpu_initialise Kai Huang
2015-03-27 21:12   ` Andrew Cooper
2015-03-30  7:03     ` Kai Huang
2015-03-30 10:00       ` Andrew Cooper
2015-03-27  2:35 ` [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy Kai Huang
2015-04-09 12:04   ` Tim Deegan
2015-04-10  7:25     ` Kai Huang
2015-04-10  9:30       ` Tim Deegan
2015-03-27  2:35 ` [PATCH 09/10] log-dirty: Refine common code to support PML Kai Huang
2015-04-09 12:27   ` Tim Deegan
2015-04-10  7:38     ` Kai Huang
2015-04-10  9:31       ` Tim Deegan
2015-04-10  9:33         ` Kai Huang
2015-03-27  2:35 ` [PATCH 10/10] p2m/ept: Enable PML in p2m-ept for log-dirty Kai Huang
2015-04-09 12:20   ` Tim Deegan
2015-04-10  8:44     ` Kai Huang
2015-04-10  9:46       ` Tim Deegan
2015-04-10 13:18         ` Kai Huang
2015-04-10 14:35           ` Tim Deegan
2015-03-27 21:26 ` [PATCH 00/10] PML (Paging Modification Logging) support Andrew Cooper
2015-03-30  5:50   ` Kai Huang
2015-04-07  8:30     ` Kai Huang
2015-04-07  9:24       ` Tim Deegan
2015-04-08  2:23         ` Kai Huang
2015-04-09 12:32         ` Tim Deegan
2015-04-10  6:40           ` Kai Huang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.