[PATCH v1 0/7] Implement support for external IPT monitoring

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/7] Implement support for external IPT monitoring
@ 2020-06-16 15:16 Michał Leszczyński
  2020-06-16 15:19 ` [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
                   ` (7 more replies)
  0 siblings, 8 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:16 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	Roger Pau Monné

Intel Processor Trace is an architectural extension available in modern Intel family CPUs. It allows recording the detailed trace of activity while the processor executes the code. One might use the recorded trace to reconstruct the code flow. It means, to find out the executed code paths, determine branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3C: System Programming Guide, Part 3, Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such a feature has numerous applications like malware monitoring, fuzzing, or performance testing.

Michal Leszczynski (7):
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  x86/vmx: add ipt_state as part of vCPU state
  x86/vmx: add do_vmtrace_op
  tools/libxc: add xc_ptbuf_* functions
  tools/proctrace: add proctrace tool
  x86/vmx: switch IPT MSRs on vmentry/vmexit

 tools/libxc/include/xenctrl.h               |  59 ++++
 tools/libxc/xc_tbuf.c                       | 108 +++++++
 tools/proctrace/COPYING                     | 339 ++++++++++++++++++++
 tools/proctrace/Makefile                    |  49 +++
 tools/proctrace/proctrace.c                 | 139 ++++++++
 xen/arch/x86/hvm/hvm.c                      | 170 ++++++++++
 xen/arch/x86/hvm/vmx/vmx.c                  |  52 +++
 xen/include/asm-x86/cpufeature.h            |   1 +
 xen/include/asm-x86/hvm/hvm.h               |   9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h          |  11 +
 xen/include/asm-x86/msr-index.h             |  37 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/hvm/hvm_op.h             |  27 ++
 13 files changed, 1002 insertions(+)
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

--
2.20.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
@ 2020-06-16 15:19 ` Michał Leszczyński
  2020-06-18 13:31   ` Jan Beulich
  2020-06-16 15:20 ` [PATCH v1 2/7] x86/vmx: add IPT cpu feature Michał Leszczyński
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:19 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/include/asm-x86/msr-index.h | 37 +++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index b328a47ed8..ecf0dd8bab 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -621,4 +621,41 @@
 #define MSR_PKGC9_IRTL			0x00000634
 #define MSR_PKGC10_IRTL			0x00000635
 
+/* Intel PT MSRs */
+#define MSR_IA32_RTIT_CTL              0x00000570
+#define RTIT_CTL_TRACEEN               (1ULL << 0)
+#define RTIT_CTL_CYCEN                 (1ULL << 1)
+#define RTIT_CTL_OS                    (1ULL << 2)
+#define RTIT_CTL_USR                   (1ULL << 3)
+#define RTIT_CTL_PWR_EVT_EN            (1ULL << 4)
+#define RTIT_CTL_FUP_ON_PTW            (1ULL << 5)
+#define RTIT_CTL_FABRIC_EN             (1ULL << 6)
+#define RTIT_CTL_CR3_FILTER            (1ULL << 7)
+#define RTIT_CTL_TOPA                  (1ULL << 8)
+#define RTIT_CTL_MTC_EN                (1ULL << 9)
+#define RTIT_CTL_TSC_EN                (1ULL << 10)
+#define RTIT_CTL_DIS_RETC              (1ULL << 11)
+#define RTIT_CTL_PTW_EN                (1ULL << 12)
+#define RTIT_CTL_BRANCH_EN             (1ULL << 13)
+#define RTIT_CTL_MTC_FREQ_OFFSET       14
+#define RTIT_CTL_MTC_FREQ              (0x0fULL << RTIT_CTL_MTC_FREQ_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET     19
+#define RTIT_CTL_CYC_THRESH            (0x0fULL << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET       24
+#define RTIT_CTL_PSB_FREQ              (0x0fULL << RTIT_CTL_PSB_FREQ_OFFSET)
+#define RTIT_CTL_ADDR_OFFSET(n)        (32 + 4 * (n))
+#define RTIT_CTL_ADDR(n)               (0x0fULL << RTIT_CTL_ADDR_OFFSET(n))
+#define MSR_IA32_RTIT_STATUS           0x00000571
+#define RTIT_STATUS_FILTER_EN          (1ULL << 0)
+#define RTIT_STATUS_CONTEXT_EN         (1ULL << 1)
+#define RTIT_STATUS_TRIGGER_EN         (1ULL << 2)
+#define RTIT_STATUS_ERROR              (1ULL << 4)
+#define RTIT_STATUS_STOPPED            (1ULL << 5)
+#define RTIT_STATUS_BYTECNT            (0x1ffffULL << 32)
+#define MSR_IA32_RTIT_CR3_MATCH        0x00000572
+#define MSR_IA32_RTIT_OUTPUT_BASE      0x00000560
+#define MSR_IA32_RTIT_OUTPUT_MASK      0x00000561
+#define MSR_IA32_RTIT_ADDR_A(n)        (0x00000580 + (n) * 2)
+#define MSR_IA32_RTIT_ADDR_B(n)        (0x00000581 + (n) * 2)
+
 #endif /* __ASM_MSR_INDEX_H */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v1 2/7] x86/vmx: add IPT cpu feature
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
  2020-06-16 15:19 ` [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
@ 2020-06-16 15:20 ` Michał Leszczyński
  2020-06-16 16:30   ` Roger Pau Monné
  2020-06-16 15:21 ` [PATCH v1 3/7] x86/vmx: add ipt_state as part of vCPU state Michał Leszczyński
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich,
	Roger Pau Monné

Check if Intel Processor Trace feature is supported by current
processor. Define hvm_ipt_supported function.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/vmx/vmx.c                  | 24 +++++++++++++++++++++
 xen/include/asm-x86/cpufeature.h            |  1 +
 xen/include/asm-x86/hvm/hvm.h               |  9 ++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h          |  1 +
 xen/include/public/arch-x86/cpufeatureset.h |  1 +
 5 files changed, 36 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ab19d9424e..a91bbdb798 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2484,6 +2484,7 @@ static bool __init has_if_pschange_mc(void)
 
 const struct hvm_function_table * __init start_vmx(void)
 {
+    u64 _vmx_misc_cap;
     set_in_cr4(X86_CR4_VMXE);
 
     if ( vmx_vmcs_init() )
@@ -2557,6 +2558,29 @@ const struct hvm_function_table * __init start_vmx(void)
         vmx_function_table.get_guest_bndcfgs = vmx_get_guest_bndcfgs;
     }
 
+    /* Check whether IPT is supported in VMX operation */
+    vmx_function_table.ipt_supported = 1;
+
+    if ( !cpu_has_ipt )
+    {
+        vmx_function_table.ipt_supported = 0;
+        printk("VMX: Missing support for Intel Processor Trace x86 feature.\n");
+    }
+
+    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+
+    if ( !( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED ) )
+    {
+        vmx_function_table.ipt_supported = 0;
+        printk("VMX: Missing support for Intel Processor Trace in VMX operation, VMX_MISC caps: %llx\n",
+               (unsigned long long)_vmx_misc_cap);
+    }
+
+    if (vmx_function_table.ipt_supported)
+    {
+        printk("VMX: Intel Processor Trace is SUPPORTED");
+    }
+
     lbr_tsx_fixup_check();
     ler_to_fixup_check();
 
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..8d7955dd87 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwb            boot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cd        boot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt             boot_cpu_has(X86_FEATURE_IPT)
 #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bw        boot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..48465b6067 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -96,6 +96,9 @@ struct hvm_function_table {
     /* Necessary hardware support for alternate p2m's? */
     bool altp2m_supported;
 
+    /* Hardware support for IPT? */
+    bool ipt_supported;
+
     /* Hardware virtual interrupt delivery enable? */
     bool virtual_intr_delivery_enabled;
 
@@ -630,6 +633,12 @@ static inline bool hvm_altp2m_supported(void)
     return hvm_funcs.altp2m_supported;
 }
 
+/* returns true if hardware supports Intel Processor Trace */
+static inline bool hvm_ipt_supported(void)
+{
+    return hvm_funcs.ipt_supported;
+}
+
 /* updates the current hardware p2m */
 static inline void altp2m_vcpu_update_p2m(struct vcpu *v)
 {
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..4c81093aba 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -285,6 +285,7 @@ extern u64 vmx_ept_vpid_cap;
 
 #define VMX_MISC_CR3_TARGET                     0x01ff0000
 #define VMX_MISC_VMWRITE_ALL                    0x20000000
+#define VMX_MISC_PT_SUPPORTED                   0x00004000
 
 #define VMX_TSC_MULTIPLIER_MAX                  0xffffffffffffffffULL
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 5ca35d9d97..7cfcac451d 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,          5*32+20) /*S  Supervisor Mode Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add */
 XEN_CPUFEATURE(CLFLUSHOPT,    5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,          5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(IPT,           5*32+25) /*H  Intel Processor Trace */
 XEN_CPUFEATURE(AVX512PF,      5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,      5*32+27) /*A  AVX-512 Exponent & Reciprocal Instrs */
 XEN_CPUFEATURE(AVX512CD,      5*32+28) /*A  AVX-512 Conflict Detection Instrs */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v1 3/7] x86/vmx: add ipt_state as part of vCPU state
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
  2020-06-16 15:19 ` [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
  2020-06-16 15:20 ` [PATCH v1 2/7] x86/vmx: add IPT cpu feature Michał Leszczyński
@ 2020-06-16 15:21 ` Michał Leszczyński
  2020-06-16 16:33   ` Roger Pau Monné
  2020-06-16 15:22 ` [PATCH v1 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:21 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich,
	Roger Pau Monné

Guest IPT state will be preserved across vmentry/vmexit using
this structure.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/vmx/vmx.c         |  2 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a91bbdb798..97104c319e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -471,6 +471,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
     vmx_install_vlapic_mapping(v);
 
+    v->arch.hvm.vmx.ipt_state = NULL;
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 4c81093aba..273ade975e 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -104,6 +104,13 @@ struct pi_blocking_vcpu {
     spinlock_t           *lock;
 };
 
+struct ipt_state {
+    uint64_t ctl;
+    uint64_t status;
+    uint64_t output_base;
+    uint64_t output_mask;
+};
+
 struct vmx_vcpu {
     /* Physical address of VMCS. */
     paddr_t              vmcs_pa;
@@ -186,6 +193,9 @@ struct vmx_vcpu {
      * pCPU and wakeup the related vCPU.
      */
     struct pi_blocking_vcpu pi_blocking;
+
+    /* State of Intel Processor Trace feature */
+    struct ipt_state     *ipt_state;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (2 preceding siblings ...)
  2020-06-16 15:21 ` [PATCH v1 3/7] x86/vmx: add ipt_state as part of vCPU state Michał Leszczyński
@ 2020-06-16 15:22 ` Michał Leszczyński
  2020-06-16 17:23   ` Roger Pau Monné
  2020-06-16 15:22 ` [PATCH v1 5/7] tools/libxc: add xc_ptbuf_* functions Michał Leszczyński
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:22 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Roger Pau Monné

Provide an interface for privileged domains to manage
external IPT monitoring.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/hvm.c          | 170 ++++++++++++++++++++++++++++++++
 xen/include/public/hvm/hvm_op.h |  27 +++++
 2 files changed, 197 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5bb47583b3..9292caebe0 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4949,6 +4949,172 @@ static int compat_altp2m_op(
     return rc;
 }
 
+static int do_vmtrace_op(
+    XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    struct xen_hvm_vmtrace_op a;
+    struct domain *d = NULL;
+    int rc = -EFAULT;
+    int i;
+    struct vcpu *v;
+    void* buf;
+    uint32_t buf_size;
+    uint32_t buf_order;
+    uint64_t buf_mfn;
+    struct page_info *pg;
+
+    if ( !hvm_ipt_supported() )
+        return -EOPNOTSUPP;
+
+    if ( copy_from_guest(&a, arg, 1) )
+        return -EFAULT;
+
+    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
+        return -EINVAL;
+
+    switch ( a.cmd )
+    {
+    case HVMOP_vmtrace_ipt_enable:
+    case HVMOP_vmtrace_ipt_disable:
+    case HVMOP_vmtrace_ipt_get_buf:
+    case HVMOP_vmtrace_ipt_get_offset:
+        break;
+
+    default:
+        return -EOPNOTSUPP;
+    }
+
+    d = rcu_lock_domain_by_any_id(a.domain);
+
+    if ( d == NULL )
+        return -ESRCH;
+
+    if ( !is_hvm_domain(d) )
+    {
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    domain_pause(d);
+
+    if ( a.vcpu >= d->max_vcpus )
+    {
+        rc = -EINVAL;
+        goto out;
+    }
+
+    v = d->vcpu[a.vcpu];
+
+    if ( a.cmd == HVMOP_vmtrace_ipt_enable )
+    {
+        if ( v->arch.hvm.vmx.ipt_state ) {
+            // already enabled
+            rc = -EINVAL;
+            goto out;
+        }
+
+        if ( a.size < PAGE_SIZE || a.size > 1000000 * PAGE_SIZE ) {
+            // we don't accept trace buffer size smaller than single page
+            // and the upper bound is defined as 4GB in the specification
+            rc = -EINVAL;
+            goto out;
+	}
+
+        buf_order = get_order_from_bytes(a.size);
+
+        if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {
+            rc = -EINVAL;
+            goto out;
+        }
+
+        buf = page_to_virt(alloc_domheap_pages(d, buf_order, MEMF_no_refcount));
+        buf_size = a.size;
+
+        if ( !buf ) {
+            rc = -EFAULT;
+            goto out;
+        }
+
+        memset(buf, 0, buf_size);
+
+        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) {
+            share_xen_page_with_privileged_guests(virt_to_page(buf) + i, SHARE_ro);
+        }
+
+        v->arch.hvm.vmx.ipt_state = xmalloc(struct ipt_state);
+        v->arch.hvm.vmx.ipt_state->output_base = virt_to_mfn(buf) << PAGE_SHIFT;
+        v->arch.hvm.vmx.ipt_state->output_mask = buf_size - 1;
+        v->arch.hvm.vmx.ipt_state->status = 0;
+        v->arch.hvm.vmx.ipt_state->ctl = RTIT_CTL_TRACEEN | RTIT_CTL_OS | RTIT_CTL_USR | RTIT_CTL_BRANCH_EN;
+    }
+    else if ( a.cmd == HVMOP_vmtrace_ipt_disable )
+    {
+        if ( !v->arch.hvm.vmx.ipt_state ) {
+            rc = -EINVAL;
+            goto out;
+        }
+
+        buf_mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
+        buf_size = ( v->arch.hvm.vmx.ipt_state->output_mask + 1 ) & 0xFFFFFFFFUL;
+
+        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
+        {
+            if ( (mfn_to_page(_mfn(buf_mfn + i))->count_info & PGC_count_mask) != 1 )
+            {
+                rc = -EBUSY;
+                goto out;
+            }
+        }
+
+        xfree(v->arch.hvm.vmx.ipt_state);
+	v->arch.hvm.vmx.ipt_state = NULL;
+
+        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
+        {
+            pg = mfn_to_page(_mfn(buf_mfn + i));
+            put_page_alloc_ref(pg);
+            if ( !test_and_clear_bit(_PGC_xen_heap, &pg->count_info) )
+                ASSERT_UNREACHABLE();
+            pg->u.inuse.type_info = 0;
+            page_set_owner(pg, NULL);
+            free_domheap_page(pg);
+        }
+    }
+    else if ( a.cmd == HVMOP_vmtrace_ipt_get_buf )
+    {
+        if ( !v->arch.hvm.vmx.ipt_state ) {
+            rc = -EINVAL;
+            goto out;
+        }
+
+        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
+        a.size = (v->arch.hvm.vmx.ipt_state->output_mask + 1) & 0xFFFFFFFFUL;
+    }
+    else if ( a.cmd == HVMOP_vmtrace_ipt_get_offset )
+    {
+        if ( !v->arch.hvm.vmx.ipt_state ) {
+            rc = -EINVAL;
+            goto out;
+        }
+
+        a.offset = v->arch.hvm.vmx.ipt_state->output_mask >> 32;
+    }
+
+    rc = -EFAULT;
+    if ( __copy_to_guest(arg, &a, 1) )
+      goto out;
+    rc = 0;
+
+ out:
+    smp_wmb();
+    domain_unpause(d);
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
+DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
+
 static int hvmop_get_mem_type(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
 {
@@ -5101,6 +5267,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
         break;
 
+    case HVMOP_vmtrace:
+        rc = do_vmtrace_op(arg);
+        break;
+
     default:
     {
         gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 870ec52060..3bbcd54c96 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
 typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
 
+/* HVMOP_vmtrace: Perform VM tracing related operation */
+#define HVMOP_vmtrace 26
+
+#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
+
+struct xen_hvm_vmtrace_op {
+    /* IN variable */
+    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
+    uint32_t cmd;
+/* Enable/disable external vmtrace for given domain */
+#define HVMOP_vmtrace_ipt_enable      1
+#define HVMOP_vmtrace_ipt_disable     2
+#define HVMOP_vmtrace_ipt_get_buf     3
+#define HVMOP_vmtrace_ipt_get_offset  4
+    domid_t domain;
+    uint32_t vcpu;
+
+    /* IN/OUT variable */
+    uint64_t size;
+
+    /* OUT variable */
+    uint64_t mfn;
+    uint64_t offset;
+};
+typedef struct xen_hvm_vmtrace_op xen_hvm_vmtrace_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_vmtrace_op_t);
+
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
 
 /*
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v1 5/7] tools/libxc: add xc_ptbuf_* functions
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (3 preceding siblings ...)
  2020-06-16 15:22 ` [PATCH v1 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
@ 2020-06-16 15:22 ` Michał Leszczyński
  2020-06-16 15:23 ` [PATCH v1 6/7] tools/proctrace: add proctrace tool Michał Leszczyński
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Ian Jackson, Wei Liu

Add functions in libxc that use the new HVMOP_vmtrace interface.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 tools/libxc/include/xenctrl.h |  59 +++++++++++++++++++
 tools/libxc/xc_tbuf.c         | 108 ++++++++++++++++++++++++++++++++++
 2 files changed, 167 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 113ddd935d..0a972deb7d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,65 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable Intel Processor Trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm size trace buffer size in bytes, must be power of 2, between 4 kB and 4 GB
+ * @return 0 on success, -1 on failure
+ */
+int xc_ptbuf_enable(xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t size);
+
+/**
+ * Disable Intel Processor Trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_ptbuf_disable(xc_interface *xch, uint32_t domid, uint32_t vcpu);
+
+/**
+ * Map the trace buffer into Dom0.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm buf pointer to the mapped buffer will be written there
+ * @parm trace buffer size (in bytes) will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_ptbuf_map(xc_interface *xch, uint32_t domid, uint32_t vcpu, uint8_t **buf, uint64_t *size);
+
+/**
+ * Unmap the trace buffer from Dom0.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm buf pointer to the mapped buffer
+ * @parm size of the trace buffer (in bytes)
+ * @return 0 on success, -1 on failure
+ */
+int xc_ptbuf_unmap(xc_interface *xch, uint8_t *buf, uint64_t size);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_ptbuf_get_offset(xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_tbuf.c b/tools/libxc/xc_tbuf.c
index 283fbd1c8f..8fab7f7d79 100644
--- a/tools/libxc/xc_tbuf.c
+++ b/tools/libxc/xc_tbuf.c
@@ -79,6 +79,114 @@ int xc_tbuf_get_size(xc_interface *xch, unsigned long *size)
     return rc;
 }
 
+int xc_ptbuf_enable(xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t size)
+{
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+    int rc = -1;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+    arg->cmd = HVMOP_vmtrace_ipt_enable;
+    arg->domain = domid;
+    arg->vcpu = vcpu;
+    arg->size = size;
+
+    rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+                  HYPERCALL_BUFFER_AS_ARG(arg));
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_ptbuf_get_offset(xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset)
+{
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+    int rc = -1;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+    arg->cmd = HVMOP_vmtrace_ipt_get_offset;
+    arg->domain = domid;
+    arg->vcpu = vcpu;
+
+    rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+                  HYPERCALL_BUFFER_AS_ARG(arg));
+
+    if ( rc == 0 )
+    {
+        *offset = arg->offset;
+    }
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_ptbuf_map(xc_interface *xch, uint32_t domid, uint32_t vcpu, uint8_t **buf, uint64_t *size)
+{
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+    int rc = -1;
+    uint8_t *mapped_buf;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+    arg->cmd = HVMOP_vmtrace_ipt_get_buf;
+    arg->domain = domid;
+    arg->vcpu = vcpu;
+
+    rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+                  HYPERCALL_BUFFER_AS_ARG(arg));
+
+    if ( rc == 0 )
+    {
+        mapped_buf = (uint8_t *)xc_map_foreign_range(xch, DOMID_XEN, arg->size, PROT_READ, arg->mfn);
+
+        if ( mapped_buf == NULL )
+            return -1;
+
+        *buf = mapped_buf;
+        *size = arg->size;
+    }
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_ptbuf_unmap(xc_interface *xch, uint8_t *buf, uint64_t size)
+{
+    xenforeignmemory_unmap(xch->fmem, buf, size >> PAGE_SHIFT);
+    return 0;
+}
+
+int xc_ptbuf_disable(xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+    int rc = -1;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+    arg->cmd = HVMOP_vmtrace_ipt_disable;
+    arg->domain = domid;
+    arg->vcpu = vcpu;
+
+    rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+                  HYPERCALL_BUFFER_AS_ARG(arg));
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_tbuf_enable(xc_interface *xch, unsigned long pages, unsigned long *mfn,
                    unsigned long *size)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v1 6/7] tools/proctrace: add proctrace tool
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (4 preceding siblings ...)
  2020-06-16 15:22 ` [PATCH v1 5/7] tools/libxc: add xc_ptbuf_* functions Michał Leszczyński
@ 2020-06-16 15:23 ` Michał Leszczyński
  2020-06-16 15:24 ` [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit Michał Leszczyński
  2020-06-16 18:17 ` [PATCH v1 0/7] Implement support for external IPT monitoring Andrew Cooper
  7 siblings, 0 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:23 UTC (permalink / raw)
  To: Xen-devel; +Cc: Ian Jackson, Wei Liu

Add an demonstration tool that uses xc_ptbuf_* calls in order
to manage external IPT monitoring for DomU.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 tools/proctrace/COPYING     | 339 ++++++++++++++++++++++++++++++++++++
 tools/proctrace/Makefile    |  49 ++++++
 tools/proctrace/proctrace.c | 139 +++++++++++++++
 3 files changed, 527 insertions(+)
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

diff --git a/tools/proctrace/COPYING b/tools/proctrace/COPYING
new file mode 100644
index 0000000000..c0a841112c
--- /dev/null
+++ b/tools/proctrace/COPYING
@@ -0,0 +1,339 @@
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+                       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+		    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+			    NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+	    How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; If not, see <http://www.gnu.org/licenses/>.
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
new file mode 100644
index 0000000000..d9231dfa24
--- /dev/null
+++ b/tools/proctrace/Makefile
@@ -0,0 +1,49 @@
+# Copyright (C) CERT Polska - NASK PIB
+# Author: Michał Leszczyński <michal.leszczynski@cert.pl>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; under version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += -Werror
+CFLAGS  += $(CFLAGS_libxenevtchn)
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenevtchn)
+
+# SCRIPTS = xenmon.py
+
+.PHONY: all
+all: build
+
+.PHONY: build
+build: proctrace
+
+.PHONY: install
+install: build
+	$(INSTALL_DIR) $(DESTDIR)$(sbindir)
+	$(INSTALL_PROG) proctrace $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: uninstall
+uninstall:
+	rm -f $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: clean
+clean:
+	$(RM) -f $(DEPS_RM)
+
+.PHONY: distclean
+distclean: clean
+
+iptlive: iptlive.o Makefile
+	$(CC) $(LDFLAGS) $< -o $@ $(LDLIBS) $(APPEND_LDFLAGS)
+
+-include $(DEPS_INCLUDE)
diff --git a/tools/proctrace/proctrace.c b/tools/proctrace/proctrace.c
new file mode 100644
index 0000000000..74409428b4
--- /dev/null
+++ b/tools/proctrace/proctrace.c
@@ -0,0 +1,139 @@
+/******************************************************************************
+ * tools/proctrace.c
+ *
+ * Demonstrative tool for collecting Intel Processor Trace data from Xen.
+ *  Could be used to externally monitor a given vCPU in given DomU.
+ *
+ * Copyright (C) 2020 by CERT Polska - NASK PIB
+ *
+ * Authors: Michał Leszczyński, michal.leszczynski@cert.pl
+ * Date:    June, 2020
+ * 
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; under version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <errno.h>
+#include <signal.h>
+#include <xenevtchn.h>
+#include <xenctrl.h>
+#include <xen/xen.h>
+#include <string.h>
+#include <sys/select.h>
+#include <getopt.h>
+
+
+volatile int interrupted = 0;
+
+void term_handler(int signum) {
+    interrupted = 1;
+}
+
+int main(int argc, char* argv[]) {
+    xc_interface *xc;
+    uint32_t domid;
+    uint32_t vcpu_id;
+
+    int rc = -1;
+    uint8_t *buf;
+    uint64_t size;
+    uint64_t last_offset = 0;
+
+    signal(SIGINT, term_handler);
+
+    if (argc != 3) {
+        fprintf(stderr, "Usage: %s <domid> <vcpu_id>\n", argv[0]);
+        fprintf(stderr, "It's recommended to redirect this program's output to file\n");
+        fprintf(stderr, "or to pipe it's output to xxd or other program.\n");
+        return 1;
+    }
+
+    domid = atoi(argv[1]);
+    vcpu_id = atoi(argv[2]);
+
+    xc = xc_interface_open(0, 0, 0);
+
+    if (!xc) {
+        fprintf(stderr, "Failed to open xc interface\n");
+        return 1;
+    }
+
+    rc = xc_ptbuf_enable(xc, domid, vcpu_id, 64 * 1024 * 1024);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_ptbuf_enable\n");
+        return 1;
+    }
+
+    rc = xc_ptbuf_map(xc, domid, vcpu_id, &buf, &size);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_ptbuf_map\n");
+        return 1;
+    }
+
+    while (!interrupted) {
+        uint64_t offset;
+        rc = xc_ptbuf_get_offset(xc, domid, vcpu_id, &offset);
+
+        if (rc) {
+            fprintf(stderr, "Failed to call xc_ptbuf_get_offset\n");
+            return 1;
+        }
+
+        if (offset > last_offset)
+        {
+            fwrite(buf + last_offset, offset - last_offset, 1, stdout);
+        }
+        else
+        {
+            // buffer wrapped
+            fwrite(buf + last_offset, size - last_offset, 1, stdout);
+            fwrite(buf, offset, 1, stdout);
+        }
+
+        last_offset = offset;
+        usleep(1000 * 100);
+    }
+
+    rc = xc_ptbuf_unmap(xc, buf, size);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_ptbuf_unmap\n");
+        return 1;
+    }
+
+    rc = xc_ptbuf_disable(xc, domid, vcpu_id);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_ptbuf_disable\n");
+        return 1;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (5 preceding siblings ...)
  2020-06-16 15:23 ` [PATCH v1 6/7] tools/proctrace: add proctrace tool Michał Leszczyński
@ 2020-06-16 15:24 ` Michał Leszczyński
  2020-06-16 17:38   ` Roger Pau Monné
  2020-06-18 17:38   ` Andrew Cooper
  2020-06-16 18:17 ` [PATCH v1 0/7] Implement support for external IPT monitoring Andrew Cooper
  7 siblings, 2 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 15:24 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich,
	Roger Pau Monné

Enable IPT when entering the VM and disable it on vmexit.
Register state is persisted using vCPU ipt_state structure.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/vmx/vmx.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 97104c319e..01d9a7b584 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3698,6 +3698,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     __vmread(GUEST_RSP,    &regs->rsp);
     __vmread(GUEST_RFLAGS, &regs->rflags);
 
+    if ( unlikely(v->arch.hvm.vmx.ipt_state) )
+    {
+        wrmsrl(MSR_IA32_RTIT_CTL, 0);
+        smp_rmb();
+
+        rdmsrl(MSR_IA32_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
+        rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, v->arch.hvm.vmx.ipt_state->output_mask);
+    }
+
     hvm_invalidate_regs_fields(regs);
 
     if ( paging_mode_hap(v->domain) )
@@ -4497,6 +4506,23 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
     }
 
  out:
+    if ( unlikely(curr->arch.hvm.vmx.ipt_state) )
+    {
+        wrmsrl(MSR_IA32_RTIT_CTL, 0);
+
+        if (curr->arch.hvm.vmx.ipt_state->ctl)
+        {
+            wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, curr->arch.hvm.vmx.ipt_state->output_base);
+            wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, curr->arch.hvm.vmx.ipt_state->output_mask);
+            wrmsrl(MSR_IA32_RTIT_STATUS, curr->arch.hvm.vmx.ipt_state->status);
+
+            // MSR_IA32_RTIT_CTL is context-switched manually instead of being
+            // stored inside VMCS, as of Q2'20 only the most recent processors
+            // support such field in VMCS
+            wrmsrl(MSR_IA32_RTIT_CTL, curr->arch.hvm.vmx.ipt_state->ctl);
+        }
+    }
+
     if ( unlikely(curr->arch.hvm.vmx.lbr_flags & LBR_FIXUP_MASK) )
         lbr_fixup();
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 2/7] x86/vmx: add IPT cpu feature
  2020-06-16 15:20 ` [PATCH v1 2/7] x86/vmx: add IPT cpu feature Michał Leszczyński
@ 2020-06-16 16:30   ` Roger Pau Monné
  2020-06-17 11:34     ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-16 16:30 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel

On Tue, Jun 16, 2020 at 05:20:39PM +0200, Michał Leszczyński wrote:
> Check if Intel Processor Trace feature is supported by current
> processor. Define hvm_ipt_supported function.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c                  | 24 +++++++++++++++++++++
>  xen/include/asm-x86/cpufeature.h            |  1 +
>  xen/include/asm-x86/hvm/hvm.h               |  9 ++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h          |  1 +
>  xen/include/public/arch-x86/cpufeatureset.h |  1 +
>  5 files changed, 36 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ab19d9424e..a91bbdb798 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2484,6 +2484,7 @@ static bool __init has_if_pschange_mc(void)
>  
>  const struct hvm_function_table * __init start_vmx(void)
>  {
> +    u64 _vmx_misc_cap;

Please use uint64_t, and you can drop the leading _vmx prefix, this is
already vmx specific. Also add a newline between variable definition
and code.

>      set_in_cr4(X86_CR4_VMXE);
>  
>      if ( vmx_vmcs_init() )
> @@ -2557,6 +2558,29 @@ const struct hvm_function_table * __init start_vmx(void)
>          vmx_function_table.get_guest_bndcfgs = vmx_get_guest_bndcfgs;
>      }
>  
> +    /* Check whether IPT is supported in VMX operation */
> +    vmx_function_table.ipt_supported = 1;
> +
> +    if ( !cpu_has_ipt )
> +    {
> +        vmx_function_table.ipt_supported = 0;
> +        printk("VMX: Missing support for Intel Processor Trace x86 feature.\n");
> +    }
> +
> +    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
> +
> +    if ( !( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED ) )
> +    {
> +        vmx_function_table.ipt_supported = 0;
> +        printk("VMX: Missing support for Intel Processor Trace in VMX operation, VMX_MISC caps: %llx\n",
> +               (unsigned long long)_vmx_misc_cap);
> +    }
> +
> +    if (vmx_function_table.ipt_supported)
> +    {
> +        printk("VMX: Intel Processor Trace is SUPPORTED");
> +    }

I think you could simplify this as:

vmx_function_table.ipt_supported = cpu_has_ipt &&
                                   (misc_cap & VMX_MISC_PT_SUPPORTED);

Also the code is too chatty IMO.

Looking at how other VMX features are detected, I think you should
move the checks to vmx_init_vmcs_config and set the relevant bits in
the VM control registers that you can then evaluate in
vmx_display_features in order to print if the feature is supported?

> +
>      lbr_tsx_fixup_check();
>      ler_to_fixup_check();
>  
> diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
> index f790d5c1f8..8d7955dd87 100644
> --- a/xen/include/asm-x86/cpufeature.h
> +++ b/xen/include/asm-x86/cpufeature.h
> @@ -104,6 +104,7 @@
>  #define cpu_has_clwb            boot_cpu_has(X86_FEATURE_CLWB)
>  #define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
>  #define cpu_has_avx512cd        boot_cpu_has(X86_FEATURE_AVX512CD)
> +#define cpu_has_ipt             boot_cpu_has(X86_FEATURE_IPT)
>  #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
>  #define cpu_has_avx512bw        boot_cpu_has(X86_FEATURE_AVX512BW)
>  #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 1eb377dd82..48465b6067 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -96,6 +96,9 @@ struct hvm_function_table {
>      /* Necessary hardware support for alternate p2m's? */
>      bool altp2m_supported;
>  
> +    /* Hardware support for IPT? */
> +    bool ipt_supported;

We might want to name this pt_supported, since it's possible for other
vendors to also introduce a processor tracing feature in the future?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 3/7] x86/vmx: add ipt_state as part of vCPU state
  2020-06-16 15:21 ` [PATCH v1 3/7] x86/vmx: add ipt_state as part of vCPU state Michał Leszczyński
@ 2020-06-16 16:33   ` Roger Pau Monné
  0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-16 16:33 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel

On Tue, Jun 16, 2020 at 05:21:20PM +0200, Michał Leszczyński wrote:
> Guest IPT state will be preserved across vmentry/vmexit using
> this structure.

I think you should squash this patch with a patch where the structure
it's actually used.

> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c         |  2 ++
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 10 ++++++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index a91bbdb798..97104c319e 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -471,6 +471,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>      vmx_install_vlapic_mapping(v);
>  
> +    v->arch.hvm.vmx.ipt_state = NULL;

Nit: there's no need to init this to NULL, since the structure is
zeroed on allocation.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-16 15:22 ` [PATCH v1 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
@ 2020-06-16 17:23   ` Roger Pau Monné
  2020-06-17 19:13     ` Michał Leszczyński
  2020-06-18 15:25     ` Michał Leszczyński
  0 siblings, 2 replies; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-16 17:23 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
> Provide an interface for privileged domains to manage
> external IPT monitoring.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>

Thanks for the patch! I have some questions below which require your
input.

> ---
>  xen/arch/x86/hvm/hvm.c          | 170 ++++++++++++++++++++++++++++++++
>  xen/include/public/hvm/hvm_op.h |  27 +++++
>  2 files changed, 197 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5bb47583b3..9292caebe0 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4949,6 +4949,172 @@ static int compat_altp2m_op(
>      return rc;
>  }
>  
> +static int do_vmtrace_op(
> +    XEN_GUEST_HANDLE_PARAM(void) arg)

No need for the newline, this can fit on a single line.

> +{
> +    struct xen_hvm_vmtrace_op a;
> +    struct domain *d = NULL;

I don't think you need to init d to NULL (at least by looking at the
current code below).

> +    int rc = -EFAULT;

No need to init rc.

> +    int i;

unsigned since it's used as a loop counter.

> +    struct vcpu *v;
> +    void* buf;

Nit: '*' should be prepended to the variable name.

> +    uint32_t buf_size;

size_t

> +    uint32_t buf_order;

Order is generally fine using unsigned int, no need to use a
specifically sized type.

> +    uint64_t buf_mfn;

Could this use the mfn type?

> +    struct page_info *pg;
> +
> +    if ( !hvm_ipt_supported() )
> +        return -EOPNOTSUPP;
> +
> +    if ( copy_from_guest(&a, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
> +        return -EINVAL;
> +
> +    switch ( a.cmd )
> +    {
> +    case HVMOP_vmtrace_ipt_enable:
> +    case HVMOP_vmtrace_ipt_disable:
> +    case HVMOP_vmtrace_ipt_get_buf:
> +    case HVMOP_vmtrace_ipt_get_offset:
> +        break;
> +
> +    default:
> +        return -EOPNOTSUPP;
> +    }
> +
> +    d = rcu_lock_domain_by_any_id(a.domain);
> +
> +    if ( d == NULL )
> +        return -ESRCH;
> +
> +    if ( !is_hvm_domain(d) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    domain_pause(d);
> +
> +    if ( a.vcpu >= d->max_vcpus )
> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    v = d->vcpu[a.vcpu];
> +
> +    if ( a.cmd == HVMOP_vmtrace_ipt_enable )

Please use a switch here, you might even consider re-using the switch
from above and moving the domain checks before actually checking the
command field, so that you don't need to perform two switches against
a.cmd.

> +    {
> +        if ( v->arch.hvm.vmx.ipt_state ) {

Coding style, brace should be on newline (there are more below which
I'm not going to comment on).

> +            // already enabled

Comments should use /* ... */, there multiple instances of this below
which I'm not going to comment on, please check CODING_STYLE.

Also, the interface looks racy, I think you are missing a lock to
protect v->arch.hvm.vmx.ipt_state from being freed under your feet if
you issue concurrent calls to the interface.

> +            rc = -EINVAL;
> +            goto out;
> +        }
> +
> +        if ( a.size < PAGE_SIZE || a.size > 1000000 * PAGE_SIZE ) {

You can use GB(4) which is easier to read. Should the size also be a
multiple of a PAGE_SIZE?

> +            // we don't accept trace buffer size smaller than single page
> +            // and the upper bound is defined as 4GB in the specification
> +            rc = -EINVAL;
> +            goto out;
> +	}

Stray tab.

> +
> +        buf_order = get_order_from_bytes(a.size);
> +
> +        if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {

Oh here is the check. I think you can move this with the checks above
by doing a.size & ~PAGE_MASK.

> +            rc = -EINVAL;
> +            goto out;
> +        }
> +
> +        buf = page_to_virt(alloc_domheap_pages(d, buf_order, MEMF_no_refcount));

What if alloc_domheap_pages return NULL?

Since I think you only what the linear address of the page to zero it
I would suggest using clear_domain_page.

> +        buf_size = a.size;
> +
> +        if ( !buf ) {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +
> +        memset(buf, 0, buf_size);
> +
> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) {
> +            share_xen_page_with_privileged_guests(virt_to_page(buf) + i, SHARE_ro);

This line (and some more below) exceeds 80 characters, please split
it.

> +        }
> +
> +        v->arch.hvm.vmx.ipt_state = xmalloc(struct ipt_state);

You should check that xmalloc has succeeds before trying to access
ipt_state.

> +        v->arch.hvm.vmx.ipt_state->output_base = virt_to_mfn(buf) << PAGE_SHIFT;
> +        v->arch.hvm.vmx.ipt_state->output_mask = buf_size - 1;
> +        v->arch.hvm.vmx.ipt_state->status = 0;
> +        v->arch.hvm.vmx.ipt_state->ctl = RTIT_CTL_TRACEEN | RTIT_CTL_OS | RTIT_CTL_USR | RTIT_CTL_BRANCH_EN;

Shouldn't the user be able to select what tracing should be enabled?

> +    }
> +    else if ( a.cmd == HVMOP_vmtrace_ipt_disable )
> +    {
> +        if ( !v->arch.hvm.vmx.ipt_state ) {
> +            rc = -EINVAL;
> +            goto out;
> +        }
> +
> +        buf_mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
> +        buf_size = ( v->arch.hvm.vmx.ipt_state->output_mask + 1 ) & 0xFFFFFFFFUL;
> +
> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
> +        {
> +            if ( (mfn_to_page(_mfn(buf_mfn + i))->count_info & PGC_count_mask) != 1 )
> +            {
> +                rc = -EBUSY;
> +                goto out;
> +            }
> +        }
> +
> +        xfree(v->arch.hvm.vmx.ipt_state);
> +	v->arch.hvm.vmx.ipt_state = NULL;
> +
> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
> +        {
> +            pg = mfn_to_page(_mfn(buf_mfn + i));
> +            put_page_alloc_ref(pg);
> +            if ( !test_and_clear_bit(_PGC_xen_heap, &pg->count_info) )
> +                ASSERT_UNREACHABLE();
> +            pg->u.inuse.type_info = 0;
> +            page_set_owner(pg, NULL);
> +            free_domheap_page(pg);

Hm, this seems fairly dangerous, what guarantees that the caller is
not going to map the buffer while you are trying to tear it down?

You perform a check before freeing ipt_state, but between the check
and the actual tearing down the domain might have setup mappings to
them.

I wonder, could you expand a bit on why trace buffers are allocated
from domheap memory by Xen?

There are a couple of options here, maybe the caller could provide
it's own buffer, then Xen would take an extra reference to those pages
and setup them to be used as buffers.

Another alternative would be to use domhep memory but not let the
caller map it directly, and instead introduce a hypercall to copy
from the internal Xen buffer into a user-provided one.

How much memory is used on average by those buffers? That would help
decide a model that would best fit the usage.

> +        }
> +    }
> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_buf )
> +    {
> +        if ( !v->arch.hvm.vmx.ipt_state ) {
> +            rc = -EINVAL;
> +            goto out;
> +        }
> +
> +        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;

This will not work for translated domains, ie: a PVH or HVM domain
won't be able to use this interface since it has no way to request the
mapping of a specific mfn into it's physmap. I think we need to take
this into account when deciding how the interface should be, so that
we don't corner ourselves with a PV only interface.

> +        a.size = (v->arch.hvm.vmx.ipt_state->output_mask + 1) & 0xFFFFFFFFUL;

You can truncate it easier by casting to uint32_t I think.

Or even better, you could put output_mask in a union like:

union {
    uint64_t raw;
    struct {
        uint32_t size;
	uint32_t offset;
    }
}

Then you can avoid the shifting and the castings.

> +    }
> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_offset )
> +    {
> +        if ( !v->arch.hvm.vmx.ipt_state ) {
> +            rc = -EINVAL;
> +            goto out;
> +        }
> +
> +        a.offset = v->arch.hvm.vmx.ipt_state->output_mask >> 32;
> +    }
> +
> +    rc = -EFAULT;
> +    if ( __copy_to_guest(arg, &a, 1) )
> +      goto out;
> +    rc = 0;
> +
> + out:
> +    smp_wmb();

Why do you need a barrier here?

> +    domain_unpause(d);
> +    rcu_unlock_domain(d);
> +
> +    return rc;
> +}
> +
> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
> +
>  static int hvmop_get_mem_type(
>      XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
>  {
> @@ -5101,6 +5267,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
>          break;
>  
> +    case HVMOP_vmtrace:
> +        rc = do_vmtrace_op(arg);
> +        break;
> +
>      default:
>      {
>          gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index 870ec52060..3bbcd54c96 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>  
> +/* HVMOP_vmtrace: Perform VM tracing related operation */
> +#define HVMOP_vmtrace 26
> +
> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
> +
> +struct xen_hvm_vmtrace_op {
> +    /* IN variable */
> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> +    uint32_t cmd;
> +/* Enable/disable external vmtrace for given domain */
> +#define HVMOP_vmtrace_ipt_enable      1
> +#define HVMOP_vmtrace_ipt_disable     2
> +#define HVMOP_vmtrace_ipt_get_buf     3
> +#define HVMOP_vmtrace_ipt_get_offset  4
> +    domid_t domain;

You are missing a padding field here AFAICT.

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-16 15:24 ` [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit Michał Leszczyński
@ 2020-06-16 17:38   ` Roger Pau Monné
  2020-06-16 17:47     ` Michał Leszczyński
  2020-06-18 17:38   ` Andrew Cooper
  1 sibling, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-16 17:38 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel

On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
> Enable IPT when entering the VM and disable it on vmexit.
> Register state is persisted using vCPU ipt_state structure.

Shouldn't this be better done using Intel MSR load lists?

That seems to be what the SDM recommends for tracing VM events.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-16 17:38   ` Roger Pau Monné
@ 2020-06-16 17:47     ` Michał Leszczyński
  2020-06-17  9:09       ` Roger Pau Monné
  0 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 17:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel

----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
>> Enable IPT when entering the VM and disable it on vmexit.
>> Register state is persisted using vCPU ipt_state structure.
> 
> Shouldn't this be better done using Intel MSR load lists?
> 
> That seems to be what the SDM recommends for tracing VM events.
> 
> Thanks, Roger.

This is intentional, additionally described by the comment:

// MSR_IA32_RTIT_CTL is context-switched manually instead of being
// stored inside VMCS, as of Q2'20 only the most recent processors
// support such field in VMCS

There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be loaded using MR load lists. During my experiments, I haven't found any single CPU available to me that would declare such a feature flag. I was mostly testing CPUs that were launched in 2018, so I suppose that this feature is present only on very recent hardware. Unfortunately it's not possible to check on Intel ARK as this information is not listed there at all.

Best regards,
Michał Leszczyński
CERT Polska

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (6 preceding siblings ...)
  2020-06-16 15:24 ` [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit Michał Leszczyński
@ 2020-06-16 18:17 ` Andrew Cooper
  2020-06-16 18:47   ` Michał Leszczyński
  7 siblings, 1 reply; 59+ messages in thread
From: Andrew Cooper @ 2020-06-16 18:17 UTC (permalink / raw)
  To: Michał Leszczyński, Xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Ian Jackson, George Dunlap, Jun Nakajima,
	Roger Pau Monné

On 16/06/2020 16:16, Michał Leszczyński wrote:
> Intel Processor Trace is an architectural extension available in modern Intel family CPUs. It allows recording the detailed trace of activity while the processor executes the code. One might use the recorded trace to reconstruct the code flow. It means, to find out the executed code paths, determine branches taken, and so forth.
>
> The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3C: System Programming Guide, Part 3, Chapter 36: "Intel Processor Trace."
>
> This patch series implements an interface that Dom0 could use in order to enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such a feature has numerous applications like malware monitoring, fuzzing, or performance testing.

Hello,

I'm very excited to see support like this appearing.  However, be aware
that we're currently in code freeze for the 4.14 release, so in-depth
reviews will probably be delayed somewhat due to our bug queue and
release activities.

That said, I've had a very quick look through the series, and have a few
general questions first.

AFAICT, this is strictly for external monitoring of the VM, not for the
VM to use itself?  If so, it shouldn't have the H tag here:

XEN_CPUFEATURE(IPT,           5*32+25) /*H  Intel Processor Trace */

because that exposes the feature to the guest, with the implication that
all other parts of the feature work as advertised.

Are there any restrictions on EPT being enabled in the first place?  I'm
not aware of any, and in principle we could use this functionality for
PV guests as well (using the CPL filter).  Therefore, I think it would
be helpful to not tie the functionality to HVM guests, even if that is
the only option enabled to start with.

The buffer mapping and creation logic is fairly problematic.  Instead of
fighting with another opencoded example, take a look at the IOREQ
server's use of "acquire resource" which is a mapping interface which
supports allocating memory on behalf of the guest, outside of the guest
memory, for use by control tools.

I think what this wants is a bit somewhere in domain_create to indicate
that external tracing is used for this domain (and allocate whatever
structures/buffers are necessary), acquire resource to map the buffers
themselves, and a domctl for any necessary runtime controls.

What semantics do you want for the buffer becoming full?  Given that
debugging/tracing is the goal, I presume "pause vcpu on full" is the
preferred behaviour, rather than drop packets on full?

When this subject was broached on xen-devel before, one issue was the
fact that all actions which are intercepted don't end up writing any
appropriate packets.  This is perhaps less of an issue for this example,
where the external agent can see VMExits in the trace, but it still
results in missing information.  (It is a major problem for PT within
the guest, and needs Xen's intercept/emulation framework being updated
to be PT-aware so it can fill in the same packets which hardware would
have done for equivalent actions.)

Thanks,

~Andrew

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-16 18:17 ` [PATCH v1 0/7] Implement support for external IPT monitoring Andrew Cooper
@ 2020-06-16 18:47   ` Michał Leszczyński
  2020-06-16 20:16     ` Andrew Cooper
  2020-06-17  1:35     ` Tian, Kevin
  0 siblings, 2 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-16 18:47 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Ian Jackson, George Dunlap, Jun Nakajima, Xen-devel,
	Roger Pau Monné

----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):

> On 16/06/2020 16:16, Michał Leszczyński wrote:
>> Intel Processor Trace is an architectural extension available in modern Intel
>> family CPUs. It allows recording the detailed trace of activity while the
>> processor executes the code. One might use the recorded trace to reconstruct
>> the code flow. It means, to find out the executed code paths, determine
>> branches taken, and so forth.
>>
>> The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures
>> Software Developer's Manual Volume 3C: System Programming Guide, Part 3,
>> Chapter 36: "Intel Processor Trace."
>>
>> This patch series implements an interface that Dom0 could use in order to enable
>> IPT for particular vCPUs in DomU, allowing for external monitoring. Such a
>> feature has numerous applications like malware monitoring, fuzzing, or
>> performance testing.
> 
> Hello,
> 
> I'm very excited to see support like this appearing.  However, be aware
> that we're currently in code freeze for the 4.14 release, so in-depth
> reviews will probably be delayed somewhat due to our bug queue and
> release activities.

Sure, take your time :)


> 
> That said, I've had a very quick look through the series, and have a few
> general questions first.
> 
> AFAICT, this is strictly for external monitoring of the VM, not for the
> VM to use itself?  If so, it shouldn't have the H tag here:
> 
> XEN_CPUFEATURE(IPT,           5*32+25) /*H  Intel Processor Trace */
> 
> because that exposes the feature to the guest, with the implication that
> all other parts of the feature work as advertised.

Ok, I will remove the H tag.


> 
> 
> Are there any restrictions on EPT being enabled in the first place?  I'm
> not aware of any, and in principle we could use this functionality for
> PV guests as well (using the CPL filter).  Therefore, I think it would
> be helpful to not tie the functionality to HVM guests, even if that is
> the only option enabled to start with.

I think at the moment it's not required to have EPT. This patch series doesn't use any translation feature flags, so the output address is always a machine physical address, regardless of context. I will check if it could be easily used with PV.


> 
> The buffer mapping and creation logic is fairly problematic.  Instead of
> fighting with another opencoded example, take a look at the IOREQ
> server's use of "acquire resource" which is a mapping interface which
> supports allocating memory on behalf of the guest, outside of the guest
> memory, for use by control tools.
> 
> I think what this wants is a bit somewhere in domain_create to indicate
> that external tracing is used for this domain (and allocate whatever
> structures/buffers are necessary), acquire resource to map the buffers
> themselves, and a domctl for any necessary runtime controls.
> 

I will check this out, this sounds like a good option as it would remove lots of complexity from the existing ipt_enable domctl.

> 
> What semantics do you want for the buffer becoming full?  Given that
> debugging/tracing is the goal, I presume "pause vcpu on full" is the
> preferred behaviour, rather than drop packets on full?
> 

Right now this is a ring-style buffer and when it would become full it would simply wrap and override the old data.

> 
> When this subject was broached on xen-devel before, one issue was the
> fact that all actions which are intercepted don't end up writing any
> appropriate packets.  This is perhaps less of an issue for this example,
> where the external agent can see VMExits in the trace, but it still
> results in missing information.  (It is a major problem for PT within
> the guest, and needs Xen's intercept/emulation framework being updated
> to be PT-aware so it can fill in the same packets which hardware would
> have done for equivalent actions.)

Ok, this sounds like a hard issue. Could you point out what could be the particular problematic cases? For instance, if something would alter EIP/RIP or CR3 then I belive it would still be recorded in PT trace (i.e. these values will be logged on VM entry).

> 
> 
> Thanks,
> 
> ~Andrew


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-16 18:47   ` Michał Leszczyński
@ 2020-06-16 20:16     ` Andrew Cooper
  2020-06-17  3:02       ` Tamas K Lengyel
  2020-06-17  1:35     ` Tian, Kevin
  1 sibling, 1 reply; 59+ messages in thread
From: Andrew Cooper @ 2020-06-16 20:16 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Ian Jackson, George Dunlap, Jun Nakajima, Xen-devel,
	Roger Pau Monné

On 16/06/2020 19:47, Michał Leszczyński wrote:
> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
>
>> Are there any restrictions on EPT being enabled in the first place?  I'm
>> not aware of any, and in principle we could use this functionality for
>> PV guests as well (using the CPL filter).  Therefore, I think it would
>> be helpful to not tie the functionality to HVM guests, even if that is
>> the only option enabled to start with.
> I think at the moment it's not required to have EPT. This patch series doesn't use any translation feature flags, so the output address is always a machine physical address, regardless of context. I will check if it could be easily used with PV.

If its trivial to add PV support then please do.  If its not, then don't
feel obliged, but please do at least consider how PV support might look
in the eventual feature.

(Generally speaking, considering "how would I make this work in other
modes where it is possible" leads to a better design.)

>> The buffer mapping and creation logic is fairly problematic.  Instead of
>> fighting with another opencoded example, take a look at the IOREQ
>> server's use of "acquire resource" which is a mapping interface which
>> supports allocating memory on behalf of the guest, outside of the guest
>> memory, for use by control tools.
>>
>> I think what this wants is a bit somewhere in domain_create to indicate
>> that external tracing is used for this domain (and allocate whatever
>> structures/buffers are necessary), acquire resource to map the buffers
>> themselves, and a domctl for any necessary runtime controls.
>>
> I will check this out, this sounds like a good option as it would remove lots of complexity from the existing ipt_enable domctl.

Xen has traditionally opted for a "and turn this extra thing on
dynamically" model, but this has caused no end of security issues and
broken corner cases.

You can see this still existing in the difference between
XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
required to chose the number of vcpus for the domain) and we're making
good progress undoing this particular wart (before 4.13, it was
concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
issuing other hypercalls between these two).

There is a lot of settings which should be immutable for the lifetime of
the domain, and external monitoring looks like another one of these. 
Specifying it at createdomain time allows for far better runtime
behaviour (you are no longer in a situation where the first time you try
to turn tracing on, you end up with -ENOMEM because another VM booted in
the meantime and used the remaining memory), and it makes for rather
more simple code in Xen itself (at runtime, you can rely on it having
been set up properly, because a failure setting up will have killed the
domain already).

>> What semantics do you want for the buffer becoming full?  Given that
>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
>> preferred behaviour, rather than drop packets on full?
>>
> Right now this is a ring-style buffer and when it would become full it would simply wrap and override the old data.

How does the consumer spot that the data has wrapped?  What happens if
data starts getting logged, but noone is listening?  What happens if the
consumer exits/crashes/etc and stops listening as a consequence?

It's fine to simply state what will happen, and possibly even "don't do
that then", but the corner cases do at least need thinking about.

>> When this subject was broached on xen-devel before, one issue was the
>> fact that all actions which are intercepted don't end up writing any
>> appropriate packets.  This is perhaps less of an issue for this example,
>> where the external agent can see VMExits in the trace, but it still
>> results in missing information.  (It is a major problem for PT within
>> the guest, and needs Xen's intercept/emulation framework being updated
>> to be PT-aware so it can fill in the same packets which hardware would
>> have done for equivalent actions.)
> Ok, this sounds like a hard issue. Could you point out what could be the particular problematic cases? For instance, if something would alter EIP/RIP or CR3 then I belive it would still be recorded in PT trace (i.e. these values will be logged on VM entry).

One easy case is what happens on a Pstate transition while in the
hypervisor.  That won't be recorded.  (Perhaps this bit of data isn't
terribly interesting.)

More complicated cases exist when you start combining Xen features. 
E.g. with Introspection, a function pointer call which happens to set a
pagetable access bit bit which is write-protected will trap for
emulation, and be completed by the emulator (this is far faster than
pausing the domain, changing EPT permissions, singlestepping the vcpu,
then reinstating reduced EPT permissions).

In this case, no TIP would be generated unless the x86 emulator were
updated to know how to do this.

~Andrew

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-16 18:47   ` Michał Leszczyński
  2020-06-16 20:16     ` Andrew Cooper
@ 2020-06-17  1:35     ` Tian, Kevin
  2020-06-17  6:45       ` Kang, Luwei
  1 sibling, 1 reply; 59+ messages in thread
From: Tian, Kevin @ 2020-06-17  1:35 UTC (permalink / raw)
  To: Michał Leszczyński, Andrew Cooper
  Cc: Stefano Stabellini, Julien Grall, Jan Beulich, Wei Liu,
	Ian Jackson, George Dunlap, Kang, Luwei, Nakajima, Jun,
	Xen-devel, Roger Pau Monné

+Luwei, who developed PT for KVM and is the best one who can help
review VMX changes from Intel side. Please include him in future
post or discussion.

> -----Original Message-----
> From: Michał Leszczyński <michal.leszczynski@cert.pl>
> Sent: Wednesday, June 17, 2020 2:48 AM
> To: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Xen-devel <xen-devel@lists.xenproject.org>; Jan Beulich
> <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> <roger.pau@citrix.com>; Nakajima, Jun <jun.nakajima@intel.com>; Tian,
> Kevin <kevin.tian@intel.com>; George Dunlap <george.dunlap@citrix.com>;
> Ian Jackson <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>;
> Stefano Stabellini <sstabellini@kernel.org>
> Subject: Re: [PATCH v1 0/7] Implement support for external IPT monitoring
> 
> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com
> napisał(a):
> 
> > On 16/06/2020 16:16, Michał Leszczyński wrote:
> >> Intel Processor Trace is an architectural extension available in modern
> Intel
> >> family CPUs. It allows recording the detailed trace of activity while the
> >> processor executes the code. One might use the recorded trace to
> reconstruct
> >> the code flow. It means, to find out the executed code paths, determine
> >> branches taken, and so forth.
> >>
> >> The abovementioned feature is described in Intel(R) 64 and IA-32
> Architectures
> >> Software Developer's Manual Volume 3C: System Programming Guide,
> Part 3,
> >> Chapter 36: "Intel Processor Trace."
> >>
> >> This patch series implements an interface that Dom0 could use in order to
> enable
> >> IPT for particular vCPUs in DomU, allowing for external monitoring. Such a
> >> feature has numerous applications like malware monitoring, fuzzing, or
> >> performance testing.
> >
> > Hello,
> >
> > I'm very excited to see support like this appearing.  However, be aware
> > that we're currently in code freeze for the 4.14 release, so in-depth
> > reviews will probably be delayed somewhat due to our bug queue and
> > release activities.
> 
> Sure, take your time :)
> 
> 
> >
> > That said, I've had a very quick look through the series, and have a few
> > general questions first.
> >
> > AFAICT, this is strictly for external monitoring of the VM, not for the
> > VM to use itself?  If so, it shouldn't have the H tag here:
> >
> > XEN_CPUFEATURE(IPT,           5*32+25) /*H  Intel Processor Trace */
> >
> > because that exposes the feature to the guest, with the implication that
> > all other parts of the feature work as advertised.
> 
> Ok, I will remove the H tag.
> 
> 
> >
> >
> > Are there any restrictions on EPT being enabled in the first place?  I'm
> > not aware of any, and in principle we could use this functionality for
> > PV guests as well (using the CPL filter).  Therefore, I think it would
> > be helpful to not tie the functionality to HVM guests, even if that is
> > the only option enabled to start with.
> 
> I think at the moment it's not required to have EPT. This patch series doesn't
> use any translation feature flags, so the output address is always a machine
> physical address, regardless of context. I will check if it could be easily used
> with PV.
> 
> 
> >
> > The buffer mapping and creation logic is fairly problematic.  Instead of
> > fighting with another opencoded example, take a look at the IOREQ
> > server's use of "acquire resource" which is a mapping interface which
> > supports allocating memory on behalf of the guest, outside of the guest
> > memory, for use by control tools.
> >
> > I think what this wants is a bit somewhere in domain_create to indicate
> > that external tracing is used for this domain (and allocate whatever
> > structures/buffers are necessary), acquire resource to map the buffers
> > themselves, and a domctl for any necessary runtime controls.
> >
> 
> I will check this out, this sounds like a good option as it would remove lots of
> complexity from the existing ipt_enable domctl.
> 
> >
> > What semantics do you want for the buffer becoming full?  Given that
> > debugging/tracing is the goal, I presume "pause vcpu on full" is the
> > preferred behaviour, rather than drop packets on full?
> >
> 
> Right now this is a ring-style buffer and when it would become full it would
> simply wrap and override the old data.
> 
> >
> > When this subject was broached on xen-devel before, one issue was the
> > fact that all actions which are intercepted don't end up writing any
> > appropriate packets.  This is perhaps less of an issue for this example,
> > where the external agent can see VMExits in the trace, but it still
> > results in missing information.  (It is a major problem for PT within
> > the guest, and needs Xen's intercept/emulation framework being updated
> > to be PT-aware so it can fill in the same packets which hardware would
> > have done for equivalent actions.)
> 
> Ok, this sounds like a hard issue. Could you point out what could be the
> particular problematic cases? For instance, if something would alter EIP/RIP
> or CR3 then I belive it would still be recorded in PT trace (i.e. these values will
> be logged on VM entry).
> 
> >
> >
> > Thanks,
> >
> > ~Andrew
> 
> 
> Best regards,
> Michał Leszczyński
> CERT Polska

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-16 20:16     ` Andrew Cooper
@ 2020-06-17  3:02       ` Tamas K Lengyel
  2020-06-17 16:19         ` Andrew Cooper
  0 siblings, 1 reply; 59+ messages in thread
From: Tamas K Lengyel @ 2020-06-17  3:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jan Beulich, Xen-devel, Roger Pau Monné

On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 16/06/2020 19:47, Michał Leszczyński wrote:
> > ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> >
> >> Are there any restrictions on EPT being enabled in the first place?  I'm
> >> not aware of any, and in principle we could use this functionality for
> >> PV guests as well (using the CPL filter).  Therefore, I think it would
> >> be helpful to not tie the functionality to HVM guests, even if that is
> >> the only option enabled to start with.
> > I think at the moment it's not required to have EPT. This patch series doesn't use any translation feature flags, so the output address is always a machine physical address, regardless of context. I will check if it could be easily used with PV.
>
> If its trivial to add PV support then please do.  If its not, then don't
> feel obliged, but please do at least consider how PV support might look
> in the eventual feature.
>
> (Generally speaking, considering "how would I make this work in other
> modes where it is possible" leads to a better design.)
>
> >> The buffer mapping and creation logic is fairly problematic.  Instead of
> >> fighting with another opencoded example, take a look at the IOREQ
> >> server's use of "acquire resource" which is a mapping interface which
> >> supports allocating memory on behalf of the guest, outside of the guest
> >> memory, for use by control tools.
> >>
> >> I think what this wants is a bit somewhere in domain_create to indicate
> >> that external tracing is used for this domain (and allocate whatever
> >> structures/buffers are necessary), acquire resource to map the buffers
> >> themselves, and a domctl for any necessary runtime controls.
> >>
> > I will check this out, this sounds like a good option as it would remove lots of complexity from the existing ipt_enable domctl.
>
> Xen has traditionally opted for a "and turn this extra thing on
> dynamically" model, but this has caused no end of security issues and
> broken corner cases.
>
> You can see this still existing in the difference between
> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
> required to chose the number of vcpus for the domain) and we're making
> good progress undoing this particular wart (before 4.13, it was
> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
> issuing other hypercalls between these two).
>
> There is a lot of settings which should be immutable for the lifetime of
> the domain, and external monitoring looks like another one of these.
> Specifying it at createdomain time allows for far better runtime
> behaviour (you are no longer in a situation where the first time you try
> to turn tracing on, you end up with -ENOMEM because another VM booted in
> the meantime and used the remaining memory), and it makes for rather
> more simple code in Xen itself (at runtime, you can rely on it having
> been set up properly, because a failure setting up will have killed the
> domain already).

I'm not in favor of this being a flag that gets set during domain
creation time. It could certainly be the case that some users would
want this being on from the start till the end but in other cases you
may want to enable it intermittently only for some time in-between
particular events. If it's an on/off flag during domain creation you
pretty much force that choice on the users and while the overhead of
PT is better than say MTF it's certainly not nothing. In case there is
an OOM situation enabling IPT dynamically the user can always just
pause the VM and wait till memory becomes available.

>
> >> What semantics do you want for the buffer becoming full?  Given that
> >> debugging/tracing is the goal, I presume "pause vcpu on full" is the
> >> preferred behaviour, rather than drop packets on full?
> >>
> > Right now this is a ring-style buffer and when it would become full it would simply wrap and override the old data.
>
> How does the consumer spot that the data has wrapped?  What happens if
> data starts getting logged, but noone is listening?  What happens if the
> consumer exits/crashes/etc and stops listening as a consequence?
>
> It's fine to simply state what will happen, and possibly even "don't do
> that then", but the corner cases do at least need thinking about.

AFAIU the current use-case is predominantly to be used in conjunction
with VMI events where you want to be able to see the trace leading up
to a particular vmexit. So in the case when the buffer is wrapped
in-between events and data is lost that's not really of concern.

Tamas


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17  1:35     ` Tian, Kevin
@ 2020-06-17  6:45       ` Kang, Luwei
  2020-06-17  9:21         ` Roger Pau Monné
  0 siblings, 1 reply; 59+ messages in thread
From: Kang, Luwei @ 2020-06-17  6:45 UTC (permalink / raw)
  To: Tian, Kevin, Michał Leszczyński, Andrew Cooper
  Cc: Stefano Stabellini, Julien Grall, Jan Beulich, Wei Liu,
	Ian Jackson, George Dunlap, Nakajima, Jun, Xen-devel,
	Roger Pau Monné

> -----Original Message-----
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, June 17, 2020 9:35 AM
> To: Michał Leszczyński <michal.leszczynski@cert.pl>; Andrew Cooper
> <andrew.cooper3@citrix.com>
> Cc: Xen-devel <xen-devel@lists.xenproject.org>; Jan Beulich
> <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> <roger.pau@citrix.com>; Nakajima, Jun <jun.nakajima@intel.com>; George
> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>;
> Julien Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>;
> Kang, Luwei <luwei.kang@intel.com>
> Subject: RE: [PATCH v1 0/7] Implement support for external IPT monitoring
> 
> +Luwei, who developed PT for KVM and is the best one who can help
> review VMX changes from Intel side. Please include him in future post or
> discussion.
> 
> > -----Original Message-----
> > From: Michał Leszczyński <michal.leszczynski@cert.pl>
> > Sent: Wednesday, June 17, 2020 2:48 AM
> > To: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: Xen-devel <xen-devel@lists.xenproject.org>; Jan Beulich
> > <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> > <roger.pau@citrix.com>; Nakajima, Jun <jun.nakajima@intel.com>; Tian,
> > Kevin <kevin.tian@intel.com>; George Dunlap
> > <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>;
> > Julien Grall <julien@xen.org>; Stefano Stabellini
> > <sstabellini@kernel.org>
> > Subject: Re: [PATCH v1 0/7] Implement support for external IPT
> > monitoring
> >
> > ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com
> > napisał(a):
> >
> > > On 16/06/2020 16:16, Michał Leszczyński wrote:
> > >> Intel Processor Trace is an architectural extension available in
> > >> modern
> > Intel
> > >> family CPUs. It allows recording the detailed trace of activity
> > >> while the processor executes the code. One might use the recorded
> > >> trace to
> > reconstruct
> > >> the code flow. It means, to find out the executed code paths,
> > >> determine branches taken, and so forth.
> > >>
> > >> The abovementioned feature is described in Intel(R) 64 and IA-32
> > Architectures
> > >> Software Developer's Manual Volume 3C: System Programming Guide,
> > Part 3,
> > >> Chapter 36: "Intel Processor Trace."
> > >>
> > >> This patch series implements an interface that Dom0 could use in
> > >> order to
> > enable
> > >> IPT for particular vCPUs in DomU, allowing for external monitoring.
> > >> Such a feature has numerous applications like malware monitoring,
> > >> fuzzing, or performance testing.
> > >
> > > Hello,
> > >
> > > I'm very excited to see support like this appearing.  However, be
> > > aware that we're currently in code freeze for the 4.14 release, so
> > > in-depth reviews will probably be delayed somewhat due to our bug
> > > queue and release activities.
> >
> > Sure, take your time :)
> >
> >
> > >
> > > That said, I've had a very quick look through the series, and have a
> > > few general questions first.
> > >
> > > AFAICT, this is strictly for external monitoring of the VM, not for
> > > the VM to use itself?  If so, it shouldn't have the H tag here:
> > >
> > > XEN_CPUFEATURE(IPT,           5*32+25) /*H  Intel Processor Trace */
> > >
> > > because that exposes the feature to the guest, with the implication
> > > that all other parts of the feature work as advertised.
> >
> > Ok, I will remove the H tag.
> >
> >
> > >
> > >
> > > Are there any restrictions on EPT being enabled in the first place?
> > > I'm not aware of any, and in principle we could use this
> > > functionality for PV guests as well (using the CPL filter).
> > > Therefore, I think it would be helpful to not tie the functionality
> > > to HVM guests, even if that is the only option enabled to start with.
> >
> > I think at the moment it's not required to have EPT. This patch series
> > doesn't use any translation feature flags, so the output address is
> > always a machine physical address, regardless of context. I will check
> > if it could be easily used with PV.
> >
> >
> > >
> > > The buffer mapping and creation logic is fairly problematic.
> > > Instead of fighting with another opencoded example, take a look at
> > > the IOREQ server's use of "acquire resource" which is a mapping
> > > interface which supports allocating memory on behalf of the guest,
> > > outside of the guest memory, for use by control tools.
> > >
> > > I think what this wants is a bit somewhere in domain_create to
> > > indicate that external tracing is used for this domain (and allocate
> > > whatever structures/buffers are necessary), acquire resource to map
> > > the buffers themselves, and a domctl for any necessary runtime controls.
> > >
> >
> > I will check this out, this sounds like a good option as it would
> > remove lots of complexity from the existing ipt_enable domctl.
> >
> > >
> > > What semantics do you want for the buffer becoming full?  Given that
> > > debugging/tracing is the goal, I presume "pause vcpu on full" is the
> > > preferred behaviour, rather than drop packets on full?
> > >
> >
> > Right now this is a ring-style buffer and when it would become full it
> > would simply wrap and override the old data.
> >
> > >
> > > When this subject was broached on xen-devel before, one issue was
> > > the fact that all actions which are intercepted don't end up writing
> > > any appropriate packets.  This is perhaps less of an issue for this
> > > example, where the external agent can see VMExits in the trace, but
> > > it still results in missing information.  (It is a major problem for
> > > PT within the guest, and needs Xen's intercept/emulation framework
> > > being updated to be PT-aware so it can fill in the same packets
> > > which hardware would have done for equivalent actions.)
> >
> > Ok, this sounds like a hard issue. Could you point out what could be
> > the particular problematic cases? For instance, if something would
> > alter EIP/RIP or CR3 then I belive it would still be recorded in PT
> > trace (i.e. these values will be logged on VM entry).

e.g. If a VM exit is taken on a guest write to CR3 (including “MOV CR3” as well as task switches), the PIP packet
normally generated on the CR3 write will be missing. The PIP packet needs to be written to the PT buffer by software. Another example is VM-exit taken on RDTSC. 

For VM introspection, all the Intel PT packets may need to emulated by software. Some description in SDM as below:
If a VMM emulates an element of processor state by taking a VM exit on reads and/or writes to that piece of state, and the state element impacts Intel PT packet generation or values, it may be incumbent upon the VMM to insert or modify the output trace data.

Thanks,
Luwei Kang

> >
> > >
> > >
> > > Thanks,
> > >
> > > ~Andrew
> >
> >
> > Best regards,
> > Michał Leszczyński
> > CERT Polska

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-16 17:47     ` Michał Leszczyński
@ 2020-06-17  9:09       ` Roger Pau Monné
  2020-06-17 11:54         ` Michał Leszczyński
  0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-17  9:09 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel

On Tue, Jun 16, 2020 at 07:47:07PM +0200, Michał Leszczyński wrote:
> ----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
> >> Enable IPT when entering the VM and disable it on vmexit.
> >> Register state is persisted using vCPU ipt_state structure.
> > 
> > Shouldn't this be better done using Intel MSR load lists?
> > 
> > That seems to be what the SDM recommends for tracing VM events.
> > 
> > Thanks, Roger.
> 
> 
> This is intentional, additionally described by the comment:
> 
> // MSR_IA32_RTIT_CTL is context-switched manually instead of being
> // stored inside VMCS, as of Q2'20 only the most recent processors
> // support such field in VMCS
> 
> 
> There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be loaded using MR load lists.

I've been looking at the Intel SDM and I'm not able to find which bit
signals whether MSR_IA32_RTIT_CTL can be loaded using MSR load lists.
Sorry to ask, but can you elaborate on where is this signaled?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17  6:45       ` Kang, Luwei
@ 2020-06-17  9:21         ` Roger Pau Monné
  2020-06-17 12:37           ` Kang, Luwei
  0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-17  9:21 UTC (permalink / raw)
  To: Kang, Luwei
  Cc: Tian, Kevin, Stefano Stabellini, Julien Grall, Nakajima, Jun,
	Wei Liu, Andrew Cooper, Michał Leszczyński,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

On Wed, Jun 17, 2020 at 06:45:22AM +0000, Kang, Luwei wrote:
> > -----Original Message-----
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Wednesday, June 17, 2020 9:35 AM
> > To: Michał Leszczyński <michal.leszczynski@cert.pl>; Andrew Cooper
> > <andrew.cooper3@citrix.com>
> > Cc: Xen-devel <xen-devel@lists.xenproject.org>; Jan Beulich
> > <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> > <roger.pau@citrix.com>; Nakajima, Jun <jun.nakajima@intel.com>; George
> > Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>;
> > Julien Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>;
> > Kang, Luwei <luwei.kang@intel.com>
> > Subject: RE: [PATCH v1 0/7] Implement support for external IPT monitoring
> > 
> > +Luwei, who developed PT for KVM and is the best one who can help
> > review VMX changes from Intel side. Please include him in future post or
> > discussion.
> > 
> > > -----Original Message-----
> > > From: Michał Leszczyński <michal.leszczynski@cert.pl>
> > > Sent: Wednesday, June 17, 2020 2:48 AM
> > > To: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Cc: Xen-devel <xen-devel@lists.xenproject.org>; Jan Beulich
> > > <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> > > <roger.pau@citrix.com>; Nakajima, Jun <jun.nakajima@intel.com>; Tian,
> > > Kevin <kevin.tian@intel.com>; George Dunlap
> > > <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>;
> > > Julien Grall <julien@xen.org>; Stefano Stabellini
> > > <sstabellini@kernel.org>
> > > Subject: Re: [PATCH v1 0/7] Implement support for external IPT
> > > monitoring
> > >
> > > ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com
> > > napisał(a):
> > >
> > > > On 16/06/2020 16:16, Michał Leszczyński wrote:
> > > > When this subject was broached on xen-devel before, one issue was
> > > > the fact that all actions which are intercepted don't end up writing
> > > > any appropriate packets.  This is perhaps less of an issue for this
> > > > example, where the external agent can see VMExits in the trace, but
> > > > it still results in missing information.  (It is a major problem for
> > > > PT within the guest, and needs Xen's intercept/emulation framework
> > > > being updated to be PT-aware so it can fill in the same packets
> > > > which hardware would have done for equivalent actions.)
> > >
> > > Ok, this sounds like a hard issue. Could you point out what could be
> > > the particular problematic cases? For instance, if something would
> > > alter EIP/RIP or CR3 then I belive it would still be recorded in PT
> > > trace (i.e. these values will be logged on VM entry).
> 
> e.g. If a VM exit is taken on a guest write to CR3 (including “MOV CR3” as well as task switches), the PIP packet
> normally generated on the CR3 write will be missing. The PIP packet needs to be written to the PT buffer by software. Another example is VM-exit taken on RDTSC. 
> 
> For VM introspection, all the Intel PT packets may need to emulated by software. Some description in SDM as below:
> If a VMM emulates an element of processor state by taking a VM exit on reads and/or writes to that piece of state, and the state element impacts Intel PT packet generation or values, it may be incumbent upon the VMM to insert or modify the output trace data.

I got the impression that IPT was mostly useful together with
introspection, as you can then get events from trapped instructions
(and likely emulated) from the introspection interface, while being
able to get the processor trace for non-trapped events.

I'm not sure whether there would be corner cases with trapped
instructions not being handled by the introspection framework.

How does KVM deal with this, do they insert/modify trace packets on
trapped and emulated instructions by the VMM?

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 2/7] x86/vmx: add IPT cpu feature
  2020-06-16 16:30   ` Roger Pau Monné
@ 2020-06-17 11:34     ` Jan Beulich
  0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2020-06-17 11:34 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Jun Nakajima, Xen-devel,
	Roger Pau Monné

On 16.06.2020 18:30, Roger Pau Monné wrote:
> On Tue, Jun 16, 2020 at 05:20:39PM +0200, Michał Leszczyński wrote:
>> Check if Intel Processor Trace feature is supported by current
>> processor. Define hvm_ipt_supported function.
>>
>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
>> ---
>>  xen/arch/x86/hvm/vmx/vmx.c                  | 24 +++++++++++++++++++++
>>  xen/include/asm-x86/cpufeature.h            |  1 +
>>  xen/include/asm-x86/hvm/hvm.h               |  9 ++++++++
>>  xen/include/asm-x86/hvm/vmx/vmcs.h          |  1 +
>>  xen/include/public/arch-x86/cpufeatureset.h |  1 +
>>  5 files changed, 36 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index ab19d9424e..a91bbdb798 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -2484,6 +2484,7 @@ static bool __init has_if_pschange_mc(void)
>>  
>>  const struct hvm_function_table * __init start_vmx(void)
>>  {
>> +    u64 _vmx_misc_cap;
> 
> Please use uint64_t, and you can drop the leading _vmx prefix, this is
> already vmx specific.

Actually, all of _vmx_ should be dropped (i.e. in particular local
variables shouldn't start with an underscore).

Jan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-17  9:09       ` Roger Pau Monné
@ 2020-06-17 11:54         ` Michał Leszczyński
  2020-06-17 12:51           ` Roger Pau Monné
  0 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-17 11:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel

----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 07:47:07PM +0200, Michał Leszczyński wrote:
>> ----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> 
>> > On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
>> >> Enable IPT when entering the VM and disable it on vmexit.
>> >> Register state is persisted using vCPU ipt_state structure.
>> > 
>> > Shouldn't this be better done using Intel MSR load lists?
>> > 
>> > That seems to be what the SDM recommends for tracing VM events.
>> > 
>> > Thanks, Roger.
>> 
>> 
>> This is intentional, additionally described by the comment:
>> 
>> // MSR_IA32_RTIT_CTL is context-switched manually instead of being
>> // stored inside VMCS, as of Q2'20 only the most recent processors
>> // support such field in VMCS
>> 
>> 
>> There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be
>> loaded using MR load lists.
> 
> I've been looking at the Intel SDM and I'm not able to find which bit
> signals whether MSR_IA32_RTIT_CTL can be loaded using MSR load lists.
> Sorry to ask, but can you elaborate on where is this signaled?
> 
> Thanks, Roger.


According to SDM:

> 24 Virtual Machine Control Structures -> 24.4 Guest-state Area -> 24.4.1 Guest Register State

> IA32_RTIT_CTL (64 bits). This field is supported only on processors that support either the 1-setting of the "load IA32_RTIT_CTL" VM-entry control or that of the "clear IA32_RTIT_CTL" VM-exit control.


> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1 VM-Entry Controls

> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the reserved bits.

Please look at bit position 18 "Load IA32_RTIT_CTL".



Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17  9:21         ` Roger Pau Monné
@ 2020-06-17 12:37           ` Kang, Luwei
  2020-06-17 12:53             ` Roger Pau Monné
  0 siblings, 1 reply; 59+ messages in thread
From: Kang, Luwei @ 2020-06-17 12:37 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tian, Kevin, Stefano Stabellini, Julien Grall, Nakajima, Jun,
	Wei Liu, Andrew Cooper, Michał Leszczyński,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

> > > -----Original Message-----
> > > From: Tian, Kevin <kevin.tian@intel.com>
> > > Sent: Wednesday, June 17, 2020 9:35 AM
> > > To: Michał Leszczyński <michal.leszczynski@cert.pl>; Andrew Cooper
> > > <andrew.cooper3@citrix.com>
> > > Cc: Xen-devel <xen-devel@lists.xenproject.org>; Jan Beulich
> > > <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> > > <roger.pau@citrix.com>; Nakajima, Jun <jun.nakajima@intel.com>;
> > > George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> > > <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Stefano
> > > Stabellini <sstabellini@kernel.org>; Kang, Luwei
> > > <luwei.kang@intel.com>
> > > Subject: RE: [PATCH v1 0/7] Implement support for external IPT
> > > monitoring
> > >
> > > +Luwei, who developed PT for KVM and is the best one who can help
> > > review VMX changes from Intel side. Please include him in future
> > > post or discussion.
> > >
> > > > -----Original Message-----
> > > > From: Michał Leszczyński <michal.leszczynski@cert.pl>
> > > > Sent: Wednesday, June 17, 2020 2:48 AM
> > > > To: Andrew Cooper <andrew.cooper3@citrix.com>
> > > > Cc: Xen-devel <xen-devel@lists.xenproject.org>; Jan Beulich
> > > > <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> > > > <roger.pau@citrix.com>; Nakajima, Jun <jun.nakajima@intel.com>;
> > > > Tian, Kevin <kevin.tian@intel.com>; George Dunlap
> > > > <george.dunlap@citrix.com>; Ian Jackson
> > > > <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>;
> > > > Stefano Stabellini <sstabellini@kernel.org>
> > > > Subject: Re: [PATCH v1 0/7] Implement support for external IPT
> > > > monitoring
> > > >
> > > > ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com
> > > > napisał(a):
> > > >
> > > > > On 16/06/2020 16:16, Michał Leszczyński wrote:
> > > > > When this subject was broached on xen-devel before, one issue
> > > > > was the fact that all actions which are intercepted don't end up
> > > > > writing any appropriate packets.  This is perhaps less of an
> > > > > issue for this example, where the external agent can see VMExits
> > > > > in the trace, but it still results in missing information.  (It
> > > > > is a major problem for PT within the guest, and needs Xen's
> > > > > intercept/emulation framework being updated to be PT-aware so it
> > > > > can fill in the same packets which hardware would have done for
> > > > > equivalent actions.)
> > > >
> > > > Ok, this sounds like a hard issue. Could you point out what could
> > > > be the particular problematic cases? For instance, if something
> > > > would alter EIP/RIP or CR3 then I belive it would still be
> > > > recorded in PT trace (i.e. these values will be logged on VM entry).
> >
> > e.g. If a VM exit is taken on a guest write to CR3 (including “MOV
> > CR3” as well as task switches), the PIP packet normally generated on the CR3
> write will be missing. The PIP packet needs to be written to the PT buffer by
> software. Another example is VM-exit taken on RDTSC.
> >
> > For VM introspection, all the Intel PT packets may need to emulated by
> software. Some description in SDM as below:
> > If a VMM emulates an element of processor state by taking a VM exit on
> reads and/or writes to that piece of state, and the state element impacts Intel
> PT packet generation or values, it may be incumbent upon the VMM to insert
> or modify the output trace data.
> 
> I got the impression that IPT was mostly useful together with introspection, as
> you can then get events from trapped instructions (and likely emulated) from
> the introspection interface, while being able to get the processor trace for non-
> trapped events.
> 
> I'm not sure whether there would be corner cases with trapped instructions
> not being handled by the introspection framework.
> 
> How does KVM deal with this, do they insert/modify trace packets on trapped
> and emulated instructions by the VMM?

The KVM includes instruction decoder and emulator(arch/x86/kvm/emulate.c), and the guest's memory can be set to write-protect as well. But it doesn't support Intel PT packets software emulator. For KVM, the Intel PT feature will be exposed to KVM guest and KVM guest can use Intel PT feature like native.

Thanks,
Luwei Kang

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-17 11:54         ` Michał Leszczyński
@ 2020-06-17 12:51           ` Roger Pau Monné
  2020-06-17 15:14             ` Andrew Cooper
  0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-17 12:51 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Jun Nakajima, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel

On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Tue, Jun 16, 2020 at 07:47:07PM +0200, Michał Leszczyński wrote:
> >> ----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >> 
> >> > On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
> >> >> Enable IPT when entering the VM and disable it on vmexit.
> >> >> Register state is persisted using vCPU ipt_state structure.
> >> > 
> >> > Shouldn't this be better done using Intel MSR load lists?
> >> > 
> >> > That seems to be what the SDM recommends for tracing VM events.
> >> > 
> >> > Thanks, Roger.
> >> 
> >> 
> >> This is intentional, additionally described by the comment:
> >> 
> >> // MSR_IA32_RTIT_CTL is context-switched manually instead of being
> >> // stored inside VMCS, as of Q2'20 only the most recent processors
> >> // support such field in VMCS
> >> 
> >> 
> >> There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be
> >> loaded using MR load lists.
> > 
> > I've been looking at the Intel SDM and I'm not able to find which bit
> > signals whether MSR_IA32_RTIT_CTL can be loaded using MSR load lists.
> > Sorry to ask, but can you elaborate on where is this signaled?
> > 
> > Thanks, Roger.
> 
> 
> According to SDM:
> 
> > 24 Virtual Machine Control Structures -> 24.4 Guest-state Area -> 24.4.1 Guest Register State
> 
> > IA32_RTIT_CTL (64 bits). This field is supported only on processors that support either the 1-setting of the "load IA32_RTIT_CTL" VM-entry control or that of the "clear IA32_RTIT_CTL" VM-exit control.
> 
> 
> > 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1 VM-Entry Controls
> 
> > Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the reserved bits.
> 
> Please look at bit position 18 "Load IA32_RTIT_CTL".

I think this is something different from what I was referring to.
Those options you refer to (load/clear IA32_RTIT_CTL) deal with
loading/storing a specific field on the vmcs that maps to the guest
IA32_RTIT_CTL.

OTOH MSR load lists can be used to load and store any arbitrary MSR on
vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 12:37           ` Kang, Luwei
@ 2020-06-17 12:53             ` Roger Pau Monné
  2020-06-17 23:29               ` Kang, Luwei
  0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-17 12:53 UTC (permalink / raw)
  To: Kang, Luwei
  Cc: Tian, Kevin, Stefano Stabellini, Julien Grall, Nakajima, Jun,
	Wei Liu, Andrew Cooper, Michał Leszczyński,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

On Wed, Jun 17, 2020 at 12:37:13PM +0000, Kang, Luwei wrote:
> > How does KVM deal with this, do they insert/modify trace packets on trapped
> > and emulated instructions by the VMM?
> 
> The KVM includes instruction decoder and emulator(arch/x86/kvm/emulate.c), and the guest's memory can be set to write-protect as well. But it doesn't support Intel PT packets software emulator. For KVM, the Intel PT feature will be exposed to KVM guest and KVM guest can use Intel PT feature like native.

But if such feature is exposed to the guest for it's own usage, won't
it be missing packets for instructions emulated by the VMM?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-17 12:51           ` Roger Pau Monné
@ 2020-06-17 15:14             ` Andrew Cooper
  2020-06-17 18:56               ` Michał Leszczyński
  2020-06-17 23:30               ` Kang, Luwei
  0 siblings, 2 replies; 59+ messages in thread
From: Andrew Cooper @ 2020-06-17 15:14 UTC (permalink / raw)
  To: Roger Pau Monné, Michał Leszczyński
  Cc: Kevin Tian, luwei.kang, Jun Nakajima, Wei Liu, Jan Beulich, Xen-devel

On 17/06/2020 13:51, Roger Pau Monné wrote:
> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>
>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1 VM-Entry Controls
>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the reserved bits.
>> Please look at bit position 18 "Load IA32_RTIT_CTL".
> I think this is something different from what I was referring to.
> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> loading/storing a specific field on the vmcs that maps to the guest
> IA32_RTIT_CTL.
>
> OTOH MSR load lists can be used to load and store any arbitrary MSR on
> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.

If I remember the historic roadmaps correctly, there are 3 cases.

The first hardware to support PT (Broadwell?) prohibited its use
completely in VMX operations.  In this case, we can use it to trace PV
guests iff we don't enable VMX in hardware to begin with.

This was relaxed in later hardware (Skylake?) to permit use within VMX
operations, but without any help in the VMCS.  (i.e. manual context
switching per this patch, or MSR load lists as noted in the SDM.)

Subsequent support for "virtualised PT" was added (IceLake?) which adds
the load/save controls, and the ability to translate the output buffer
under EPT.

All of this is from memory so I'm quite possibly wrong with details, but
I believe this is why the current complexity exists.

~Andrew

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17  3:02       ` Tamas K Lengyel
@ 2020-06-17 16:19         ` Andrew Cooper
  2020-06-17 16:27           ` Tamas K Lengyel
                             ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Andrew Cooper @ 2020-06-17 16:19 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jan Beulich, Xen-devel, Roger Pau Monné

On 17/06/2020 04:02, Tamas K Lengyel wrote:
> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 16/06/2020 19:47, Michał Leszczyński wrote:
>>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
>>>
>>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>>>> not aware of any, and in principle we could use this functionality for
>>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>>>> be helpful to not tie the functionality to HVM guests, even if that is
>>>> the only option enabled to start with.
>>> I think at the moment it's not required to have EPT. This patch series doesn't use any translation feature flags, so the output address is always a machine physical address, regardless of context. I will check if it could be easily used with PV.
>> If its trivial to add PV support then please do.  If its not, then don't
>> feel obliged, but please do at least consider how PV support might look
>> in the eventual feature.
>>
>> (Generally speaking, considering "how would I make this work in other
>> modes where it is possible" leads to a better design.)
>>
>>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>>>> fighting with another opencoded example, take a look at the IOREQ
>>>> server's use of "acquire resource" which is a mapping interface which
>>>> supports allocating memory on behalf of the guest, outside of the guest
>>>> memory, for use by control tools.
>>>>
>>>> I think what this wants is a bit somewhere in domain_create to indicate
>>>> that external tracing is used for this domain (and allocate whatever
>>>> structures/buffers are necessary), acquire resource to map the buffers
>>>> themselves, and a domctl for any necessary runtime controls.
>>>>
>>> I will check this out, this sounds like a good option as it would remove lots of complexity from the existing ipt_enable domctl.
>> Xen has traditionally opted for a "and turn this extra thing on
>> dynamically" model, but this has caused no end of security issues and
>> broken corner cases.
>>
>> You can see this still existing in the difference between
>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>> required to chose the number of vcpus for the domain) and we're making
>> good progress undoing this particular wart (before 4.13, it was
>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>> issuing other hypercalls between these two).
>>
>> There is a lot of settings which should be immutable for the lifetime of
>> the domain, and external monitoring looks like another one of these.
>> Specifying it at createdomain time allows for far better runtime
>> behaviour (you are no longer in a situation where the first time you try
>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>> the meantime and used the remaining memory), and it makes for rather
>> more simple code in Xen itself (at runtime, you can rely on it having
>> been set up properly, because a failure setting up will have killed the
>> domain already).
> I'm not in favor of this being a flag that gets set during domain
> creation time. It could certainly be the case that some users would
> want this being on from the start till the end but in other cases you
> may want to enable it intermittently only for some time in-between
> particular events. If it's an on/off flag during domain creation you
> pretty much force that choice on the users and while the overhead of
> PT is better than say MTF it's certainly not nothing. In case there is
> an OOM situation enabling IPT dynamically the user can always just
> pause the VM and wait till memory becomes available.

There is nothing wrong with having "turn tracing on/off at runtime"
hypercalls.  It is specifically what I suggested two posts up in this
thread, but it should be limited to the TraceEn bit in RTIT_CTL.

What isn't ok is trying to allocate the buffers, write the TOPA, etc on
first-enable or first-map, because the runtime complexity of logic like
this large, and far too easy to get wrong in security relevant ways.

The domain create flag would mean "I wish to use tracing with this
domain", and not "I want tracing enabled from the getgo".

>>>> What semantics do you want for the buffer becoming full?  Given that
>>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
>>>> preferred behaviour, rather than drop packets on full?
>>>>
>>> Right now this is a ring-style buffer and when it would become full it would simply wrap and override the old data.
>> How does the consumer spot that the data has wrapped?  What happens if
>> data starts getting logged, but noone is listening?  What happens if the
>> consumer exits/crashes/etc and stops listening as a consequence?
>>
>> It's fine to simply state what will happen, and possibly even "don't do
>> that then", but the corner cases do at least need thinking about.
> AFAIU the current use-case is predominantly to be used in conjunction
> with VMI events where you want to be able to see the trace leading up
> to a particular vmexit. So in the case when the buffer is wrapped
> in-between events and data is lost that's not really of concern.

That's all fine.  I imagine the output here is voluminous, and needs
help being cut down as much as possible.

On a tangent, I presume you'd like to include VM-fork eventually, which
ought to include copying the trace buffer on fork?

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 16:19         ` Andrew Cooper
@ 2020-06-17 16:27           ` Tamas K Lengyel
  2020-06-17 17:23             ` Andrew Cooper
  2020-06-17 19:30             ` Michał Leszczyński
  2020-06-17 20:20           ` Michał Leszczyński
  2020-06-18 14:59           ` Michał Leszczyński
  2 siblings, 2 replies; 59+ messages in thread
From: Tamas K Lengyel @ 2020-06-17 16:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jan Beulich, Xen-devel, Roger Pau Monné

On Wed, Jun 17, 2020 at 10:19 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 17/06/2020 04:02, Tamas K Lengyel wrote:
> > On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> On 16/06/2020 19:47, Michał Leszczyński wrote:
> >>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> >>>
> >>>> Are there any restrictions on EPT being enabled in the first place?  I'm
> >>>> not aware of any, and in principle we could use this functionality for
> >>>> PV guests as well (using the CPL filter).  Therefore, I think it would
> >>>> be helpful to not tie the functionality to HVM guests, even if that is
> >>>> the only option enabled to start with.
> >>> I think at the moment it's not required to have EPT. This patch series doesn't use any translation feature flags, so the output address is always a machine physical address, regardless of context. I will check if it could be easily used with PV.
> >> If its trivial to add PV support then please do.  If its not, then don't
> >> feel obliged, but please do at least consider how PV support might look
> >> in the eventual feature.
> >>
> >> (Generally speaking, considering "how would I make this work in other
> >> modes where it is possible" leads to a better design.)
> >>
> >>>> The buffer mapping and creation logic is fairly problematic.  Instead of
> >>>> fighting with another opencoded example, take a look at the IOREQ
> >>>> server's use of "acquire resource" which is a mapping interface which
> >>>> supports allocating memory on behalf of the guest, outside of the guest
> >>>> memory, for use by control tools.
> >>>>
> >>>> I think what this wants is a bit somewhere in domain_create to indicate
> >>>> that external tracing is used for this domain (and allocate whatever
> >>>> structures/buffers are necessary), acquire resource to map the buffers
> >>>> themselves, and a domctl for any necessary runtime controls.
> >>>>
> >>> I will check this out, this sounds like a good option as it would remove lots of complexity from the existing ipt_enable domctl.
> >> Xen has traditionally opted for a "and turn this extra thing on
> >> dynamically" model, but this has caused no end of security issues and
> >> broken corner cases.
> >>
> >> You can see this still existing in the difference between
> >> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
> >> required to chose the number of vcpus for the domain) and we're making
> >> good progress undoing this particular wart (before 4.13, it was
> >> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
> >> issuing other hypercalls between these two).
> >>
> >> There is a lot of settings which should be immutable for the lifetime of
> >> the domain, and external monitoring looks like another one of these.
> >> Specifying it at createdomain time allows for far better runtime
> >> behaviour (you are no longer in a situation where the first time you try
> >> to turn tracing on, you end up with -ENOMEM because another VM booted in
> >> the meantime and used the remaining memory), and it makes for rather
> >> more simple code in Xen itself (at runtime, you can rely on it having
> >> been set up properly, because a failure setting up will have killed the
> >> domain already).
> > I'm not in favor of this being a flag that gets set during domain
> > creation time. It could certainly be the case that some users would
> > want this being on from the start till the end but in other cases you
> > may want to enable it intermittently only for some time in-between
> > particular events. If it's an on/off flag during domain creation you
> > pretty much force that choice on the users and while the overhead of
> > PT is better than say MTF it's certainly not nothing. In case there is
> > an OOM situation enabling IPT dynamically the user can always just
> > pause the VM and wait till memory becomes available.
>
> There is nothing wrong with having "turn tracing on/off at runtime"
> hypercalls.  It is specifically what I suggested two posts up in this
> thread, but it should be limited to the TraceEn bit in RTIT_CTL.
>
> What isn't ok is trying to allocate the buffers, write the TOPA, etc on
> first-enable or first-map, because the runtime complexity of logic like
> this large, and far too easy to get wrong in security relevant ways.
>
> The domain create flag would mean "I wish to use tracing with this
> domain", and not "I want tracing enabled from the getgo".

Gotcha, that's reasonable.

>
> >>>> What semantics do you want for the buffer becoming full?  Given that
> >>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
> >>>> preferred behaviour, rather than drop packets on full?
> >>>>
> >>> Right now this is a ring-style buffer and when it would become full it would simply wrap and override the old data.
> >> How does the consumer spot that the data has wrapped?  What happens if
> >> data starts getting logged, but noone is listening?  What happens if the
> >> consumer exits/crashes/etc and stops listening as a consequence?
> >>
> >> It's fine to simply state what will happen, and possibly even "don't do
> >> that then", but the corner cases do at least need thinking about.
> > AFAIU the current use-case is predominantly to be used in conjunction
> > with VMI events where you want to be able to see the trace leading up
> > to a particular vmexit. So in the case when the buffer is wrapped
> > in-between events and data is lost that's not really of concern.
>
> That's all fine.  I imagine the output here is voluminous, and needs
> help being cut down as much as possible.
>
> On a tangent, I presume you'd like to include VM-fork eventually, which
> ought to include copying the trace buffer on fork?

I would eventually like to use it to reconstruct the branch history so
we can update AFL's coverage map with that instead of having to do the
current breakpoint-singlestep dance. But for that I would only care
about the trace starting after the fork, so copying the parent's PT
buffer is not needed. We'll also probably only use PT if the branch
history is larger than what LBR can hold. I asked Michal to name the
hypercall interface "vmtrace" for this reason so we can add other
stuff like LBR later using the same interface (which I already
implemented in https://github.com/tklengyel/xen/commits/lbr).

Tamas


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 16:27           ` Tamas K Lengyel
@ 2020-06-17 17:23             ` Andrew Cooper
  2020-06-17 19:31               ` Tamas K Lengyel
  2020-06-17 19:30             ` Michał Leszczyński
  1 sibling, 1 reply; 59+ messages in thread
From: Andrew Cooper @ 2020-06-17 17:23 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jan Beulich, Xen-devel, Roger Pau Monné

On 17/06/2020 17:27, Tamas K Lengyel wrote:
>>>>>> What semantics do you want for the buffer becoming full?  Given that
>>>>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
>>>>>> preferred behaviour, rather than drop packets on full?
>>>>>>
>>>>> Right now this is a ring-style buffer and when it would become full it would simply wrap and override the old data.
>>>> How does the consumer spot that the data has wrapped?  What happens if
>>>> data starts getting logged, but noone is listening?  What happens if the
>>>> consumer exits/crashes/etc and stops listening as a consequence?
>>>>
>>>> It's fine to simply state what will happen, and possibly even "don't do
>>>> that then", but the corner cases do at least need thinking about.
>>> AFAIU the current use-case is predominantly to be used in conjunction
>>> with VMI events where you want to be able to see the trace leading up
>>> to a particular vmexit. So in the case when the buffer is wrapped
>>> in-between events and data is lost that's not really of concern.
>> That's all fine.  I imagine the output here is voluminous, and needs
>> help being cut down as much as possible.
>>
>> On a tangent, I presume you'd like to include VM-fork eventually, which
>> ought to include copying the trace buffer on fork?
> I would eventually like to use it to reconstruct the branch history so
> we can update AFL's coverage map with that instead of having to do the
> current breakpoint-singlestep dance. But for that I would only care
> about the trace starting after the fork, so copying the parent's PT
> buffer is not needed. We'll also probably only use PT if the branch
> history is larger than what LBR can hold. I asked Michal to name the
> hypercall interface "vmtrace" for this reason so we can add other
> stuff like LBR later using the same interface (which I already
> implemented in https://github.com/tklengyel/xen/commits/lbr).

I was wondering when someone was going to want LBR data like this. 
Can't you borrow the LBR-stitching tricks from Linux's perf to recover
the call trace even when its deeper than the LBR stack?

What about PEBS?  ISTR there is a fairly complicated matrix of which
features work in combination.


As for naming, we should definitely have something fairly generic. 
AFAICT, it would be applicable to ARM's CoreSight facilities as well.

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-17 15:14             ` Andrew Cooper
@ 2020-06-17 18:56               ` Michał Leszczyński
  2020-06-18  8:52                 ` Roger Pau Monné
  2020-06-17 23:30               ` Kang, Luwei
  1 sibling, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-17 18:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, luwei kang, Jun Nakajima, Wei Liu, Jan Beulich,
	Xen-devel, Roger Pau Monné

----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):

> On 17/06/2020 13:51, Roger Pau Monné wrote:
>> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>
>>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
>>>> VM-Entry Controls
>>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
>>>> how it should set the reserved bits.
>>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>> I think this is something different from what I was referring to.
>> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>> loading/storing a specific field on the vmcs that maps to the guest
>> IA32_RTIT_CTL.
>>
>> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> 
> If I remember the historic roadmaps correctly, there are 3 cases.
> 
> The first hardware to support PT (Broadwell?) prohibited its use
> completely in VMX operations.  In this case, we can use it to trace PV
> guests iff we don't enable VMX in hardware to begin with.
> 
> This was relaxed in later hardware (Skylake?) to permit use within VMX
> operations, but without any help in the VMCS.  (i.e. manual context
> switching per this patch, or MSR load lists as noted in the SDM.)
> 
> Subsequent support for "virtualised PT" was added (IceLake?) which adds
> the load/save controls, and the ability to translate the output buffer
> under EPT.
> 
> 
> All of this is from memory so I'm quite possibly wrong with details, but
> I believe this is why the current complexity exists.
> 
> ~Andrew


I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:

> 35.5.2.2 Guest-Only Tracing
> "For this usage, VM-entry is programmed to enable trace packet generation, while VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable trace-packet generation in the host."

it actually helped a bit. With patch v1 there were parts of hypervisor recorded in the trace (i.e. the moment between TRACE_EN being set and actual vmenter, and the moment between vmexit and TRACE_EN being unset). Using MSR load list this was eliminated. This change will be reflected in patch v2.


I can't however implement any working scenario in which all these MSRs are managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL is set to 0. The values of remaining registers will be stable after everything is serialized. I think this is too complex for the load lists alone. I belive that currently SDM instructs to use load lists only for toggling this single bit on-or-off.


Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR load lists and the rest of related MSRs being managed manually.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-16 17:23   ` Roger Pau Monné
@ 2020-06-17 19:13     ` Michał Leszczyński
  2020-06-18  3:20       ` Tamas K Lengyel
  2020-06-18  8:46       ` Roger Pau Monné
  2020-06-18 15:25     ` Michał Leszczyński
  1 sibling, 2 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-17 19:13 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
>> Provide an interface for privileged domains to manage
>> external IPT monitoring.
>> 
>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Thanks for the patch! I have some questions below which require your
> input.
> 
>> ---
>>  xen/arch/x86/hvm/hvm.c          | 170 ++++++++++++++++++++++++++++++++
>>  xen/include/public/hvm/hvm_op.h |  27 +++++
>>  2 files changed, 197 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 5bb47583b3..9292caebe0 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4949,6 +4949,172 @@ static int compat_altp2m_op(
>>      return rc;
>>  }
>>  
>> +static int do_vmtrace_op(
>> +    XEN_GUEST_HANDLE_PARAM(void) arg)
> 
> No need for the newline, this can fit on a single line.
> 
>> +{
>> +    struct xen_hvm_vmtrace_op a;
>> +    struct domain *d = NULL;
> 
> I don't think you need to init d to NULL (at least by looking at the
> current code below).
> 
>> +    int rc = -EFAULT;
> 
> No need to init rc.
> 
>> +    int i;
> 
> unsigned since it's used as a loop counter.
> 
>> +    struct vcpu *v;
>> +    void* buf;
> 
> Nit: '*' should be prepended to the variable name.
> 
>> +    uint32_t buf_size;
> 
> size_t
> 
>> +    uint32_t buf_order;
> 
> Order is generally fine using unsigned int, no need to use a
> specifically sized type.
> 
>> +    uint64_t buf_mfn;
> 
> Could this use the mfn type?
> 
>> +    struct page_info *pg;
>> +
>> +    if ( !hvm_ipt_supported() )
>> +        return -EOPNOTSUPP;
>> +
>> +    if ( copy_from_guest(&a, arg, 1) )
>> +        return -EFAULT;
>> +
>> +    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
>> +        return -EINVAL;
>> +
>> +    switch ( a.cmd )
>> +    {
>> +    case HVMOP_vmtrace_ipt_enable:
>> +    case HVMOP_vmtrace_ipt_disable:
>> +    case HVMOP_vmtrace_ipt_get_buf:
>> +    case HVMOP_vmtrace_ipt_get_offset:
>> +        break;
>> +
>> +    default:
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    d = rcu_lock_domain_by_any_id(a.domain);
>> +
>> +    if ( d == NULL )
>> +        return -ESRCH;
>> +
>> +    if ( !is_hvm_domain(d) )
>> +    {
>> +        rc = -EOPNOTSUPP;
>> +        goto out;
>> +    }
>> +
>> +    domain_pause(d);
>> +
>> +    if ( a.vcpu >= d->max_vcpus )
>> +    {
>> +        rc = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    v = d->vcpu[a.vcpu];
>> +
>> +    if ( a.cmd == HVMOP_vmtrace_ipt_enable )
> 
> Please use a switch here, you might even consider re-using the switch
> from above and moving the domain checks before actually checking the
> command field, so that you don't need to perform two switches against
> a.cmd.
> 
>> +    {
>> +        if ( v->arch.hvm.vmx.ipt_state ) {
> 
> Coding style, brace should be on newline (there are more below which
> I'm not going to comment on).
> 
>> +            // already enabled
> 
> Comments should use /* ... */, there multiple instances of this below
> which I'm not going to comment on, please check CODING_STYLE.
> 
> Also, the interface looks racy, I think you are missing a lock to
> protect v->arch.hvm.vmx.ipt_state from being freed under your feet if
> you issue concurrent calls to the interface.
> 
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        if ( a.size < PAGE_SIZE || a.size > 1000000 * PAGE_SIZE ) {
> 
> You can use GB(4) which is easier to read. Should the size also be a
> multiple of a PAGE_SIZE?
> 
>> +            // we don't accept trace buffer size smaller than single page
>> +            // and the upper bound is defined as 4GB in the specification
>> +            rc = -EINVAL;
>> +            goto out;
>> +	}
> 
> Stray tab.
> 
>> +
>> +        buf_order = get_order_from_bytes(a.size);
>> +
>> +        if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {
> 
> Oh here is the check. I think you can move this with the checks above
> by doing a.size & ~PAGE_MASK.


I belive it's more strict than a.size & ~PAGE_MASK. I think that CPU expects that the buffer size is a power of 2, so you can have 64 MB or 128 MB, but not 96 MB buffer.


> 
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        buf = page_to_virt(alloc_domheap_pages(d, buf_order,
>> MEMF_no_refcount));
> 
> What if alloc_domheap_pages return NULL?
> 
> Since I think you only what the linear address of the page to zero it
> I would suggest using clear_domain_page.
> 

Hmm. This was fixed already. Most probably I did something strange with git and this change was not stored. I will correct this with patch v2.


>> +        buf_size = a.size;
>> +
>> +        if ( !buf ) {
>> +            rc = -EFAULT;
>> +            goto out;
>> +        }
>> +
>> +        memset(buf, 0, buf_size);
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) {
>> +            share_xen_page_with_privileged_guests(virt_to_page(buf) + i,
>> SHARE_ro);
> 
> This line (and some more below) exceeds 80 characters, please split
> it.
> 
>> +        }
>> +
>> +        v->arch.hvm.vmx.ipt_state = xmalloc(struct ipt_state);
> 
> You should check that xmalloc has succeeds before trying to access
> ipt_state.
> 
>> +        v->arch.hvm.vmx.ipt_state->output_base = virt_to_mfn(buf) <<
>> PAGE_SHIFT;
>> +        v->arch.hvm.vmx.ipt_state->output_mask = buf_size - 1;
>> +        v->arch.hvm.vmx.ipt_state->status = 0;
>> +        v->arch.hvm.vmx.ipt_state->ctl = RTIT_CTL_TRACEEN | RTIT_CTL_OS |
>> RTIT_CTL_USR | RTIT_CTL_BRANCH_EN;
> 
> Shouldn't the user be able to select what tracing should be enabled?
> 
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_disable )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        buf_mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
>> +        buf_size = ( v->arch.hvm.vmx.ipt_state->output_mask + 1 ) &
>> 0xFFFFFFFFUL;
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>> +        {
>> +            if ( (mfn_to_page(_mfn(buf_mfn + i))->count_info & PGC_count_mask)
>> != 1 )
>> +            {
>> +                rc = -EBUSY;
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        xfree(v->arch.hvm.vmx.ipt_state);
>> +	v->arch.hvm.vmx.ipt_state = NULL;
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>> +        {
>> +            pg = mfn_to_page(_mfn(buf_mfn + i));
>> +            put_page_alloc_ref(pg);
>> +            if ( !test_and_clear_bit(_PGC_xen_heap, &pg->count_info) )
>> +                ASSERT_UNREACHABLE();
>> +            pg->u.inuse.type_info = 0;
>> +            page_set_owner(pg, NULL);
>> +            free_domheap_page(pg);
> 
> Hm, this seems fairly dangerous, what guarantees that the caller is
> not going to map the buffer while you are trying to tear it down?
> 
> You perform a check before freeing ipt_state, but between the check
> and the actual tearing down the domain might have setup mappings to
> them.
> 
> I wonder, could you expand a bit on why trace buffers are allocated
> from domheap memory by Xen?


In general, I thought it would be good to account trace buffers for particular DomUs, so it would be easier to troubleshoot the memory usage.


> 
> There are a couple of options here, maybe the caller could provide
> it's own buffer, then Xen would take an extra reference to those pages
> and setup them to be used as buffers.
> 
> Another alternative would be to use domhep memory but not let the
> caller map it directly, and instead introduce a hypercall to copy
> from the internal Xen buffer into a user-provided one.
> 
> How much memory is used on average by those buffers? That would help
> decide a model that would best fit the usage.


From 4 kB to 4 GB. Right now I use 128 MB buffers and it takes just a few seconds to fill them up completely.

I think I've just copied the pattern which is already present in Xen's code, e.g. interfaces used by xenbaked/xentrace tools.


> 
>> +        }
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_buf )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
> 
> This will not work for translated domains, ie: a PVH or HVM domain
> won't be able to use this interface since it has no way to request the
> mapping of a specific mfn into it's physmap. I think we need to take
> this into account when deciding how the interface should be, so that
> we don't corner ourselves with a PV only interface.

Please be aware that this is only going to be used by Dom0. Is is well-supported case that somebody is using PVH/HVM Dom0?

I think that all Virtual Machine Introspection stuff currently requires to have Dom0 PV. Our main goal is to have this working well in combo with VMI.


> 
>> +        a.size = (v->arch.hvm.vmx.ipt_state->output_mask + 1) & 0xFFFFFFFFUL;
> 
> You can truncate it easier by casting to uint32_t I think.
> 
> Or even better, you could put output_mask in a union like:
> 
> union {
>    uint64_t raw;
>    struct {
>        uint32_t size;
>	uint32_t offset;
>    }
> }
> 
> Then you can avoid the shifting and the castings.
> 
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_offset )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        a.offset = v->arch.hvm.vmx.ipt_state->output_mask >> 32;
>> +    }
>> +
>> +    rc = -EFAULT;
>> +    if ( __copy_to_guest(arg, &a, 1) )
>> +      goto out;
>> +    rc = 0;
>> +
>> + out:
>> +    smp_wmb();
> 
> Why do you need a barrier here?
> 
>> +    domain_unpause(d);
>> +    rcu_unlock_domain(d);
>> +
>> +    return rc;
>> +}
>> +
>> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
>> +
>>  static int hvmop_get_mem_type(
>>      XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
>>  {
>> @@ -5101,6 +5267,10 @@ long do_hvm_op(unsigned long op,
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>          rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
>>          break;
>>  
>> +    case HVMOP_vmtrace:
>> +        rc = do_vmtrace_op(arg);
>> +        break;
>> +
>>      default:
>>      {
>>          gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
>> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
>> index 870ec52060..3bbcd54c96 100644
>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>>  
>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
>> +#define HVMOP_vmtrace 26
>> +
>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
>> +
>> +struct xen_hvm_vmtrace_op {
>> +    /* IN variable */
>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>> +    uint32_t cmd;
>> +/* Enable/disable external vmtrace for given domain */
>> +#define HVMOP_vmtrace_ipt_enable      1
>> +#define HVMOP_vmtrace_ipt_disable     2
>> +#define HVMOP_vmtrace_ipt_get_buf     3
>> +#define HVMOP_vmtrace_ipt_get_offset  4
>> +    domid_t domain;
> 
> You are missing a padding field here AFAICT.
> 
> Roger.


Thanks for your feedback, I will apply all the remaining suggestions in patch v2.

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 16:27           ` Tamas K Lengyel
  2020-06-17 17:23             ` Andrew Cooper
@ 2020-06-17 19:30             ` Michał Leszczyński
  1 sibling, 0 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-17 19:30 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Xen-devel, Roger Pau Monné

----- 17 cze 2020 o 18:27, Tamas K Lengyel tamas.k.lengyel@gmail.com napisał(a):

> On Wed, Jun 17, 2020 at 10:19 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>>
>> On 17/06/2020 04:02, Tamas K Lengyel wrote:
>> > On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> >> On 16/06/2020 19:47, Michał Leszczyński wrote:
>> >>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
>> >>>
>> >>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>> >>>> not aware of any, and in principle we could use this functionality for
>> >>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>> >>>> be helpful to not tie the functionality to HVM guests, even if that is
>> >>>> the only option enabled to start with.
>> >>> I think at the moment it's not required to have EPT. This patch series doesn't
>> >>> use any translation feature flags, so the output address is always a machine
>> >>> physical address, regardless of context. I will check if it could be easily
>> >>> used with PV.
>> >> If its trivial to add PV support then please do.  If its not, then don't
>> >> feel obliged, but please do at least consider how PV support might look
>> >> in the eventual feature.
>> >>
>> >> (Generally speaking, considering "how would I make this work in other
>> >> modes where it is possible" leads to a better design.)
>> >>
>> >>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>> >>>> fighting with another opencoded example, take a look at the IOREQ
>> >>>> server's use of "acquire resource" which is a mapping interface which
>> >>>> supports allocating memory on behalf of the guest, outside of the guest
>> >>>> memory, for use by control tools.
>> >>>>
>> >>>> I think what this wants is a bit somewhere in domain_create to indicate
>> >>>> that external tracing is used for this domain (and allocate whatever
>> >>>> structures/buffers are necessary), acquire resource to map the buffers
>> >>>> themselves, and a domctl for any necessary runtime controls.
>> >>>>
>> >>> I will check this out, this sounds like a good option as it would remove lots of
>> >>> complexity from the existing ipt_enable domctl.
>> >> Xen has traditionally opted for a "and turn this extra thing on
>> >> dynamically" model, but this has caused no end of security issues and
>> >> broken corner cases.
>> >>
>> >> You can see this still existing in the difference between
>> >> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>> >> required to chose the number of vcpus for the domain) and we're making
>> >> good progress undoing this particular wart (before 4.13, it was
>> >> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>> >> issuing other hypercalls between these two).
>> >>
>> >> There is a lot of settings which should be immutable for the lifetime of
>> >> the domain, and external monitoring looks like another one of these.
>> >> Specifying it at createdomain time allows for far better runtime
>> >> behaviour (you are no longer in a situation where the first time you try
>> >> to turn tracing on, you end up with -ENOMEM because another VM booted in
>> >> the meantime and used the remaining memory), and it makes for rather
>> >> more simple code in Xen itself (at runtime, you can rely on it having
>> >> been set up properly, because a failure setting up will have killed the
>> >> domain already).
>> > I'm not in favor of this being a flag that gets set during domain
>> > creation time. It could certainly be the case that some users would
>> > want this being on from the start till the end but in other cases you
>> > may want to enable it intermittently only for some time in-between
>> > particular events. If it's an on/off flag during domain creation you
>> > pretty much force that choice on the users and while the overhead of
>> > PT is better than say MTF it's certainly not nothing. In case there is
>> > an OOM situation enabling IPT dynamically the user can always just
>> > pause the VM and wait till memory becomes available.
>>
>> There is nothing wrong with having "turn tracing on/off at runtime"
>> hypercalls.  It is specifically what I suggested two posts up in this
>> thread, but it should be limited to the TraceEn bit in RTIT_CTL.
>>
>> What isn't ok is trying to allocate the buffers, write the TOPA, etc on
>> first-enable or first-map, because the runtime complexity of logic like
>> this large, and far too easy to get wrong in security relevant ways.
>>
>> The domain create flag would mean "I wish to use tracing with this
>> domain", and not "I want tracing enabled from the getgo".
> 
> Gotcha, that's reasonable.
> 


I think I also agree with this, i.e. to alloc buffers on domain creation and just enable/disable the feature in runtime. This would remove some complexity from runtime. I think it's usually (always?) known in advance whether we would like to use external monitoring on a domain or not.

I will try to adapt this approach in patch v2.


>>
>> >>>> What semantics do you want for the buffer becoming full?  Given that
>> >>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
>> >>>> preferred behaviour, rather than drop packets on full?
>> >>>>
>> >>> Right now this is a ring-style buffer and when it would become full it would
>> >>> simply wrap and override the old data.
>> >> How does the consumer spot that the data has wrapped?  What happens if
>> >> data starts getting logged, but noone is listening?  What happens if the
>> >> consumer exits/crashes/etc and stops listening as a consequence?
>> >>
>> >> It's fine to simply state what will happen, and possibly even "don't do
>> >> that then", but the corner cases do at least need thinking about.
>> > AFAIU the current use-case is predominantly to be used in conjunction
>> > with VMI events where you want to be able to see the trace leading up
>> > to a particular vmexit. So in the case when the buffer is wrapped
>> > in-between events and data is lost that's not really of concern.
>>
>> That's all fine.  I imagine the output here is voluminous, and needs
>> help being cut down as much as possible.
>>
>> On a tangent, I presume you'd like to include VM-fork eventually, which
>> ought to include copying the trace buffer on fork?
> 
> I would eventually like to use it to reconstruct the branch history so
> we can update AFL's coverage map with that instead of having to do the
> current breakpoint-singlestep dance. But for that I would only care
> about the trace starting after the fork, so copying the parent's PT
> buffer is not needed. We'll also probably only use PT if the branch
> history is larger than what LBR can hold. I asked Michal to name the
> hypercall interface "vmtrace" for this reason so we can add other
> stuff like LBR later using the same interface (which I already
> implemented in https://github.com/tklengyel/xen/commits/lbr).
> 
> Tamas


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 17:23             ` Andrew Cooper
@ 2020-06-17 19:31               ` Tamas K Lengyel
  0 siblings, 0 replies; 59+ messages in thread
From: Tamas K Lengyel @ 2020-06-17 19:31 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jan Beulich, Xen-devel, Roger Pau Monné

On Wed, Jun 17, 2020 at 11:23 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 17/06/2020 17:27, Tamas K Lengyel wrote:
> >>>>>> What semantics do you want for the buffer becoming full?  Given that
> >>>>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
> >>>>>> preferred behaviour, rather than drop packets on full?
> >>>>>>
> >>>>> Right now this is a ring-style buffer and when it would become full it would simply wrap and override the old data.
> >>>> How does the consumer spot that the data has wrapped?  What happens if
> >>>> data starts getting logged, but noone is listening?  What happens if the
> >>>> consumer exits/crashes/etc and stops listening as a consequence?
> >>>>
> >>>> It's fine to simply state what will happen, and possibly even "don't do
> >>>> that then", but the corner cases do at least need thinking about.
> >>> AFAIU the current use-case is predominantly to be used in conjunction
> >>> with VMI events where you want to be able to see the trace leading up
> >>> to a particular vmexit. So in the case when the buffer is wrapped
> >>> in-between events and data is lost that's not really of concern.
> >> That's all fine.  I imagine the output here is voluminous, and needs
> >> help being cut down as much as possible.
> >>
> >> On a tangent, I presume you'd like to include VM-fork eventually, which
> >> ought to include copying the trace buffer on fork?
> > I would eventually like to use it to reconstruct the branch history so
> > we can update AFL's coverage map with that instead of having to do the
> > current breakpoint-singlestep dance. But for that I would only care
> > about the trace starting after the fork, so copying the parent's PT
> > buffer is not needed. We'll also probably only use PT if the branch
> > history is larger than what LBR can hold. I asked Michal to name the
> > hypercall interface "vmtrace" for this reason so we can add other
> > stuff like LBR later using the same interface (which I already
> > implemented in https://github.com/tklengyel/xen/commits/lbr).
>
> I was wondering when someone was going to want LBR data like this.
> Can't you borrow the LBR-stitching tricks from Linux's perf to recover
> the call trace even when its deeper than the LBR stack?

TBH I only spent like an hour putting it together so I haven't
investigated the topic too much. But thanks for the tip, first I heard
about this LBR-stitching trick ;)

>
> What about PEBS?  ISTR there is a fairly complicated matrix of which
> features work in combination.

There is also BTS.. I would assume it would take some experimentation
to figure out what works and when and in what combination. Right now I
have no plans for doing that experimentation or adding support for
additional tracers.

>
>
> As for naming, we should definitely have something fairly generic.
> AFAICT, it would be applicable to ARM's CoreSight facilities as well.

IMHO XEN_DOMCTL_vmtrace would be a good name for controlling these features.

Tamas


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 16:19         ` Andrew Cooper
  2020-06-17 16:27           ` Tamas K Lengyel
@ 2020-06-17 20:20           ` Michał Leszczyński
  2020-06-18  8:25             ` Roger Pau Monné
  2020-06-18 14:59           ` Michał Leszczyński
  2 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-17 20:20 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Tamas K Lengyel, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel, Roger Pau Monné

----- 17 cze 2020 o 18:19, Andrew Cooper andrew.cooper3@citrix.com napisał(a):

> On 17/06/2020 04:02, Tamas K Lengyel wrote:
>> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 16/06/2020 19:47, Michał Leszczyński wrote:
>>>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
>>>>
>>>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>>>>> not aware of any, and in principle we could use this functionality for
>>>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>>>>> be helpful to not tie the functionality to HVM guests, even if that is
>>>>> the only option enabled to start with.
>>>> I think at the moment it's not required to have EPT. This patch series doesn't
>>>> use any translation feature flags, so the output address is always a machine
>>>> physical address, regardless of context. I will check if it could be easily
>>>> used with PV.
>>> If its trivial to add PV support then please do.  If its not, then don't
>>> feel obliged, but please do at least consider how PV support might look
>>> in the eventual feature.
>>>
>>> (Generally speaking, considering "how would I make this work in other
>>> modes where it is possible" leads to a better design.)
>>>
>>>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>>>>> fighting with another opencoded example, take a look at the IOREQ
>>>>> server's use of "acquire resource" which is a mapping interface which
>>>>> supports allocating memory on behalf of the guest, outside of the guest
>>>>> memory, for use by control tools.
>>>>>


One thing that remains unclear to me is the "acquire resource" part. Could you give some more details on that?

Assuming that buffers are allocated right from the domain creation, what mechanism (instead of xc_map_foreign_range) should I use to map the IPT buffers into Dom0?


>>>>> I think what this wants is a bit somewhere in domain_create to indicate
>>>>> that external tracing is used for this domain (and allocate whatever
>>>>> structures/buffers are necessary), acquire resource to map the buffers
>>>>> themselves, and a domctl for any necessary runtime controls.
>>>>>
>>>> I will check this out, this sounds like a good option as it would remove lots of
>>>> complexity from the existing ipt_enable domctl.
>>> Xen has traditionally opted for a "and turn this extra thing on
>>> dynamically" model, but this has caused no end of security issues and
>>> broken corner cases.
>>>
>>> You can see this still existing in the difference between
>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>> required to chose the number of vcpus for the domain) and we're making
>>> good progress undoing this particular wart (before 4.13, it was
>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>> issuing other hypercalls between these two).
>>>
>>> There is a lot of settings which should be immutable for the lifetime of
>>> the domain, and external monitoring looks like another one of these.
>>> Specifying it at createdomain time allows for far better runtime
>>> behaviour (you are no longer in a situation where the first time you try
>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>> the meantime and used the remaining memory), and it makes for rather
>>> more simple code in Xen itself (at runtime, you can rely on it having
>>> been set up properly, because a failure setting up will have killed the
>>> domain already).
>> I'm not in favor of this being a flag that gets set during domain
>> creation time. It could certainly be the case that some users would
>> want this being on from the start till the end but in other cases you
>> may want to enable it intermittently only for some time in-between
>> particular events. If it's an on/off flag during domain creation you
>> pretty much force that choice on the users and while the overhead of
>> PT is better than say MTF it's certainly not nothing. In case there is
>> an OOM situation enabling IPT dynamically the user can always just
>> pause the VM and wait till memory becomes available.
> 
> There is nothing wrong with having "turn tracing on/off at runtime"
> hypercalls.  It is specifically what I suggested two posts up in this
> thread, but it should be limited to the TraceEn bit in RTIT_CTL.
> 
> What isn't ok is trying to allocate the buffers, write the TOPA, etc on
> first-enable or first-map, because the runtime complexity of logic like
> this large, and far too easy to get wrong in security relevant ways.
> 
> The domain create flag would mean "I wish to use tracing with this
> domain", and not "I want tracing enabled from the getgo".
> 
>>>>> What semantics do you want for the buffer becoming full?  Given that
>>>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
>>>>> preferred behaviour, rather than drop packets on full?
>>>>>
>>>> Right now this is a ring-style buffer and when it would become full it would
>>>> simply wrap and override the old data.
>>> How does the consumer spot that the data has wrapped?  What happens if
>>> data starts getting logged, but noone is listening?  What happens if the
>>> consumer exits/crashes/etc and stops listening as a consequence?
>>>
>>> It's fine to simply state what will happen, and possibly even "don't do
>>> that then", but the corner cases do at least need thinking about.
>> AFAIU the current use-case is predominantly to be used in conjunction
>> with VMI events where you want to be able to see the trace leading up
>> to a particular vmexit. So in the case when the buffer is wrapped
>> in-between events and data is lost that's not really of concern.
> 
> That's all fine.  I imagine the output here is voluminous, and needs
> help being cut down as much as possible.
> 
> On a tangent, I presume you'd like to include VM-fork eventually, which
> ought to include copying the trace buffer on fork?
> 
> ~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 12:53             ` Roger Pau Monné
@ 2020-06-17 23:29               ` Kang, Luwei
  2020-06-18  0:56                 ` Michał Leszczyński
  0 siblings, 1 reply; 59+ messages in thread
From: Kang, Luwei @ 2020-06-17 23:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tian, Kevin, Stefano Stabellini, Julien Grall, Nakajima, Jun,
	Wei Liu, Andrew Cooper, Michał Leszczyński,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

> > > How does KVM deal with this, do they insert/modify trace packets on
> > > trapped and emulated instructions by the VMM?
> >
> > The KVM includes instruction decoder and
> emulator(arch/x86/kvm/emulate.c), and the guest's memory can be set to
> write-protect as well. But it doesn't support Intel PT packets software emulator.
> For KVM, the Intel PT feature will be exposed to KVM guest and KVM guest can
> use Intel PT feature like native.
> 
> But if such feature is exposed to the guest for it's own usage, won't it be
> missing packets for instructions emulated by the VMM?

If setting the guest's memory write-protect, I think yes. 

Thanks,
Luwei Kang

> 
> Thanks, Roger.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-17 15:14             ` Andrew Cooper
  2020-06-17 18:56               ` Michał Leszczyński
@ 2020-06-17 23:30               ` Kang, Luwei
  2020-06-18 10:02                 ` Andrew Cooper
  1 sibling, 1 reply; 59+ messages in thread
From: Kang, Luwei @ 2020-06-17 23:30 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monné, Michał Leszczyński
  Cc: Xen-devel, Tian, Kevin, Wei Liu, Jan Beulich, Nakajima, Jun

> > On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> >> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >>
> >>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control
> >>> Fields -> 24.8.1 VM-Entry Controls Software should consult the VMX
> capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the
> reserved bits.
> >> Please look at bit position 18 "Load IA32_RTIT_CTL".
> > I think this is something different from what I was referring to.
> > Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> > loading/storing a specific field on the vmcs that maps to the guest
> > IA32_RTIT_CTL.
> >
> > OTOH MSR load lists can be used to load and store any arbitrary MSR on
> > vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> > already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> 
> If I remember the historic roadmaps correctly, there are 3 cases.
> 
> The first hardware to support PT (Broadwell?) prohibited its use completely in
> VMX operations.  In this case, we can use it to trace PV guests iff we don't
> enable VMX in hardware to begin with.
> 
> This was relaxed in later hardware (Skylake?) to permit use within VMX
> operations, but without any help in the VMCS.  (i.e. manual context switching
> per this patch, or MSR load lists as noted in the SDM.)
> 
> Subsequent support for "virtualised PT" was added (IceLake?) which adds the
> load/save controls, and the ability to translate the output buffer under EPT.
> 
> 
> All of this is from memory so I'm quite possibly wrong with details, but I believe
> this is why the current complexity exists.

Yes, It include 3 cases.
1. Before IA32_VMX_MISC[bit 14]:
     Intel PT doesn't support tracing in VMX operation. Execution of the VMXON instruction clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL in VMX operation causes a general-protection exception (#GP)
2. Support IA32_VMX_MISC[bit 14] but no EPT to direct PT output:
    Intel PT can be enabled across VMX but the address of Intel PT buffer is always HPA from HW point of view. There is not VMCS support in this stage. The MSR load list can be used for Intel PT context switch(VM-Entry/Exit).
3. Intel PT VM improvements (start from Icelake):
    Add a new guest IA32_RTIT_CTL field in VMCS, and HW treat the PT output addresses as GPA and translate them using EPT.

Thanks,
Luwei Kang

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 23:29               ` Kang, Luwei
@ 2020-06-18  0:56                 ` Michał Leszczyński
  2020-06-18  7:00                   ` Roger Pau Monné
  0 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-18  0:56 UTC (permalink / raw)
  To: Kang, Luwei
  Cc: Tian, Kevin, Stefano Stabellini, Julien Grall, Nakajima, Jun,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Tamas K Lengyel, Xen-devel, Roger Pau Monné

----- 18 cze 2020 o 1:29, Kang, Luwei luwei.kang@intel.com napisał(a):

>> > > How does KVM deal with this, do they insert/modify trace packets on
>> > > trapped and emulated instructions by the VMM?
>> >
>> > The KVM includes instruction decoder and
>> emulator(arch/x86/kvm/emulate.c), and the guest's memory can be set to
>> write-protect as well. But it doesn't support Intel PT packets software
>> emulator.
>> For KVM, the Intel PT feature will be exposed to KVM guest and KVM guest can
>> use Intel PT feature like native.
>> 
>> But if such feature is exposed to the guest for it's own usage, won't it be
>> missing packets for instructions emulated by the VMM?
> 
> If setting the guest's memory write-protect, I think yes.


Thus, I propose to leave it as it is right now. If somebody is purposely altering the VM state then he/she should consult not only the IPT but also understand what was done "in the meantime" by additional features, e.g. when something was altered by vm_event callback. As Tamas said previously, we usually just want to see certain path leading to vmexit.

Please also note that there is a PTWRITE instruction that could be used in the future in order to add custom payloads/hints to the PT trace, when needed.


> 
> Thanks,
> Luwei Kang
> 
>> 
> > Thanks, Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-17 19:13     ` Michał Leszczyński
@ 2020-06-18  3:20       ` Tamas K Lengyel
  2020-06-18 11:01         ` Michał Leszczyński
  2020-06-18  8:46       ` Roger Pau Monné
  1 sibling, 1 reply; 59+ messages in thread
From: Tamas K Lengyel @ 2020-06-18  3:20 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel,
	Roger Pau Monné

> >> +
> >> +        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
> >
> > This will not work for translated domains, ie: a PVH or HVM domain
> > won't be able to use this interface since it has no way to request the
> > mapping of a specific mfn into it's physmap. I think we need to take
> > this into account when deciding how the interface should be, so that
> > we don't corner ourselves with a PV only interface.
>
> Please be aware that this is only going to be used by Dom0. Is is well-supported case that somebody is using PVH/HVM Dom0?
>
> I think that all Virtual Machine Introspection stuff currently requires to have Dom0 PV. Our main goal is to have this working well in combo with VMI.

FYI the VMI interface doesn't require a PV domain. It works fine from
PVH dom0 or even from a secondary privileged HVM DomU as well,
provided you have the right XSM policy to allow that.

Tamas


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-18  0:56                 ` Michał Leszczyński
@ 2020-06-18  7:00                   ` Roger Pau Monné
  0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-18  7:00 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Tian, Kevin, Stefano Stabellini, Kang, Luwei,
	Nakajima, Jun, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, Tamas K Lengyel, Xen-devel

On Thu, Jun 18, 2020 at 02:56:17AM +0200, Michał Leszczyński wrote:
> ----- 18 cze 2020 o 1:29, Kang, Luwei luwei.kang@intel.com napisał(a):
> 
> >> > > How does KVM deal with this, do they insert/modify trace packets on
> >> > > trapped and emulated instructions by the VMM?
> >> >
> >> > The KVM includes instruction decoder and
> >> emulator(arch/x86/kvm/emulate.c), and the guest's memory can be set to
> >> write-protect as well. But it doesn't support Intel PT packets software
> >> emulator.
> >> For KVM, the Intel PT feature will be exposed to KVM guest and KVM guest can
> >> use Intel PT feature like native.
> >> 
> >> But if such feature is exposed to the guest for it's own usage, won't it be
> >> missing packets for instructions emulated by the VMM?
> > 
> > If setting the guest's memory write-protect, I think yes.
> 
> 
> Thus, I propose to leave it as it is right now. If somebody is purposely altering the VM state then he/she should consult not only the IPT but also understand what was done "in the meantime" by additional features, e.g. when something was altered by vm_event callback. As Tamas said previously, we usually just want to see certain path leading to vmexit.
> 
> Please also note that there is a PTWRITE instruction that could be used in the future in order to add custom payloads/hints to the PT trace, when needed.

Yes, I think the usage of IPT by a third party against a guest is
fine, as such third party can also use introspection and get the
information about the emulated instructions.

OTOH exposing the feature to the guest itself for it's own usage seems
wrong without adding the packets related to the instructions emulated.

I understand the current series only cares about the first option, so
that's perfectly fine.

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 20:20           ` Michał Leszczyński
@ 2020-06-18  8:25             ` Roger Pau Monné
  0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-18  8:25 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Tamas K Lengyel, Xen-devel

On Wed, Jun 17, 2020 at 10:20:20PM +0200, Michał Leszczyński wrote:
> ----- 17 cze 2020 o 18:19, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> 
> > On 17/06/2020 04:02, Tamas K Lengyel wrote:
> >> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >>> On 16/06/2020 19:47, Michał Leszczyński wrote:
> >>>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> >>>>
> >>>>> Are there any restrictions on EPT being enabled in the first place?  I'm
> >>>>> not aware of any, and in principle we could use this functionality for
> >>>>> PV guests as well (using the CPL filter).  Therefore, I think it would
> >>>>> be helpful to not tie the functionality to HVM guests, even if that is
> >>>>> the only option enabled to start with.
> >>>> I think at the moment it's not required to have EPT. This patch series doesn't
> >>>> use any translation feature flags, so the output address is always a machine
> >>>> physical address, regardless of context. I will check if it could be easily
> >>>> used with PV.
> >>> If its trivial to add PV support then please do.  If its not, then don't
> >>> feel obliged, but please do at least consider how PV support might look
> >>> in the eventual feature.
> >>>
> >>> (Generally speaking, considering "how would I make this work in other
> >>> modes where it is possible" leads to a better design.)
> >>>
> >>>>> The buffer mapping and creation logic is fairly problematic.  Instead of
> >>>>> fighting with another opencoded example, take a look at the IOREQ
> >>>>> server's use of "acquire resource" which is a mapping interface which
> >>>>> supports allocating memory on behalf of the guest, outside of the guest
> >>>>> memory, for use by control tools.
> >>>>>
> 
> 
> One thing that remains unclear to me is the "acquire resource" part. Could you give some more details on that?
> 
> Assuming that buffers are allocated right from the domain creation, what mechanism (instead of xc_map_foreign_range) should I use to map the IPT buffers into Dom0?

Take a look at demu's demu_initialize function [0] (and it's usage of
xenforeignmemory_map_resource), you likely need something similar for
the trace buffers, introducing a new XENMEM_resource_trace_data kind
of resource (naming subject to change), and use the id field in
xen_mem_acquire_resource to signal which vCPU buffer you want to
map.

That's usable by both PV and HVM guests.

Roger.

[0] http://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=blob;f=demu.c;h=f785b394d0cf141dffa05bdddecf338214358aea;hb=refs/heads/master#l453


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-17 19:13     ` Michał Leszczyński
  2020-06-18  3:20       ` Tamas K Lengyel
@ 2020-06-18  8:46       ` Roger Pau Monné
  1 sibling, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-18  8:46 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

On Wed, Jun 17, 2020 at 09:13:05PM +0200, Michał Leszczyński wrote:
> ----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
> >> +        buf_order = get_order_from_bytes(a.size);
> >> +
> >> +        if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {
> > 
> > Oh here is the check. I think you can move this with the checks above
> > by doing a.size & ~PAGE_MASK.
> 
> 
> I belive it's more strict than a.size & ~PAGE_MASK. I think that CPU expects that the buffer size is a power of 2, so you can have 64 MB or 128 MB, but not 96 MB buffer.

Oh, sorry, didn't realize. I think it's clearer to check if a size is
not a power of two by doing (size & (size - 1)). This could be joined
with the previous checks.

> > There are a couple of options here, maybe the caller could provide
> > it's own buffer, then Xen would take an extra reference to those pages
> > and setup them to be used as buffers.
> > 
> > Another alternative would be to use domhep memory but not let the
> > caller map it directly, and instead introduce a hypercall to copy
> > from the internal Xen buffer into a user-provided one.
> > 
> > How much memory is used on average by those buffers? That would help
> > decide a model that would best fit the usage.
> 
> 
> From 4 kB to 4 GB. Right now I use 128 MB buffers and it takes just a few seconds to fill them up completely.
> 
> I think I've just copied the pattern which is already present in Xen's code, e.g. interfaces used by xenbaked/xentrace tools.

I think using XENMEM_acquire_resource will result in cleaner code
overall, it would also avoid having to share the pages with Xen
AFAICT. It's also more inline with how new interfaces deal with this
kind of memory sharing.

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-17 18:56               ` Michał Leszczyński
@ 2020-06-18  8:52                 ` Roger Pau Monné
  2020-06-18 11:07                   ` Michał Leszczyński
  0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-18  8:52 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, luwei kang, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Xen-devel

On Wed, Jun 17, 2020 at 08:56:57PM +0200, Michał Leszczyński wrote:
> ----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> 
> > On 17/06/2020 13:51, Roger Pau Monné wrote:
> >> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> >>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >>>
> >>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
> >>>> VM-Entry Controls
> >>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
> >>>> how it should set the reserved bits.
> >>> Please look at bit position 18 "Load IA32_RTIT_CTL".
> >> I think this is something different from what I was referring to.
> >> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> >> loading/storing a specific field on the vmcs that maps to the guest
> >> IA32_RTIT_CTL.
> >>
> >> OTOH MSR load lists can be used to load and store any arbitrary MSR on
> >> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> >> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> > 
> > If I remember the historic roadmaps correctly, there are 3 cases.
> > 
> > The first hardware to support PT (Broadwell?) prohibited its use
> > completely in VMX operations.  In this case, we can use it to trace PV
> > guests iff we don't enable VMX in hardware to begin with.
> > 
> > This was relaxed in later hardware (Skylake?) to permit use within VMX
> > operations, but without any help in the VMCS.  (i.e. manual context
> > switching per this patch, or MSR load lists as noted in the SDM.)
> > 
> > Subsequent support for "virtualised PT" was added (IceLake?) which adds
> > the load/save controls, and the ability to translate the output buffer
> > under EPT.
> > 
> > 
> > All of this is from memory so I'm quite possibly wrong with details, but
> > I believe this is why the current complexity exists.
> > 
> > ~Andrew
> 
> 
> I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:
> 
> > 35.5.2.2 Guest-Only Tracing
> > "For this usage, VM-entry is programmed to enable trace packet generation, while VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable trace-packet generation in the host."
> 
> it actually helped a bit. With patch v1 there were parts of hypervisor recorded in the trace (i.e. the moment between TRACE_EN being set and actual vmenter, and the moment between vmexit and TRACE_EN being unset). Using MSR load list this was eliminated. This change will be reflected in patch v2.
> 
> 
> I can't however implement any working scenario in which all these MSRs are managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL is set to 0. The values of remaining registers will be stable after everything is serialized. I think this is too complex for the load lists alone. I belive that currently SDM instructs to use load lists only for toggling this single bit on-or-off.

I think that's exactly what we want: handling TraceEn at
vmentry/vmexit, so that no hypervisor packets are recorded. The rest
of the MSRs can be handled in VMM mode without issues. Switching those
on every vmentry/vmexit would also add more overhead that needed,
since I assume they don't need to be modified on every entry/exit?

> 
> Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR load lists and the rest of related MSRs being managed manually.

Yes, that' seems like a good approach.

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-17 23:30               ` Kang, Luwei
@ 2020-06-18 10:02                 ` Andrew Cooper
  0 siblings, 0 replies; 59+ messages in thread
From: Andrew Cooper @ 2020-06-18 10:02 UTC (permalink / raw)
  To: Kang, Luwei, Roger Pau Monné, Michał Leszczyński
  Cc: Xen-devel, Tian, Kevin, Wei Liu, Jan Beulich, Nakajima, Jun

On 18/06/2020 00:30, Kang, Luwei wrote:
>>> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>>>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>>
>>>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control
>>>>> Fields -> 24.8.1 VM-Entry Controls Software should consult the VMX
>> capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the
>> reserved bits.
>>>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>>> I think this is something different from what I was referring to.
>>> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>>> loading/storing a specific field on the vmcs that maps to the guest
>>> IA32_RTIT_CTL.
>>>
>>> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>>> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>>> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
>> If I remember the historic roadmaps correctly, there are 3 cases.
>>
>> The first hardware to support PT (Broadwell?) prohibited its use completely in
>> VMX operations.  In this case, we can use it to trace PV guests iff we don't
>> enable VMX in hardware to begin with.
>>
>> This was relaxed in later hardware (Skylake?) to permit use within VMX
>> operations, but without any help in the VMCS.  (i.e. manual context switching
>> per this patch, or MSR load lists as noted in the SDM.)
>>
>> Subsequent support for "virtualised PT" was added (IceLake?) which adds the
>> load/save controls, and the ability to translate the output buffer under EPT.
>>
>>
>> All of this is from memory so I'm quite possibly wrong with details, but I believe
>> this is why the current complexity exists.
> Yes, It include 3 cases.
> 1. Before IA32_VMX_MISC[bit 14]:
>      Intel PT doesn't support tracing in VMX operation. Execution of the VMXON instruction clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL in VMX operation causes a general-protection exception (#GP)
> 2. Support IA32_VMX_MISC[bit 14] but no EPT to direct PT output:
>     Intel PT can be enabled across VMX but the address of Intel PT buffer is always HPA from HW point of view. There is not VMCS support in this stage. The MSR load list can be used for Intel PT context switch(VM-Entry/Exit).
> 3. Intel PT VM improvements (start from Icelake):
>     Add a new guest IA32_RTIT_CTL field in VMCS, and HW treat the PT output addresses as GPA and translate them using EPT.

Thanks for the details, and confirming.  I think for now we can ignore
case 1 for simplicity, as I don't think it is likely that we'll have
someone on Broadwell hardware intending to run without VMX.  (If people
really want it, we can retrofit it, but I don't think the effort is
worth it for now)

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18  3:20       ` Tamas K Lengyel
@ 2020-06-18 11:01         ` Michał Leszczyński
  2020-06-18 11:55           ` Roger Pau Monné
  0 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-18 11:01 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel,
	Roger Pau Monné

----- 18 cze 2020 o 5:20, Tamas K Lengyel tamas.k.lengyel@gmail.com napisał(a):

>> >> +
>> >> +        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
>> >
>> > This will not work for translated domains, ie: a PVH or HVM domain
>> > won't be able to use this interface since it has no way to request the
>> > mapping of a specific mfn into it's physmap. I think we need to take
>> > this into account when deciding how the interface should be, so that
>> > we don't corner ourselves with a PV only interface.
>>
>> Please be aware that this is only going to be used by Dom0. Is is well-supported
>> case that somebody is using PVH/HVM Dom0?
>>
>> I think that all Virtual Machine Introspection stuff currently requires to have
>> Dom0 PV. Our main goal is to have this working well in combo with VMI.
> 
> FYI the VMI interface doesn't require a PV domain. It works fine from
> PVH dom0 or even from a secondary privileged HVM DomU as well,
> provided you have the right XSM policy to allow that.
> 
> Tamas


It was previously stated that:

> PVH or HVM domain
> won't be able to use this interface since it has no way to request the
> mapping of a specific mfn into it's physmap.

but however, taking LibVMI as an example:

https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51

An essential abstraction xen_get_memory() relies on xc_map_foreign_range(). Doesn't this mean that it's not usable from PVH or HVM domains, or did I got it all wrong?


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-18  8:52                 ` Roger Pau Monné
@ 2020-06-18 11:07                   ` Michał Leszczyński
  2020-06-18 11:49                     ` Roger Pau Monné
  0 siblings, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-18 11:07 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, luwei kang, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Xen-devel

----- 18 cze 2020 o 10:52, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Wed, Jun 17, 2020 at 08:56:57PM +0200, Michał Leszczyński wrote:
>> ----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
>> 
>> > On 17/06/2020 13:51, Roger Pau Monné wrote:
>> >> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>> >>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> >>>
>> >>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
>> >>>> VM-Entry Controls
>> >>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
>> >>>> how it should set the reserved bits.
>> >>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>> >> I think this is something different from what I was referring to.
>> >> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>> >> loading/storing a specific field on the vmcs that maps to the guest
>> >> IA32_RTIT_CTL.
>> >>
>> >> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>> >> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>> >> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
>> > 
>> > If I remember the historic roadmaps correctly, there are 3 cases.
>> > 
>> > The first hardware to support PT (Broadwell?) prohibited its use
>> > completely in VMX operations.  In this case, we can use it to trace PV
>> > guests iff we don't enable VMX in hardware to begin with.
>> > 
>> > This was relaxed in later hardware (Skylake?) to permit use within VMX
>> > operations, but without any help in the VMCS.  (i.e. manual context
>> > switching per this patch, or MSR load lists as noted in the SDM.)
>> > 
>> > Subsequent support for "virtualised PT" was added (IceLake?) which adds
>> > the load/save controls, and the ability to translate the output buffer
>> > under EPT.
>> > 
>> > 
>> > All of this is from memory so I'm quite possibly wrong with details, but
>> > I believe this is why the current complexity exists.
>> > 
>> > ~Andrew
>> 
>> 
>> I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:
>> 
>> > 35.5.2.2 Guest-Only Tracing
>> > "For this usage, VM-entry is programmed to enable trace packet generation, while
>> > VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable
>> > trace-packet generation in the host."
>> 
>> it actually helped a bit. With patch v1 there were parts of hypervisor recorded
>> in the trace (i.e. the moment between TRACE_EN being set and actual vmenter,
>> and the moment between vmexit and TRACE_EN being unset). Using MSR load list
>> this was eliminated. This change will be reflected in patch v2.
>> 
>> 
>> I can't however implement any working scenario in which all these MSRs are
>> managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are
>> buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL
>> is set to 0. The values of remaining registers will be stable after everything
>> is serialized. I think this is too complex for the load lists alone. I belive
>> that currently SDM instructs to use load lists only for toggling this single
>> bit on-or-off.
> 
> I think that's exactly what we want: handling TraceEn at
> vmentry/vmexit, so that no hypervisor packets are recorded. The rest
> of the MSRs can be handled in VMM mode without issues. Switching those
> on every vmentry/vmexit would also add more overhead that needed,
> since I assume they don't need to be modified on every entry/exit?


Assuming that there is a single DomU per pcpu and they are never migrated between pcpus then you never need to modify the remaining MSRs.

In case DomUs are floating or there are multiple DomUs per pcpu, we need to read out a few MSRs on vm-exit and restore them on vm-entry. Right now I'm always using this approach as I'm pretty not sure how to optimize it without introducing additional bugs. I will show the implementation in patch v2.


> 
>> 
>> Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR load
>> lists and the rest of related MSRs being managed manually.
> 
> Yes, that' seems like a good approach.
> 
> Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-18 11:07                   ` Michał Leszczyński
@ 2020-06-18 11:49                     ` Roger Pau Monné
  0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-18 11:49 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, luwei kang, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Xen-devel

On Thu, Jun 18, 2020 at 01:07:33PM +0200, Michał Leszczyński wrote:
> ----- 18 cze 2020 o 10:52, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Wed, Jun 17, 2020 at 08:56:57PM +0200, Michał Leszczyński wrote:
> >> ----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> >> 
> >> > On 17/06/2020 13:51, Roger Pau Monné wrote:
> >> >> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> >> >>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >> >>>
> >> >>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
> >> >>>> VM-Entry Controls
> >> >>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
> >> >>>> how it should set the reserved bits.
> >> >>> Please look at bit position 18 "Load IA32_RTIT_CTL".
> >> >> I think this is something different from what I was referring to.
> >> >> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> >> >> loading/storing a specific field on the vmcs that maps to the guest
> >> >> IA32_RTIT_CTL.
> >> >>
> >> >> OTOH MSR load lists can be used to load and store any arbitrary MSR on
> >> >> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> >> >> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> >> > 
> >> > If I remember the historic roadmaps correctly, there are 3 cases.
> >> > 
> >> > The first hardware to support PT (Broadwell?) prohibited its use
> >> > completely in VMX operations.  In this case, we can use it to trace PV
> >> > guests iff we don't enable VMX in hardware to begin with.
> >> > 
> >> > This was relaxed in later hardware (Skylake?) to permit use within VMX
> >> > operations, but without any help in the VMCS.  (i.e. manual context
> >> > switching per this patch, or MSR load lists as noted in the SDM.)
> >> > 
> >> > Subsequent support for "virtualised PT" was added (IceLake?) which adds
> >> > the load/save controls, and the ability to translate the output buffer
> >> > under EPT.
> >> > 
> >> > 
> >> > All of this is from memory so I'm quite possibly wrong with details, but
> >> > I believe this is why the current complexity exists.
> >> > 
> >> > ~Andrew
> >> 
> >> 
> >> I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:
> >> 
> >> > 35.5.2.2 Guest-Only Tracing
> >> > "For this usage, VM-entry is programmed to enable trace packet generation, while
> >> > VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable
> >> > trace-packet generation in the host."
> >> 
> >> it actually helped a bit. With patch v1 there were parts of hypervisor recorded
> >> in the trace (i.e. the moment between TRACE_EN being set and actual vmenter,
> >> and the moment between vmexit and TRACE_EN being unset). Using MSR load list
> >> this was eliminated. This change will be reflected in patch v2.
> >> 
> >> 
> >> I can't however implement any working scenario in which all these MSRs are
> >> managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are
> >> buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL
> >> is set to 0. The values of remaining registers will be stable after everything
> >> is serialized. I think this is too complex for the load lists alone. I belive
> >> that currently SDM instructs to use load lists only for toggling this single
> >> bit on-or-off.
> > 
> > I think that's exactly what we want: handling TraceEn at
> > vmentry/vmexit, so that no hypervisor packets are recorded. The rest
> > of the MSRs can be handled in VMM mode without issues. Switching those
> > on every vmentry/vmexit would also add more overhead that needed,
> > since I assume they don't need to be modified on every entry/exit?
> 
> 
> Assuming that there is a single DomU per pcpu and they are never migrated between pcpus then you never need to modify the remaining MSRs.
> 
> In case DomUs are floating or there are multiple DomUs per pcpu, we need to read out a few MSRs on vm-exit and restore them on vm-entry. Right now I'm always using this approach as I'm pretty not sure how to optimize it without introducing additional bugs. I will show the implementation in patch v2.

I think you might likely only need to modify the MSRs when doing
context switches of domains, instead of doing it on every
vmentry/vmexit?

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 11:01         ` Michał Leszczyński
@ 2020-06-18 11:55           ` Roger Pau Monné
  2020-06-18 12:51             ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-18 11:55 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Tamas K Lengyel,
	Ian Jackson, George Dunlap, Jan Beulich, Andrew Cooper,
	Xen-devel

On Thu, Jun 18, 2020 at 01:01:39PM +0200, Michał Leszczyński wrote:
> ----- 18 cze 2020 o 5:20, Tamas K Lengyel tamas.k.lengyel@gmail.com napisał(a):
> 
> >> >> +
> >> >> +        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
> >> >
> >> > This will not work for translated domains, ie: a PVH or HVM domain
> >> > won't be able to use this interface since it has no way to request the
> >> > mapping of a specific mfn into it's physmap. I think we need to take
> >> > this into account when deciding how the interface should be, so that
> >> > we don't corner ourselves with a PV only interface.
> >>
> >> Please be aware that this is only going to be used by Dom0. Is is well-supported
> >> case that somebody is using PVH/HVM Dom0?
> >>
> >> I think that all Virtual Machine Introspection stuff currently requires to have
> >> Dom0 PV. Our main goal is to have this working well in combo with VMI.
> > 
> > FYI the VMI interface doesn't require a PV domain. It works fine from
> > PVH dom0 or even from a secondary privileged HVM DomU as well,
> > provided you have the right XSM policy to allow that.
> > 
> > Tamas
> 
> 
> It was previously stated that:
> 
> > PVH or HVM domain
> > won't be able to use this interface since it has no way to request the
> > mapping of a specific mfn into it's physmap.
> 
> but however, taking LibVMI as an example:
> 
> https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51
> 
> An essential abstraction xen_get_memory() relies on xc_map_foreign_range(). Doesn't this mean that it's not usable from PVH or HVM domains, or did I got it all wrong?

That was my fault, so the buffer mfns are assigned to Xen, and then
the Xen domain ID is used to map those, which should work on both PV
and HVM (or PVH).

I still think using XENMEM_acquire_resource might be better, but I
would let others comment.

Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 11:55           ` Roger Pau Monné
@ 2020-06-18 12:51             ` Jan Beulich
  2020-06-18 13:09               ` Michał Leszczyński
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2020-06-18 12:51 UTC (permalink / raw)
  To: Roger Pau Monné, Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Tamas K Lengyel,
	Ian Jackson, George Dunlap, Andrew Cooper, Xen-devel

On 18.06.2020 13:55, Roger Pau Monné wrote:
> On Thu, Jun 18, 2020 at 01:01:39PM +0200, Michał Leszczyński wrote:
>> It was previously stated that:
>>
>>> PVH or HVM domain
>>> won't be able to use this interface since it has no way to request the
>>> mapping of a specific mfn into it's physmap.
>>
>> but however, taking LibVMI as an example:
>>
>> https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51
>>
>> An essential abstraction xen_get_memory() relies on xc_map_foreign_range(). Doesn't this mean that it's not usable from PVH or HVM domains, or did I got it all wrong?
> 
> That was my fault, so the buffer mfns are assigned to Xen, and then
> the Xen domain ID is used to map those, which should work on both PV
> and HVM (or PVH).
> 
> I still think using XENMEM_acquire_resource might be better, but I
> would let others comment.

+1

Jan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 12:51             ` Jan Beulich
@ 2020-06-18 13:09               ` Michał Leszczyński
  2020-06-18 13:24                 ` Jan Beulich
  2020-06-18 13:40                 ` Roger Pau Monné
  0 siblings, 2 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-18 13:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Tamas K Lengyel,
	Ian Jackson, George Dunlap, Andrew Cooper, Xen-devel,
	Roger Pau Monné

----- 18 cze 2020 o 14:51, Jan Beulich jbeulich@suse.com napisał(a):

> On 18.06.2020 13:55, Roger Pau Monné wrote:
>> On Thu, Jun 18, 2020 at 01:01:39PM +0200, Michał Leszczyński wrote:
>>> It was previously stated that:
>>>
>>>> PVH or HVM domain
>>>> won't be able to use this interface since it has no way to request the
>>>> mapping of a specific mfn into it's physmap.
>>>
>>> but however, taking LibVMI as an example:
>>>
>>> https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51
>>>
>>> An essential abstraction xen_get_memory() relies on xc_map_foreign_range().
>>> Doesn't this mean that it's not usable from PVH or HVM domains, or did I got it
>>> all wrong?
>> 
>> That was my fault, so the buffer mfns are assigned to Xen, and then
>> the Xen domain ID is used to map those, which should work on both PV
>> and HVM (or PVH).
>> 
>> I still think using XENMEM_acquire_resource might be better, but I
>> would let others comment.
> 
> +1
> 
> Jan


I'm trying to implement this right now. I've added some very simple code to mm.c just for testing:

---

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e376fc7e8f..aaaefe6d23 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4624,6 +4624,26 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
         }
         break;
     }
+
+    case XENMEM_resource_vmtrace_buf:
+    {
+        uint64_t output_base;
+        mfn_t mfn;
+        unsigned int i;
+
+        printk("vmtrace buf acquire\n");
+        output_base = d->vcpu[id]->arch.hvm.vmx.ipt_state->output_base;
+        mfn = mfn_x(output_base >> PAGE_SHIFT);
+
+        rc = 0;
+        for ( i = 0; i < nr_frames; i++ )
+        {
+            __map_domain_page_global(mfn_to_page(mfn + i));
+            mfn_list[i] = mfn + i;
+        }
+
+        break;
+    }
 #endif

     default:

---


and then in my "proctrace" tool I'm trying to acquire it like this:

    fres = xenforeignmemory_map_resource(
        fmem, domid, XENMEM_resource_vmtrace_buf,
        /* vcpu: */ 0, /* frame: */ 0, /* num_frames: */ 128, (void **)&buf,
        PROT_READ, 0);


ioctl fails with "Argument list too long". It works fine when I provide some small number of frames (e.g. num_frames: 1 or 32), but doesn't work for any larger quantity.

How should I proceed with this? The PT buffer could be large, even up to 4 GB.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 13:09               ` Michał Leszczyński
@ 2020-06-18 13:24                 ` Jan Beulich
  2020-06-18 13:40                 ` Roger Pau Monné
  1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2020-06-18 13:24 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Tamas K Lengyel,
	Ian Jackson, George Dunlap, Andrew Cooper, Xen-devel,
	Roger Pau Monné

On 18.06.2020 15:09, Michał Leszczyński wrote:
> ----- 18 cze 2020 o 14:51, Jan Beulich jbeulich@suse.com napisał(a):
> 
>> On 18.06.2020 13:55, Roger Pau Monné wrote:
>>> On Thu, Jun 18, 2020 at 01:01:39PM +0200, Michał Leszczyński wrote:
>>>> It was previously stated that:
>>>>
>>>>> PVH or HVM domain
>>>>> won't be able to use this interface since it has no way to request the
>>>>> mapping of a specific mfn into it's physmap.
>>>>
>>>> but however, taking LibVMI as an example:
>>>>
>>>> https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51
>>>>
>>>> An essential abstraction xen_get_memory() relies on xc_map_foreign_range().
>>>> Doesn't this mean that it's not usable from PVH or HVM domains, or did I got it
>>>> all wrong?
>>>
>>> That was my fault, so the buffer mfns are assigned to Xen, and then
>>> the Xen domain ID is used to map those, which should work on both PV
>>> and HVM (or PVH).
>>>
>>> I still think using XENMEM_acquire_resource might be better, but I
>>> would let others comment.
>>
>> +1
>>
>> Jan
> 
> 
> I'm trying to implement this right now. I've added some very simple code to mm.c just for testing:
> 
> ---
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index e376fc7e8f..aaaefe6d23 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4624,6 +4624,26 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>          }
>          break;
>      }
> +
> +    case XENMEM_resource_vmtrace_buf:
> +    {
> +        uint64_t output_base;
> +        mfn_t mfn;
> +        unsigned int i;
> +
> +        printk("vmtrace buf acquire\n");
> +        output_base = d->vcpu[id]->arch.hvm.vmx.ipt_state->output_base;
> +        mfn = mfn_x(output_base >> PAGE_SHIFT);
> +
> +        rc = 0;
> +        for ( i = 0; i < nr_frames; i++ )
> +        {
> +            __map_domain_page_global(mfn_to_page(mfn + i));
> +            mfn_list[i] = mfn + i;
> +        }
> +
> +        break;
> +    }
>  #endif
> 
>      default:
> 
> ---
> 
> 
> and then in my "proctrace" tool I'm trying to acquire it like this:
> 
>     fres = xenforeignmemory_map_resource(
>         fmem, domid, XENMEM_resource_vmtrace_buf,
>         /* vcpu: */ 0, /* frame: */ 0, /* num_frames: */ 128, (void **)&buf,
>         PROT_READ, 0);
> 
> 
> ioctl fails with "Argument list too long". It works fine when I provide some small number of frames (e.g. num_frames: 1 or 32), but doesn't work for any larger quantity.

See xen/common/memory.c:acquire_resource(). So far larger quantities
weren't needed, so there's an implementation limit of 32 right now.
This can be lifted, but please not by growing the on-stack array (as
the comment there also suggests).

Jan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions
  2020-06-16 15:19 ` [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
@ 2020-06-18 13:31   ` Jan Beulich
  0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2020-06-18 13:31 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 16.06.2020 17:19, Michał Leszczyński wrote:
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -621,4 +621,41 @@
>  #define MSR_PKGC9_IRTL			0x00000634
>  #define MSR_PKGC10_IRTL			0x00000635
>  
> +/* Intel PT MSRs */
> +#define MSR_IA32_RTIT_CTL              0x00000570
> +#define RTIT_CTL_TRACEEN               (1ULL << 0)
> +#define RTIT_CTL_CYCEN                 (1ULL << 1)
> +#define RTIT_CTL_OS                    (1ULL << 2)
> +#define RTIT_CTL_USR                   (1ULL << 3)
> +#define RTIT_CTL_PWR_EVT_EN            (1ULL << 4)
> +#define RTIT_CTL_FUP_ON_PTW            (1ULL << 5)
> +#define RTIT_CTL_FABRIC_EN             (1ULL << 6)
> +#define RTIT_CTL_CR3_FILTER            (1ULL << 7)
> +#define RTIT_CTL_TOPA                  (1ULL << 8)
> +#define RTIT_CTL_MTC_EN                (1ULL << 9)
> +#define RTIT_CTL_TSC_EN                (1ULL << 10)
> +#define RTIT_CTL_DIS_RETC              (1ULL << 11)
> +#define RTIT_CTL_PTW_EN                (1ULL << 12)
> +#define RTIT_CTL_BRANCH_EN             (1ULL << 13)
> +#define RTIT_CTL_MTC_FREQ_OFFSET       14
> +#define RTIT_CTL_MTC_FREQ              (0x0fULL << RTIT_CTL_MTC_FREQ_OFFSET)
> +#define RTIT_CTL_CYC_THRESH_OFFSET     19
> +#define RTIT_CTL_CYC_THRESH            (0x0fULL << RTIT_CTL_CYC_THRESH_OFFSET)
> +#define RTIT_CTL_PSB_FREQ_OFFSET       24
> +#define RTIT_CTL_PSB_FREQ              (0x0fULL << RTIT_CTL_PSB_FREQ_OFFSET)
> +#define RTIT_CTL_ADDR_OFFSET(n)        (32 + 4 * (n))
> +#define RTIT_CTL_ADDR(n)               (0x0fULL << RTIT_CTL_ADDR_OFFSET(n))
> +#define MSR_IA32_RTIT_STATUS           0x00000571
> +#define RTIT_STATUS_FILTER_EN          (1ULL << 0)
> +#define RTIT_STATUS_CONTEXT_EN         (1ULL << 1)
> +#define RTIT_STATUS_TRIGGER_EN         (1ULL << 2)
> +#define RTIT_STATUS_ERROR              (1ULL << 4)
> +#define RTIT_STATUS_STOPPED            (1ULL << 5)
> +#define RTIT_STATUS_BYTECNT            (0x1ffffULL << 32)
> +#define MSR_IA32_RTIT_CR3_MATCH        0x00000572
> +#define MSR_IA32_RTIT_OUTPUT_BASE      0x00000560
> +#define MSR_IA32_RTIT_OUTPUT_MASK      0x00000561
> +#define MSR_IA32_RTIT_ADDR_A(n)        (0x00000580 + (n) * 2)
> +#define MSR_IA32_RTIT_ADDR_B(n)        (0x00000581 + (n) * 2)

Please honor the comment at the top of the file as well as the one
separating new content from not necessarily well-formed one further
down. I also think Andrew wants no IA32 infixes anymore in new
additions, albeit I'm still not fully convinced the resulting
deviation from SDM naming is really helpful.

Jan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 13:09               ` Michał Leszczyński
  2020-06-18 13:24                 ` Jan Beulich
@ 2020-06-18 13:40                 ` Roger Pau Monné
  1 sibling, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2020-06-18 13:40 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Tamas K Lengyel,
	Ian Jackson, George Dunlap, Jan Beulich, Andrew Cooper,
	Xen-devel

On Thu, Jun 18, 2020 at 03:09:57PM +0200, Michał Leszczyński wrote:
> ----- 18 cze 2020 o 14:51, Jan Beulich jbeulich@suse.com napisał(a):
> 
> > On 18.06.2020 13:55, Roger Pau Monné wrote:
> >> On Thu, Jun 18, 2020 at 01:01:39PM +0200, Michał Leszczyński wrote:
> >>> It was previously stated that:
> >>>
> >>>> PVH or HVM domain
> >>>> won't be able to use this interface since it has no way to request the
> >>>> mapping of a specific mfn into it's physmap.
> >>>
> >>> but however, taking LibVMI as an example:
> >>>
> >>> https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51
> >>>
> >>> An essential abstraction xen_get_memory() relies on xc_map_foreign_range().
> >>> Doesn't this mean that it's not usable from PVH or HVM domains, or did I got it
> >>> all wrong?
> >> 
> >> That was my fault, so the buffer mfns are assigned to Xen, and then
> >> the Xen domain ID is used to map those, which should work on both PV
> >> and HVM (or PVH).
> >> 
> >> I still think using XENMEM_acquire_resource might be better, but I
> >> would let others comment.
> > 
> > +1
> > 
> > Jan
> 
> 
> I'm trying to implement this right now. I've added some very simple code to mm.c just for testing:
> 
> ---
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index e376fc7e8f..aaaefe6d23 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4624,6 +4624,26 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>          }
>          break;
>      }
> +
> +    case XENMEM_resource_vmtrace_buf:
> +    {
> +        uint64_t output_base;
> +        mfn_t mfn;
> +        unsigned int i;
> +
> +        printk("vmtrace buf acquire\n");
> +        output_base = d->vcpu[id]->arch.hvm.vmx.ipt_state->output_base;
> +        mfn = mfn_x(output_base >> PAGE_SHIFT);
> +
> +        rc = 0;
> +        for ( i = 0; i < nr_frames; i++ )
> +        {
> +            __map_domain_page_global(mfn_to_page(mfn + i));

I don't think you need the __map_domain_page_global?

> +            mfn_list[i] = mfn + i;

I think you need mfn_add here, or else this won't build?

> +        }
> +
> +        break;
> +    }
>  #endif
> 
>      default:
> 
> ---
> 
> 
> and then in my "proctrace" tool I'm trying to acquire it like this:
> 
>     fres = xenforeignmemory_map_resource(
>         fmem, domid, XENMEM_resource_vmtrace_buf,
>         /* vcpu: */ 0, /* frame: */ 0, /* num_frames: */ 128, (void **)&buf,
>         PROT_READ, 0);
> 
> 
> ioctl fails with "Argument list too long". It works fine when I provide some small number of frames (e.g. num_frames: 1 or 32), but doesn't work for any larger quantity.
> 
> How should I proceed with this? The PT buffer could be large, even up to 4 GB.

I think adding a loop and hypercall continuation support could make
this work without having to expand the size of mfn_list and
gfn_list?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 0/7] Implement support for external IPT monitoring
  2020-06-17 16:19         ` Andrew Cooper
  2020-06-17 16:27           ` Tamas K Lengyel
  2020-06-17 20:20           ` Michał Leszczyński
@ 2020-06-18 14:59           ` Michał Leszczyński
  2 siblings, 0 replies; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-18 14:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Tamas K Lengyel, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel, Roger Pau Monné

----- 17 cze 2020 o 18:19, Andrew Cooper andrew.cooper3@citrix.com napisał(a):

> On 17/06/2020 04:02, Tamas K Lengyel wrote:
>> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 16/06/2020 19:47, Michał Leszczyński wrote:
>>>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
>>>>
>>>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>>>>> not aware of any, and in principle we could use this functionality for
>>>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>>>>> be helpful to not tie the functionality to HVM guests, even if that is
>>>>> the only option enabled to start with.
>>>> I think at the moment it's not required to have EPT. This patch series doesn't
>>>> use any translation feature flags, so the output address is always a machine
>>>> physical address, regardless of context. I will check if it could be easily
>>>> used with PV.
>>> If its trivial to add PV support then please do.  If its not, then don't
>>> feel obliged, but please do at least consider how PV support might look
>>> in the eventual feature.
>>>
>>> (Generally speaking, considering "how would I make this work in other
>>> modes where it is possible" leads to a better design.)
>>>
>>>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>>>>> fighting with another opencoded example, take a look at the IOREQ
>>>>> server's use of "acquire resource" which is a mapping interface which
>>>>> supports allocating memory on behalf of the guest, outside of the guest
>>>>> memory, for use by control tools.
>>>>>
>>>>> I think what this wants is a bit somewhere in domain_create to indicate
>>>>> that external tracing is used for this domain (and allocate whatever
>>>>> structures/buffers are necessary), acquire resource to map the buffers
>>>>> themselves, and a domctl for any necessary runtime controls.
>>>>>
>>>> I will check this out, this sounds like a good option as it would remove lots of
>>>> complexity from the existing ipt_enable domctl.
>>> Xen has traditionally opted for a "and turn this extra thing on
>>> dynamically" model, but this has caused no end of security issues and
>>> broken corner cases.
>>>
>>> You can see this still existing in the difference between
>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>> required to chose the number of vcpus for the domain) and we're making
>>> good progress undoing this particular wart (before 4.13, it was
>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>> issuing other hypercalls between these two).
>>>
>>> There is a lot of settings which should be immutable for the lifetime of
>>> the domain, and external monitoring looks like another one of these.
>>> Specifying it at createdomain time allows for far better runtime
>>> behaviour (you are no longer in a situation where the first time you try
>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>> the meantime and used the remaining memory), and it makes for rather
>>> more simple code in Xen itself (at runtime, you can rely on it having
>>> been set up properly, because a failure setting up will have killed the
>>> domain already).
>> I'm not in favor of this being a flag that gets set during domain
>> creation time. It could certainly be the case that some users would
>> want this being on from the start till the end but in other cases you
>> may want to enable it intermittently only for some time in-between
>> particular events. If it's an on/off flag during domain creation you
>> pretty much force that choice on the users and while the overhead of
>> PT is better than say MTF it's certainly not nothing. In case there is
>> an OOM situation enabling IPT dynamically the user can always just
>> pause the VM and wait till memory becomes available.
> 
> There is nothing wrong with having "turn tracing on/off at runtime"
> hypercalls.  It is specifically what I suggested two posts up in this
> thread, but it should be limited to the TraceEn bit in RTIT_CTL.
> 
> What isn't ok is trying to allocate the buffers, write the TOPA, etc on
> first-enable or first-map, because the runtime complexity of logic like
> this large, and far too easy to get wrong in security relevant ways.
> 
> The domain create flag would mean "I wish to use tracing with this
> domain", and not "I want tracing enabled from the getgo".
> 


I'm trying to implement this suggestion right now. I've already switched to acquire_resource and now I want to make buffers statically allocated on domain creation.

I think it would be reasonable to add an option like "vmtrace_ipt_size" in xl.cfg. By default it would be 0 (meaning "disabled"), and when it's set to non-zero value by the user, the IPT buffers of given size will be allocated for each vCPU upon domain creation.

Could you give some hints about how to correctly add a new option in xl.cfg in a way that it's accessible on the hypervisor part? This part related to configuration parsing/processing is what I don't understand yet.


>>>>> What semantics do you want for the buffer becoming full?  Given that
>>>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
>>>>> preferred behaviour, rather than drop packets on full?
>>>>>
>>>> Right now this is a ring-style buffer and when it would become full it would
>>>> simply wrap and override the old data.
>>> How does the consumer spot that the data has wrapped?  What happens if
>>> data starts getting logged, but noone is listening?  What happens if the
>>> consumer exits/crashes/etc and stops listening as a consequence?
>>>
>>> It's fine to simply state what will happen, and possibly even "don't do
>>> that then", but the corner cases do at least need thinking about.
>> AFAIU the current use-case is predominantly to be used in conjunction
>> with VMI events where you want to be able to see the trace leading up
>> to a particular vmexit. So in the case when the buffer is wrapped
>> in-between events and data is lost that's not really of concern.
> 
> That's all fine.  I imagine the output here is voluminous, and needs
> help being cut down as much as possible.
> 
> On a tangent, I presume you'd like to include VM-fork eventually, which
> ought to include copying the trace buffer on fork?
> 
> ~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-16 17:23   ` Roger Pau Monné
  2020-06-17 19:13     ` Michał Leszczyński
@ 2020-06-18 15:25     ` Michał Leszczyński
  2020-06-18 15:39       ` Jan Beulich
  1 sibling, 1 reply; 59+ messages in thread
From: Michał Leszczyński @ 2020-06-18 15:25 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
>> Provide an interface for privileged domains to manage
>> external IPT monitoring.
>> 
>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Thanks for the patch! I have some questions below which require your
> input.
> 
>> ---
>>  xen/arch/x86/hvm/hvm.c          | 170 ++++++++++++++++++++++++++++++++
>>  xen/include/public/hvm/hvm_op.h |  27 +++++
>>  2 files changed, 197 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 5bb47583b3..9292caebe0 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4949,6 +4949,172 @@ static int compat_altp2m_op(
>>      return rc;
>>  }
>>  
>> +static int do_vmtrace_op(
>> +    XEN_GUEST_HANDLE_PARAM(void) arg)
> 
> No need for the newline, this can fit on a single line.
> 
>> +{
>> +    struct xen_hvm_vmtrace_op a;
>> +    struct domain *d = NULL;
> 
> I don't think you need to init d to NULL (at least by looking at the
> current code below).
> 
>> +    int rc = -EFAULT;
> 
> No need to init rc.
> 
>> +    int i;
> 
> unsigned since it's used as a loop counter.
> 
>> +    struct vcpu *v;
>> +    void* buf;
> 
> Nit: '*' should be prepended to the variable name.
> 
>> +    uint32_t buf_size;
> 
> size_t
> 
>> +    uint32_t buf_order;
> 
> Order is generally fine using unsigned int, no need to use a
> specifically sized type.
> 
>> +    uint64_t buf_mfn;
> 
> Could this use the mfn type?
> 
>> +    struct page_info *pg;
>> +
>> +    if ( !hvm_ipt_supported() )
>> +        return -EOPNOTSUPP;
>> +
>> +    if ( copy_from_guest(&a, arg, 1) )
>> +        return -EFAULT;
>> +
>> +    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
>> +        return -EINVAL;
>> +
>> +    switch ( a.cmd )
>> +    {
>> +    case HVMOP_vmtrace_ipt_enable:
>> +    case HVMOP_vmtrace_ipt_disable:
>> +    case HVMOP_vmtrace_ipt_get_buf:
>> +    case HVMOP_vmtrace_ipt_get_offset:
>> +        break;
>> +
>> +    default:
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    d = rcu_lock_domain_by_any_id(a.domain);
>> +
>> +    if ( d == NULL )
>> +        return -ESRCH;
>> +
>> +    if ( !is_hvm_domain(d) )
>> +    {
>> +        rc = -EOPNOTSUPP;
>> +        goto out;
>> +    }
>> +
>> +    domain_pause(d);
>> +
>> +    if ( a.vcpu >= d->max_vcpus )
>> +    {
>> +        rc = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    v = d->vcpu[a.vcpu];
>> +
>> +    if ( a.cmd == HVMOP_vmtrace_ipt_enable )
> 
> Please use a switch here, you might even consider re-using the switch
> from above and moving the domain checks before actually checking the
> command field, so that you don't need to perform two switches against
> a.cmd.
> 
>> +    {
>> +        if ( v->arch.hvm.vmx.ipt_state ) {
> 
> Coding style, brace should be on newline (there are more below which
> I'm not going to comment on).
> 
>> +            // already enabled
> 
> Comments should use /* ... */, there multiple instances of this below
> which I'm not going to comment on, please check CODING_STYLE.
> 
> Also, the interface looks racy, I think you are missing a lock to
> protect v->arch.hvm.vmx.ipt_state from being freed under your feet if
> you issue concurrent calls to the interface.
> 
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        if ( a.size < PAGE_SIZE || a.size > 1000000 * PAGE_SIZE ) {
> 
> You can use GB(4) which is easier to read. Should the size also be a
> multiple of a PAGE_SIZE?
> 
>> +            // we don't accept trace buffer size smaller than single page
>> +            // and the upper bound is defined as 4GB in the specification
>> +            rc = -EINVAL;
>> +            goto out;
>> +	}
> 
> Stray tab.
> 
>> +
>> +        buf_order = get_order_from_bytes(a.size);
>> +
>> +        if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {
> 
> Oh here is the check. I think you can move this with the checks above
> by doing a.size & ~PAGE_MASK.
> 
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        buf = page_to_virt(alloc_domheap_pages(d, buf_order,
>> MEMF_no_refcount));
> 
> What if alloc_domheap_pages return NULL?
> 
> Since I think you only what the linear address of the page to zero it
> I would suggest using clear_domain_page.
> 
>> +        buf_size = a.size;
>> +
>> +        if ( !buf ) {
>> +            rc = -EFAULT;
>> +            goto out;
>> +        }
>> +
>> +        memset(buf, 0, buf_size);
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) {
>> +            share_xen_page_with_privileged_guests(virt_to_page(buf) + i,
>> SHARE_ro);
> 
> This line (and some more below) exceeds 80 characters, please split
> it.
> 
>> +        }
>> +
>> +        v->arch.hvm.vmx.ipt_state = xmalloc(struct ipt_state);
> 
> You should check that xmalloc has succeeds before trying to access
> ipt_state.
> 
>> +        v->arch.hvm.vmx.ipt_state->output_base = virt_to_mfn(buf) <<
>> PAGE_SHIFT;
>> +        v->arch.hvm.vmx.ipt_state->output_mask = buf_size - 1;
>> +        v->arch.hvm.vmx.ipt_state->status = 0;
>> +        v->arch.hvm.vmx.ipt_state->ctl = RTIT_CTL_TRACEEN | RTIT_CTL_OS |
>> RTIT_CTL_USR | RTIT_CTL_BRANCH_EN;
> 
> Shouldn't the user be able to select what tracing should be enabled?
> 
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_disable )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        buf_mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
>> +        buf_size = ( v->arch.hvm.vmx.ipt_state->output_mask + 1 ) &
>> 0xFFFFFFFFUL;
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>> +        {
>> +            if ( (mfn_to_page(_mfn(buf_mfn + i))->count_info & PGC_count_mask)
>> != 1 )
>> +            {
>> +                rc = -EBUSY;
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        xfree(v->arch.hvm.vmx.ipt_state);
>> +	v->arch.hvm.vmx.ipt_state = NULL;
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>> +        {
>> +            pg = mfn_to_page(_mfn(buf_mfn + i));
>> +            put_page_alloc_ref(pg);
>> +            if ( !test_and_clear_bit(_PGC_xen_heap, &pg->count_info) )
>> +                ASSERT_UNREACHABLE();
>> +            pg->u.inuse.type_info = 0;
>> +            page_set_owner(pg, NULL);
>> +            free_domheap_page(pg);
> 
> Hm, this seems fairly dangerous, what guarantees that the caller is
> not going to map the buffer while you are trying to tear it down?
> 
> You perform a check before freeing ipt_state, but between the check
> and the actual tearing down the domain might have setup mappings to
> them.
> 
> I wonder, could you expand a bit on why trace buffers are allocated
> from domheap memory by Xen?
> 
> There are a couple of options here, maybe the caller could provide
> it's own buffer, then Xen would take an extra reference to those pages
> and setup them to be used as buffers.
> 
> Another alternative would be to use domhep memory but not let the
> caller map it directly, and instead introduce a hypercall to copy
> from the internal Xen buffer into a user-provided one.
> 
> How much memory is used on average by those buffers? That would help
> decide a model that would best fit the usage.
> 
>> +        }
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_buf )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
> 
> This will not work for translated domains, ie: a PVH or HVM domain
> won't be able to use this interface since it has no way to request the
> mapping of a specific mfn into it's physmap. I think we need to take
> this into account when deciding how the interface should be, so that
> we don't corner ourselves with a PV only interface.
> 
>> +        a.size = (v->arch.hvm.vmx.ipt_state->output_mask + 1) & 0xFFFFFFFFUL;
> 
> You can truncate it easier by casting to uint32_t I think.
> 
> Or even better, you could put output_mask in a union like:
> 
> union {
>    uint64_t raw;
>    struct {
>        uint32_t size;
>	uint32_t offset;
>    }
> }
> 
> Then you can avoid the shifting and the castings.
> 
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_offset )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        a.offset = v->arch.hvm.vmx.ipt_state->output_mask >> 32;
>> +    }
>> +
>> +    rc = -EFAULT;
>> +    if ( __copy_to_guest(arg, &a, 1) )
>> +      goto out;
>> +    rc = 0;
>> +
>> + out:
>> +    smp_wmb();
> 
> Why do you need a barrier here?
> 
>> +    domain_unpause(d);
>> +    rcu_unlock_domain(d);
>> +
>> +    return rc;
>> +}
>> +
>> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
>> +
>>  static int hvmop_get_mem_type(
>>      XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
>>  {
>> @@ -5101,6 +5267,10 @@ long do_hvm_op(unsigned long op,
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>          rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
>>          break;
>>  
>> +    case HVMOP_vmtrace:
>> +        rc = do_vmtrace_op(arg);
>> +        break;
>> +
>>      default:
>>      {
>>          gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
>> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
>> index 870ec52060..3bbcd54c96 100644
>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>>  
>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
>> +#define HVMOP_vmtrace 26
>> +
>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
>> +
>> +struct xen_hvm_vmtrace_op {
>> +    /* IN variable */
>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>> +    uint32_t cmd;
>> +/* Enable/disable external vmtrace for given domain */
>> +#define HVMOP_vmtrace_ipt_enable      1
>> +#define HVMOP_vmtrace_ipt_disable     2
>> +#define HVMOP_vmtrace_ipt_get_buf     3
>> +#define HVMOP_vmtrace_ipt_get_offset  4
>> +    domid_t domain;
> 
> You are missing a padding field here AFAICT.


Could you point out what is the purpose of this padding field and what should be the size in this case? It's pretty unclear to me.


> 
> Roger.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 15:25     ` Michał Leszczyński
@ 2020-06-18 15:39       ` Jan Beulich
  2020-06-18 15:47         ` Tamas K Lengyel
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2020-06-18 15:39 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Xen-devel, Roger Pau Monné

On 18.06.2020 17:25, Michał Leszczyński wrote:
> ----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
>>> --- a/xen/include/public/hvm/hvm_op.h
>>> +++ b/xen/include/public/hvm/hvm_op.h
>>> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
>>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>>>  
>>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
>>> +#define HVMOP_vmtrace 26
>>> +
>>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
>>> +
>>> +struct xen_hvm_vmtrace_op {
>>> +    /* IN variable */
>>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>>> +    uint32_t cmd;
>>> +/* Enable/disable external vmtrace for given domain */
>>> +#define HVMOP_vmtrace_ipt_enable      1
>>> +#define HVMOP_vmtrace_ipt_disable     2
>>> +#define HVMOP_vmtrace_ipt_get_buf     3
>>> +#define HVMOP_vmtrace_ipt_get_offset  4
>>> +    domid_t domain;
>>
>> You are missing a padding field here AFAICT.
> 
> 
> Could you point out what is the purpose of this padding field and what should be the size in this case? It's pretty unclear to me.

In the public interface we aim at making all padding explicit.

Also, as a general remark: Please trim your replies.

Jan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 15:39       ` Jan Beulich
@ 2020-06-18 15:47         ` Tamas K Lengyel
  2020-06-18 15:49           ` Tamas K Lengyel
  0 siblings, 1 reply; 59+ messages in thread
From: Tamas K Lengyel @ 2020-06-18 15:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Michał Leszczyński, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

On Thu, Jun 18, 2020 at 9:41 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.06.2020 17:25, Michał Leszczyński wrote:
> > ----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
> >>> --- a/xen/include/public/hvm/hvm_op.h
> >>> +++ b/xen/include/public/hvm/hvm_op.h
> >>> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
> >>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
> >>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
> >>>
> >>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
> >>> +#define HVMOP_vmtrace 26
> >>> +
> >>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
> >>> +
> >>> +struct xen_hvm_vmtrace_op {
> >>> +    /* IN variable */
> >>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> >>> +    uint32_t cmd;
> >>> +/* Enable/disable external vmtrace for given domain */
> >>> +#define HVMOP_vmtrace_ipt_enable      1
> >>> +#define HVMOP_vmtrace_ipt_disable     2
> >>> +#define HVMOP_vmtrace_ipt_get_buf     3
> >>> +#define HVMOP_vmtrace_ipt_get_offset  4
> >>> +    domid_t domain;
> >>
> >> You are missing a padding field here AFAICT.
> >
> >
> > Could you point out what is the purpose of this padding field and what should be the size in this case? It's pretty unclear to me.
>
> In the public interface we aim at making all padding explicit.

Just to expand a bit on this: this is an ABI meaning the hypervisor
and the tool sending this structure communicate via memory only. Since
the hypervisor and the compiler can be compiled using different
compilers, using stuff that's not explicit in the C standard needs to
be avoided. For example using standard types like "int" or "long" is
no good since it's really up to the compiler to decide how much memory
those need. The C specification is great like that.. Same stands for
padding. Different compilers can decide to align things differently,
pack things or not pack things, so we have to manually add the padding
to take that choice away from the compiler.

As discussed many times on the list, using C struct as the base of the
ABI was a bad design decision to start with, but we are now stuck with
it. It would now make more sense to use something like JSON to pass
information like this between the hypervisor and the toolstack which
is what we opted to do in new hypervisors like Bareflank/Boxy.

Tamas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 15:47         ` Tamas K Lengyel
@ 2020-06-18 15:49           ` Tamas K Lengyel
  0 siblings, 0 replies; 59+ messages in thread
From: Tamas K Lengyel @ 2020-06-18 15:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Michał Leszczyński, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

On Thu, Jun 18, 2020 at 9:47 AM Tamas K Lengyel
<tamas.k.lengyel@gmail.com> wrote:
>
> On Thu, Jun 18, 2020 at 9:41 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 18.06.2020 17:25, Michał Leszczyński wrote:
> > > ----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@citrix.com napisał(a):
> > >> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
> > >>> --- a/xen/include/public/hvm/hvm_op.h
> > >>> +++ b/xen/include/public/hvm/hvm_op.h
> > >>> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
> > >>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
> > >>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
> > >>>
> > >>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
> > >>> +#define HVMOP_vmtrace 26
> > >>> +
> > >>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
> > >>> +
> > >>> +struct xen_hvm_vmtrace_op {
> > >>> +    /* IN variable */
> > >>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> > >>> +    uint32_t cmd;
> > >>> +/* Enable/disable external vmtrace for given domain */
> > >>> +#define HVMOP_vmtrace_ipt_enable      1
> > >>> +#define HVMOP_vmtrace_ipt_disable     2
> > >>> +#define HVMOP_vmtrace_ipt_get_buf     3
> > >>> +#define HVMOP_vmtrace_ipt_get_offset  4
> > >>> +    domid_t domain;
> > >>
> > >> You are missing a padding field here AFAICT.
> > >
> > >
> > > Could you point out what is the purpose of this padding field and what should be the size in this case? It's pretty unclear to me.
> >
> > In the public interface we aim at making all padding explicit.
>
> Just to expand a bit on this: this is an ABI meaning the hypervisor
> and the tool sending this structure communicate via memory only. Since
> the hypervisor and the compiler can be compiled using different

   ^ meant to write "hypervisor and the toolstack" above

Tamas


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit
  2020-06-16 15:24 ` [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit Michał Leszczyński
  2020-06-16 17:38   ` Roger Pau Monné
@ 2020-06-18 17:38   ` Andrew Cooper
  1 sibling, 0 replies; 59+ messages in thread
From: Andrew Cooper @ 2020-06-18 17:38 UTC (permalink / raw)
  To: Michał Leszczyński, Xen-devel
  Cc: Wei Liu, Kevin Tian, Jan Beulich, Jun Nakajima, Roger Pau Monné

On 16/06/2020 16:24, Michał Leszczyński wrote:
> Enable IPT when entering the VM and disable it on vmexit.
> Register state is persisted using vCPU ipt_state structure.
>
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 97104c319e..01d9a7b584 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3698,6 +3698,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>      __vmread(GUEST_RSP,    &regs->rsp);
>      __vmread(GUEST_RFLAGS, &regs->rflags);
>  
> +    if ( unlikely(v->arch.hvm.vmx.ipt_state) )
> +    {
> +        wrmsrl(MSR_IA32_RTIT_CTL, 0);
> +        smp_rmb();
> +
> +        rdmsrl(MSR_IA32_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
> +        rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, v->arch.hvm.vmx.ipt_state->output_mask);
> +    }
> +
>      hvm_invalidate_regs_fields(regs);
>  
>      if ( paging_mode_hap(v->domain) )
> @@ -4497,6 +4506,23 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
>      }
>  
>   out:
> +    if ( unlikely(curr->arch.hvm.vmx.ipt_state) )
> +    {
> +        wrmsrl(MSR_IA32_RTIT_CTL, 0);
> +
> +        if (curr->arch.hvm.vmx.ipt_state->ctl)
> +        {
> +            wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, curr->arch.hvm.vmx.ipt_state->output_base);
> +            wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, curr->arch.hvm.vmx.ipt_state->output_mask);
> +            wrmsrl(MSR_IA32_RTIT_STATUS, curr->arch.hvm.vmx.ipt_state->status);
> +
> +            // MSR_IA32_RTIT_CTL is context-switched manually instead of being
> +            // stored inside VMCS, as of Q2'20 only the most recent processors
> +            // support such field in VMCS
> +            wrmsrl(MSR_IA32_RTIT_CTL, curr->arch.hvm.vmx.ipt_state->ctl);
> +        }
> +    }
> +

Some notes to help with v2.

RTIT_CTL wants managing by MSR load/save list.  See how
vmx_update_guest_efer() manages MSR_EFER for the Gen1 hardware case,
because RTIT_CTL is very similar until we get to IceLake hardware and
have a GUEST_RTIT_CTRL field.

With RTIT_CTL handled by MSR load/save list, we are now certain that
TraceEn is always clear in hypervisor context, so there's no need to
explicitly zero it before playing with other MSRs.


You don't need to save/restore the values in vmentry/exit, because that
is very expensive an unnecessary.  Instead, you can use
vmx_ctxt_switch_{from,to}() which is based on when the vcpu is switched
in/out of context.

Specifically, from your current code, it looks to be safe to leave
RTIT_STATUS/OUTPUT_MASK dirty in hardware across multiple
vmentries/exits while the vcpu is in context.

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2020-06-18 17:38 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
2020-06-16 15:19 ` [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
2020-06-18 13:31   ` Jan Beulich
2020-06-16 15:20 ` [PATCH v1 2/7] x86/vmx: add IPT cpu feature Michał Leszczyński
2020-06-16 16:30   ` Roger Pau Monné
2020-06-17 11:34     ` Jan Beulich
2020-06-16 15:21 ` [PATCH v1 3/7] x86/vmx: add ipt_state as part of vCPU state Michał Leszczyński
2020-06-16 16:33   ` Roger Pau Monné
2020-06-16 15:22 ` [PATCH v1 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
2020-06-16 17:23   ` Roger Pau Monné
2020-06-17 19:13     ` Michał Leszczyński
2020-06-18  3:20       ` Tamas K Lengyel
2020-06-18 11:01         ` Michał Leszczyński
2020-06-18 11:55           ` Roger Pau Monné
2020-06-18 12:51             ` Jan Beulich
2020-06-18 13:09               ` Michał Leszczyński
2020-06-18 13:24                 ` Jan Beulich
2020-06-18 13:40                 ` Roger Pau Monné
2020-06-18  8:46       ` Roger Pau Monné
2020-06-18 15:25     ` Michał Leszczyński
2020-06-18 15:39       ` Jan Beulich
2020-06-18 15:47         ` Tamas K Lengyel
2020-06-18 15:49           ` Tamas K Lengyel
2020-06-16 15:22 ` [PATCH v1 5/7] tools/libxc: add xc_ptbuf_* functions Michał Leszczyński
2020-06-16 15:23 ` [PATCH v1 6/7] tools/proctrace: add proctrace tool Michał Leszczyński
2020-06-16 15:24 ` [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit Michał Leszczyński
2020-06-16 17:38   ` Roger Pau Monné
2020-06-16 17:47     ` Michał Leszczyński
2020-06-17  9:09       ` Roger Pau Monné
2020-06-17 11:54         ` Michał Leszczyński
2020-06-17 12:51           ` Roger Pau Monné
2020-06-17 15:14             ` Andrew Cooper
2020-06-17 18:56               ` Michał Leszczyński
2020-06-18  8:52                 ` Roger Pau Monné
2020-06-18 11:07                   ` Michał Leszczyński
2020-06-18 11:49                     ` Roger Pau Monné
2020-06-17 23:30               ` Kang, Luwei
2020-06-18 10:02                 ` Andrew Cooper
2020-06-18 17:38   ` Andrew Cooper
2020-06-16 18:17 ` [PATCH v1 0/7] Implement support for external IPT monitoring Andrew Cooper
2020-06-16 18:47   ` Michał Leszczyński
2020-06-16 20:16     ` Andrew Cooper
2020-06-17  3:02       ` Tamas K Lengyel
2020-06-17 16:19         ` Andrew Cooper
2020-06-17 16:27           ` Tamas K Lengyel
2020-06-17 17:23             ` Andrew Cooper
2020-06-17 19:31               ` Tamas K Lengyel
2020-06-17 19:30             ` Michał Leszczyński
2020-06-17 20:20           ` Michał Leszczyński
2020-06-18  8:25             ` Roger Pau Monné
2020-06-18 14:59           ` Michał Leszczyński
2020-06-17  1:35     ` Tian, Kevin
2020-06-17  6:45       ` Kang, Luwei
2020-06-17  9:21         ` Roger Pau Monné
2020-06-17 12:37           ` Kang, Luwei
2020-06-17 12:53             ` Roger Pau Monné
2020-06-17 23:29               ` Kang, Luwei
2020-06-18  0:56                 ` Michał Leszczyński
2020-06-18  7:00                   ` Roger Pau Monné

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).