xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Implement support for external IPT monitoring
@ 2020-06-18 23:34 Michał Leszczyński
  2020-06-18 23:38 ` [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays Michał Leszczyński
                   ` (7 more replies)
  0 siblings, 8 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:34 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Tamas K Lengyel, Anthony PERARD, Kang, Luwei,
	Roger Pau Monné

Intel Processor Trace is an architectural extension available in modern Intel 
family CPUs. It allows recording the detailed trace of activity while the 
processor executes the code. One might use the recorded trace to reconstruct 
the code flow. It means, to find out the executed code paths, determine 
branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to 
enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
a feature has numerous applications like malware monitoring, fuzzing, or 
performance testing.

Also thanks to Tamas K Lengyel for a few preliminary hints before
first version of this patch was submitted to xen-devel.

Changed since v1:
  * MSR_RTIT_CTL is managed using MSR load lists
  * other PT-related MSRs are modified only when vCPU goes out of context
  * trace buffer is now acquired as a resource
  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
    must be specified in the moment of domain creation
  * trace buffers are allocated on domain creation, destructed on
    domain destruction
  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
    these calls don't manage buffer memory anymore
  * lifted 32 MFN/GFN array limit when acquiring resources
  * minor code style changes according to review

Michal Leszczynski (7):
  xen/mm: lift 32 item limit from mfn/gfn arrays
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  x86/vmx: add do_vmtrace_op
  tools/libxc: add xc_vmtrace_* functions
  tools/libxl: add vmtrace_pt_size parameter
  tools/proctrace: add proctrace tool

 tools/golang/xenlight/helpers.gen.go        |   2 +
 tools/golang/xenlight/types.gen.go          |   1 +
 tools/libxc/Makefile                        |   1 +
 tools/libxc/include/xenctrl.h               |  39 +++
 tools/libxc/xc_vmtrace.c                    |  97 ++++++
 tools/libxl/libxl_types.idl                 |   2 +
 tools/libxl/libxl_x86.c                     |   5 +
 tools/proctrace/COPYING                     | 339 ++++++++++++++++++++
 tools/proctrace/Makefile                    |  50 +++
 tools/proctrace/proctrace.c                 | 153 +++++++++
 tools/xl/xl_parse.c                         |   4 +
 xen/arch/x86/hvm/hvm.c                      | 167 ++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c                 |   4 +
 xen/arch/x86/hvm/vmx/vmx.c                  |  24 ++
 xen/arch/x86/mm.c                           |  37 +++
 xen/common/domain.c                         |   1 +
 xen/common/memory.c                         |  39 +--
 xen/include/asm-x86/cpufeature.h            |   1 +
 xen/include/asm-x86/hvm/hvm.h               |   9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h          |  17 +
 xen/include/asm-x86/msr-index.h             |  37 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/hvm/hvm_op.h             |  23 ++
 xen/include/public/hvm/params.h             |   5 +-
 xen/include/public/memory.h                 |   1 +
 xen/include/xen/sched.h                     |   3 +
 26 files changed, 1039 insertions(+), 23 deletions(-)
 create mode 100644 tools/libxc/xc_vmtrace.c
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
@ 2020-06-18 23:38 ` Michał Leszczyński
  2020-06-19 11:34   ` Roger Pau Monné
  2020-06-18 23:39 ` [PATCH v2 2/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Tamas K Lengyel, Kang,
	Luwei

Replace on-stack array allocation with heap allocation
in order to lift the limit of 32 items in mfn/gfn arrays
when calling acquire_resource.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/common/memory.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..e02606ebe5 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1050,12 +1050,7 @@ static int acquire_resource(
 {
     struct domain *d, *currd = current->domain;
     xen_mem_acquire_resource_t xmar;
-    /*
-     * The mfn_list and gfn_list (below) arrays are ok on stack for the
-     * moment since they are small, but if they need to grow in future
-     * use-cases then per-CPU arrays or heap allocations may be required.
-     */
-    xen_pfn_t mfn_list[32];
+    xen_pfn_t *mfn_list;
     int rc;
 
     if ( copy_from_guest(&xmar, arg, 1) )
@@ -1064,25 +1059,17 @@ static int acquire_resource(
     if ( xmar.pad != 0 )
         return -EINVAL;
 
-    if ( guest_handle_is_null(xmar.frame_list) )
-    {
-        if ( xmar.nr_frames )
-            return -EINVAL;
-
-        xmar.nr_frames = ARRAY_SIZE(mfn_list);
-
-        if ( __copy_field_to_guest(arg, &xmar, nr_frames) )
-            return -EFAULT;
-
-        return 0;
-    }
+    mfn_list = xmalloc_array(xen_pfn_t, xmar.nr_frames);
 
-    if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-        return -E2BIG;
+    if ( ! mfn_list )
+        return -EFAULT;
 
     rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
     if ( rc )
+    {
+        xfree(mfn_list);
         return rc;
+    }
 
     rc = xsm_domain_resource_map(XSM_DM_PRIV, d);
     if ( rc )
@@ -1111,7 +1098,7 @@ static int acquire_resource(
     }
     else
     {
-        xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
+        xen_pfn_t *gfn_list;
         unsigned int i;
 
         /*
@@ -1120,7 +1107,12 @@ static int acquire_resource(
          *        resource pages unless the caller is the hardware domain.
          */
         if ( !is_hardware_domain(currd) )
-            return -EACCES;
+        {
+            rc = -EACCES;
+            goto out;
+        }
+
+        gfn_list = xmalloc_array(xen_pfn_t, xmar.nr_frames);
 
         if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
             rc = -EFAULT;
@@ -1133,9 +1125,12 @@ static int acquire_resource(
             if ( rc && i )
                 rc = -EIO;
         }
+
+        xfree(gfn_list);
     }
 
  out:
+    xfree(mfn_list);
     rcu_unlock_domain(d);
 
     return rc;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 2/7] x86/vmx: add Intel PT MSR definitions
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
  2020-06-18 23:38 ` [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays Michał Leszczyński
@ 2020-06-18 23:39 ` Michał Leszczyński
  2020-06-22 12:35   ` Jan Beulich
  2020-06-18 23:40 ` [PATCH v2 3/7] x86/vmx: add IPT cpu feature Michał Leszczyński
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:39 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kang, Luwei, Wei Liu, Andrew Cooper, Jan Beulich,
	Tamas K Lengyel, Roger Pau Monné

Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/include/asm-x86/msr-index.h | 37 +++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index b328a47ed8..812516f340 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -621,4 +621,41 @@
 #define MSR_PKGC9_IRTL			0x00000634
 #define MSR_PKGC10_IRTL			0x00000635
 
+/* Intel PT MSRs */
+#define MSR_RTIT_OUTPUT_BASE           0x00000560
+#define MSR_RTIT_OUTPUT_MASK           0x00000561
+#define MSR_RTIT_CTL                   0x00000570
+#define RTIT_CTL_TRACEEN               (_AC(1, ULL) << 0)
+#define RTIT_CTL_CYCEN                 (_AC(1, ULL) << 1)
+#define RTIT_CTL_OS                    (_AC(1, ULL) << 2)
+#define RTIT_CTL_USR                   (_AC(1, ULL) << 3)
+#define RTIT_CTL_PWR_EVT_EN            (_AC(1, ULL) << 4)
+#define RTIT_CTL_FUP_ON_PTW            (_AC(1, ULL) << 5)
+#define RTIT_CTL_FABRIC_EN             (_AC(1, ULL) << 6)
+#define RTIT_CTL_CR3_FILTER            (_AC(1, ULL) << 7)
+#define RTIT_CTL_TOPA                  (_AC(1, ULL) << 8)
+#define RTIT_CTL_MTC_EN                (_AC(1, ULL) << 9)
+#define RTIT_CTL_TSC_EN                (_AC(1, ULL) << 10)
+#define RTIT_CTL_DIS_RETC              (_AC(1, ULL) << 11)
+#define RTIT_CTL_PTW_EN                (_AC(1, ULL) << 12)
+#define RTIT_CTL_BRANCH_EN             (_AC(1, ULL) << 13)
+#define RTIT_CTL_MTC_FREQ_OFFSET       14
+#define RTIT_CTL_MTC_FREQ              (0x0fULL << RTIT_CTL_MTC_FREQ_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET     19
+#define RTIT_CTL_CYC_THRESH            (0x0fULL << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET       24
+#define RTIT_CTL_PSB_FREQ              (0x0fULL << RTIT_CTL_PSB_FREQ_OFFSET)
+#define RTIT_CTL_ADDR_OFFSET(n)        (32 + 4 * (n))
+#define RTIT_CTL_ADDR(n)               (0x0fULL << RTIT_CTL_ADDR_OFFSET(n))
+#define MSR_RTIT_STATUS                0x00000571
+#define RTIT_STATUS_FILTER_EN          (_AC(1, ULL) << 0)
+#define RTIT_STATUS_CONTEXT_EN         (_AC(1, ULL) << 1)
+#define RTIT_STATUS_TRIGGER_EN         (_AC(1, ULL) << 2)
+#define RTIT_STATUS_ERROR              (_AC(1, ULL) << 4)
+#define RTIT_STATUS_STOPPED            (_AC(1, ULL) << 5)
+#define RTIT_STATUS_BYTECNT            (0x1ffffULL << 32)
+#define MSR_RTIT_CR3_MATCH             0x00000572
+#define MSR_RTIT_ADDR_A(n)             (0x00000580 + (n) * 2)
+#define MSR_RTIT_ADDR_B(n)             (0x00000581 + (n) * 2)
+
 #endif /* __ASM_MSR_INDEX_H */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 3/7] x86/vmx: add IPT cpu feature
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
  2020-06-18 23:38 ` [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays Michał Leszczyński
  2020-06-18 23:39 ` [PATCH v2 2/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
@ 2020-06-18 23:40 ` Michał Leszczyński
  2020-06-19 13:44   ` Roger Pau Monné
  2020-06-22 12:40   ` Jan Beulich
  2020-06-18 23:41 ` [PATCH v2 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:40 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Kang, Luwei, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Tamas K Lengyel, Roger Pau Monné

Check if Intel Processor Trace feature is supported by current
processor. Define hvm_ipt_supported function.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/vmx/vmcs.c                 | 4 ++++
 xen/include/asm-x86/cpufeature.h            | 1 +
 xen/include/asm-x86/hvm/hvm.h               | 9 +++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h          | 1 +
 xen/include/public/arch-x86/cpufeatureset.h | 1 +
 5 files changed, 16 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ca94c2bedc..8466ccb912 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
         if ( opt_ept_pml )
             opt |= SECONDARY_EXEC_ENABLE_PML;
 
+        /* Check whether IPT is supported in VMX operation */
+        hvm_funcs.pt_supported = cpu_has_ipt &&
+            ( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );
+
         /*
          * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
          * can be set only when "use TPR shadow" is set
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..8d7955dd87 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwb            boot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cd        boot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt             boot_cpu_has(X86_FEATURE_IPT)
 #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bw        boot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..8c0d0ece67 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -96,6 +96,9 @@ struct hvm_function_table {
     /* Necessary hardware support for alternate p2m's? */
     bool altp2m_supported;
 
+    /* Hardware support for processor tracing? */
+    bool pt_supported;
+
     /* Hardware virtual interrupt delivery enable? */
     bool virtual_intr_delivery_enabled;
 
@@ -630,6 +633,12 @@ static inline bool hvm_altp2m_supported(void)
     return hvm_funcs.altp2m_supported;
 }
 
+/* returns true if hardware supports Intel Processor Trace */
+static inline bool hvm_pt_supported(void)
+{
+    return hvm_funcs.pt_supported;
+}
+
 /* updates the current hardware p2m */
 static inline void altp2m_vcpu_update_p2m(struct vcpu *v)
 {
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..4c81093aba 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -285,6 +285,7 @@ extern u64 vmx_ept_vpid_cap;
 
 #define VMX_MISC_CR3_TARGET                     0x01ff0000
 #define VMX_MISC_VMWRITE_ALL                    0x20000000
+#define VMX_MISC_PT_SUPPORTED                   0x00004000
 
 #define VMX_TSC_MULTIPLIER_MAX                  0xffffffffffffffffULL
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 5ca35d9d97..0d3f15f628 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,          5*32+20) /*S  Supervisor Mode Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add */
 XEN_CPUFEATURE(CLFLUSHOPT,    5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,          5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(IPT,           5*32+25) /*   Intel Processor Trace */
 XEN_CPUFEATURE(AVX512PF,      5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,      5*32+27) /*A  AVX-512 Exponent & Reciprocal Instrs */
 XEN_CPUFEATURE(AVX512CD,      5*32+28) /*A  AVX-512 Conflict Detection Instrs */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (2 preceding siblings ...)
  2020-06-18 23:40 ` [PATCH v2 3/7] x86/vmx: add IPT cpu feature Michał Leszczyński
@ 2020-06-18 23:41 ` Michał Leszczyński
  2020-06-19  0:46   ` Michał Leszczyński
                     ` (3 more replies)
  2020-06-18 23:41 ` [PATCH v2 5/7] tools/libxc: add xc_vmtrace_* functions Michał Leszczyński
                   ` (3 subsequent siblings)
  7 siblings, 4 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:41 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Tamas K Lengyel, Kang, Luwei, Roger Pau Monné

Provide an interface for privileged domains to manage
external IPT monitoring. Guest IPT state will be preserved
across vmentry/vmexit using ipt_state structure.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/hvm.c             | 167 +++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |  24 +++++
 xen/arch/x86/mm.c                  |  37 +++++++
 xen/common/domain.c                |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  16 +++
 xen/include/public/hvm/hvm_op.h    |  23 ++++
 xen/include/public/hvm/params.h    |   5 +-
 xen/include/public/memory.h        |   1 +
 xen/include/xen/sched.h            |   3 +
 9 files changed, 276 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5bb47583b3..145ad053d2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
     return rc;
 }
 
+void hvm_vmtrace_destroy(struct vcpu *v)
+{
+    unsigned int i;
+    struct page_info *pg;
+    struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
+    mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;
+    size_t buf_size = ipt->output_mask.size + 1;
+
+    xfree(ipt);
+    v->arch.hvm.vmx.ipt_state = NULL;
+
+    for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
+    {
+        pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));
+        free_domheap_page(pg);
+    }
+}
+
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     viridian_vcpu_deinit(v);
@@ -1631,6 +1649,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
     vlapic_destroy(v);
 
     hvm_vcpu_cacheattr_destroy(v);
+
+    hvm_vmtrace_destroy(v);
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -4066,6 +4086,51 @@ static int hvmop_set_evtchn_upcall_vector(
     return 0;
 }
 
+static int hvm_set_vmtrace_pt_size(struct domain *d, uint64_t value)
+{
+    void *buf;
+    unsigned int buf_order;
+    struct page_info *pg;
+    struct ipt_state *ipt;
+    struct vcpu *v;
+
+    if ( value < PAGE_SIZE ||
+         value > GB(4) ||
+         ( value & (value - 1) ) ) {
+        /* we don't accept trace buffer size smaller than single page
+         * and the upper bound is defined as 4GB in the specification */
+        return -EINVAL;
+    }
+
+    for_each_vcpu ( d, v )
+    {
+        buf_order = get_order_from_bytes(value);
+        pg = alloc_domheap_pages(d, buf_order, MEMF_no_refcount);
+
+        if ( !pg )
+            return -EFAULT;
+
+        buf = page_to_virt(pg);
+
+        if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
+            return -EFAULT;
+
+        ipt = xmalloc(struct ipt_state);
+
+        if ( !ipt )
+            return -EFAULT;
+
+        ipt->output_base = virt_to_mfn(buf) << PAGE_SHIFT;
+        ipt->output_mask.raw = value - 1;
+        ipt->status = 0;
+        ipt->active = 0;
+
+        v->arch.hvm.vmx.ipt_state = ipt;
+    }
+
+    return 0;
+}
+
 static int hvm_allow_set_param(struct domain *d,
                                uint32_t index,
                                uint64_t new_value)
@@ -4127,6 +4192,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
     case HVM_PARAM_MCA_CAP:
+    case HVM_PARAM_VMTRACE_PT_SIZE:
         if ( value != 0 && new_value != value )
             rc = -EEXIST;
         break;
@@ -4328,6 +4394,9 @@ static int hvm_set_param(struct domain *d, uint32_t index, uint64_t value)
     case HVM_PARAM_MCA_CAP:
         rc = vmce_enable_mca_cap(d, value);
         break;
+    case HVM_PARAM_VMTRACE_PT_SIZE:
+        rc = hvm_set_vmtrace_pt_size(d, value);
+        break;
     }
 
     if ( !rc )
@@ -4949,6 +5018,100 @@ static int compat_altp2m_op(
     return rc;
 }
 
+static int do_vmtrace_op(XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    struct xen_hvm_vmtrace_op a;
+    struct domain *d;
+    int rc;
+    struct vcpu *v;
+    struct ipt_state *ipt;
+
+    if ( !hvm_pt_supported() )
+        return -EOPNOTSUPP;
+
+    if ( copy_from_guest(&a, arg, 1) )
+        return -EFAULT;
+
+    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
+        return -EINVAL;
+
+    d = rcu_lock_domain_by_any_id(a.domain);
+    spin_lock(&d->vmtrace_lock);
+
+    if ( d == NULL )
+        return -ESRCH;
+
+    if ( !is_hvm_domain(d) )
+    {
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    domain_pause(d);
+
+    if ( a.vcpu >= d->max_vcpus )
+    {
+        rc = -EINVAL;
+        goto out;
+    }
+
+    v = d->vcpu[a.vcpu];
+    ipt = v->arch.hvm.vmx.ipt_state;
+
+    if ( !ipt )
+    {
+        /*
+	 * PT must be first initialized upon domain creation.
+	 */
+        rc = -EINVAL;
+        goto out;
+    }
+
+    switch ( a.cmd )
+    {
+    case HVMOP_vmtrace_ipt_enable:
+        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
+                               RTIT_CTL_TRACEEN | RTIT_CTL_OS |
+                               RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
+        {
+            rc = -EFAULT;
+            goto out;
+        }
+
+        ipt->active = 1;
+        break;
+    case HVMOP_vmtrace_ipt_disable:
+        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL, 0) )
+        {
+            rc = -EFAULT;
+            goto out;
+        }
+
+        ipt->active = 0;
+        break;
+    case HVMOP_vmtrace_ipt_get_offset:
+        a.offset = ipt->output_mask.offset;
+        break;
+    default:
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    rc = -EFAULT;
+    if ( __copy_to_guest(arg, &a, 1) )
+      goto out;
+    rc = 0;
+
+ out:
+    domain_unpause(d);
+    spin_unlock(&d->vmtrace_lock);
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
+DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
+
 static int hvmop_get_mem_type(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
 {
@@ -5101,6 +5264,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
         break;
 
+    case HVMOP_vmtrace:
+        rc = do_vmtrace_op(arg);
+        break;
+
     default:
     {
         gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ab19d9424e..51f0046483 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -508,11 +508,25 @@ static void vmx_restore_host_msrs(void)
 
 static void vmx_save_guest_msrs(struct vcpu *v)
 {
+    uint64_t rtit_ctl;
+
     /*
      * We cannot cache SHADOW_GS_BASE while the VCPU runs, as it can
      * be updated at any time via SWAPGS, which we cannot trap.
      */
     v->arch.hvm.vmx.shadow_gs = rdgsshadow();
+
+    if ( unlikely(v->arch.hvm.vmx.ipt_state && v->arch.hvm.vmx.ipt_state->active) )
+    {
+        smp_rmb();
+        rdmsrl(MSR_RTIT_CTL, rtit_ctl);
+
+        if ( rtit_ctl & RTIT_CTL_TRACEEN )
+            BUG();
+
+        rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
+        rdmsrl(MSR_RTIT_OUTPUT_MASK, v->arch.hvm.vmx.ipt_state->output_mask.raw);
+    }
 }
 
 static void vmx_restore_guest_msrs(struct vcpu *v)
@@ -524,6 +538,16 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
 
     if ( cpu_has_msr_tsc_aux )
         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
+
+    if ( unlikely(v->arch.hvm.vmx.ipt_state && v->arch.hvm.vmx.ipt_state->active) )
+    {
+        wrmsrl(MSR_RTIT_OUTPUT_BASE,
+            v->arch.hvm.vmx.ipt_state->output_base);
+        wrmsrl(MSR_RTIT_OUTPUT_MASK,
+            v->arch.hvm.vmx.ipt_state->output_mask.raw);
+        wrmsrl(MSR_RTIT_STATUS,
+            v->arch.hvm.vmx.ipt_state->status);
+    }
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e376fc7e8f..e4658dc27f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4624,6 +4624,43 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
         }
         break;
     }
+
+    case XENMEM_resource_vmtrace_buf:
+    {
+        mfn_t mfn;
+        unsigned int i;
+        struct ipt_state *ipt;
+
+        if ( id >= d->max_vcpus )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        ipt = d->vcpu[id]->arch.hvm.vmx.ipt_state;
+
+        if ( !ipt )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        mfn = mfn_x(ipt->output_base >> PAGE_SHIFT);
+
+        if (nr_frames > ( ( ipt->output_mask.size + 1 ) >> PAGE_SHIFT ))
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        rc = 0;
+        for ( i = 0; i < nr_frames; i++ )
+        {
+            mfn_list[i] = mfn_add(mfn, i);
+        }
+
+        break;
+    }
 #endif
 
     default:
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7cc9526139..6f6458cd5b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -414,6 +414,7 @@ struct domain *domain_create(domid_t domid,
     d->shutdown_code = SHUTDOWN_CODE_INVALID;
 
     spin_lock_init(&d->pbuf_lock);
+    spin_lock_init(&d->vmtrace_lock);
 
     rwlock_init(&d->vnuma_rwlock);
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 4c81093aba..9fc4653777 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -104,6 +104,19 @@ struct pi_blocking_vcpu {
     spinlock_t           *lock;
 };
 
+struct ipt_state {
+    uint64_t active;
+    uint64_t status;
+    uint64_t output_base;
+    union {
+        uint64_t raw;
+        struct {
+            uint32_t size;
+            uint32_t offset;
+        };
+    } output_mask;
+};
+
 struct vmx_vcpu {
     /* Physical address of VMCS. */
     paddr_t              vmcs_pa;
@@ -186,6 +199,9 @@ struct vmx_vcpu {
      * pCPU and wakeup the related vCPU.
      */
     struct pi_blocking_vcpu pi_blocking;
+
+    /* State of Intel Processor Trace feature */
+    struct ipt_state     *ipt_state;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 870ec52060..8cd0b6ea49 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -382,6 +382,29 @@ struct xen_hvm_altp2m_op {
 typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
 
+/* HVMOP_vmtrace: Perform VM tracing related operation */
+#define HVMOP_vmtrace 26
+
+#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
+
+struct xen_hvm_vmtrace_op {
+    /* IN variable */
+    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
+    uint32_t cmd;
+/* Enable/disable external vmtrace for given domain */
+#define HVMOP_vmtrace_ipt_enable      1
+#define HVMOP_vmtrace_ipt_disable     2
+#define HVMOP_vmtrace_ipt_get_offset  3
+    domid_t domain;
+    uint32_t vcpu;
+    uint64_t size;
+
+    /* OUT variable */
+    uint64_t offset;
+};
+typedef struct xen_hvm_vmtrace_op xen_hvm_vmtrace_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_vmtrace_op_t);
+
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
 
 /*
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 0a91bfa749..adbc7e5d08 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -300,6 +300,9 @@
 #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
 #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
 
-#define HVM_NR_PARAMS 39
+/* VM trace capabilities */
+#define HVM_PARAM_VMTRACE_PT_SIZE 39
+
+#define HVM_NR_PARAMS 40
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index dbd35305df..f823c784c3 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -620,6 +620,7 @@ struct xen_mem_acquire_resource {
 
 #define XENMEM_resource_ioreq_server 0
 #define XENMEM_resource_grant_table 1
+#define XENMEM_resource_vmtrace_buf 2
 
     /*
      * IN - a type-specific resource identifier, which must be zero
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..b3a36f3788 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -457,6 +457,9 @@ struct domain
     unsigned    pbuf_idx;
     spinlock_t  pbuf_lock;
 
+    /* Used by vmtrace domctls */
+    spinlock_t  vmtrace_lock;
+
     /* OProfile support. */
     struct xenoprof *xenoprof;
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 5/7] tools/libxc: add xc_vmtrace_* functions
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (3 preceding siblings ...)
  2020-06-18 23:41 ` [PATCH v2 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
@ 2020-06-18 23:41 ` Michał Leszczyński
  2020-06-18 23:42 ` [PATCH v2 6/7] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Tamas K Lengyel, Ian Jackson, Kang, Luwei, Wei Liu

Add functions in libxc that use the new HVMOP_vmtrace interface.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 tools/libxc/Makefile          |  1 +
 tools/libxc/include/xenctrl.h | 39 ++++++++++++++
 tools/libxc/xc_vmtrace.c      | 97 +++++++++++++++++++++++++++++++++++
 3 files changed, 137 insertions(+)
 create mode 100644 tools/libxc/xc_vmtrace.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fae5969a73..605e44501d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -27,6 +27,7 @@ CTRL_SRCS-y       += xc_csched2.c
 CTRL_SRCS-y       += xc_arinc653.c
 CTRL_SRCS-y       += xc_rt.c
 CTRL_SRCS-y       += xc_tbuf.c
+CTRL_SRCS-y       += xc_vmtrace.c
 CTRL_SRCS-y       += xc_pm.c
 CTRL_SRCS-y       += xc_cpu_hotplug.c
 CTRL_SRCS-y       += xc_resume.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 113ddd935d..101cc9b712 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,45 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable Intel Processor Trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_ipt_enable(xc_interface *xch, uint32_t domid,
+                          uint32_t vcpu);
+
+/**
+ * Disable Intel Processor Trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_ipt_disable(xc_interface *xch, uint32_t domid,
+                           uint32_t vcpu);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_ipt_get_offset(xc_interface *xch, uint32_t domid,
+                              uint32_t vcpu, uint64_t *offset);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_vmtrace.c b/tools/libxc/xc_vmtrace.c
new file mode 100644
index 0000000000..5f0551ad71
--- /dev/null
+++ b/tools/libxc/xc_vmtrace.c
@@ -0,0 +1,97 @@
+/******************************************************************************
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "xc_private.h"
+#include <xen/trace.h>
+
+int xc_vmtrace_ipt_enable(
+        xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+    int rc = -1;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+    arg->cmd = HVMOP_vmtrace_ipt_enable;
+    arg->domain = domid;
+    arg->vcpu = vcpu;
+
+    rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+                  HYPERCALL_BUFFER_AS_ARG(arg));
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_vmtrace_ipt_get_offset(
+        xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset)
+{
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+    int rc = -1;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+    arg->cmd = HVMOP_vmtrace_ipt_get_offset;
+    arg->domain = domid;
+    arg->vcpu = vcpu;
+
+    rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+                  HYPERCALL_BUFFER_AS_ARG(arg));
+
+    if ( rc == 0 )
+    {
+        *offset = arg->offset;
+    }
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_vmtrace_ipt_disable(xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+    int rc = -1;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+    arg->cmd = HVMOP_vmtrace_ipt_disable;
+    arg->domain = domid;
+    arg->vcpu = vcpu;
+
+    rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+                  HYPERCALL_BUFFER_AS_ARG(arg));
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 6/7] tools/libxl: add vmtrace_pt_size parameter
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (4 preceding siblings ...)
  2020-06-18 23:41 ` [PATCH v2 5/7] tools/libxc: add xc_vmtrace_* functions Michał Leszczyński
@ 2020-06-18 23:42 ` Michał Leszczyński
  2020-06-18 23:42 ` [PATCH v2 7/7] tools/proctrace: add proctrace tool Michał Leszczyński
  2020-06-18 23:51 ` [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
  7 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:42 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kang, Luwei, Wei Liu, Tamas K Lengyel, Ian Jackson,
	George Dunlap, Anthony PERARD

Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 tools/golang/xenlight/helpers.gen.go | 2 ++
 tools/golang/xenlight/types.gen.go   | 1 +
 tools/libxl/libxl_types.idl          | 2 ++
 tools/libxl/libxl_x86.c              | 5 +++++
 tools/xl/xl_parse.c                  | 4 ++++
 5 files changed, 14 insertions(+)

diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go
index 935d3bc50a..986ebbd681 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.VmtracePtSize = int(xc.vmtrace_pt_size)
 
  return nil}
 
@@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.vmtrace_pt_size = C.int(x.VmtracePtSize)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go
index 663c1e86b4..41ec7cdd32 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -516,6 +516,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+VmtracePtSize int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f399..04c1704b72 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
     # supported by x86 HVM and ARM support is planned.
     ("altp2m", libxl_altp2m_mode),
 
+    ("vmtrace_pt_size", integer),
+
     ], dir=DIR_IN,
        copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index e57f63282e..14be2b395a 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -404,6 +404,11 @@ static int hvm_set_conf_params(libxl__gc *gc, uint32_t domid,
             libxl_defbool_val(info->u.hvm.altp2m))
             altp2m = libxl_defbool_val(info->u.hvm.altp2m);
 
+        if (xc_hvm_param_set(xch, domid, HVM_PARAM_VMTRACE_PT_SIZE,
+                             info->vmtrace_pt_size)) {
+            LOG(ERROR, "Couldn't set HVM_PARAM_VMTRACE_PT_SIZE");
+            goto out;
+        }
         if (xc_hvm_param_set(xch, domid, HVM_PARAM_HPET_ENABLED,
                              libxl_defbool_val(info->u.hvm.hpet))) {
             LOG(ERROR, "Couldn't set HVM_PARAM_HPET_ENABLED");
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7b7e..6ab98dda55 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1861,6 +1861,10 @@ void parse_config_data(const char *config_source,
         }
     }
 
+    if (!xlu_cfg_get_long(config, "vmtrace_pt_size", &l, 1)) {
+        b_info->vmtrace_pt_size = l;
+    }
+
     if (!xlu_cfg_get_list(config, "ioports", &ioports, &num_ioports, 0)) {
         b_info->num_ioports = num_ioports;
         b_info->ioports = calloc(num_ioports, sizeof(*b_info->ioports));
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 7/7] tools/proctrace: add proctrace tool
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (5 preceding siblings ...)
  2020-06-18 23:42 ` [PATCH v2 6/7] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
@ 2020-06-18 23:42 ` Michał Leszczyński
  2020-06-18 23:51 ` [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
  7 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Tamas K Lengyel, Ian Jackson, Kang, Luwei, Wei Liu

Add an demonstration tool that uses xc_vmtrace_* calls in order
to manage external IPT monitoring for DomU.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 tools/proctrace/COPYING     | 339 ++++++++++++++++++++++++++++++++++++
 tools/proctrace/Makefile    |  50 ++++++
 tools/proctrace/proctrace.c | 153 ++++++++++++++++
 3 files changed, 542 insertions(+)
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

diff --git a/tools/proctrace/COPYING b/tools/proctrace/COPYING
new file mode 100644
index 0000000000..c0a841112c
--- /dev/null
+++ b/tools/proctrace/COPYING
@@ -0,0 +1,339 @@
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+                       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+		    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+			    NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+	    How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; If not, see <http://www.gnu.org/licenses/>.
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
new file mode 100644
index 0000000000..76d7387a64
--- /dev/null
+++ b/tools/proctrace/Makefile
@@ -0,0 +1,50 @@
+# Copyright (C) CERT Polska - NASK PIB
+# Author: Michał Leszczyński <michal.leszczynski@cert.pl>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; under version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += -Werror
+CFLAGS  += $(CFLAGS_libxenevtchn)
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenevtchn)
+LDLIBS  += $(LDLIBS_libxenforeignmemory)
+
+# SCRIPTS = xenmon.py
+
+.PHONY: all
+all: build
+
+.PHONY: build
+build: proctrace
+
+.PHONY: install
+install: build
+	$(INSTALL_DIR) $(DESTDIR)$(sbindir)
+	$(INSTALL_PROG) proctrace $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: uninstall
+uninstall:
+	rm -f $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: clean
+clean:
+	$(RM) -f $(DEPS_RM)
+
+.PHONY: distclean
+distclean: clean
+
+iptlive: iptlive.o Makefile
+	$(CC) $(LDFLAGS) $< -o $@ $(LDLIBS) $(APPEND_LDFLAGS)
+
+-include $(DEPS_INCLUDE)
diff --git a/tools/proctrace/proctrace.c b/tools/proctrace/proctrace.c
new file mode 100644
index 0000000000..b2c3da49f1
--- /dev/null
+++ b/tools/proctrace/proctrace.c
@@ -0,0 +1,153 @@
+/******************************************************************************
+ * tools/proctrace.c
+ *
+ * Demonstrative tool for collecting Intel Processor Trace data from Xen.
+ *  Could be used to externally monitor a given vCPU in given DomU.
+ *
+ * Copyright (C) 2020 by CERT Polska - NASK PIB
+ *
+ * Authors: Michał Leszczyński, michal.leszczynski@cert.pl
+ * Date:    June, 2020
+ * 
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; under version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <errno.h>
+#include <signal.h>
+#include <xenevtchn.h>
+#include <xenctrl.h>
+#include <xen/xen.h>
+#include <string.h>
+#include <sys/select.h>
+#include <getopt.h>
+
+#include <xen/xen.h>
+#include <xen/trace.h>
+#include <xenforeignmemory.h>
+#define XC_WANT_COMPAT_MAP_FOREIGN_API
+#include <xenevtchn.h>
+#include <xenctrl.h>
+
+#define BUF_SIZE 16384 * 4096
+
+volatile int interrupted = 0;
+
+void term_handler(int signum) {
+    interrupted = 1;
+}
+
+int main(int argc, char* argv[]) {
+    xc_interface *xc;
+    uint32_t domid;
+    uint32_t vcpu_id;
+
+    int rc = -1;
+    uint8_t *buf;
+    uint64_t last_offset = 0;
+
+    xenforeignmemory_handle* fmem;
+    xenforeignmemory_resource_handle *fres;
+
+    signal(SIGINT, term_handler);
+
+    if (argc != 3) {
+        fprintf(stderr, "Usage: %s <domid> <vcpu_id>\n", argv[0]);
+        fprintf(stderr, "It's recommended to redirect this"
+                        "program's output to file\n");
+        fprintf(stderr, "or to pipe it's output to xxd or other program.\n");
+        return 1;
+    }
+
+    domid = atoi(argv[1]);
+    vcpu_id = atoi(argv[2]);
+
+    xc = xc_interface_open(0, 0, 0);
+
+    fmem = xenforeignmemory_open(0, 0);
+
+    if (!xc) {
+        fprintf(stderr, "Failed to open xc interface\n");
+        return 1;
+    }
+
+    rc = xc_vmtrace_ipt_enable(xc, domid, vcpu_id);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_vmtrace_ipt_enable\n");
+        return 1;
+    }
+
+    fres = xenforeignmemory_map_resource(
+        fmem, domid, XENMEM_resource_vmtrace_buf,
+        /* vcpu: */ vcpu_id,
+        /* frame: */ 0,
+        /* num_frames: */ BUF_SIZE >> XC_PAGE_SHIFT,
+        (void **)&buf,
+        PROT_READ, 0);
+
+    while (!interrupted) {
+        uint64_t offset;
+        rc = xc_vmtrace_ipt_get_offset(xc, domid, vcpu_id, &offset);
+
+        if (rc) {
+            fprintf(stderr, "Failed to call xc_vmtrace_ipt_get_offset\n");
+            return 1;
+        }
+
+        if (offset > last_offset)
+        {
+            fwrite(buf + last_offset, offset - last_offset, 1, stdout);
+        }
+        else if (offset < last_offset)
+        {
+            // buffer wrapped
+            fwrite(buf + last_offset, BUF_SIZE - last_offset, 1, stdout);
+            fwrite(buf, offset, 1, stdout);
+        }
+
+        last_offset = offset;
+        usleep(1000 * 100);
+    }
+
+    rc = xenforeignmemory_unmap_resource(fmem, fres);
+
+    if (rc) {
+        fprintf(stderr, "Failed to unmap resource\n");
+        return 1;
+    }
+
+    rc = xc_vmtrace_ipt_disable(xc, domid, vcpu_id);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_vmtrace_ipt_disable\n");
+        return 1;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 0/7] Implement support for external IPT monitoring
  2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
                   ` (6 preceding siblings ...)
  2020-06-18 23:42 ` [PATCH v2 7/7] tools/proctrace: add proctrace tool Michał Leszczyński
@ 2020-06-18 23:51 ` Michał Leszczyński
  7 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-18 23:51 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	Tamas K Lengyel, Anthony PERARD, Kang, Luwei,
	Roger Pau Monné

----- 19 cze 2020 o 1:34, Michał Leszczyński michal.leszczynski@cert.pl napisał(a):

> Intel Processor Trace is an architectural extension available in modern Intel
> family CPUs. It allows recording the detailed trace of activity while the
> processor executes the code. One might use the recorded trace to reconstruct
> the code flow. It means, to find out the executed code paths, determine
> branches taken, and so forth.
> 
> The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures
> Software Developer's Manual Volume 3C: System Programming Guide, Part 3,
> Chapter 36: "Intel Processor Trace."
> 
> This patch series implements an interface that Dom0 could use in order to
> enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such
> a feature has numerous applications like malware monitoring, fuzzing, or
> performance testing.
> 
> Also thanks to Tamas K Lengyel for a few preliminary hints before
> first version of this patch was submitted to xen-devel.
> 
> Changed since v1:
>  * MSR_RTIT_CTL is managed using MSR load lists
>  * other PT-related MSRs are modified only when vCPU goes out of context
>  * trace buffer is now acquired as a resource
>  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
>    must be specified in the moment of domain creation
>  * trace buffers are allocated on domain creation, destructed on
>    domain destruction
>  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
>    these calls don't manage buffer memory anymore
>  * lifted 32 MFN/GFN array limit when acquiring resources
>  * minor code style changes according to review
> 
> Michal Leszczynski (7):
>  xen/mm: lift 32 item limit from mfn/gfn arrays
>  x86/vmx: add Intel PT MSR definitions
>  x86/vmx: add IPT cpu feature
>  x86/vmx: add do_vmtrace_op
>  tools/libxc: add xc_vmtrace_* functions
>  tools/libxl: add vmtrace_pt_size parameter
>  tools/proctrace: add proctrace tool
> 
> tools/golang/xenlight/helpers.gen.go        |   2 +
> tools/golang/xenlight/types.gen.go          |   1 +
> tools/libxc/Makefile                        |   1 +
> tools/libxc/include/xenctrl.h               |  39 +++
> tools/libxc/xc_vmtrace.c                    |  97 ++++++
> tools/libxl/libxl_types.idl                 |   2 +
> tools/libxl/libxl_x86.c                     |   5 +
> tools/proctrace/COPYING                     | 339 ++++++++++++++++++++
> tools/proctrace/Makefile                    |  50 +++
> tools/proctrace/proctrace.c                 | 153 +++++++++
> tools/xl/xl_parse.c                         |   4 +
> xen/arch/x86/hvm/hvm.c                      | 167 ++++++++++
> xen/arch/x86/hvm/vmx/vmcs.c                 |   4 +
> xen/arch/x86/hvm/vmx/vmx.c                  |  24 ++
> xen/arch/x86/mm.c                           |  37 +++
> xen/common/domain.c                         |   1 +
> xen/common/memory.c                         |  39 +--
> xen/include/asm-x86/cpufeature.h            |   1 +
> xen/include/asm-x86/hvm/hvm.h               |   9 +
> xen/include/asm-x86/hvm/vmx/vmcs.h          |  17 +
> xen/include/asm-x86/msr-index.h             |  37 +++
> xen/include/public/arch-x86/cpufeatureset.h |   1 +
> xen/include/public/hvm/hvm_op.h             |  23 ++
> xen/include/public/hvm/params.h             |   5 +-
> xen/include/public/memory.h                 |   1 +
> xen/include/xen/sched.h                     |   3 +
> 26 files changed, 1039 insertions(+), 23 deletions(-)
> create mode 100644 tools/libxc/xc_vmtrace.c
> create mode 100644 tools/proctrace/COPYING
> create mode 100644 tools/proctrace/Makefile
> create mode 100644 tools/proctrace/proctrace.c
> 
> --
> 2.20.1


Thanks for all comments related to v1. I did my best to address all of them and
thus almost all code was altered. Due to that, I've decided to post the next
version at this stage.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 23:41 ` [PATCH v2 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
@ 2020-06-19  0:46   ` Michał Leszczyński
  2020-06-19 15:30   ` Roger Pau Monné
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-19  0:46 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	Tamas K Lengyel, Kang, Luwei, Roger Pau Monné

----- 19 cze 2020 o 1:41, Michał Leszczyński michal.leszczynski@cert.pl napisał(a):

> Provide an interface for privileged domains to manage
> external IPT monitoring. Guest IPT state will be preserved
> across vmentry/vmexit using ipt_state structure.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---

...

> +struct xen_hvm_vmtrace_op {
> +    /* IN variable */
> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> +    uint32_t cmd;
> +/* Enable/disable external vmtrace for given domain */
> +#define HVMOP_vmtrace_ipt_enable      1
> +#define HVMOP_vmtrace_ipt_disable     2
> +#define HVMOP_vmtrace_ipt_get_offset  3
> +    domid_t domain;
> +    uint32_t vcpu;
> +    uint64_t size;
> +
> +    /* OUT variable */
> +    uint64_t offset;
> +};
> +typedef struct xen_hvm_vmtrace_op xen_hvm_vmtrace_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_vmtrace_op_t);
> +

I've forgotten about the padding thing here. This will be fixed in the next patch version, sorry.

ml


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-18 23:38 ` [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays Michał Leszczyński
@ 2020-06-19 11:34   ` Roger Pau Monné
  2020-06-19 11:36     ` Michał Leszczyński
  2020-06-19 12:35     ` Michał Leszczyński
  0 siblings, 2 replies; 46+ messages in thread
From: Roger Pau Monné @ 2020-06-19 11:34 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Kang, Luwei, Jan Beulich,
	Tamas K Lengyel, Xen-devel

On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
> Replace on-stack array allocation with heap allocation
> in order to lift the limit of 32 items in mfn/gfn arrays
> when calling acquire_resource.

I'm afraid this is not correct, you cannot allow unbounded amounts of
items to be processed like this, it's likely that you manage to
trigger the watchdog if the list is long enough, specially when doing
set_foreign_p2m_entry.

You need to process the items in batches (32 was IMO a good start), and
then add support for hypercall continuations. Take a look at how
XENMEM_populate_physmap just a couple of lines below makes use of
hypercall_create_continuation.

After processing every batch you need to check if
hypercall_preempt_check returns true and if so use
hypercall_create_continuation in order to encode a continuation.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-19 11:34   ` Roger Pau Monné
@ 2020-06-19 11:36     ` Michał Leszczyński
  2020-06-19 11:48       ` Jan Beulich
  2020-06-19 12:35     ` Michał Leszczyński
  1 sibling, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-19 11:36 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Kang, Luwei, Jan Beulich,
	Tamas K Lengyel, Xen-devel

----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>> Replace on-stack array allocation with heap allocation
>> in order to lift the limit of 32 items in mfn/gfn arrays
>> when calling acquire_resource.
> 
> I'm afraid this is not correct, you cannot allow unbounded amounts of
> items to be processed like this, it's likely that you manage to
> trigger the watchdog if the list is long enough, specially when doing
> set_foreign_p2m_entry.
> 
> You need to process the items in batches (32 was IMO a good start), and
> then add support for hypercall continuations. Take a look at how
> XENMEM_populate_physmap just a couple of lines below makes use of
> hypercall_create_continuation.
> 
> After processing every batch you need to check if
> hypercall_preempt_check returns true and if so use
> hypercall_create_continuation in order to encode a continuation.
> 
> Thanks, Roger.


Somebody previously suggested that this limit could be lifted this way,
so I would like to hear some more opinions on that.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-19 11:36     ` Michał Leszczyński
@ 2020-06-19 11:48       ` Jan Beulich
  2020-06-19 11:51         ` Michał Leszczyński
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2020-06-19 11:48 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Kang, Luwei, Tamas K Lengyel,
	Xen-devel, Roger Pau Monné

On 19.06.2020 13:36, Michał Leszczyński wrote:
> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>> Replace on-stack array allocation with heap allocation
>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>> when calling acquire_resource.
>>
>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>> items to be processed like this, it's likely that you manage to
>> trigger the watchdog if the list is long enough, specially when doing
>> set_foreign_p2m_entry.
>>
>> You need to process the items in batches (32 was IMO a good start), and
>> then add support for hypercall continuations. Take a look at how
>> XENMEM_populate_physmap just a couple of lines below makes use of
>> hypercall_create_continuation.
>>
>> After processing every batch you need to check if
>> hypercall_preempt_check returns true and if so use
>> hypercall_create_continuation in order to encode a continuation.
>>
>> Thanks, Roger.
> 
> 
> Somebody previously suggested that this limit could be lifted this way,
> so I would like to hear some more opinions on that.

I did suggest the limit can be lifted, but not by processing all
pieces in one go. Whether batches of 32 or 64 or 128 are chosen
is a different thing, but you can't do arbitrary amounts without
any preemption checks.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-19 11:48       ` Jan Beulich
@ 2020-06-19 11:51         ` Michał Leszczyński
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-19 11:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Kang, Luwei, Tamas K Lengyel,
	Xen-devel, Roger Pau Monné

----- 19 cze 2020 o 13:48, Jan Beulich jbeulich@suse.com napisał(a):

> On 19.06.2020 13:36, Michał Leszczyński wrote:
>> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> 
>>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>>> Replace on-stack array allocation with heap allocation
>>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>>> when calling acquire_resource.
>>>
>>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>>> items to be processed like this, it's likely that you manage to
>>> trigger the watchdog if the list is long enough, specially when doing
>>> set_foreign_p2m_entry.
>>>
>>> You need to process the items in batches (32 was IMO a good start), and
>>> then add support for hypercall continuations. Take a look at how
>>> XENMEM_populate_physmap just a couple of lines below makes use of
>>> hypercall_create_continuation.
>>>
>>> After processing every batch you need to check if
>>> hypercall_preempt_check returns true and if so use
>>> hypercall_create_continuation in order to encode a continuation.
>>>
>>> Thanks, Roger.
>> 
>> 
>> Somebody previously suggested that this limit could be lifted this way,
>> so I would like to hear some more opinions on that.
> 
> I did suggest the limit can be lifted, but not by processing all
> pieces in one go. Whether batches of 32 or 64 or 128 are chosen
> is a different thing, but you can't do arbitrary amounts without
> any preemption checks.
> 
> Jan


Okay. I will try to correct it within v3.

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-19 11:34   ` Roger Pau Monné
  2020-06-19 11:36     ` Michał Leszczyński
@ 2020-06-19 12:35     ` Michał Leszczyński
  2020-06-19 12:39       ` Jan Beulich
  1 sibling, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-19 12:35 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Kang, Luwei, Jan Beulich,
	Tamas K Lengyel, Xen-devel

----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>> Replace on-stack array allocation with heap allocation
>> in order to lift the limit of 32 items in mfn/gfn arrays
>> when calling acquire_resource.
> 
> I'm afraid this is not correct, you cannot allow unbounded amounts of
> items to be processed like this, it's likely that you manage to
> trigger the watchdog if the list is long enough, specially when doing
> set_foreign_p2m_entry.
> 
> You need to process the items in batches (32 was IMO a good start), and
> then add support for hypercall continuations. Take a look at how
> XENMEM_populate_physmap just a couple of lines below makes use of
> hypercall_create_continuation.
> 
> After processing every batch you need to check if
> hypercall_preempt_check returns true and if so use
> hypercall_create_continuation in order to encode a continuation.
> 
> Thanks, Roger.


One more question. Are these continuations transparent from the caller side,
or do I also need to add something on the invoker side to properly handle these
continuations?


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-19 12:35     ` Michał Leszczyński
@ 2020-06-19 12:39       ` Jan Beulich
  2020-06-22  3:00         ` Michał Leszczyński
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2020-06-19 12:39 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Kang, Luwei, Tamas K Lengyel,
	Xen-devel, Roger Pau Monné

On 19.06.2020 14:35, Michał Leszczyński wrote:
> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>> Replace on-stack array allocation with heap allocation
>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>> when calling acquire_resource.
>>
>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>> items to be processed like this, it's likely that you manage to
>> trigger the watchdog if the list is long enough, specially when doing
>> set_foreign_p2m_entry.
>>
>> You need to process the items in batches (32 was IMO a good start), and
>> then add support for hypercall continuations. Take a look at how
>> XENMEM_populate_physmap just a couple of lines below makes use of
>> hypercall_create_continuation.
>>
>> After processing every batch you need to check if
>> hypercall_preempt_check returns true and if so use
>> hypercall_create_continuation in order to encode a continuation.
> 
> One more question. Are these continuations transparent from the caller side,
> or do I also need to add something on the invoker side to properly handle these
> continuations?

They are (mostly) transparent to the guest, yes. "Mostly" because we
have cases (iirc) where the continuation data is stored in a way that
a guest could observe it. But it still wouldn't need to do anything
in order for the hypercall to get continued until it completes (which
may be "fails", faod).

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature
  2020-06-18 23:40 ` [PATCH v2 3/7] x86/vmx: add IPT cpu feature Michał Leszczyński
@ 2020-06-19 13:44   ` Roger Pau Monné
  2020-06-19 14:22     ` Michał Leszczyński
  2020-06-22  2:49     ` Michał Leszczyński
  2020-06-22 12:40   ` Jan Beulich
  1 sibling, 2 replies; 46+ messages in thread
From: Roger Pau Monné @ 2020-06-19 13:44 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Kang, Luwei, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Tamas K Lengyel, Xen-devel

On Fri, Jun 19, 2020 at 01:40:21AM +0200, Michał Leszczyński wrote:
> Check if Intel Processor Trace feature is supported by current
> processor. Define hvm_ipt_supported function.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---

We usually keep a shirt list of the changes between versions, so it's
easier for the reviewers to know what changed. As an example:

https://lore.kernel.org/xen-devel/20200613184132.11880-1-julien@xen.org/

>  xen/arch/x86/hvm/vmx/vmcs.c                 | 4 ++++
>  xen/include/asm-x86/cpufeature.h            | 1 +
>  xen/include/asm-x86/hvm/hvm.h               | 9 +++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h          | 1 +
>  xen/include/public/arch-x86/cpufeatureset.h | 1 +
>  5 files changed, 16 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index ca94c2bedc..8466ccb912 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
>          if ( opt_ept_pml )
>              opt |= SECONDARY_EXEC_ENABLE_PML;
>  
> +        /* Check whether IPT is supported in VMX operation */
> +        hvm_funcs.pt_supported = cpu_has_ipt &&
> +            ( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );

By the placement of this chunk you are tying IPT support to the
secondary exec availability, but I don't think that's required?

Ie: You should move the read of misc_cap to the top-level of the
function and perform the VMX_MISC_PT_SUPPORTED check there also.

Note that space inside parentheses is only required for conditions of
'if', 'for' and those kind of statements, here it's not required, so
this should be:

    hvm_funcs.pt_supported = cpu_has_ipt &&
                             (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);

I also think this should look like:

    if ( !smp_processor_id() )
    	hvm_funcs.pt_supported = cpu_has_ipt &&
                                 (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
    else if ( hvm_funcs.pt_supported &&
              !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
    {
        printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
               smp_processor_id());
        return -EINVAL;
    }


So that you can detect mismatches between CPUs.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature
  2020-06-19 13:44   ` Roger Pau Monné
@ 2020-06-19 14:22     ` Michał Leszczyński
  2020-06-19 15:31       ` Roger Pau Monné
  2020-06-22  2:49     ` Michał Leszczyński
  1 sibling, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-19 14:22 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Kang, Luwei, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Tamas K Lengyel, Xen-devel

----- 19 cze 2020 o 15:44, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:40:21AM +0200, Michał Leszczyński wrote:
>> Check if Intel Processor Trace feature is supported by current
>> processor. Define hvm_ipt_supported function.
>> 
>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
>> ---
> 
> We usually keep a shirt list of the changes between versions, so it's
> easier for the reviewers to know what changed. As an example:
> 
> https://lore.kernel.org/xen-devel/20200613184132.11880-1-julien@xen.org/
> 

There is a change list in the cover letter. Should I also add changelog for
each individual patch?


>>  xen/arch/x86/hvm/vmx/vmcs.c                 | 4 ++++
>>  xen/include/asm-x86/cpufeature.h            | 1 +
>>  xen/include/asm-x86/hvm/hvm.h               | 9 +++++++++
>>  xen/include/asm-x86/hvm/vmx/vmcs.h          | 1 +
>>  xen/include/public/arch-x86/cpufeatureset.h | 1 +
>>  5 files changed, 16 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index ca94c2bedc..8466ccb912 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
>>          if ( opt_ept_pml )
>>              opt |= SECONDARY_EXEC_ENABLE_PML;
>>  
>> +        /* Check whether IPT is supported in VMX operation */
>> +        hvm_funcs.pt_supported = cpu_has_ipt &&
>> +            ( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );
> 
> By the placement of this chunk you are tying IPT support to the
> secondary exec availability, but I don't think that's required?
> 
> Ie: You should move the read of misc_cap to the top-level of the
> function and perform the VMX_MISC_PT_SUPPORTED check there also.
> 
> Note that space inside parentheses is only required for conditions of
> 'if', 'for' and those kind of statements, here it's not required, so
> this should be:
> 
>    hvm_funcs.pt_supported = cpu_has_ipt &&
>                             (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
> 
> I also think this should look like:
> 
>    if ( !smp_processor_id() )
>    	hvm_funcs.pt_supported = cpu_has_ipt &&
>                                 (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>    else if ( hvm_funcs.pt_supported &&
>              !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>    {
>        printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
>               smp_processor_id());
>        return -EINVAL;
>    }
> 
> 
> So that you can detect mismatches between CPUs.


I will fix this.


> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 23:41 ` [PATCH v2 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
  2020-06-19  0:46   ` Michał Leszczyński
@ 2020-06-19 15:30   ` Roger Pau Monné
  2020-06-19 15:50     ` Jan Beulich
  2020-06-22  2:56   ` Michał Leszczyński
  2020-06-22 13:25   ` Jan Beulich
  3 siblings, 1 reply; 46+ messages in thread
From: Roger Pau Monné @ 2020-06-19 15:30 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jan Beulich, Tamas K Lengyel, Xen-devel

On Fri, Jun 19, 2020 at 01:41:03AM +0200, Michał Leszczyński wrote:
> Provide an interface for privileged domains to manage
> external IPT monitoring. Guest IPT state will be preserved
> across vmentry/vmexit using ipt_state structure.

Thanks! I have some comments below, some of them are cosmetic coding
style issues. Sorry the Xen coding style is quite different from other
projects, and it takes a bit to get used to.

> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/hvm/hvm.c             | 167 +++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/vmx/vmx.c         |  24 +++++
>  xen/arch/x86/mm.c                  |  37 +++++++
>  xen/common/domain.c                |   1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  16 +++
>  xen/include/public/hvm/hvm_op.h    |  23 ++++
>  xen/include/public/hvm/params.h    |   5 +-
>  xen/include/public/memory.h        |   1 +
>  xen/include/xen/sched.h            |   3 +
>  9 files changed, 276 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5bb47583b3..145ad053d2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
>      return rc;
>  }
>  
> +void hvm_vmtrace_destroy(struct vcpu *v)
> +{
> +    unsigned int i;
> +    struct page_info *pg;
> +    struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
> +    mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;

Does this build? I think you are missing a _mfn(...) here?

> +    size_t buf_size = ipt->output_mask.size + 1;
> +
> +    xfree(ipt);
> +    v->arch.hvm.vmx.ipt_state = NULL;
> +
> +    for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
> +    {
> +        pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));
> +        free_domheap_page(pg);

There's no need for the loop, just use free_domheap_pages and free the
whole allocated area at once. Also you can use maddr_to_page to get
the page from output_base.

> +    }
> +}
> +
>  void hvm_vcpu_destroy(struct vcpu *v)
>  {
>      viridian_vcpu_deinit(v);
> @@ -1631,6 +1649,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
>      vlapic_destroy(v);
>  
>      hvm_vcpu_cacheattr_destroy(v);
> +
> +    hvm_vmtrace_destroy(v);
>  }
>  
>  void hvm_vcpu_down(struct vcpu *v)
> @@ -4066,6 +4086,51 @@ static int hvmop_set_evtchn_upcall_vector(
>      return 0;
>  }
>  
> +static int hvm_set_vmtrace_pt_size(struct domain *d, uint64_t value)

No need for the hvm_ prefix since it's a local function, and then I
would name it:

vmtrace_alloc_buffers(struct domain *d, uint64_t size)

I would name the variable size, so it's purpose it's clearer.

> +{
> +    void *buf;
> +    unsigned int buf_order;
> +    struct page_info *pg;
> +    struct ipt_state *ipt;
> +    struct vcpu *v;
> +
> +    if ( value < PAGE_SIZE ||
> +         value > GB(4) ||
> +         ( value & (value - 1) ) ) {

Extra spaces, and no need to place each condition on a new line as
long as you don't exceed the 80 character limit. Also thee opening
brace should be on a newline:

    if ( value < PAGE_SIZE || value > GB(4) || (value & (value - 1)) )
    {


> +        /* we don't accept trace buffer size smaller than single page
> +         * and the upper bound is defined as 4GB in the specification */

Comment style is wrong, according to CODING_STYLE this should be:

        /*
	 * We don't accept trace buffer size smaller than single page
         * and the upper bound is defined as 4GB in the specification.
	 */

Since you mention the limitations of the buffer size, I would also
mention that size must be a power of 2.

> +        return -EINVAL;
> +    }
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        buf_order = get_order_from_bytes(value);

There's no need for the buf_order variable, just place the call inside
alloc_domheap_pages.

> +        pg = alloc_domheap_pages(d, buf_order, MEMF_no_refcount);

You can define the variable here:

struct page_info *pg = alloc_domheap_pages(d, get_order_from_bytes(value),
                                           MEMF_no_refcount);

ipt variable can also be defined here to reduce it's scope.

> +
> +        if ( !pg )
> +            return -EFAULT;

-ENOMEM

> +
> +        buf = page_to_virt(pg);

Why do you need buf?

You seem to get the virtual address of the page(s) just to use it in
virt_to_mfn, but you can directly use page_to_maddr and remove the buf
variable (and the shift done below).

> +
> +        if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
> +            return -EFAULT;

You leak the pg allocation here...

> +
> +        ipt = xmalloc(struct ipt_state);

If you use xzalloc you can avoid setting the fields to 0 below.

> +
> +        if ( !ipt )
> +            return -EFAULT;

... and here in the failure paths. This should be -ENOMEM also.

> +
> +        ipt->output_base = virt_to_mfn(buf) << PAGE_SHIFT;
> +        ipt->output_mask.raw = value - 1;
> +        ipt->status = 0;
> +        ipt->active = 0;
> +
> +        v->arch.hvm.vmx.ipt_state = ipt;
> +    }
> +
> +    return 0;
> +}
> +
>  static int hvm_allow_set_param(struct domain *d,
>                                 uint32_t index,
>                                 uint64_t new_value)
> @@ -4127,6 +4192,7 @@ static int hvm_allow_set_param(struct domain *d,
>      case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>      case HVM_PARAM_ALTP2M:
>      case HVM_PARAM_MCA_CAP:
> +    case HVM_PARAM_VMTRACE_PT_SIZE:
>          if ( value != 0 && new_value != value )
>              rc = -EEXIST;
>          break;
> @@ -4328,6 +4394,9 @@ static int hvm_set_param(struct domain *d, uint32_t index, uint64_t value)
>      case HVM_PARAM_MCA_CAP:
>          rc = vmce_enable_mca_cap(d, value);
>          break;
> +    case HVM_PARAM_VMTRACE_PT_SIZE:
> +        rc = hvm_set_vmtrace_pt_size(d, value);
> +        break;
>      }
>  
>      if ( !rc )
> @@ -4949,6 +5018,100 @@ static int compat_altp2m_op(
>      return rc;
>  }
>  
> +static int do_vmtrace_op(XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +    struct xen_hvm_vmtrace_op a;
> +    struct domain *d;
> +    int rc;
> +    struct vcpu *v;
> +    struct ipt_state *ipt;
> +
> +    if ( !hvm_pt_supported() )
> +        return -EOPNOTSUPP;
> +
> +    if ( copy_from_guest(&a, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
> +        return -EINVAL;
> +
> +    d = rcu_lock_domain_by_any_id(a.domain);
> +    spin_lock(&d->vmtrace_lock);

This must be moved after you have checked d is not NULL.

> +
> +    if ( d == NULL )
> +        return -ESRCH;
> +
> +    if ( !is_hvm_domain(d) )

You don't need to hold vmtrace_lock to check this...

> +    {
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    domain_pause(d);

Do you really need to pause the whole domain, or just the vcpu you are
avoid to modify the parameters?

Also for HVMOP_vmtrace_ipt_get_offset there's no need to pause
anything?

> +
> +    if ( a.vcpu >= d->max_vcpus )

... neither this. I think you can reduce the locked section a bit.

> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    v = d->vcpu[a.vcpu];
> +    ipt = v->arch.hvm.vmx.ipt_state;
> +
> +    if ( !ipt )
> +    {
> +        /*
> +	 * PT must be first initialized upon domain creation.
> +	 */

You have a couple of hard tab in the lines above. Also single line
comments are /* ... */.

> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    switch ( a.cmd )
> +    {
> +    case HVMOP_vmtrace_ipt_enable:
> +        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
> +                               RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> +                               RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
> +        {
> +            rc = -EFAULT;

vmx_add_guest_msr already returns a valid error code, please don't
replace it with -EFAULT unconditionally (here and below).

> +            goto out;
> +        }
> +
> +        ipt->active = 1;
> +        break;

Please add an empty newline after break;.

> +    case HVMOP_vmtrace_ipt_disable:
> +        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL, 0) )
> +        {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +
> +        ipt->active = 0;
> +        break;
> +    case HVMOP_vmtrace_ipt_get_offset:
> +        a.offset = ipt->output_mask.offset;
> +        break;
> +    default:
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    rc = -EFAULT;
> +    if ( __copy_to_guest(arg, &a, 1) )
> +      goto out;
> +    rc = 0;
> +
> + out:
> +    domain_unpause(d);
> +    spin_unlock(&d->vmtrace_lock);
> +    rcu_unlock_domain(d);
> +
> +    return rc;
> +}
> +
> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
> +
>  static int hvmop_get_mem_type(
>      XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
>  {
> @@ -5101,6 +5264,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
>          break;
>  
> +    case HVMOP_vmtrace:
> +        rc = do_vmtrace_op(arg);
> +        break;
> +
>      default:
>      {
>          gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ab19d9424e..51f0046483 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -508,11 +508,25 @@ static void vmx_restore_host_msrs(void)
>  
>  static void vmx_save_guest_msrs(struct vcpu *v)
>  {
> +    uint64_t rtit_ctl;
> +
>      /*
>       * We cannot cache SHADOW_GS_BASE while the VCPU runs, as it can
>       * be updated at any time via SWAPGS, which we cannot trap.
>       */
>      v->arch.hvm.vmx.shadow_gs = rdgsshadow();
> +
> +    if ( unlikely(v->arch.hvm.vmx.ipt_state && v->arch.hvm.vmx.ipt_state->active) )
> +    {
> +        smp_rmb();

If this barrier is required I think it warrants a comment about why
it's needed.

> +        rdmsrl(MSR_RTIT_CTL, rtit_ctl);
> +
> +        if ( rtit_ctl & RTIT_CTL_TRACEEN )
> +            BUG();

BUG_ON(rtit_ctl & RTIT_CTL_TRACEEN);

> +
> +        rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
> +        rdmsrl(MSR_RTIT_OUTPUT_MASK, v->arch.hvm.vmx.ipt_state->output_mask.raw);
> +    }
>  }
>  
>  static void vmx_restore_guest_msrs(struct vcpu *v)
> @@ -524,6 +538,16 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
>  
>      if ( cpu_has_msr_tsc_aux )
>          wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
> +
> +    if ( unlikely(v->arch.hvm.vmx.ipt_state && v->arch.hvm.vmx.ipt_state->active) )

Line is too long.

> +    {
> +        wrmsrl(MSR_RTIT_OUTPUT_BASE,
> +            v->arch.hvm.vmx.ipt_state->output_base);
> +        wrmsrl(MSR_RTIT_OUTPUT_MASK,
> +            v->arch.hvm.vmx.ipt_state->output_mask.raw);
> +        wrmsrl(MSR_RTIT_STATUS,
> +            v->arch.hvm.vmx.ipt_state->status);

Can you please align to the start of the parentheses:

        wrmsrl(MSR_RTIT_STATUS,
               v->arch.hvm.vmx.ipt_state->status);

> +    }
>  }
>  
>  void vmx_update_cpu_exec_control(struct vcpu *v)
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index e376fc7e8f..e4658dc27f 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4624,6 +4624,43 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>          }
>          break;
>      }
> +
> +    case XENMEM_resource_vmtrace_buf:
> +    {
> +        mfn_t mfn;
> +        unsigned int i;
> +        struct ipt_state *ipt;
> +
> +        if ( id >= d->max_vcpus )
> +        {
> +            rc = -EINVAL;

You can just set rc = -EINVAL at the top and then avoid setting it on
every error case.

> +            break;
> +        }
> +
> +        ipt = d->vcpu[id]->arch.hvm.vmx.ipt_state;
> +
> +        if ( !ipt )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        mfn = mfn_x(ipt->output_base >> PAGE_SHIFT);
> +
> +        if (nr_frames > ( ( ipt->output_mask.size + 1 ) >> PAGE_SHIFT ))

Spaces are wrongly positioned. ie:

if ( frame + nr_frames > ((ipt->output_mask.size + 1) >> PAGE_SHIFT) )

> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = 0;
> +        for ( i = 0; i < nr_frames; i++ )

I think you should init i to the the 'frame' field from
xen_mem_acquire_resource, since it's possible to select the offset you
want to map from a resource. You will also need to take it into
account, because IMO that's where I would store the last processed
frame when dealing with continuations.

> +        {
> +            mfn_list[i] = mfn_add(mfn, i);
> +        }
> +
> +        break;
> +    }
>  #endif
>  
>      default:
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 7cc9526139..6f6458cd5b 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -414,6 +414,7 @@ struct domain *domain_create(domid_t domid,
>      d->shutdown_code = SHUTDOWN_CODE_INVALID;
>  
>      spin_lock_init(&d->pbuf_lock);
> +    spin_lock_init(&d->vmtrace_lock);
>  
>      rwlock_init(&d->vnuma_rwlock);
>  
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 4c81093aba..9fc4653777 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -104,6 +104,19 @@ struct pi_blocking_vcpu {
>      spinlock_t           *lock;
>  };
>  
> +struct ipt_state {
> +    uint64_t active;

Could this be a boolean?

> +    uint64_t status;
> +    uint64_t output_base;
> +    union {
> +        uint64_t raw;
> +        struct {
> +            uint32_t size;
> +            uint32_t offset;
> +        };
> +    } output_mask;
> +};
> +
>  struct vmx_vcpu {
>      /* Physical address of VMCS. */
>      paddr_t              vmcs_pa;
> @@ -186,6 +199,9 @@ struct vmx_vcpu {
>       * pCPU and wakeup the related vCPU.
>       */
>      struct pi_blocking_vcpu pi_blocking;
> +
> +    /* State of Intel Processor Trace feature */
> +    struct ipt_state     *ipt_state;
>  };
>  
>  int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index 870ec52060..8cd0b6ea49 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -382,6 +382,29 @@ struct xen_hvm_altp2m_op {
>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>  
> +/* HVMOP_vmtrace: Perform VM tracing related operation */
> +#define HVMOP_vmtrace 26
> +
> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
> +
> +struct xen_hvm_vmtrace_op {
> +    /* IN variable */
> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> +    uint32_t cmd;
> +/* Enable/disable external vmtrace for given domain */
> +#define HVMOP_vmtrace_ipt_enable      1
> +#define HVMOP_vmtrace_ipt_disable     2
> +#define HVMOP_vmtrace_ipt_get_offset  3
> +    domid_t domain;
> +    uint32_t vcpu;
> +    uint64_t size;
> +
> +    /* OUT variable */
> +    uint64_t offset;
> +};
> +typedef struct xen_hvm_vmtrace_op xen_hvm_vmtrace_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_vmtrace_op_t);
> +
>  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
>  
>  /*
> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
> index 0a91bfa749..adbc7e5d08 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -300,6 +300,9 @@
>  #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
>  #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
>  
> -#define HVM_NR_PARAMS 39
> +/* VM trace capabilities */
> +#define HVM_PARAM_VMTRACE_PT_SIZE 39

I think it was recommended that IPT should be set at domain creation,
but using a HVM param still allows you to init this after the domain
has been created. I think it might be best to just add a new field to
xen_domctl_createdomain, as it seems like the interface could also be
helpful for other arches?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature
  2020-06-19 14:22     ` Michał Leszczyński
@ 2020-06-19 15:31       ` Roger Pau Monné
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Pau Monné @ 2020-06-19 15:31 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Kang, Luwei, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Tamas K Lengyel, Xen-devel

On Fri, Jun 19, 2020 at 04:22:28PM +0200, Michał Leszczyński wrote:
> ----- 19 cze 2020 o 15:44, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Fri, Jun 19, 2020 at 01:40:21AM +0200, Michał Leszczyński wrote:
> >> Check if Intel Processor Trace feature is supported by current
> >> processor. Define hvm_ipt_supported function.
> >> 
> >> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> >> ---
> > 
> > We usually keep a shirt list of the changes between versions, so it's
> > easier for the reviewers to know what changed. As an example:
> > 
> > https://lore.kernel.org/xen-devel/20200613184132.11880-1-julien@xen.org/
> > 
> 
> There is a change list in the cover letter. Should I also add changelog for
> each individual patch?

Oh, sorry, completely missed that. IMO yes, it's easier for reviewers
because we then have the diff and the changelog at the same place.

Changelogs in cover letters are also fine, but I would tend to only
add big architectural changes there.

Roger.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-19 15:30   ` Roger Pau Monné
@ 2020-06-19 15:50     ` Jan Beulich
  2020-06-22  2:45       ` Michał Leszczyński
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2020-06-19 15:50 UTC (permalink / raw)
  To: Roger Pau Monné, Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel

On 19.06.2020 17:30, Roger Pau Monné wrote:
> On Fri, Jun 19, 2020 at 01:41:03AM +0200, Michał Leszczyński wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
>>      return rc;
>>  }
>>  
>> +void hvm_vmtrace_destroy(struct vcpu *v)
>> +{
>> +    unsigned int i;
>> +    struct page_info *pg;
>> +    struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
>> +    mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;
> 
> Does this build? I think you are missing a _mfn(...) here?

This as well as ...

>> +    size_t buf_size = ipt->output_mask.size + 1;
>> +
>> +    xfree(ipt);
>> +    v->arch.hvm.vmx.ipt_state = NULL;
>> +
>> +    for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>> +    {
>> +        pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));

... the extra _mfn() here suggest the code was only ever built in
release mode so far.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-19 15:50     ` Jan Beulich
@ 2020-06-22  2:45       ` Michał Leszczyński
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22  2:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

----- 19 cze 2020 o 17:50, Jan Beulich jbeulich@suse.com napisał(a):

> On 19.06.2020 17:30, Roger Pau Monné wrote:
>> On Fri, Jun 19, 2020 at 01:41:03AM +0200, Michał Leszczyński wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
>>>      return rc;
>>>  }
>>>  
>>> +void hvm_vmtrace_destroy(struct vcpu *v)
>>> +{
>>> +    unsigned int i;
>>> +    struct page_info *pg;
>>> +    struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
>>> +    mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;
>> 
>> Does this build? I think you are missing a _mfn(...) here?
> 
> This as well as ...
> 
>>> +    size_t buf_size = ipt->output_mask.size + 1;
>>> +
>>> +    xfree(ipt);
>>> +    v->arch.hvm.vmx.ipt_state = NULL;
>>> +
>>> +    for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>>> +    {
>>> +        pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));
> 
> ... the extra _mfn() here suggest the code was only ever built in
> release mode so far.
> 
> Jan


Ah, I forgot to enable developer checks. This will be corrected in v3.

ml


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature
  2020-06-19 13:44   ` Roger Pau Monné
  2020-06-19 14:22     ` Michał Leszczyński
@ 2020-06-22  2:49     ` Michał Leszczyński
  2020-06-22  8:31       ` Jan Beulich
  1 sibling, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22  2:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Kang, Luwei, Jun Nakajima, Wei Liu, Andrew Cooper,
	Jan Beulich, Tamas K Lengyel, Xen-devel

----- 19 cze 2020 o 15:44, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:40:21AM +0200, Michał Leszczyński wrote:
>> Check if Intel Processor Trace feature is supported by current
>> processor. Define hvm_ipt_supported function.
>> 
>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
>> ---
> 
> We usually keep a shirt list of the changes between versions, so it's
> easier for the reviewers to know what changed. As an example:
> 
> https://lore.kernel.org/xen-devel/20200613184132.11880-1-julien@xen.org/
> 
>>  xen/arch/x86/hvm/vmx/vmcs.c                 | 4 ++++
>>  xen/include/asm-x86/cpufeature.h            | 1 +
>>  xen/include/asm-x86/hvm/hvm.h               | 9 +++++++++
>>  xen/include/asm-x86/hvm/vmx/vmcs.h          | 1 +
>>  xen/include/public/arch-x86/cpufeatureset.h | 1 +
>>  5 files changed, 16 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index ca94c2bedc..8466ccb912 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
>>          if ( opt_ept_pml )
>>              opt |= SECONDARY_EXEC_ENABLE_PML;
>>  
>> +        /* Check whether IPT is supported in VMX operation */
>> +        hvm_funcs.pt_supported = cpu_has_ipt &&
>> +            ( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );
> 
> By the placement of this chunk you are tying IPT support to the
> secondary exec availability, but I don't think that's required?
> 
> Ie: You should move the read of misc_cap to the top-level of the
> function and perform the VMX_MISC_PT_SUPPORTED check there also.
> 
> Note that space inside parentheses is only required for conditions of
> 'if', 'for' and those kind of statements, here it's not required, so
> this should be:
> 
>    hvm_funcs.pt_supported = cpu_has_ipt &&
>                             (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
> 
> I also think this should look like:
> 
>    if ( !smp_processor_id() )
>    	hvm_funcs.pt_supported = cpu_has_ipt &&
>                                 (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>    else if ( hvm_funcs.pt_supported &&
>              !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>    {
>        printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
>               smp_processor_id());
>        return -EINVAL;
>    }
> 
> 
> So that you can detect mismatches between CPUs.


I'm afraid this snippet doesn't work. All CPUs read hvm_funcs.pt_supported as 0 even when it was set to 1 for CPU=0. I'm not sure if this is some multithreading issue or there is a separate hvm_funcs for each CPU?

ml


> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 23:41 ` [PATCH v2 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
  2020-06-19  0:46   ` Michał Leszczyński
  2020-06-19 15:30   ` Roger Pau Monné
@ 2020-06-22  2:56   ` Michał Leszczyński
  2020-06-22  8:39     ` Jan Beulich
  2020-06-22 13:25   ` Jan Beulich
  3 siblings, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22  2:56 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	Tamas K Lengyel, Kang, Luwei, Roger Pau Monné

----- 19 cze 2020 o 1:41, Michał Leszczyński michal.leszczynski@cert.pl napisał(a):

> Provide an interface for privileged domains to manage
> external IPT monitoring. Guest IPT state will be preserved
> across vmentry/vmexit using ipt_state structure.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
> xen/arch/x86/hvm/hvm.c             | 167 +++++++++++++++++++++++++++++
> xen/arch/x86/hvm/vmx/vmx.c         |  24 +++++
> xen/arch/x86/mm.c                  |  37 +++++++
> xen/common/domain.c                |   1 +
> xen/include/asm-x86/hvm/vmx/vmcs.h |  16 +++
> xen/include/public/hvm/hvm_op.h    |  23 ++++
> xen/include/public/hvm/params.h    |   5 +-
> xen/include/public/memory.h        |   1 +
> xen/include/xen/sched.h            |   3 +
> 9 files changed, 276 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5bb47583b3..145ad053d2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
>     return rc;
> }
> 
> +void hvm_vmtrace_destroy(struct vcpu *v)
> +{
> +    unsigned int i;
> +    struct page_info *pg;
> +    struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
> +    mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;
> +    size_t buf_size = ipt->output_mask.size + 1;
> +
> +    xfree(ipt);
> +    v->arch.hvm.vmx.ipt_state = NULL;
> +
> +    for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
> +    {
> +        pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));
> +        free_domheap_page(pg);
> +    }
> +}
> +
> void hvm_vcpu_destroy(struct vcpu *v)
> {
>     viridian_vcpu_deinit(v);
> @@ -1631,6 +1649,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
>     vlapic_destroy(v);
> 
>     hvm_vcpu_cacheattr_destroy(v);
> +
> +    hvm_vmtrace_destroy(v);
> }
> 
> void hvm_vcpu_down(struct vcpu *v)
> @@ -4066,6 +4086,51 @@ static int hvmop_set_evtchn_upcall_vector(
>     return 0;
> }
> 
> +static int hvm_set_vmtrace_pt_size(struct domain *d, uint64_t value)
> +{
> +    void *buf;
> +    unsigned int buf_order;
> +    struct page_info *pg;
> +    struct ipt_state *ipt;
> +    struct vcpu *v;
> +
> +    if ( value < PAGE_SIZE ||
> +         value > GB(4) ||
> +         ( value & (value - 1) ) ) {
> +        /* we don't accept trace buffer size smaller than single page
> +         * and the upper bound is defined as 4GB in the specification */
> +        return -EINVAL;
> +    }
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        buf_order = get_order_from_bytes(value);
> +        pg = alloc_domheap_pages(d, buf_order, MEMF_no_refcount);
> +
> +        if ( !pg )
> +            return -EFAULT;
> +
> +        buf = page_to_virt(pg);
> +
> +        if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
> +            return -EFAULT;
> +
> +        ipt = xmalloc(struct ipt_state);
> +
> +        if ( !ipt )
> +            return -EFAULT;
> +
> +        ipt->output_base = virt_to_mfn(buf) << PAGE_SHIFT;
> +        ipt->output_mask.raw = value - 1;
> +        ipt->status = 0;
> +        ipt->active = 0;
> +
> +        v->arch.hvm.vmx.ipt_state = ipt;
> +    }
> +
> +    return 0;
> +}
> +
> static int hvm_allow_set_param(struct domain *d,
>                                uint32_t index,
>                                uint64_t new_value)
> @@ -4127,6 +4192,7 @@ static int hvm_allow_set_param(struct domain *d,
>     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>     case HVM_PARAM_ALTP2M:
>     case HVM_PARAM_MCA_CAP:
> +    case HVM_PARAM_VMTRACE_PT_SIZE:
>         if ( value != 0 && new_value != value )
>             rc = -EEXIST;
>         break;
> @@ -4328,6 +4394,9 @@ static int hvm_set_param(struct domain *d, uint32_t index,
> uint64_t value)
>     case HVM_PARAM_MCA_CAP:
>         rc = vmce_enable_mca_cap(d, value);
>         break;
> +    case HVM_PARAM_VMTRACE_PT_SIZE:
> +        rc = hvm_set_vmtrace_pt_size(d, value);
> +        break;
>     }
> 
>     if ( !rc )
> @@ -4949,6 +5018,100 @@ static int compat_altp2m_op(
>     return rc;
> }
> 
> +static int do_vmtrace_op(XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +    struct xen_hvm_vmtrace_op a;
> +    struct domain *d;
> +    int rc;
> +    struct vcpu *v;
> +    struct ipt_state *ipt;
> +
> +    if ( !hvm_pt_supported() )
> +        return -EOPNOTSUPP;
> +
> +    if ( copy_from_guest(&a, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
> +        return -EINVAL;
> +
> +    d = rcu_lock_domain_by_any_id(a.domain);
> +    spin_lock(&d->vmtrace_lock);
> +
> +    if ( d == NULL )
> +        return -ESRCH;
> +
> +    if ( !is_hvm_domain(d) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    domain_pause(d);
> +
> +    if ( a.vcpu >= d->max_vcpus )
> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    v = d->vcpu[a.vcpu];
> +    ipt = v->arch.hvm.vmx.ipt_state;
> +
> +    if ( !ipt )
> +    {
> +        /*
> +	 * PT must be first initialized upon domain creation.
> +	 */
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    switch ( a.cmd )
> +    {
> +    case HVMOP_vmtrace_ipt_enable:
> +        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
> +                               RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> +                               RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
> +        {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +
> +        ipt->active = 1;
> +        break;
> +    case HVMOP_vmtrace_ipt_disable:
> +        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL, 0) )
> +        {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +
> +        ipt->active = 0;
> +        break;
> +    case HVMOP_vmtrace_ipt_get_offset:
> +        a.offset = ipt->output_mask.offset;
> +        break;
> +    default:
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    rc = -EFAULT;
> +    if ( __copy_to_guest(arg, &a, 1) )
> +      goto out;
> +    rc = 0;
> +
> + out:
> +    domain_unpause(d);
> +    spin_unlock(&d->vmtrace_lock);
> +    rcu_unlock_domain(d);
> +
> +    return rc;
> +}
> +
> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
> +
> static int hvmop_get_mem_type(
>     XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
> {
> @@ -5101,6 +5264,10 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
>         rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
>         break;
> 
> +    case HVMOP_vmtrace:
> +        rc = do_vmtrace_op(arg);
> +        break;
> +
>     default:
>     {
>         gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ab19d9424e..51f0046483 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -508,11 +508,25 @@ static void vmx_restore_host_msrs(void)
> 
> static void vmx_save_guest_msrs(struct vcpu *v)
> {
> +    uint64_t rtit_ctl;
> +
>     /*
>      * We cannot cache SHADOW_GS_BASE while the VCPU runs, as it can
>      * be updated at any time via SWAPGS, which we cannot trap.
>      */
>     v->arch.hvm.vmx.shadow_gs = rdgsshadow();
> +
> +    if ( unlikely(v->arch.hvm.vmx.ipt_state &&
> v->arch.hvm.vmx.ipt_state->active) )
> +    {
> +        smp_rmb();
> +        rdmsrl(MSR_RTIT_CTL, rtit_ctl);
> +
> +        if ( rtit_ctl & RTIT_CTL_TRACEEN )
> +            BUG();
> +
> +        rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
> +        rdmsrl(MSR_RTIT_OUTPUT_MASK,
> v->arch.hvm.vmx.ipt_state->output_mask.raw);
> +    }
> }


It was suggested to move this piece of code from vm-entry/vm-exit handling to
vmx_save_guest_msrs/vmx_restore_guest_msrs functions.

Please note that we do need to periodically read MSR_RTIT_OUTPUT_MASK in order
to know how much data was written into the buffer by the processor. During tests,
I've spotted that in some cases vCPUs get out of scope very rarely.

For instance: when IPT gets enabled, xc_vmtrace_ipt_get_offset() is returning zero
value for the first few seconds, because it's relying on the value of
v->arch.hvm.vmx.ipt_state->output_mask which we only read in vmx_save_guest_msrs()
and in some cases this occurs very rarely.

Could you guys suggest some solution to the problem? If we would leave this value
dirty in hardware, AFAIK we have no possibility to directly access it,
but observing this particular value is very important in the runtime.


Best regards,
Michał Leszczyński
CERT Polska


> 
> static void vmx_restore_guest_msrs(struct vcpu *v)
> @@ -524,6 +538,16 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
> 
>     if ( cpu_has_msr_tsc_aux )
>         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
> +
> +    if ( unlikely(v->arch.hvm.vmx.ipt_state &&
> v->arch.hvm.vmx.ipt_state->active) )
> +    {
> +        wrmsrl(MSR_RTIT_OUTPUT_BASE,
> +            v->arch.hvm.vmx.ipt_state->output_base);
> +        wrmsrl(MSR_RTIT_OUTPUT_MASK,
> +            v->arch.hvm.vmx.ipt_state->output_mask.raw);
> +        wrmsrl(MSR_RTIT_STATUS,
> +            v->arch.hvm.vmx.ipt_state->status);
> +    }
> }
> 
> void vmx_update_cpu_exec_control(struct vcpu *v)
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index e376fc7e8f..e4658dc27f 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4624,6 +4624,43 @@ int arch_acquire_resource(struct domain *d, unsigned int
> type,
>         }
>         break;
>     }
> +
> +    case XENMEM_resource_vmtrace_buf:
> +    {
> +        mfn_t mfn;
> +        unsigned int i;
> +        struct ipt_state *ipt;
> +
> +        if ( id >= d->max_vcpus )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        ipt = d->vcpu[id]->arch.hvm.vmx.ipt_state;
> +
> +        if ( !ipt )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        mfn = mfn_x(ipt->output_base >> PAGE_SHIFT);
> +
> +        if (nr_frames > ( ( ipt->output_mask.size + 1 ) >> PAGE_SHIFT ))
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = 0;
> +        for ( i = 0; i < nr_frames; i++ )
> +        {
> +            mfn_list[i] = mfn_add(mfn, i);
> +        }
> +
> +        break;
> +    }
> #endif
> 
>     default:
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 7cc9526139..6f6458cd5b 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -414,6 +414,7 @@ struct domain *domain_create(domid_t domid,
>     d->shutdown_code = SHUTDOWN_CODE_INVALID;
> 
>     spin_lock_init(&d->pbuf_lock);
> +    spin_lock_init(&d->vmtrace_lock);
> 
>     rwlock_init(&d->vnuma_rwlock);
> 
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 4c81093aba..9fc4653777 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -104,6 +104,19 @@ struct pi_blocking_vcpu {
>     spinlock_t           *lock;
> };
> 
> +struct ipt_state {
> +    uint64_t active;
> +    uint64_t status;
> +    uint64_t output_base;
> +    union {
> +        uint64_t raw;
> +        struct {
> +            uint32_t size;
> +            uint32_t offset;
> +        };
> +    } output_mask;
> +};
> +
> struct vmx_vcpu {
>     /* Physical address of VMCS. */
>     paddr_t              vmcs_pa;
> @@ -186,6 +199,9 @@ struct vmx_vcpu {
>      * pCPU and wakeup the related vCPU.
>      */
>     struct pi_blocking_vcpu pi_blocking;
> +
> +    /* State of Intel Processor Trace feature */
> +    struct ipt_state     *ipt_state;
> };
> 
> int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index 870ec52060..8cd0b6ea49 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -382,6 +382,29 @@ struct xen_hvm_altp2m_op {
> typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
> DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
> 
> +/* HVMOP_vmtrace: Perform VM tracing related operation */
> +#define HVMOP_vmtrace 26
> +
> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
> +
> +struct xen_hvm_vmtrace_op {
> +    /* IN variable */
> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> +    uint32_t cmd;
> +/* Enable/disable external vmtrace for given domain */
> +#define HVMOP_vmtrace_ipt_enable      1
> +#define HVMOP_vmtrace_ipt_disable     2
> +#define HVMOP_vmtrace_ipt_get_offset  3
> +    domid_t domain;
> +    uint32_t vcpu;
> +    uint64_t size;
> +
> +    /* OUT variable */
> +    uint64_t offset;
> +};
> +typedef struct xen_hvm_vmtrace_op xen_hvm_vmtrace_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_vmtrace_op_t);
> +
> #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
> 
> /*
> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
> index 0a91bfa749..adbc7e5d08 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -300,6 +300,9 @@
> #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
> #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
> 
> -#define HVM_NR_PARAMS 39
> +/* VM trace capabilities */
> +#define HVM_PARAM_VMTRACE_PT_SIZE 39
> +
> +#define HVM_NR_PARAMS 40
> 
> #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index dbd35305df..f823c784c3 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -620,6 +620,7 @@ struct xen_mem_acquire_resource {
> 
> #define XENMEM_resource_ioreq_server 0
> #define XENMEM_resource_grant_table 1
> +#define XENMEM_resource_vmtrace_buf 2
> 
>     /*
>      * IN - a type-specific resource identifier, which must be zero
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index ac53519d7f..b3a36f3788 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -457,6 +457,9 @@ struct domain
>     unsigned    pbuf_idx;
>     spinlock_t  pbuf_lock;
> 
> +    /* Used by vmtrace domctls */
> +    spinlock_t  vmtrace_lock;
> +
>     /* OProfile support. */
>     struct xenoprof *xenoprof;
> 
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays
  2020-06-19 12:39       ` Jan Beulich
@ 2020-06-22  3:00         ` Michał Leszczyński
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22  3:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Kang, Luwei, Tamas K Lengyel,
	Xen-devel, Roger Pau Monné

----- 19 cze 2020 o 14:39, Jan Beulich jbeulich@suse.com napisał(a):

> On 19.06.2020 14:35, Michał Leszczyński wrote:
>> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> 
>>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>>> Replace on-stack array allocation with heap allocation
>>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>>> when calling acquire_resource.
>>>
>>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>>> items to be processed like this, it's likely that you manage to
>>> trigger the watchdog if the list is long enough, specially when doing
>>> set_foreign_p2m_entry.
>>>
>>> You need to process the items in batches (32 was IMO a good start), and
>>> then add support for hypercall continuations. Take a look at how
>>> XENMEM_populate_physmap just a couple of lines below makes use of
>>> hypercall_create_continuation.
>>>
>>> After processing every batch you need to check if
>>> hypercall_preempt_check returns true and if so use
>>> hypercall_create_continuation in order to encode a continuation.
>> 
>> One more question. Are these continuations transparent from the caller side,
>> or do I also need to add something on the invoker side to properly handle these
>> continuations?
> 
> They are (mostly) transparent to the guest, yes. "Mostly" because we
> have cases (iirc) where the continuation data is stored in a way that
> a guest could observe it. But it still wouldn't need to do anything
> in order for the hypercall to get continued until it completes (which
> may be "fails", faod).
> 
> Jan


Okay, I've managed to implement continuations while still having these array small.
The operation could simply process max. 32 elements at the time and creates continuation
until everything gets processed.

This will be in patch v3.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature
  2020-06-22  2:49     ` Michał Leszczyński
@ 2020-06-22  8:31       ` Jan Beulich
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2020-06-22  8:31 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Kang, Luwei, Wei Liu, Andrew Cooper, Jun Nakajima,
	Tamas K Lengyel, Xen-devel, Roger Pau Monné

On 22.06.2020 04:49, Michał Leszczyński wrote:
> ----- 19 cze 2020 o 15:44, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
>> On Fri, Jun 19, 2020 at 01:40:21AM +0200, Michał Leszczyński wrote:
>>> Check if Intel Processor Trace feature is supported by current
>>> processor. Define hvm_ipt_supported function.
>>>
>>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
>>> ---
>>
>> We usually keep a shirt list of the changes between versions, so it's
>> easier for the reviewers to know what changed. As an example:
>>
>> https://lore.kernel.org/xen-devel/20200613184132.11880-1-julien@xen.org/
>>
>>>  xen/arch/x86/hvm/vmx/vmcs.c                 | 4 ++++
>>>  xen/include/asm-x86/cpufeature.h            | 1 +
>>>  xen/include/asm-x86/hvm/hvm.h               | 9 +++++++++
>>>  xen/include/asm-x86/hvm/vmx/vmcs.h          | 1 +
>>>  xen/include/public/arch-x86/cpufeatureset.h | 1 +
>>>  5 files changed, 16 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index ca94c2bedc..8466ccb912 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
>>>          if ( opt_ept_pml )
>>>              opt |= SECONDARY_EXEC_ENABLE_PML;
>>>  
>>> +        /* Check whether IPT is supported in VMX operation */
>>> +        hvm_funcs.pt_supported = cpu_has_ipt &&
>>> +            ( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );
>>
>> By the placement of this chunk you are tying IPT support to the
>> secondary exec availability, but I don't think that's required?
>>
>> Ie: You should move the read of misc_cap to the top-level of the
>> function and perform the VMX_MISC_PT_SUPPORTED check there also.
>>
>> Note that space inside parentheses is only required for conditions of
>> 'if', 'for' and those kind of statements, here it's not required, so
>> this should be:
>>
>>    hvm_funcs.pt_supported = cpu_has_ipt &&
>>                             (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>>
>> I also think this should look like:
>>
>>    if ( !smp_processor_id() )
>>    	hvm_funcs.pt_supported = cpu_has_ipt &&
>>                                 (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>>    else if ( hvm_funcs.pt_supported &&
>>              !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>>    {
>>        printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
>>               smp_processor_id());
>>        return -EINVAL;
>>    }
>>
>>
>> So that you can detect mismatches between CPUs.
> 
> 
> I'm afraid this snippet doesn't work. All CPUs read hvm_funcs.pt_supported as 0 even when it was set to 1 for CPU=0. I'm not sure if this is some multithreading issue or there is a separate hvm_funcs for each CPU?

hvm_funcs will be set from start_vmx()'s return value, i.e. vmx_function_table.
Therefore at least prior to start_vmx() completing you need to fiddle with
vmx_function_table, not hvm_funcs. And in the code here, where for APs you
only read the value, you could easily use the former uniformly, I think.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22  2:56   ` Michał Leszczyński
@ 2020-06-22  8:39     ` Jan Beulich
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2020-06-22  8:39 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

On 22.06.2020 04:56, Michał Leszczyński wrote:
> ----- 19 cze 2020 o 1:41, Michał Leszczyński michal.leszczynski@cert.pl napisał(a):
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -508,11 +508,25 @@ static void vmx_restore_host_msrs(void)
>>
>> static void vmx_save_guest_msrs(struct vcpu *v)
>> {
>> +    uint64_t rtit_ctl;
>> +
>>     /*
>>      * We cannot cache SHADOW_GS_BASE while the VCPU runs, as it can
>>      * be updated at any time via SWAPGS, which we cannot trap.
>>      */
>>     v->arch.hvm.vmx.shadow_gs = rdgsshadow();
>> +
>> +    if ( unlikely(v->arch.hvm.vmx.ipt_state &&
>> v->arch.hvm.vmx.ipt_state->active) )
>> +    {
>> +        smp_rmb();
>> +        rdmsrl(MSR_RTIT_CTL, rtit_ctl);
>> +
>> +        if ( rtit_ctl & RTIT_CTL_TRACEEN )
>> +            BUG();
>> +
>> +        rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
>> +        rdmsrl(MSR_RTIT_OUTPUT_MASK,
>> v->arch.hvm.vmx.ipt_state->output_mask.raw);
>> +    }
>> }
> 
> 
> It was suggested to move this piece of code from vm-entry/vm-exit handling to
> vmx_save_guest_msrs/vmx_restore_guest_msrs functions.
> 
> Please note that we do need to periodically read MSR_RTIT_OUTPUT_MASK in order
> to know how much data was written into the buffer by the processor. During tests,
> I've spotted that in some cases vCPUs get out of scope very rarely.
> 
> For instance: when IPT gets enabled, xc_vmtrace_ipt_get_offset() is returning zero
> value for the first few seconds, because it's relying on the value of
> v->arch.hvm.vmx.ipt_state->output_mask which we only read in vmx_save_guest_msrs()
> and in some cases this occurs very rarely.
> 
> Could you guys suggest some solution to the problem? If we would leave this value
> dirty in hardware, AFAIK we have no possibility to directly access it,
> but observing this particular value is very important in the runtime.

Much depends on what state the vCPU in question is in when you need
to "periodically read" the value. If it's paused, you may need to
invoke sync_vcpu_execstate(). If it's actively running you can't
get at the value anyway except when you're on the CPU that this vCPU
is running on (and then you can RDMSR it quite easily). Any
intermediate state between paused and running is prone to change
immediately after you've established it anyway, so you need to get
the vCPU into one of the two "stable" states in order to get at
the register.

Also (and I think I said so before) - please trim your replies.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/7] x86/vmx: add Intel PT MSR definitions
  2020-06-18 23:39 ` [PATCH v2 2/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
@ 2020-06-22 12:35   ` Jan Beulich
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2020-06-22 12:35 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kang, Luwei, Wei Liu, Tamas K Lengyel, Andrew Cooper, Xen-devel,
	Roger Pau Monné

On 19.06.2020 01:39, Michał Leszczyński wrote:
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -621,4 +621,41 @@
>  #define MSR_PKGC9_IRTL			0x00000634
>  #define MSR_PKGC10_IRTL			0x00000635
>  
> +/* Intel PT MSRs */
> +#define MSR_RTIT_OUTPUT_BASE           0x00000560
> +#define MSR_RTIT_OUTPUT_MASK           0x00000561
> +#define MSR_RTIT_CTL                   0x00000570
> +#define RTIT_CTL_TRACEEN               (_AC(1, ULL) << 0)
> +#define RTIT_CTL_CYCEN                 (_AC(1, ULL) << 1)
> +#define RTIT_CTL_OS                    (_AC(1, ULL) << 2)
> +#define RTIT_CTL_USR                   (_AC(1, ULL) << 3)
> +#define RTIT_CTL_PWR_EVT_EN            (_AC(1, ULL) << 4)
> +#define RTIT_CTL_FUP_ON_PTW            (_AC(1, ULL) << 5)
> +#define RTIT_CTL_FABRIC_EN             (_AC(1, ULL) << 6)
> +#define RTIT_CTL_CR3_FILTER            (_AC(1, ULL) << 7)
> +#define RTIT_CTL_TOPA                  (_AC(1, ULL) << 8)
> +#define RTIT_CTL_MTC_EN                (_AC(1, ULL) << 9)
> +#define RTIT_CTL_TSC_EN                (_AC(1, ULL) << 10)
> +#define RTIT_CTL_DIS_RETC              (_AC(1, ULL) << 11)
> +#define RTIT_CTL_PTW_EN                (_AC(1, ULL) << 12)
> +#define RTIT_CTL_BRANCH_EN             (_AC(1, ULL) << 13)
> +#define RTIT_CTL_MTC_FREQ_OFFSET       14
> +#define RTIT_CTL_MTC_FREQ              (0x0fULL << RTIT_CTL_MTC_FREQ_OFFSET)

This was a fair step in the right direction, but you've missed
some instances (like here) wanting to use _AC(), and you've
also not moved the whole block up above the legacy line. As
Andrew's "x86/msr: Disallow access to Processor Trace MSRs" is
likely to go in before 4.14 in some form, you'll want to
re-base over it eventually anyway. You may want to take a look
there right away, to see where in the header to place your
addition.

If you look further up in the file you'll also notice how we
try to visually separate MSR numbers from bit-within-MSR
definitions.

Finally I'd also like to ask that you omit all RTIT_CTL_*_OFFSET
values. Only the _OFFSET-less #define-s should really be needed
- see MASK_EXTR() and MASK_INSR() in case right now you're using
these for some open-coded shifting ...

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature
  2020-06-18 23:40 ` [PATCH v2 3/7] x86/vmx: add IPT cpu feature Michał Leszczyński
  2020-06-19 13:44   ` Roger Pau Monné
@ 2020-06-22 12:40   ` Jan Beulich
  1 sibling, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2020-06-22 12:40 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Kang, Luwei, Wei Liu, Andrew Cooper, Jun Nakajima,
	Tamas K Lengyel, Xen-devel, Roger Pau Monné

On 19.06.2020 01:40, Michał Leszczyński wrote:
> @@ -630,6 +633,12 @@ static inline bool hvm_altp2m_supported(void)
>      return hvm_funcs.altp2m_supported;
>  }
>  
> +/* returns true if hardware supports Intel Processor Trace */
> +static inline bool hvm_pt_supported(void)
> +{
> +    return hvm_funcs.pt_supported;
> +}

This is vendor-independent code, hence I'd like to see "Intel"
dropped from the comment.

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -285,6 +285,7 @@ extern u64 vmx_ept_vpid_cap;
>  
>  #define VMX_MISC_CR3_TARGET                     0x01ff0000
>  #define VMX_MISC_VMWRITE_ALL                    0x20000000
> +#define VMX_MISC_PT_SUPPORTED                   0x00004000

Move this up by two lines, so the values continue to be numerically
sorted?

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-18 23:41 ` [PATCH v2 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
                     ` (2 preceding siblings ...)
  2020-06-22  2:56   ` Michał Leszczyński
@ 2020-06-22 13:25   ` Jan Beulich
  2020-06-22 14:35     ` Michał Leszczyński
  3 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2020-06-22 13:25 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

On 19.06.2020 01:41, Michał Leszczyński wrote:
> @@ -1631,6 +1649,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
>      vlapic_destroy(v);
>  
>      hvm_vcpu_cacheattr_destroy(v);
> +
> +    hvm_vmtrace_destroy(v);
>  }

Whenever possible resource cleanup should occur from
hvm_domain_relinquish_resources(). Is there anything preventing
this here?

> @@ -4066,6 +4086,51 @@ static int hvmop_set_evtchn_upcall_vector(
>      return 0;
>  }
>  
> +static int hvm_set_vmtrace_pt_size(struct domain *d, uint64_t value)
> +{
> +    void *buf;
> +    unsigned int buf_order;
> +    struct page_info *pg;
> +    struct ipt_state *ipt;
> +    struct vcpu *v;
> +
> +    if ( value < PAGE_SIZE ||
> +         value > GB(4) ||
> +         ( value & (value - 1) ) ) {
> +        /* we don't accept trace buffer size smaller than single page
> +         * and the upper bound is defined as 4GB in the specification */
> +        return -EINVAL;
> +    }
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        buf_order = get_order_from_bytes(value);
> +        pg = alloc_domheap_pages(d, buf_order, MEMF_no_refcount);
> +
> +        if ( !pg )
> +            return -EFAULT;
> +
> +        buf = page_to_virt(pg);

In addition to what Roger has said here, just fyi that you can't
use page_to_virt() on anything returned from alloc_domheap_pages(),
unless you suitably restrict the address range such that the
result is guaranteed to have a direct mapping.

> @@ -4949,6 +5018,100 @@ static int compat_altp2m_op(
>      return rc;
>  }
>  
> +static int do_vmtrace_op(XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +    struct xen_hvm_vmtrace_op a;
> +    struct domain *d;
> +    int rc;
> +    struct vcpu *v;
> +    struct ipt_state *ipt;
> +
> +    if ( !hvm_pt_supported() )
> +        return -EOPNOTSUPP;
> +
> +    if ( copy_from_guest(&a, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
> +        return -EINVAL;
> +
> +    d = rcu_lock_domain_by_any_id(a.domain);
> +    spin_lock(&d->vmtrace_lock);
> +
> +    if ( d == NULL )
> +        return -ESRCH;
> +
> +    if ( !is_hvm_domain(d) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    domain_pause(d);

Who's the intended caller of this interface? You making it a hvm-op
suggests the guest may itself call this. But of course a guest
can't pause itself. If this is supposed to be a tools-only interface,
then you should frame it suitably in the public header and of course
you need to enforce this here (which would e.g. mean you shouldn't
use rcu_lock_domain_by_any_id()).

Also please take a look at hvm/ioreq.c, which makes quite a bit of
use of domain_pause(). In particular I think you want to acquire
the lock only after having paused the domain.

> +    if ( a.vcpu >= d->max_vcpus )
> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    v = d->vcpu[a.vcpu];
> +    ipt = v->arch.hvm.vmx.ipt_state;
> +
> +    if ( !ipt )
> +    {
> +        /*
> +	 * PT must be first initialized upon domain creation.
> +	 */
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    switch ( a.cmd )
> +    {
> +    case HVMOP_vmtrace_ipt_enable:
> +        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
> +                               RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> +                               RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
> +        {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +
> +        ipt->active = 1;
> +        break;
> +    case HVMOP_vmtrace_ipt_disable:
> +        if ( vmx_add_guest_msr(v, MSR_RTIT_CTL, 0) )

Shouldn't you rather remove the MSR from the load list here?

Is any of what you do in this switch() actually legitimate without
hvm_set_vmtrace_pt_size() having got called for the guest? From
remarks elsewhere I imply you expect the param that you currently
use to be set upon domain creation time, but at the very least the
potentially big buffer should imo not get allocated up front, but
only when tracing is to actually be enabled.

Also please have blank lines between individual case blocks.

> +        {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +
> +        ipt->active = 0;
> +        break;
> +    case HVMOP_vmtrace_ipt_get_offset:
> +        a.offset = ipt->output_mask.offset;
> +        break;
> +    default:
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    rc = -EFAULT;
> +    if ( __copy_to_guest(arg, &a, 1) )
> +      goto out;

Only HVMOP_vmtrace_ipt_get_offset needs this - please avoid copying
back on other paths. Even there you could in principle copy back
just the one field; the function taking XEN_GUEST_HANDLE_PARAM(void)
gets in the way of this, though.

The last line above also has an indentation issue, but the use of
"goto" in this case can be avoided anyway.

> +    rc = 0;
> +
> + out:
> +    domain_unpause(d);
> +    spin_unlock(&d->vmtrace_lock);
> +    rcu_unlock_domain(d);
> +
> +    return rc;
> +}
> +
> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);

I don't see any use of this handle further down - why do you force
it being declared?

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4624,6 +4624,43 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>          }
>          break;
>      }
> +
> +    case XENMEM_resource_vmtrace_buf:
> +    {
> +        mfn_t mfn;
> +        unsigned int i;
> +        struct ipt_state *ipt;
> +
> +        if ( id >= d->max_vcpus )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        ipt = d->vcpu[id]->arch.hvm.vmx.ipt_state;

Please use domain_vcpu() here.

> +        if ( !ipt )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        mfn = mfn_x(ipt->output_base >> PAGE_SHIFT);

maddr_to_mfn()?

> +        if (nr_frames > ( ( ipt->output_mask.size + 1 ) >> PAGE_SHIFT ))
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        rc = 0;
> +        for ( i = 0; i < nr_frames; i++ )
> +        {
> +            mfn_list[i] = mfn_add(mfn, i);
> +        }

Please omit the braces in cases like this one.

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -104,6 +104,19 @@ struct pi_blocking_vcpu {
>      spinlock_t           *lock;
>  };
>  
> +struct ipt_state {
> +    uint64_t active;
> +    uint64_t status;
> +    uint64_t output_base;
> +    union {
> +        uint64_t raw;
> +        struct {
> +            uint32_t size;
> +            uint32_t offset;
> +        };
> +    } output_mask;
> +};

While referenced ...

>  struct vmx_vcpu {
>      /* Physical address of VMCS. */
>      paddr_t              vmcs_pa;
> @@ -186,6 +199,9 @@ struct vmx_vcpu {
>       * pCPU and wakeup the related vCPU.
>       */
>      struct pi_blocking_vcpu pi_blocking;
> +
> +    /* State of Intel Processor Trace feature */
> +    struct ipt_state     *ipt_state;

... here, the struct declaration itself doesn't really belong in this
header, as not being VMCS-related. The better place likely is vmx.h.

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -382,6 +382,29 @@ struct xen_hvm_altp2m_op {
>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>  
> +/* HVMOP_vmtrace: Perform VM tracing related operation */
> +#define HVMOP_vmtrace 26
> +
> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001

I'm unconvinced we want to introduce yet another versioned interface.
In any event, as hinted at earlier, this suggests it wants to be a
tools-only interface instead (which, at least for the time being, is
not required to be a stable interface then, but that's also something
we apparently want to move away from, and hence you may better not
try to rely on it not needing to be stable).

> +struct xen_hvm_vmtrace_op {
> +    /* IN variable */
> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> +    uint32_t cmd;
> +/* Enable/disable external vmtrace for given domain */
> +#define HVMOP_vmtrace_ipt_enable      1
> +#define HVMOP_vmtrace_ipt_disable     2
> +#define HVMOP_vmtrace_ipt_get_offset  3
> +    domid_t domain;
> +    uint32_t vcpu;
> +    uint64_t size;
> +
> +    /* OUT variable */
> +    uint64_t offset;

If this is to be a tools-only interface, please use uint64_aligned_t.

You also want to add an entry to xen/include/xlat.lst and use the
resulting macro to prove that the struct layout is the same for
native and compat callers.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -457,6 +457,9 @@ struct domain
>      unsigned    pbuf_idx;
>      spinlock_t  pbuf_lock;
>  
> +    /* Used by vmtrace domctls */
> +    spinlock_t  vmtrace_lock;

There's no domctl invovled here anymore, I think.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 13:25   ` Jan Beulich
@ 2020-06-22 14:35     ` Michał Leszczyński
  2020-06-22 15:22       ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22 14:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):

> On 19.06.2020 01:41, Michał Leszczyński wrote:
>> +
>> +    domain_pause(d);
> 
> Who's the intended caller of this interface? You making it a hvm-op
> suggests the guest may itself call this. But of course a guest
> can't pause itself. If this is supposed to be a tools-only interface,
> then you should frame it suitably in the public header and of course
> you need to enforce this here (which would e.g. mean you shouldn't
> use rcu_lock_domain_by_any_id()).
> 

What should I use instead of rcu_lock_domain_by_and_id()?

> Also please take a look at hvm/ioreq.c, which makes quite a bit of
> use of domain_pause(). In particular I think you want to acquire
> the lock only after having paused the domain.
> 

This domain_pause() will be changed to vcpu_pause().

> Shouldn't you rather remove the MSR from the load list here?
> 

This will be fixed.

> Is any of what you do in this switch() actually legitimate without
> hvm_set_vmtrace_pt_size() having got called for the guest? From
> remarks elsewhere I imply you expect the param that you currently
> use to be set upon domain creation time, but at the very least the
> potentially big buffer should imo not get allocated up front, but
> only when tracing is to actually be enabled.

Wait... so you want to allocate these buffers in runtime?
Previously we were talking that there is too much runtime logic
and these enable/disable hypercalls should be stripped to absolute
minimum.


>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -382,6 +382,29 @@ struct xen_hvm_altp2m_op {
>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>>  
>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
>> +#define HVMOP_vmtrace 26
>> +
>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
> 
> I'm unconvinced we want to introduce yet another versioned interface.
> In any event, as hinted at earlier, this suggests it wants to be a
> tools-only interface instead (which, at least for the time being, is
> not required to be a stable interface then, but that's also something
> we apparently want to move away from, and hence you may better not
> try to rely on it not needing to be stable).

Ok. I will remove the interface version.

> 
>> +struct xen_hvm_vmtrace_op {
>> +    /* IN variable */
>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>> +    uint32_t cmd;
>> +/* Enable/disable external vmtrace for given domain */
>> +#define HVMOP_vmtrace_ipt_enable      1
>> +#define HVMOP_vmtrace_ipt_disable     2
>> +#define HVMOP_vmtrace_ipt_get_offset  3
>> +    domid_t domain;
>> +    uint32_t vcpu;
>> +    uint64_t size;
>> +
>> +    /* OUT variable */
>> +    uint64_t offset;
> 
> If this is to be a tools-only interface, please use uint64_aligned_t.
> 

This type is not defined within hvm_op.h header. What should I do about it?

> You also want to add an entry to xen/include/xlat.lst and use the
> resulting macro to prove that the struct layout is the same for
> native and compat callers.

Could you tell a little bit more about this? What are "native" and
"compat" callers and what is the purpose of this file?


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 14:35     ` Michał Leszczyński
@ 2020-06-22 15:22       ` Jan Beulich
  2020-06-22 16:02         ` Michał Leszczyński
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2020-06-22 15:22 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

On 22.06.2020 16:35, Michał Leszczyński wrote:
> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>> On 19.06.2020 01:41, Michał Leszczyński wrote:
>>> +
>>> +    domain_pause(d);
>>
>> Who's the intended caller of this interface? You making it a hvm-op
>> suggests the guest may itself call this. But of course a guest
>> can't pause itself. If this is supposed to be a tools-only interface,
>> then you should frame it suitably in the public header and of course
>> you need to enforce this here (which would e.g. mean you shouldn't
>> use rcu_lock_domain_by_any_id()).
>>
> 
> What should I use instead of rcu_lock_domain_by_and_id()?

Please take a look at the header where its declaration lives. It's
admittedly not the usual thing in Xen, but there are even comments
describing the differences between the four related by-id functions.
I guess rcu_lock_live_remote_domain_by_id() is the one you want to
use, despite being puzzled by there being surprisingly little uses
elsewhere.

>> Also please take a look at hvm/ioreq.c, which makes quite a bit of
>> use of domain_pause(). In particular I think you want to acquire
>> the lock only after having paused the domain.
> 
> This domain_pause() will be changed to vcpu_pause().

And you understand that my comment then still applies?

>> Shouldn't you rather remove the MSR from the load list here?
> 
> This will be fixed.

Thanks for trimming your reply, but I think you've gone too far:
Context should still be such that one can see what the comments
are about without having to go back to the original mail. Please
try to find some middle ground.

>> Is any of what you do in this switch() actually legitimate without
>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>> remarks elsewhere I imply you expect the param that you currently
>> use to be set upon domain creation time, but at the very least the
>> potentially big buffer should imo not get allocated up front, but
>> only when tracing is to actually be enabled.
> 
> Wait... so you want to allocate these buffers in runtime?
> Previously we were talking that there is too much runtime logic
> and these enable/disable hypercalls should be stripped to absolute
> minimum.

Basic arrangements can be made at domain creation time. I don't
think though that it would be a good use of memory if you
allocated perhaps many gigabytes of memory just for possibly
wanting to enable tracing on a guest. 

>>> +struct xen_hvm_vmtrace_op {
>>> +    /* IN variable */
>>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>>> +    uint32_t cmd;
>>> +/* Enable/disable external vmtrace for given domain */
>>> +#define HVMOP_vmtrace_ipt_enable      1
>>> +#define HVMOP_vmtrace_ipt_disable     2
>>> +#define HVMOP_vmtrace_ipt_get_offset  3
>>> +    domid_t domain;
>>> +    uint32_t vcpu;
>>> +    uint64_t size;
>>> +
>>> +    /* OUT variable */
>>> +    uint64_t offset;
>>
>> If this is to be a tools-only interface, please use uint64_aligned_t.
>>
> 
> This type is not defined within hvm_op.h header. What should I do about it?

It gets defined by xen.h, so should be available here. Its
definitions live in a

#if defined(__XEN__) || defined(__XEN_TOOLS__)

section, which is what I did recommend to put your interface in
as well. Unless you want this to be exposed to the guest itself,
at which point further constraints would arise.

>> You also want to add an entry to xen/include/xlat.lst and use the
>> resulting macro to prove that the struct layout is the same for
>> native and compat callers.
> 
> Could you tell a little bit more about this? What are "native" and
> "compat" callers and what is the purpose of this file?

A native caller is one whose bitness matches Xen's, i.e. for x86
a guest running in 64-bit mode. A compat guest is one running in
32-bit mode. Interface structure layout, at least for historical
reasons, can differ between the two. If it does, these
structures need translation. In the case here the layouts look
to match, which we still want to be verified at build time. If
you add a suitable line to xlat.lst starting with a ?, a macro
will be generated that you can then invoke somewhere near the
code that actually handles this sub-hypercall. See e.g. the top
of xen/common/hypfs.c for a relatively recent addition of such.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 15:22       ` Jan Beulich
@ 2020-06-22 16:02         ` Michał Leszczyński
  2020-06-22 16:16           ` Jan Beulich
  2020-06-22 17:05           ` Michał Leszczyński
  0 siblings, 2 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22 16:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):

> On 22.06.2020 16:35, Michał Leszczyński wrote:
>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>> On 19.06.2020 01:41, Michał Leszczyński wrote:
>>>> +
>>>> +    domain_pause(d);
>>>
>>> Who's the intended caller of this interface? You making it a hvm-op
>>> suggests the guest may itself call this. But of course a guest
>>> can't pause itself. If this is supposed to be a tools-only interface,
>>> then you should frame it suitably in the public header and of course
>>> you need to enforce this here (which would e.g. mean you shouldn't
>>> use rcu_lock_domain_by_any_id()).
>>>
>> 
>> What should I use instead of rcu_lock_domain_by_and_id()?
> 
> Please take a look at the header where its declaration lives. It's
> admittedly not the usual thing in Xen, but there are even comments
> describing the differences between the four related by-id functions.
> I guess rcu_lock_live_remote_domain_by_id() is the one you want to
> use, despite being puzzled by there being surprisingly little uses
> elsewhere.
> 

Ok, I will correct this.

>>> Also please take a look at hvm/ioreq.c, which makes quite a bit of
>>> use of domain_pause(). In particular I think you want to acquire
>>> the lock only after having paused the domain.
>> 
>> This domain_pause() will be changed to vcpu_pause().
> 
> And you understand that my comment then still applies?

If you mean that we should first call vcpu_pause()
and then acquire spinlock, then yes, this will be corrected in v3.

>>> Is any of what you do in this switch() actually legitimate without
>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>> remarks elsewhere I imply you expect the param that you currently
>>> use to be set upon domain creation time, but at the very least the
>>> potentially big buffer should imo not get allocated up front, but
>>> only when tracing is to actually be enabled.
>> 
>> Wait... so you want to allocate these buffers in runtime?
>> Previously we were talking that there is too much runtime logic
>> and these enable/disable hypercalls should be stripped to absolute
>> minimum.
> 
> Basic arrangements can be made at domain creation time. I don't
> think though that it would be a good use of memory if you
> allocated perhaps many gigabytes of memory just for possibly
> wanting to enable tracing on a guest.
> 

From our previous conversations I thought that you want to have
as much logic moved to the domain creation as possible.

Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
zero (= disabled), if you set it to a non-zero value, then trace buffers
of given size will be allocated for the domain and you have possibility
to use ipt_enable/ipt_disable at any moment.

This way the runtime logic is as thin as possible. I assume user knows
in advance whether he/she would want to use external monitoring with IPT
or not.

It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
would suffice, I think.

If we want to fall back to the scenario where the trace buffer is
allocated dynamically, then we basically get back to patch v1
implementation.

>>>> +struct xen_hvm_vmtrace_op {
>>>> +    /* IN variable */
>>>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>>>> +    uint32_t cmd;
>>>> +/* Enable/disable external vmtrace for given domain */
>>>> +#define HVMOP_vmtrace_ipt_enable      1
>>>> +#define HVMOP_vmtrace_ipt_disable     2
>>>> +#define HVMOP_vmtrace_ipt_get_offset  3
>>>> +    domid_t domain;
>>>> +    uint32_t vcpu;
>>>> +    uint64_t size;
>>>> +
>>>> +    /* OUT variable */
>>>> +    uint64_t offset;
>>>
>>> If this is to be a tools-only interface, please use uint64_aligned_t.
>>>
>> 
>> This type is not defined within hvm_op.h header. What should I do about it?
> 
> It gets defined by xen.h, so should be available here. Its
> definitions live in a
> 
> #if defined(__XEN__) || defined(__XEN_TOOLS__)
> 
> section, which is what I did recommend to put your interface in
> as well. Unless you want this to be exposed to the guest itself,
> at which point further constraints would arise.
> 
>>> You also want to add an entry to xen/include/xlat.lst and use the
>>> resulting macro to prove that the struct layout is the same for
>>> native and compat callers.
>> 
>> Could you tell a little bit more about this? What are "native" and
>> "compat" callers and what is the purpose of this file?
> 
> A native caller is one whose bitness matches Xen's, i.e. for x86
> a guest running in 64-bit mode. A compat guest is one running in
> 32-bit mode. Interface structure layout, at least for historical
> reasons, can differ between the two. If it does, these
> structures need translation. In the case here the layouts look
> to match, which we still want to be verified at build time. If
> you add a suitable line to xlat.lst starting with a ?, a macro
> will be generated that you can then invoke somewhere near the
> code that actually handles this sub-hypercall. See e.g. the top
> of xen/common/hypfs.c for a relatively recent addition of such.
> 

Thanks for this explanation, I will try to address this.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 16:02         ` Michał Leszczyński
@ 2020-06-22 16:16           ` Jan Beulich
  2020-06-22 16:22             ` Michał Leszczyński
                               ` (2 more replies)
  2020-06-22 17:05           ` Michał Leszczyński
  1 sibling, 3 replies; 46+ messages in thread
From: Jan Beulich @ 2020-06-22 16:16 UTC (permalink / raw)
  To: Michał Leszczyński, Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Tamas K Lengyel, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Xen-devel, Roger Pau Monné

On 22.06.2020 18:02, Michał Leszczyński wrote:
> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>>> Is any of what you do in this switch() actually legitimate without
>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>> remarks elsewhere I imply you expect the param that you currently
>>>> use to be set upon domain creation time, but at the very least the
>>>> potentially big buffer should imo not get allocated up front, but
>>>> only when tracing is to actually be enabled.
>>>
>>> Wait... so you want to allocate these buffers in runtime?
>>> Previously we were talking that there is too much runtime logic
>>> and these enable/disable hypercalls should be stripped to absolute
>>> minimum.
>>
>> Basic arrangements can be made at domain creation time. I don't
>> think though that it would be a good use of memory if you
>> allocated perhaps many gigabytes of memory just for possibly
>> wanting to enable tracing on a guest.
>>
> 
> From our previous conversations I thought that you want to have
> as much logic moved to the domain creation as possible.
> 
> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
> zero (= disabled), if you set it to a non-zero value, then trace buffers
> of given size will be allocated for the domain and you have possibility
> to use ipt_enable/ipt_disable at any moment.
> 
> This way the runtime logic is as thin as possible. I assume user knows
> in advance whether he/she would want to use external monitoring with IPT
> or not.

Andrew - I think you requested movement to domain_create(). Could
you clarify if indeed you mean to also allocate the big buffers
this early?

> It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
> would suffice, I think.

But that one such buffer per vCPU, isn't it? Plus these buffers
need to be physically contiguous, which is an additional possibly
severe constraint.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 16:16           ` Jan Beulich
@ 2020-06-22 16:22             ` Michał Leszczyński
  2020-06-22 16:25             ` Roger Pau Monné
  2020-06-23  1:04             ` Michał Leszczyński
  2 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22 16:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

----- 22 cze 2020 o 18:16, Jan Beulich jbeulich@suse.com napisał(a):

> On 22.06.2020 18:02, Michał Leszczyński wrote:
>> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>>>> Is any of what you do in this switch() actually legitimate without
>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>> use to be set upon domain creation time, but at the very least the
>>>>> potentially big buffer should imo not get allocated up front, but
>>>>> only when tracing is to actually be enabled.
>>>>
>>>> Wait... so you want to allocate these buffers in runtime?
>>>> Previously we were talking that there is too much runtime logic
>>>> and these enable/disable hypercalls should be stripped to absolute
>>>> minimum.
>>>
>>> Basic arrangements can be made at domain creation time. I don't
>>> think though that it would be a good use of memory if you
>>> allocated perhaps many gigabytes of memory just for possibly
>>> wanting to enable tracing on a guest.
>>>
>> 
>> From our previous conversations I thought that you want to have
>> as much logic moved to the domain creation as possible.
>> 
>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>> of given size will be allocated for the domain and you have possibility
>> to use ipt_enable/ipt_disable at any moment.
>> 
>> This way the runtime logic is as thin as possible. I assume user knows
>> in advance whether he/she would want to use external monitoring with IPT
>> or not.
> 
> Andrew - I think you requested movement to domain_create(). Could
> you clarify if indeed you mean to also allocate the big buffers
> this early?
> 
>> It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
>> would suffice, I think.
> 
> But that one such buffer per vCPU, isn't it? Plus these buffers
> need to be physically contiguous, which is an additional possibly
> severe constraint.

Yes. For my use case (VMI stuff) I estimate 16-64 MB per vCPU and for fuzzing
I think it would be even less.

And also yes - these buffers need to be physically contigous and aligned
because otherwise CPU would refuse to use them.


Best regards,
Michał Leszczyński
CERT Polska



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 16:16           ` Jan Beulich
  2020-06-22 16:22             ` Michał Leszczyński
@ 2020-06-22 16:25             ` Roger Pau Monné
  2020-06-22 16:33               ` Michał Leszczyński
  2020-06-23  1:04             ` Michał Leszczyński
  2 siblings, 1 reply; 46+ messages in thread
From: Roger Pau Monné @ 2020-06-22 16:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Kang, Luwei, Jun Nakajima, Tamas K Lengyel,
	Xen-devel

On Mon, Jun 22, 2020 at 06:16:57PM +0200, Jan Beulich wrote:
> On 22.06.2020 18:02, Michał Leszczyński wrote:
> > ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
> >> On 22.06.2020 16:35, Michał Leszczyński wrote:
> >>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
> > It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
> > would suffice, I think.
> 
> But that one such buffer per vCPU, isn't it? Plus these buffers
> need to be physically contiguous, which is an additional possibly
> severe constraint.

FTR, from my reading of the SDM you can use a mode called ToPA where
the buffer is some kind of linked list of tables that map to output
regions. That would be nice, but IMO it should be implemented in a
next iteration after the basic support is in.

Roger.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 16:25             ` Roger Pau Monné
@ 2020-06-22 16:33               ` Michał Leszczyński
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22 16:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jan Beulich, Tamas K Lengyel, Xen-devel

----- 22 cze 2020 o 18:25, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Mon, Jun 22, 2020 at 06:16:57PM +0200, Jan Beulich wrote:
>> On 22.06.2020 18:02, Michał Leszczyński wrote:
>> > ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>> >> On 22.06.2020 16:35, Michał Leszczyński wrote:
>> >>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>> > It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
>> > would suffice, I think.
>> 
>> But that one such buffer per vCPU, isn't it? Plus these buffers
>> need to be physically contiguous, which is an additional possibly
>> severe constraint.
> 
> FTR, from my reading of the SDM you can use a mode called ToPA where
> the buffer is some kind of linked list of tables that map to output
> regions. That would be nice, but IMO it should be implemented in a
> next iteration after the basic support is in.
> 
> Roger.

Yes. I keep that in mind but right now I would like to go for the
minimum viable implementation, while ToPA could be added in the next
patch series.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 16:02         ` Michał Leszczyński
  2020-06-22 16:16           ` Jan Beulich
@ 2020-06-22 17:05           ` Michał Leszczyński
  2020-06-23  8:49             ` Jan Beulich
  1 sibling, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-22 17:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Xen-devel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	Tamas K Lengyel, Kang, Luwei, Roger Pau Monné

>>>>> +struct xen_hvm_vmtrace_op {
>>>>> +    /* IN variable */
>>>>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>>>>> +    uint32_t cmd;
>>>>> +/* Enable/disable external vmtrace for given domain */
>>>>> +#define HVMOP_vmtrace_ipt_enable      1
>>>>> +#define HVMOP_vmtrace_ipt_disable     2
>>>>> +#define HVMOP_vmtrace_ipt_get_offset  3
>>>>> +    domid_t domain;
>>>>> +    uint32_t vcpu;
>>>>> +    uint64_t size;
>>>>> +
>>>>> +    /* OUT variable */
>>>>> +    uint64_t offset;
>>>>
>>>> If this is to be a tools-only interface, please use uint64_aligned_t.
>>>>
>>> 
>>> This type is not defined within hvm_op.h header. What should I do about it?
>> 
>> It gets defined by xen.h, so should be available here. Its
>> definitions live in a
>> 
>> #if defined(__XEN__) || defined(__XEN_TOOLS__)
>> 
>> section, which is what I did recommend to put your interface in
>> as well. Unless you want this to be exposed to the guest itself,
>> at which point further constraints would arise.
>> 

When I've putted it into #if defined(__XEN__) || defined(__XEN_TOOLS__)
then it complains about uint64_aligned_compat_t type missing.

I also can't spot any single instance of uint64_aligned_t within
this file.


ml


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 16:16           ` Jan Beulich
  2020-06-22 16:22             ` Michał Leszczyński
  2020-06-22 16:25             ` Roger Pau Monné
@ 2020-06-23  1:04             ` Michał Leszczyński
  2020-06-23  8:51               ` Jan Beulich
  2 siblings, 1 reply; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-23  1:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

----- 22 cze 2020 o 18:16, Jan Beulich jbeulich@suse.com napisał(a):

> On 22.06.2020 18:02, Michał Leszczyński wrote:
>> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>>>> Is any of what you do in this switch() actually legitimate without
>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>> use to be set upon domain creation time, but at the very least the
>>>>> potentially big buffer should imo not get allocated up front, but
>>>>> only when tracing is to actually be enabled.
>>>>
>>>> Wait... so you want to allocate these buffers in runtime?
>>>> Previously we were talking that there is too much runtime logic
>>>> and these enable/disable hypercalls should be stripped to absolute
>>>> minimum.
>>>
>>> Basic arrangements can be made at domain creation time. I don't
>>> think though that it would be a good use of memory if you
>>> allocated perhaps many gigabytes of memory just for possibly
>>> wanting to enable tracing on a guest.
>>>
>> 
>> From our previous conversations I thought that you want to have
>> as much logic moved to the domain creation as possible.
>> 
>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>> of given size will be allocated for the domain and you have possibility
>> to use ipt_enable/ipt_disable at any moment.
>> 
>> This way the runtime logic is as thin as possible. I assume user knows
>> in advance whether he/she would want to use external monitoring with IPT
>> or not.
> 
> Andrew - I think you requested movement to domain_create(). Could
> you clarify if indeed you mean to also allocate the big buffers
> this early?

I would like to recall what Andrew wrote few days ago:

----- 16 cze 2020 o 22:16, Andrew Cooper andrew.cooper3@citrix.com wrote:
> Xen has traditionally opted for a "and turn this extra thing on
> dynamically" model, but this has caused no end of security issues and
> broken corner cases.
> 
> You can see this still existing in the difference between
> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
> required to chose the number of vcpus for the domain) and we're making
> good progress undoing this particular wart (before 4.13, it was
> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
> issuing other hypercalls between these two).
> 
> There is a lot of settings which should be immutable for the lifetime of
> the domain, and external monitoring looks like another one of these.
> Specifying it at createdomain time allows for far better runtime
> behaviour (you are no longer in a situation where the first time you try
> to turn tracing on, you end up with -ENOMEM because another VM booted in
> the meantime and used the remaining memory), and it makes for rather
> more simple code in Xen itself (at runtime, you can rely on it having
> been set up properly, because a failure setting up will have killed the
> domain already).
> 
> ...
> 
> ~Andrew

according to this quote I've moved buffer allocation to the create_domain,
the updated version was already sent to the list as patch v3.

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-22 17:05           ` Michał Leszczyński
@ 2020-06-23  8:49             ` Jan Beulich
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2020-06-23  8:49 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Xen-devel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	Tamas K Lengyel, Kang, Luwei, Roger Pau Monné

On 22.06.2020 19:05, Michał Leszczyński wrote:
>>>>>> +struct xen_hvm_vmtrace_op {
>>>>>> +    /* IN variable */
>>>>>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>>>>>> +    uint32_t cmd;
>>>>>> +/* Enable/disable external vmtrace for given domain */
>>>>>> +#define HVMOP_vmtrace_ipt_enable      1
>>>>>> +#define HVMOP_vmtrace_ipt_disable     2
>>>>>> +#define HVMOP_vmtrace_ipt_get_offset  3
>>>>>> +    domid_t domain;
>>>>>> +    uint32_t vcpu;
>>>>>> +    uint64_t size;
>>>>>> +
>>>>>> +    /* OUT variable */
>>>>>> +    uint64_t offset;
>>>>>
>>>>> If this is to be a tools-only interface, please use uint64_aligned_t.
>>>>>
>>>>
>>>> This type is not defined within hvm_op.h header. What should I do about it?
>>>
>>> It gets defined by xen.h, so should be available here. Its
>>> definitions live in a
>>>
>>> #if defined(__XEN__) || defined(__XEN_TOOLS__)
>>>
>>> section, which is what I did recommend to put your interface in
>>> as well. Unless you want this to be exposed to the guest itself,
>>> at which point further constraints would arise.
>>>
> 
> When I've putted it into #if defined(__XEN__) || defined(__XEN_TOOLS__)
> then it complains about uint64_aligned_compat_t type missing.

One the interface is tools-only, the requirement for compat
checking isn't as strict anymore. As you can see from domctl
and sysctl, there's no checking there, but to compensate we
use uint64_aligned_t instead (arranging for struct layouts to
match).

> I also can't spot any single instance of uint64_aligned_t within
> this file.

That's unrelated.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-23  1:04             ` Michał Leszczyński
@ 2020-06-23  8:51               ` Jan Beulich
  2020-06-23 17:24                 ` Andrew Cooper
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2020-06-23  8:51 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Tamas K Lengyel, Xen-devel, Roger Pau Monné

On 23.06.2020 03:04, Michał Leszczyński wrote:
> ----- 22 cze 2020 o 18:16, Jan Beulich jbeulich@suse.com napisał(a):
> 
>> On 22.06.2020 18:02, Michał Leszczyński wrote:
>>> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>>>>> Is any of what you do in this switch() actually legitimate without
>>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>>> use to be set upon domain creation time, but at the very least the
>>>>>> potentially big buffer should imo not get allocated up front, but
>>>>>> only when tracing is to actually be enabled.
>>>>>
>>>>> Wait... so you want to allocate these buffers in runtime?
>>>>> Previously we were talking that there is too much runtime logic
>>>>> and these enable/disable hypercalls should be stripped to absolute
>>>>> minimum.
>>>>
>>>> Basic arrangements can be made at domain creation time. I don't
>>>> think though that it would be a good use of memory if you
>>>> allocated perhaps many gigabytes of memory just for possibly
>>>> wanting to enable tracing on a guest.
>>>>
>>>
>>> From our previous conversations I thought that you want to have
>>> as much logic moved to the domain creation as possible.
>>>
>>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>>> of given size will be allocated for the domain and you have possibility
>>> to use ipt_enable/ipt_disable at any moment.
>>>
>>> This way the runtime logic is as thin as possible. I assume user knows
>>> in advance whether he/she would want to use external monitoring with IPT
>>> or not.
>>
>> Andrew - I think you requested movement to domain_create(). Could
>> you clarify if indeed you mean to also allocate the big buffers
>> this early?
> 
> I would like to recall what Andrew wrote few days ago:
> 
> ----- 16 cze 2020 o 22:16, Andrew Cooper andrew.cooper3@citrix.com wrote:
>> Xen has traditionally opted for a "and turn this extra thing on
>> dynamically" model, but this has caused no end of security issues and
>> broken corner cases.
>>
>> You can see this still existing in the difference between
>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>> required to chose the number of vcpus for the domain) and we're making
>> good progress undoing this particular wart (before 4.13, it was
>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>> issuing other hypercalls between these two).
>>
>> There is a lot of settings which should be immutable for the lifetime of
>> the domain, and external monitoring looks like another one of these.
>> Specifying it at createdomain time allows for far better runtime
>> behaviour (you are no longer in a situation where the first time you try
>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>> the meantime and used the remaining memory), and it makes for rather
>> more simple code in Xen itself (at runtime, you can rely on it having
>> been set up properly, because a failure setting up will have killed the
>> domain already).
>>
>> ...
>>
>> ~Andrew
> 
> according to this quote I've moved buffer allocation to the create_domain,
> the updated version was already sent to the list as patch v3.

I'd still like to see an explicit confirmation by him that this
use of memory is indeed what he has intended. There are much smaller
amounts of memory which we allocate on demand, just to avoid
allocating some without then ever using it.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-23  8:51               ` Jan Beulich
@ 2020-06-23 17:24                 ` Andrew Cooper
  2020-06-24 10:03                   ` Jan Beulich
  2020-06-24 12:23                   ` Michał Leszczyński
  0 siblings, 2 replies; 46+ messages in thread
From: Andrew Cooper @ 2020-06-23 17:24 UTC (permalink / raw)
  To: Jan Beulich, Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Tamas K Lengyel, Ian Jackson, George Dunlap, Kang, Luwei,
	Jun Nakajima, Xen-devel, Roger Pau Monné

On 23/06/2020 09:51, Jan Beulich wrote:
> On 23.06.2020 03:04, Michał Leszczyński wrote:
>> ----- 22 cze 2020 o 18:16, Jan Beulich jbeulich@suse.com napisał(a):
>>
>>> On 22.06.2020 18:02, Michał Leszczyński wrote:
>>>> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>>>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>>>>>> Is any of what you do in this switch() actually legitimate without
>>>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>>>> use to be set upon domain creation time, but at the very least the
>>>>>>> potentially big buffer should imo not get allocated up front, but
>>>>>>> only when tracing is to actually be enabled.
>>>>>> Wait... so you want to allocate these buffers in runtime?
>>>>>> Previously we were talking that there is too much runtime logic
>>>>>> and these enable/disable hypercalls should be stripped to absolute
>>>>>> minimum.
>>>>> Basic arrangements can be made at domain creation time. I don't
>>>>> think though that it would be a good use of memory if you
>>>>> allocated perhaps many gigabytes of memory just for possibly
>>>>> wanting to enable tracing on a guest.
>>>>>
>>>> From our previous conversations I thought that you want to have
>>>> as much logic moved to the domain creation as possible.
>>>>
>>>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>>>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>>>> of given size will be allocated for the domain and you have possibility
>>>> to use ipt_enable/ipt_disable at any moment.
>>>>
>>>> This way the runtime logic is as thin as possible. I assume user knows
>>>> in advance whether he/she would want to use external monitoring with IPT
>>>> or not.
>>> Andrew - I think you requested movement to domain_create(). Could
>>> you clarify if indeed you mean to also allocate the big buffers
>>> this early?
>> I would like to recall what Andrew wrote few days ago:
>>
>> ----- 16 cze 2020 o 22:16, Andrew Cooper andrew.cooper3@citrix.com wrote:
>>> Xen has traditionally opted for a "and turn this extra thing on
>>> dynamically" model, but this has caused no end of security issues and
>>> broken corner cases.
>>>
>>> You can see this still existing in the difference between
>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>> required to chose the number of vcpus for the domain) and we're making
>>> good progress undoing this particular wart (before 4.13, it was
>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>> issuing other hypercalls between these two).
>>>
>>> There is a lot of settings which should be immutable for the lifetime of
>>> the domain, and external monitoring looks like another one of these.
>>> Specifying it at createdomain time allows for far better runtime
>>> behaviour (you are no longer in a situation where the first time you try
>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>> the meantime and used the remaining memory), and it makes for rather
>>> more simple code in Xen itself (at runtime, you can rely on it having
>>> been set up properly, because a failure setting up will have killed the
>>> domain already).
>>>
>>> ...
>>>
>>> ~Andrew
>> according to this quote I've moved buffer allocation to the create_domain,
>> the updated version was already sent to the list as patch v3.
> I'd still like to see an explicit confirmation by him that this
> use of memory is indeed what he has intended. There are much smaller
> amounts of memory which we allocate on demand, just to avoid
> allocating some without then ever using it.

PT is a debug/diagnostic tool.  Its not something you'd run in
production against a production VM.

It's off by default (by virtue of having to explicitly ask to use it in
the first place), and those who've asked for it don't want to be finding
-ENOMEM after the domain has been running for a few seconds (or midway
through the vcpus), when they inveterately want to map the rings.

Those who request buffers in the first place and forget about them are
not semantically different from those who ask for a silly shadow memory
limit, or typo the guest memory and give it too much.  Its a admin
error, not a safety/correctness issue.

~Andrew


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-23 17:24                 ` Andrew Cooper
@ 2020-06-24 10:03                   ` Jan Beulich
  2020-06-24 12:40                     ` Andrew Cooper
  2020-06-24 12:23                   ` Michał Leszczyński
  1 sibling, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2020-06-24 10:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Tamas K Lengyel, Michał Leszczyński, Ian Jackson,
	George Dunlap, Kang, Luwei, Jun Nakajima, Xen-devel,
	Roger Pau Monné

On 23.06.2020 19:24, Andrew Cooper wrote:
> On 23/06/2020 09:51, Jan Beulich wrote:
>> On 23.06.2020 03:04, Michał Leszczyński wrote:
>>> ----- 22 cze 2020 o 18:16, Jan Beulich jbeulich@suse.com napisał(a):
>>>
>>>> On 22.06.2020 18:02, Michał Leszczyński wrote:
>>>>> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>>>>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>>>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>>>>>>> Is any of what you do in this switch() actually legitimate without
>>>>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>>>>> use to be set upon domain creation time, but at the very least the
>>>>>>>> potentially big buffer should imo not get allocated up front, but
>>>>>>>> only when tracing is to actually be enabled.
>>>>>>> Wait... so you want to allocate these buffers in runtime?
>>>>>>> Previously we were talking that there is too much runtime logic
>>>>>>> and these enable/disable hypercalls should be stripped to absolute
>>>>>>> minimum.
>>>>>> Basic arrangements can be made at domain creation time. I don't
>>>>>> think though that it would be a good use of memory if you
>>>>>> allocated perhaps many gigabytes of memory just for possibly
>>>>>> wanting to enable tracing on a guest.
>>>>>>
>>>>> From our previous conversations I thought that you want to have
>>>>> as much logic moved to the domain creation as possible.
>>>>>
>>>>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>>>>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>>>>> of given size will be allocated for the domain and you have possibility
>>>>> to use ipt_enable/ipt_disable at any moment.
>>>>>
>>>>> This way the runtime logic is as thin as possible. I assume user knows
>>>>> in advance whether he/she would want to use external monitoring with IPT
>>>>> or not.
>>>> Andrew - I think you requested movement to domain_create(). Could
>>>> you clarify if indeed you mean to also allocate the big buffers
>>>> this early?
>>> I would like to recall what Andrew wrote few days ago:
>>>
>>> ----- 16 cze 2020 o 22:16, Andrew Cooper andrew.cooper3@citrix.com wrote:
>>>> Xen has traditionally opted for a "and turn this extra thing on
>>>> dynamically" model, but this has caused no end of security issues and
>>>> broken corner cases.
>>>>
>>>> You can see this still existing in the difference between
>>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>>> required to chose the number of vcpus for the domain) and we're making
>>>> good progress undoing this particular wart (before 4.13, it was
>>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>>> issuing other hypercalls between these two).
>>>>
>>>> There is a lot of settings which should be immutable for the lifetime of
>>>> the domain, and external monitoring looks like another one of these.
>>>> Specifying it at createdomain time allows for far better runtime
>>>> behaviour (you are no longer in a situation where the first time you try
>>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>>> the meantime and used the remaining memory), and it makes for rather
>>>> more simple code in Xen itself (at runtime, you can rely on it having
>>>> been set up properly, because a failure setting up will have killed the
>>>> domain already).
>>>>
>>>> ...
>>>>
>>>> ~Andrew
>>> according to this quote I've moved buffer allocation to the create_domain,
>>> the updated version was already sent to the list as patch v3.
>> I'd still like to see an explicit confirmation by him that this
>> use of memory is indeed what he has intended. There are much smaller
>> amounts of memory which we allocate on demand, just to avoid
>> allocating some without then ever using it.
> 
> PT is a debug/diagnostic tool.  Its not something you'd run in
> production against a production VM.

Well, the suggested use with introspection made me assume differently.
If this is (now and forever) truly debug/diag only, so be it then.

Jan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-23 17:24                 ` Andrew Cooper
  2020-06-24 10:03                   ` Jan Beulich
@ 2020-06-24 12:23                   ` Michał Leszczyński
  1 sibling, 0 replies; 46+ messages in thread
From: Michał Leszczyński @ 2020-06-24 12:23 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Tamas K Lengyel, Ian Jackson, George Dunlap, Kang,
	Luwei, Jan Beulich, Xen-devel, Roger Pau Monné

----- 23 cze 2020 o 19:24, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> On 23/06/2020 09:51, Jan Beulich wrote:
>> I'd still like to see an explicit confirmation by him that this
>> use of memory is indeed what he has intended. There are much smaller
>> amounts of memory which we allocate on demand, just to avoid
>> allocating some without then ever using it.
> 
> PT is a debug/diagnostic tool.  Its not something you'd run in
> production against a production VM.
> 
> It's off by default (by virtue of having to explicitly ask to use it in
> the first place), and those who've asked for it don't want to be finding
> -ENOMEM after the domain has been running for a few seconds (or midway
> through the vcpus), when they inveterately want to map the rings.
> 
> Those who request buffers in the first place and forget about them are
> not semantically different from those who ask for a silly shadow memory
> limit, or typo the guest memory and give it too much.  Its a admin
> error, not a safety/correctness issue.
> 
> ~Andrew


Absolutely +1.

Assuming that somebody wants to perform some advanced scenario and is trying
to run many domains (e.g. 20), it's much better to have 19 domains
working fine and 1 prematurely crashing because of -ENOMEM,
rather than have all 20 domains randomly crashing in runtime because
it turned out there is a shortage of memory.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-24 10:03                   ` Jan Beulich
@ 2020-06-24 12:40                     ` Andrew Cooper
  2020-06-24 12:52                       ` Tamas K Lengyel
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2020-06-24 12:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Tamas K Lengyel, Michał Leszczyński, Ian Jackson,
	George Dunlap, Kang, Luwei, Jun Nakajima, Xen-devel,
	Roger Pau Monné

On 24/06/2020 11:03, Jan Beulich wrote:
> On 23.06.2020 19:24, Andrew Cooper wrote:
>> On 23/06/2020 09:51, Jan Beulich wrote:
>>> On 23.06.2020 03:04, Michał Leszczyński wrote:
>>>> ----- 22 cze 2020 o 18:16, Jan Beulich jbeulich@suse.com napisał(a):
>>>>
>>>>> On 22.06.2020 18:02, Michał Leszczyński wrote:
>>>>>> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
>>>>>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>>>>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
>>>>>>>>> Is any of what you do in this switch() actually legitimate without
>>>>>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>>>>>> use to be set upon domain creation time, but at the very least the
>>>>>>>>> potentially big buffer should imo not get allocated up front, but
>>>>>>>>> only when tracing is to actually be enabled.
>>>>>>>> Wait... so you want to allocate these buffers in runtime?
>>>>>>>> Previously we were talking that there is too much runtime logic
>>>>>>>> and these enable/disable hypercalls should be stripped to absolute
>>>>>>>> minimum.
>>>>>>> Basic arrangements can be made at domain creation time. I don't
>>>>>>> think though that it would be a good use of memory if you
>>>>>>> allocated perhaps many gigabytes of memory just for possibly
>>>>>>> wanting to enable tracing on a guest.
>>>>>>>
>>>>>> From our previous conversations I thought that you want to have
>>>>>> as much logic moved to the domain creation as possible.
>>>>>>
>>>>>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>>>>>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>>>>>> of given size will be allocated for the domain and you have possibility
>>>>>> to use ipt_enable/ipt_disable at any moment.
>>>>>>
>>>>>> This way the runtime logic is as thin as possible. I assume user knows
>>>>>> in advance whether he/she would want to use external monitoring with IPT
>>>>>> or not.
>>>>> Andrew - I think you requested movement to domain_create(). Could
>>>>> you clarify if indeed you mean to also allocate the big buffers
>>>>> this early?
>>>> I would like to recall what Andrew wrote few days ago:
>>>>
>>>> ----- 16 cze 2020 o 22:16, Andrew Cooper andrew.cooper3@citrix.com wrote:
>>>>> Xen has traditionally opted for a "and turn this extra thing on
>>>>> dynamically" model, but this has caused no end of security issues and
>>>>> broken corner cases.
>>>>>
>>>>> You can see this still existing in the difference between
>>>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>>>> required to chose the number of vcpus for the domain) and we're making
>>>>> good progress undoing this particular wart (before 4.13, it was
>>>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>>>> issuing other hypercalls between these two).
>>>>>
>>>>> There is a lot of settings which should be immutable for the lifetime of
>>>>> the domain, and external monitoring looks like another one of these.
>>>>> Specifying it at createdomain time allows for far better runtime
>>>>> behaviour (you are no longer in a situation where the first time you try
>>>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>>>> the meantime and used the remaining memory), and it makes for rather
>>>>> more simple code in Xen itself (at runtime, you can rely on it having
>>>>> been set up properly, because a failure setting up will have killed the
>>>>> domain already).
>>>>>
>>>>> ...
>>>>>
>>>>> ~Andrew
>>>> according to this quote I've moved buffer allocation to the create_domain,
>>>> the updated version was already sent to the list as patch v3.
>>> I'd still like to see an explicit confirmation by him that this
>>> use of memory is indeed what he has intended. There are much smaller
>>> amounts of memory which we allocate on demand, just to avoid
>>> allocating some without then ever using it.
>> PT is a debug/diagnostic tool.  Its not something you'd run in
>> production against a production VM.
> Well, the suggested use with introspection made me assume differently.
> If this is (now and forever) truly debug/diag only, so be it then.

At the end of the day, it is a fine grain profiling tool, meaning that
it is not used in the common case.

The usecase presented is for fuzzing, using the execution trace
generated by the CPU, rather than the current scheme which allegedly
involves shuffling breakpoints around.

There is a big disclaimer with PT saying "there is a perf hit from using
this".  This is a consequence of the core suddenly becoming far more
memory bound, and almost certainly being capable of generating trace
data faster than can be written out.


Even if it does find its way into some fairly custom production
scenarios (fuzzing as a service?), we're still in the position where the
only people who enable this will be the people intending to use it.

~Andrew


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op
  2020-06-24 12:40                     ` Andrew Cooper
@ 2020-06-24 12:52                       ` Tamas K Lengyel
  0 siblings, 0 replies; 46+ messages in thread
From: Tamas K Lengyel @ 2020-06-24 12:52 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, Kang, Luwei, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Wed, Jun 24, 2020 at 6:40 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 24/06/2020 11:03, Jan Beulich wrote:
> > On 23.06.2020 19:24, Andrew Cooper wrote:
> >> On 23/06/2020 09:51, Jan Beulich wrote:
> >>> On 23.06.2020 03:04, Michał Leszczyński wrote:
> >>>> ----- 22 cze 2020 o 18:16, Jan Beulich jbeulich@suse.com napisał(a):
> >>>>
> >>>>> On 22.06.2020 18:02, Michał Leszczyński wrote:
> >>>>>> ----- 22 cze 2020 o 17:22, Jan Beulich jbeulich@suse.com napisał(a):
> >>>>>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
> >>>>>>>> ----- 22 cze 2020 o 15:25, Jan Beulich jbeulich@suse.com napisał(a):
> >>>>>>>>> Is any of what you do in this switch() actually legitimate without
> >>>>>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
> >>>>>>>>> remarks elsewhere I imply you expect the param that you currently
> >>>>>>>>> use to be set upon domain creation time, but at the very least the
> >>>>>>>>> potentially big buffer should imo not get allocated up front, but
> >>>>>>>>> only when tracing is to actually be enabled.
> >>>>>>>> Wait... so you want to allocate these buffers in runtime?
> >>>>>>>> Previously we were talking that there is too much runtime logic
> >>>>>>>> and these enable/disable hypercalls should be stripped to absolute
> >>>>>>>> minimum.
> >>>>>>> Basic arrangements can be made at domain creation time. I don't
> >>>>>>> think though that it would be a good use of memory if you
> >>>>>>> allocated perhaps many gigabytes of memory just for possibly
> >>>>>>> wanting to enable tracing on a guest.
> >>>>>>>
> >>>>>> From our previous conversations I thought that you want to have
> >>>>>> as much logic moved to the domain creation as possible.
> >>>>>>
> >>>>>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
> >>>>>> zero (= disabled), if you set it to a non-zero value, then trace buffers
> >>>>>> of given size will be allocated for the domain and you have possibility
> >>>>>> to use ipt_enable/ipt_disable at any moment.
> >>>>>>
> >>>>>> This way the runtime logic is as thin as possible. I assume user knows
> >>>>>> in advance whether he/she would want to use external monitoring with IPT
> >>>>>> or not.
> >>>>> Andrew - I think you requested movement to domain_create(). Could
> >>>>> you clarify if indeed you mean to also allocate the big buffers
> >>>>> this early?
> >>>> I would like to recall what Andrew wrote few days ago:
> >>>>
> >>>> ----- 16 cze 2020 o 22:16, Andrew Cooper andrew.cooper3@citrix.com wrote:
> >>>>> Xen has traditionally opted for a "and turn this extra thing on
> >>>>> dynamically" model, but this has caused no end of security issues and
> >>>>> broken corner cases.
> >>>>>
> >>>>> You can see this still existing in the difference between
> >>>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
> >>>>> required to chose the number of vcpus for the domain) and we're making
> >>>>> good progress undoing this particular wart (before 4.13, it was
> >>>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
> >>>>> issuing other hypercalls between these two).
> >>>>>
> >>>>> There is a lot of settings which should be immutable for the lifetime of
> >>>>> the domain, and external monitoring looks like another one of these.
> >>>>> Specifying it at createdomain time allows for far better runtime
> >>>>> behaviour (you are no longer in a situation where the first time you try
> >>>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
> >>>>> the meantime and used the remaining memory), and it makes for rather
> >>>>> more simple code in Xen itself (at runtime, you can rely on it having
> >>>>> been set up properly, because a failure setting up will have killed the
> >>>>> domain already).
> >>>>>
> >>>>> ...
> >>>>>
> >>>>> ~Andrew
> >>>> according to this quote I've moved buffer allocation to the create_domain,
> >>>> the updated version was already sent to the list as patch v3.
> >>> I'd still like to see an explicit confirmation by him that this
> >>> use of memory is indeed what he has intended. There are much smaller
> >>> amounts of memory which we allocate on demand, just to avoid
> >>> allocating some without then ever using it.
> >> PT is a debug/diagnostic tool.  Its not something you'd run in
> >> production against a production VM.
> > Well, the suggested use with introspection made me assume differently.
> > If this is (now and forever) truly debug/diag only, so be it then.
>
> At the end of the day, it is a fine grain profiling tool, meaning that
> it is not used in the common case.
>
> The usecase presented is for fuzzing, using the execution trace
> generated by the CPU, rather than the current scheme which allegedly
> involves shuffling breakpoints around.

That's indeed the usecase we are looking to use it for. But there is
also some experimental malware analysis scenarios it is targeted for
which is what I believe the CERT folks are mostly interested in, like
https://www.vmray.com/cyber-security-blog/back-to-the-past-using-intels-processor-trace-for-enhanced-analysis.
I could also imagine this to be useful for IDS/IPS solutions like what
Bitdefender is building but I'm just speculating since the overhead
might be prohibitive for that use-case. I would consider all these
scenarios experimental at this point and absolutely not something to
enable by default. If someone presents a need for this to be supported
on production VMs that require turning it on/off at runtime not
knowing whether the feature will be used when the domain is created we
can certainly revisit the topic. But I don't expect that to happen any
time soon, if at all.

Tamas


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-06-24 12:53 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 23:34 [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński
2020-06-18 23:38 ` [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays Michał Leszczyński
2020-06-19 11:34   ` Roger Pau Monné
2020-06-19 11:36     ` Michał Leszczyński
2020-06-19 11:48       ` Jan Beulich
2020-06-19 11:51         ` Michał Leszczyński
2020-06-19 12:35     ` Michał Leszczyński
2020-06-19 12:39       ` Jan Beulich
2020-06-22  3:00         ` Michał Leszczyński
2020-06-18 23:39 ` [PATCH v2 2/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
2020-06-22 12:35   ` Jan Beulich
2020-06-18 23:40 ` [PATCH v2 3/7] x86/vmx: add IPT cpu feature Michał Leszczyński
2020-06-19 13:44   ` Roger Pau Monné
2020-06-19 14:22     ` Michał Leszczyński
2020-06-19 15:31       ` Roger Pau Monné
2020-06-22  2:49     ` Michał Leszczyński
2020-06-22  8:31       ` Jan Beulich
2020-06-22 12:40   ` Jan Beulich
2020-06-18 23:41 ` [PATCH v2 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
2020-06-19  0:46   ` Michał Leszczyński
2020-06-19 15:30   ` Roger Pau Monné
2020-06-19 15:50     ` Jan Beulich
2020-06-22  2:45       ` Michał Leszczyński
2020-06-22  2:56   ` Michał Leszczyński
2020-06-22  8:39     ` Jan Beulich
2020-06-22 13:25   ` Jan Beulich
2020-06-22 14:35     ` Michał Leszczyński
2020-06-22 15:22       ` Jan Beulich
2020-06-22 16:02         ` Michał Leszczyński
2020-06-22 16:16           ` Jan Beulich
2020-06-22 16:22             ` Michał Leszczyński
2020-06-22 16:25             ` Roger Pau Monné
2020-06-22 16:33               ` Michał Leszczyński
2020-06-23  1:04             ` Michał Leszczyński
2020-06-23  8:51               ` Jan Beulich
2020-06-23 17:24                 ` Andrew Cooper
2020-06-24 10:03                   ` Jan Beulich
2020-06-24 12:40                     ` Andrew Cooper
2020-06-24 12:52                       ` Tamas K Lengyel
2020-06-24 12:23                   ` Michał Leszczyński
2020-06-22 17:05           ` Michał Leszczyński
2020-06-23  8:49             ` Jan Beulich
2020-06-18 23:41 ` [PATCH v2 5/7] tools/libxc: add xc_vmtrace_* functions Michał Leszczyński
2020-06-18 23:42 ` [PATCH v2 6/7] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
2020-06-18 23:42 ` [PATCH v2 7/7] tools/proctrace: add proctrace tool Michał Leszczyński
2020-06-18 23:51 ` [PATCH v2 0/7] Implement support for external IPT monitoring Michał Leszczyński

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).