All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Implement support for external IPT monitoring
@ 2020-06-30 12:33 Michał Leszczyński
  2020-06-30 12:33 ` [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
                   ` (10 more replies)
  0 siblings, 11 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Kevin Tian, Stefano Stabellini, tamas.lengyel,
	Jun Nakajima, Wei Liu, Andrew Cooper, Michal Leszczynski,
	Ian Jackson, George Dunlap, Jan Beulich, Anthony PERARD,
	luwei.kang, Roger Pau Monné

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Intel Processor Trace is an architectural extension available in modern Intel 
family CPUs. It allows recording the detailed trace of activity while the 
processor executes the code. One might use the recorded trace to reconstruct 
the code flow. It means, to find out the executed code paths, determine 
branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to 
enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
a feature has numerous applications like malware monitoring, fuzzing, or 
performance testing.

Also thanks to Tamas K Lengyel for a few preliminary hints before
first version of this patch was submitted to xen-devel.

Changed since v1:
  * MSR_RTIT_CTL is managed using MSR load lists
  * other PT-related MSRs are modified only when vCPU goes out of context
  * trace buffer is now acquired as a resource
  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
    must be specified in the moment of domain creation
  * trace buffers are allocated on domain creation, destructed on
    domain destruction
  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
    these calls don't manage buffer memory anymore
  * lifted 32 MFN/GFN array limit when acquiring resources
  * minor code style changes according to review

Changed since v2:
  * trace buffer is now allocated on domain creation (in v2 it was
    allocated when hvm param was set)
  * restored 32-item limit in mfn/gfn arrays in acquire_resource
    and instead implemented hypercall continuations
  * code changes according to Jan's and Roger's review

Changed since v3:
  * vmtrace HVMOPs are not implemented as DOMCTLs
  * patches splitted up according to Andrew's comments
  * code changes according to v3 review on the mailing list


Michal Leszczynski (10):
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  tools/libxl: add vmtrace_pt_size parameter
  x86/vmx: implement processor tracing for VMX
  common/domain: allocate vmtrace_pt_buffer
  memory: batch processing in acquire_resource()
  x86/mm: add vmtrace_buf resource type
  x86/domctl: add XEN_DOMCTL_vmtrace_op
  tools/libxc: add xc_vmtrace_* functions
  tools/proctrace: add proctrace tool

 docs/man/xl.cfg.5.pod.in                    |  10 +
 tools/golang/xenlight/helpers.gen.go        |   2 +
 tools/golang/xenlight/types.gen.go          |   1 +
 tools/libxc/Makefile                        |   1 +
 tools/libxc/include/xenctrl.h               |  39 +++
 tools/libxc/xc_vmtrace.c                    |  73 +++++
 tools/libxl/libxl.h                         |   8 +
 tools/libxl/libxl_create.c                  |   1 +
 tools/libxl/libxl_types.idl                 |   2 +
 tools/proctrace/COPYING                     | 339 ++++++++++++++++++++
 tools/proctrace/Makefile                    |  48 +++
 tools/proctrace/proctrace.c                 | 163 ++++++++++
 tools/xl/xl_parse.c                         |  20 ++
 xen/arch/x86/domain.c                       |  11 +
 xen/arch/x86/domctl.c                       |  48 +++
 xen/arch/x86/hvm/vmx/vmcs.c                 |   7 +-
 xen/arch/x86/hvm/vmx/vmx.c                  |  89 +++++
 xen/arch/x86/mm.c                           |  25 ++
 xen/common/domain.c                         |  46 +++
 xen/common/memory.c                         |  32 +-
 xen/include/asm-x86/cpufeature.h            |   1 +
 xen/include/asm-x86/domain.h                |   4 +
 xen/include/asm-x86/hvm/hvm.h               |  38 +++
 xen/include/asm-x86/hvm/vmx/vmcs.h          |   4 +
 xen/include/asm-x86/hvm/vmx/vmx.h           |  14 +
 xen/include/asm-x86/msr-index.h             |  37 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/domctl.h                 |  27 ++
 xen/include/public/memory.h                 |   1 +
 xen/include/xen/domain.h                    |   2 +
 xen/include/xen/sched.h                     |   4 +
 31 files changed, 1094 insertions(+), 4 deletions(-)
 create mode 100644 tools/libxc/xc_vmtrace.c
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

-- 
2.20.1



^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-06-30 16:23   ` Jan Beulich
                     ` (2 more replies)
  2020-06-30 12:33 ` [PATCH v4 02/10] x86/vmx: add IPT cpu feature Michał Leszczyński
                   ` (9 subsequent siblings)
  10 siblings, 3 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: tamas.lengyel, Wei Liu, Andrew Cooper, Michal Leszczynski,
	Jan Beulich, luwei.kang, Roger Pau Monné

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/include/asm-x86/msr-index.h | 37 +++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index b328a47ed8..0203029be9 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -69,6 +69,43 @@
 #define MSR_MCU_OPT_CTRL                    0x00000123
 #define  MCU_OPT_CTRL_RNGDS_MITG_DIS        (_AC(1, ULL) <<  0)
 
+/* Intel PT MSRs */
+#define MSR_RTIT_OUTPUT_BASE                0x00000560
+
+#define MSR_RTIT_OUTPUT_MASK                0x00000561
+
+#define MSR_RTIT_CTL                        0x00000570
+#define  RTIT_CTL_TRACEEN                    (_AC(1, ULL) <<  0)
+#define  RTIT_CTL_CYCEN                      (_AC(1, ULL) <<  1)
+#define  RTIT_CTL_OS                         (_AC(1, ULL) <<  2)
+#define  RTIT_CTL_USR                        (_AC(1, ULL) <<  3)
+#define  RTIT_CTL_PWR_EVT_EN                 (_AC(1, ULL) <<  4)
+#define  RTIT_CTL_FUP_ON_PTW                 (_AC(1, ULL) <<  5)
+#define  RTIT_CTL_FABRIC_EN                  (_AC(1, ULL) <<  6)
+#define  RTIT_CTL_CR3_FILTER                 (_AC(1, ULL) <<  7)
+#define  RTIT_CTL_TOPA                       (_AC(1, ULL) <<  8)
+#define  RTIT_CTL_MTC_EN                     (_AC(1, ULL) <<  9)
+#define  RTIT_CTL_TSC_EN                     (_AC(1, ULL) <<  10)
+#define  RTIT_CTL_DIS_RETC                   (_AC(1, ULL) <<  11)
+#define  RTIT_CTL_PTW_EN                     (_AC(1, ULL) <<  12)
+#define  RTIT_CTL_BRANCH_EN                  (_AC(1, ULL) <<  13)
+#define  RTIT_CTL_MTC_FREQ                   (_AC(0x0F, ULL) <<  14)
+#define  RTIT_CTL_CYC_THRESH                 (_AC(0x0F, ULL) <<  19)
+#define  RTIT_CTL_PSB_FREQ                   (_AC(0x0F, ULL) <<  24)
+#define  RTIT_CTL_ADDR(n)                    (_AC(0x0F, ULL) <<  (32 + (4 * (n))))
+
+#define MSR_RTIT_STATUS                     0x00000571
+#define  RTIT_STATUS_FILTER_EN               (_AC(1, ULL) <<  0)
+#define  RTIT_STATUS_CONTEXT_EN              (_AC(1, ULL) <<  1)
+#define  RTIT_STATUS_TRIGGER_EN              (_AC(1, ULL) <<  2)
+#define  RTIT_STATUS_ERROR                   (_AC(1, ULL) <<  4)
+#define  RTIT_STATUS_STOPPED                 (_AC(1, ULL) <<  5)
+#define  RTIT_STATUS_BYTECNT                 (_AC(0x1FFFF, ULL) <<  32)
+
+#define MSR_RTIT_CR3_MATCH                  0x00000572
+#define MSR_RTIT_ADDR_A(n)                  (0x00000580 + (n) * 2)
+#define MSR_RTIT_ADDR_B(n)                  (0x00000581 + (n) * 2)
+
 #define MSR_U_CET                           0x000006a0
 #define MSR_S_CET                           0x000006a2
 #define  CET_SHSTK_EN                       (_AC(1, ULL) <<  0)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
  2020-06-30 12:33 ` [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-01  9:49   ` Roger Pau Monné
                     ` (2 more replies)
  2020-06-30 12:33 ` [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
                   ` (8 subsequent siblings)
  10 siblings, 3 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Kevin Tian, Stefano Stabellini, tamas.lengyel,
	Jun Nakajima, Wei Liu, Andrew Cooper, Michal Leszczynski,
	Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Check if Intel Processor Trace feature is supported by current
processor. Define vmtrace_supported global variable.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/vmx/vmcs.c                 | 7 ++++++-
 xen/common/domain.c                         | 2 ++
 xen/include/asm-x86/cpufeature.h            | 1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h          | 1 +
 xen/include/public/arch-x86/cpufeatureset.h | 1 +
 xen/include/xen/domain.h                    | 2 ++
 6 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ca94c2bedc..b73d824357 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
         _vmx_cpu_based_exec_control &=
             ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
+    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+
+    /* Check whether IPT is supported in VMX operation. */
+    vmtrace_supported = cpu_has_ipt &&
+                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
+
     if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
     {
         min = 0;
@@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
                SECONDARY_EXEC_XSAVES |
                SECONDARY_EXEC_TSC_SCALING);
-        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
         if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
             opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
         if ( opt_vpid_enabled )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7cc9526139..0a33e0dfd6 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
 
 vcpu_info_t dummy_vcpu_info;
 
+bool_t vmtrace_supported;
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
     struct vcpu *v;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..8d7955dd87 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwb            boot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cd        boot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt             boot_cpu_has(X86_FEATURE_IPT)
 #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bw        boot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..0e9a0b8de6 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -283,6 +283,7 @@ extern u32 vmx_secondary_exec_control;
 #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x80000000000ULL
 extern u64 vmx_ept_vpid_cap;
 
+#define VMX_MISC_PT_SUPPORTED                   0x00004000
 #define VMX_MISC_CR3_TARGET                     0x01ff0000
 #define VMX_MISC_VMWRITE_ALL                    0x20000000
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 5ca35d9d97..0d3f15f628 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,          5*32+20) /*S  Supervisor Mode Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add */
 XEN_CPUFEATURE(CLFLUSHOPT,    5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,          5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(IPT,           5*32+25) /*   Intel Processor Trace */
 XEN_CPUFEATURE(AVX512PF,      5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,      5*32+27) /*A  AVX-512 Exponent & Reciprocal Instrs */
 XEN_CPUFEATURE(AVX512CD,      5*32+28) /*A  AVX-512 Conflict Detection Instrs */
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 7e51d361de..6c786a56c2 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -130,4 +130,6 @@ struct vnuma_info {
 
 void vnuma_destroy(struct vnuma_info *vnuma);
 
+extern bool_t vmtrace_supported;
+
 #endif /* __XEN_DOMAIN_H__ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
  2020-06-30 12:33 ` [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
  2020-06-30 12:33 ` [PATCH v4 02/10] x86/vmx: add IPT cpu feature Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-01 10:05   ` Roger Pau Monné
                     ` (3 more replies)
  2020-06-30 12:33 ` [PATCH v4 04/10] x86/vmx: implement processor tracing for VMX Michał Leszczyński
                   ` (7 subsequent siblings)
  10 siblings, 4 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michal Leszczynski, Ian Jackson, George Dunlap,
	Jan Beulich, Anthony PERARD, luwei.kang

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 docs/man/xl.cfg.5.pod.in             | 10 ++++++++++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libxl/libxl.h                  |  8 ++++++++
 tools/libxl/libxl_create.c           |  1 +
 tools/libxl/libxl_types.idl          |  2 ++
 tools/xl/xl_parse.c                  | 20 ++++++++++++++++++++
 xen/common/domain.c                  | 12 ++++++++++++
 xen/include/public/domctl.h          |  1 +
 9 files changed, 57 insertions(+)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0532739c1f..78f434b722 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -278,6 +278,16 @@ memory=8096 will report significantly less memory available for use
 than a system with maxmem=8096 memory=8096 due to the memory overhead
 of having to track the unused pages.
 
+=item B<vmtrace_pt_size=BYTES>
+
+Specifies the size of processor trace buffer that would be allocated
+for each vCPU belonging to this domain. Disabled (i.e. B<vmtrace_pt_size=0>
+by default. This must be set to non-zero value in order to be able to
+use processor tracing features with this domain.
+
+B<NOTE>: The size value must be between 4 kB and 4 GB and it must
+be also a power of 2.
+
 =back
 
 =head3 Guest Virtual NUMA Configuration
diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go
index 935d3bc50a..ecace9634e 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.VmtracePtOrder = int(xc.vmtrace_pt_order)
 
  return nil}
 
@@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.vmtrace_pt_order = C.int(x.VmtracePtOrder)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go
index 663c1e86b4..f9b07ac862 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -516,6 +516,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+VmtracePtOrder int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 71709dc585..891e8e28d6 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -438,6 +438,14 @@
  */
 #define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
 
+/*
+ * LIBXL_HAVE_VMTRACE_PT_ORDER indicates that
+ * libxl_domain_create_info has a vmtrace_pt_order parameter, which
+ * allows to enable pre-allocation of processor tracing buffers
+ * with the given order of size.
+ */
+#define LIBXL_HAVE_VMTRACE_PT_ORDER 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 75862dc6ed..651d1f4c0f 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -608,6 +608,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
             .max_evtchn_port = b_info->event_channels,
             .max_grant_frames = b_info->max_grant_frames,
             .max_maptrack_frames = b_info->max_maptrack_frames,
+            .vmtrace_pt_order = b_info->vmtrace_pt_order,
         };
 
         if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f399..1c5dd43e4d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
     # supported by x86 HVM and ARM support is planned.
     ("altp2m", libxl_altp2m_mode),
 
+    ("vmtrace_pt_order", integer),
+
     ], dir=DIR_IN,
        copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7b7e..4eba224590 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1861,6 +1861,26 @@ void parse_config_data(const char *config_source,
         }
     }
 
+    if (!xlu_cfg_get_long(config, "vmtrace_pt_size", &l, 1) && l) {
+        int32_t shift = 0;
+
+        if (l & (l - 1))
+        {
+            fprintf(stderr, "ERROR: pt buffer size must be a power of 2\n");
+            exit(1);
+        }
+
+        while (l >>= 1) ++shift;
+
+        if (shift <= XEN_PAGE_SHIFT)
+        {
+            fprintf(stderr, "ERROR: too small pt buffer\n");
+            exit(1);
+        }
+
+        b_info->vmtrace_pt_order = shift - XEN_PAGE_SHIFT;
+    }
+
     if (!xlu_cfg_get_list(config, "ioports", &ioports, &num_ioports, 0)) {
         b_info->num_ioports = num_ioports;
         b_info->ioports = calloc(num_ioports, sizeof(*b_info->ioports));
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 0a33e0dfd6..27dcfbac8c 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -338,6 +338,12 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    if ( config->vmtrace_pt_order && !vmtrace_supported )
+    {
+        dprintk(XENLOG_INFO, "Processor tracing is not supported\n");
+        return -EINVAL;
+    }
+
     return arch_sanitise_domain_config(config);
 }
 
@@ -443,6 +449,12 @@ struct domain *domain_create(domid_t domid,
         d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
 
         radix_tree_init(&d->pirq_tree);
+
+        if ( config->vmtrace_pt_order )
+        {
+            uint32_t shift_val = config->vmtrace_pt_order + PAGE_SHIFT;
+            d->vmtrace_pt_size = (1ULL << shift_val);
+        }
     }
 
     if ( (err = arch_domain_create(d, config)) != 0 )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 59bdc28c89..7b8289d436 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
     uint32_t max_evtchn_port;
     int32_t max_grant_frames;
     int32_t max_maptrack_frames;
+    uint8_t vmtrace_pt_order;
 
     struct xen_arch_domainconfig arch;
 };
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 04/10] x86/vmx: implement processor tracing for VMX
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (2 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-01 10:30   ` Roger Pau Monné
  2020-06-30 12:33 ` [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer Michał Leszczyński
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, tamas.lengyel, Jun Nakajima, Wei Liu, Andrew Cooper,
	Michal Leszczynski, Jan Beulich, luwei.kang, Roger Pau Monné

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Use Intel Processor Trace feature in order to
provision vmtrace_pt_* features.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/vmx/vmx.c         | 89 ++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/hvm.h      | 38 +++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +
 xen/include/asm-x86/hvm/vmx/vmx.h  | 14 +++++
 4 files changed, 144 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ab19d9424e..db3f051b40 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -508,11 +508,24 @@ static void vmx_restore_host_msrs(void)
 
 static void vmx_save_guest_msrs(struct vcpu *v)
 {
+    uint64_t rtit_ctl;
+
     /*
      * We cannot cache SHADOW_GS_BASE while the VCPU runs, as it can
      * be updated at any time via SWAPGS, which we cannot trap.
      */
     v->arch.hvm.vmx.shadow_gs = rdgsshadow();
+
+    if ( unlikely(v->arch.hvm.vmx.pt_state &&
+                  v->arch.hvm.vmx.pt_state->active) )
+    {
+        rdmsrl(MSR_RTIT_CTL, rtit_ctl);
+        BUG_ON(rtit_ctl & RTIT_CTL_TRACEEN);
+
+        rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.pt_state->status);
+        rdmsrl(MSR_RTIT_OUTPUT_MASK,
+               v->arch.hvm.vmx.pt_state->output_mask.raw);
+    }
 }
 
 static void vmx_restore_guest_msrs(struct vcpu *v)
@@ -524,6 +537,17 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
 
     if ( cpu_has_msr_tsc_aux )
         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
+
+    if ( unlikely(v->arch.hvm.vmx.pt_state &&
+                  v->arch.hvm.vmx.pt_state->active) )
+    {
+        wrmsrl(MSR_RTIT_OUTPUT_BASE,
+               v->arch.hvm.vmx.pt_state->output_base);
+        wrmsrl(MSR_RTIT_OUTPUT_MASK,
+               v->arch.hvm.vmx.pt_state->output_mask.raw);
+        wrmsrl(MSR_RTIT_STATUS,
+               v->arch.hvm.vmx.pt_state->status);
+    }
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
@@ -2240,6 +2264,60 @@ static bool vmx_get_pending_event(struct vcpu *v, struct x86_event *info)
     return true;
 }
 
+static int vmx_init_pt(struct vcpu *v)
+{
+    v->arch.hvm.vmx.pt_state = xzalloc(struct pt_state);
+
+    if ( !v->arch.hvm.vmx.pt_state )
+        return -EFAULT;
+
+    if ( !v->arch.vmtrace.pt_buf )
+        return -EINVAL;
+
+    if ( !v->domain->vmtrace_pt_size )
+	return -EINVAL;
+
+    v->arch.hvm.vmx.pt_state->output_base = page_to_maddr(v->arch.vmtrace.pt_buf);
+    v->arch.hvm.vmx.pt_state->output_mask.raw = v->domain->vmtrace_pt_size - 1;
+
+    if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
+        return -EFAULT;
+
+    if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
+                              RTIT_CTL_TRACEEN | RTIT_CTL_OS |
+                              RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static int vmx_destroy_pt(struct vcpu* v)
+{
+    if ( v->arch.hvm.vmx.pt_state )
+        xfree(v->arch.hvm.vmx.pt_state);
+
+    v->arch.hvm.vmx.pt_state = NULL;
+    return 0;
+}
+
+static int vmx_control_pt(struct vcpu *v, bool_t enable)
+{
+    if ( !v->arch.hvm.vmx.pt_state )
+        return -EINVAL;
+
+    v->arch.hvm.vmx.pt_state->active = enable;
+    return 0;
+}
+
+static int vmx_get_pt_offset(struct vcpu *v, uint64_t *offset)
+{
+    if ( !v->arch.hvm.vmx.pt_state )
+        return -EINVAL;
+
+    *offset = v->arch.hvm.vmx.pt_state->output_mask.offset;
+    return 0;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -2295,6 +2373,10 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
     .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
+    .vmtrace_init_pt = vmx_init_pt,
+    .vmtrace_destroy_pt = vmx_destroy_pt,
+    .vmtrace_control_pt = vmx_control_pt,
+    .vmtrace_get_pt_offset = vmx_get_pt_offset,
     .tsc_scaling = {
         .max_ratio = VMX_TSC_MULTIPLIER_MAX,
     },
@@ -3674,6 +3756,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
     hvm_invalidate_regs_fields(regs);
 
+    if ( unlikely(v->arch.hvm.vmx.pt_state &&
+                  v->arch.hvm.vmx.pt_state->active) )
+    {
+        rdmsrl(MSR_RTIT_OUTPUT_MASK,
+               v->arch.hvm.vmx.pt_state->output_mask.raw);
+    }
+
     if ( paging_mode_hap(v->domain) )
     {
         /*
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..8f194889e5 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -214,6 +214,12 @@ struct hvm_function_table {
     bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
     int (*altp2m_vcpu_emulate_vmfunc)(const struct cpu_user_regs *regs);
 
+    /* vmtrace */
+    int (*vmtrace_init_pt)(struct vcpu *v);
+    int (*vmtrace_destroy_pt)(struct vcpu *v);
+    int (*vmtrace_control_pt)(struct vcpu *v, bool_t enable);
+    int (*vmtrace_get_pt_offset)(struct vcpu *v, uint64_t *offset);
+
     /*
      * Parameters and callbacks for hardware-assisted TSC scaling,
      * which are valid only when the hardware feature is available.
@@ -655,6 +661,38 @@ static inline bool altp2m_vcpu_emulate_ve(struct vcpu *v)
     return false;
 }
 
+static inline int vmtrace_init_pt(struct vcpu *v)
+{
+    if ( hvm_funcs.vmtrace_init_pt )
+        return hvm_funcs.vmtrace_init_pt(v);
+
+    return -EOPNOTSUPP;
+}
+
+static inline int vmtrace_destroy_pt(struct vcpu *v)
+{
+    if ( hvm_funcs.vmtrace_destroy_pt )
+        return hvm_funcs.vmtrace_destroy_pt(v);
+
+    return -EOPNOTSUPP;
+}
+
+static inline int vmtrace_control_pt(struct vcpu *v, bool_t enable)
+{
+    if ( hvm_funcs.vmtrace_control_pt )
+        return hvm_funcs.vmtrace_control_pt(v, enable);
+
+    return -EOPNOTSUPP;
+}
+
+static inline int vmtrace_get_pt_offset(struct vcpu *v, uint64_t *offset)
+{
+    if ( hvm_funcs.vmtrace_get_pt_offset )
+        return hvm_funcs.vmtrace_get_pt_offset(v, offset);
+
+    return -EOPNOTSUPP;
+}
+
 /*
  * This must be defined as a macro instead of an inline function,
  * because it uses 'struct vcpu' and 'struct domain' which have
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 0e9a0b8de6..64c0d82614 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -186,6 +186,9 @@ struct vmx_vcpu {
      * pCPU and wakeup the related vCPU.
      */
     struct pi_blocking_vcpu pi_blocking;
+
+    /* State of processor trace feature */
+    struct pt_state      *pt_state;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 111ccd7e61..be7213d3c0 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -689,4 +689,18 @@ typedef union ldt_or_tr_instr_info {
     };
 } ldt_or_tr_instr_info_t;
 
+/* Processor Trace state per vCPU */
+struct pt_state {
+    bool_t active;
+    uint64_t status;
+    uint64_t output_base;
+    union {
+        uint64_t raw;
+        struct {
+            uint32_t size;
+            uint32_t offset;
+        };
+    } output_mask;
+};
+
 #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (3 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 04/10] x86/vmx: implement processor tracing for VMX Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-01 10:38   ` Roger Pau Monné
  2020-07-01 15:35   ` Julien Grall
  2020-06-30 12:33 ` [PATCH v4 06/10] memory: batch processing in acquire_resource() Michał Leszczyński
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michal Leszczynski, Ian Jackson, George Dunlap,
	Jan Beulich, luwei.kang, Roger Pau Monné

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Allocate processor trace buffer for each vCPU when the domain
is created, deallocate trace buffers on domain destruction.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/domain.c        | 11 +++++++++++
 xen/common/domain.c          | 32 ++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h |  4 ++++
 xen/include/xen/sched.h      |  4 ++++
 4 files changed, 51 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fee6c3931a..0d79fd390c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2199,6 +2199,17 @@ int domain_relinquish_resources(struct domain *d)
                 altp2m_vcpu_disable_ve(v);
         }
 
+        for_each_vcpu ( d, v )
+        {
+            if ( !v->arch.vmtrace.pt_buf )
+                continue;
+
+            vmtrace_destroy_pt(v);
+
+            free_domheap_pages(v->arch.vmtrace.pt_buf,
+                get_order_from_bytes(v->domain->vmtrace_pt_size));
+        }
+
         if ( is_pv_domain(d) )
         {
             for_each_vcpu ( d, v )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 27dcfbac8c..8513659ef8 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -137,6 +137,31 @@ static void vcpu_destroy(struct vcpu *v)
     free_vcpu_struct(v);
 }
 
+static int vmtrace_alloc_buffers(struct vcpu *v)
+{
+    struct page_info *pg;
+    uint64_t size = v->domain->vmtrace_pt_size;
+
+    if ( size < PAGE_SIZE || size > GB(4) || (size & (size - 1)) )
+    {
+        /*
+         * We don't accept trace buffer size smaller than single page
+         * and the upper bound is defined as 4GB in the specification.
+         * The buffer size must be also a power of 2.
+         */
+        return -EINVAL;
+    }
+
+    pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
+                             MEMF_no_refcount);
+
+    if ( !pg )
+        return -ENOMEM;
+
+    v->arch.vmtrace.pt_buf = pg;
+    return 0;
+}
+
 struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
 {
     struct vcpu *v;
@@ -162,6 +187,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
     v->vcpu_id = vcpu_id;
     v->dirty_cpu = VCPU_CPU_CLEAN;
 
+    if ( d->vmtrace_pt_size && vmtrace_alloc_buffers(v) != 0 )
+        return NULL;
+
     spin_lock_init(&v->virq_lock);
 
     tasklet_init(&v->continue_hypercall_tasklet, NULL, NULL);
@@ -188,6 +216,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
     if ( arch_vcpu_create(v) != 0 )
         goto fail_sched;
 
+    if ( d->vmtrace_pt_size && vmtrace_init_pt(v) != 0 )
+        goto fail_sched;
+
     d->vcpu[vcpu_id] = v;
     if ( vcpu_id != 0 )
     {
@@ -422,6 +453,7 @@ struct domain *domain_create(domid_t domid,
     d->shutdown_code = SHUTDOWN_CODE_INVALID;
 
     spin_lock_init(&d->pbuf_lock);
+    spin_lock_init(&d->vmtrace_lock);
 
     rwlock_init(&d->vnuma_rwlock);
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 6fd94c2e14..b01c107f5c 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -627,6 +627,10 @@ struct arch_vcpu
     struct {
         bool next_interrupt_enabled;
     } monitor;
+
+    struct {
+        struct page_info *pt_buf;
+    } vmtrace;
 };
 
 struct guest_memory_policy
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..48f0a61bbd 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -457,6 +457,10 @@ struct domain
     unsigned    pbuf_idx;
     spinlock_t  pbuf_lock;
 
+    /* Used by vmtrace features */
+    spinlock_t  vmtrace_lock;
+    uint64_t    vmtrace_pt_size;
+
     /* OProfile support. */
     struct xenoprof *xenoprof;
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (4 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-01 10:46   ` Roger Pau Monné
  2020-07-03 10:35   ` Julien Grall
  2020-06-30 12:33 ` [PATCH v4 07/10] x86/mm: add vmtrace_buf resource type Michał Leszczyński
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michal Leszczynski, Ian Jackson, George Dunlap,
	Jan Beulich, luwei.kang

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Allow to acquire large resources by allowing acquire_resource()
to process items in batches, using hypercall continuation.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..3ab06581a2 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
 }
 
 static int acquire_resource(
-    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
+    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
+    unsigned long *start_extent)
 {
     struct domain *d, *currd = current->domain;
     xen_mem_acquire_resource_t xmar;
+    uint32_t total_frames;
     /*
      * The mfn_list and gfn_list (below) arrays are ok on stack for the
      * moment since they are small, but if they need to grow in future
@@ -1077,8 +1079,17 @@ static int acquire_resource(
         return 0;
     }
 
+    total_frames = xmar.nr_frames;
+
+    if ( *start_extent )
+    {
+        xmar.frame += *start_extent;
+        xmar.nr_frames -= *start_extent;
+        guest_handle_add_offset(xmar.frame_list, *start_extent);
+    }
+
     if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-        return -E2BIG;
+        xmar.nr_frames = ARRAY_SIZE(mfn_list);
 
     rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
     if ( rc )
@@ -1135,6 +1146,14 @@ static int acquire_resource(
         }
     }
 
+    if ( !rc )
+    {
+        *start_extent += xmar.nr_frames;
+
+        if ( *start_extent != total_frames )
+            rc = -ERESTART;
+    }
+
  out:
     rcu_unlock_domain(d);
 
@@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case XENMEM_acquire_resource:
         rc = acquire_resource(
-            guest_handle_cast(arg, xen_mem_acquire_resource_t));
+            guest_handle_cast(arg, xen_mem_acquire_resource_t),
+            &start_extent);
+
+        if ( rc == -ERESTART )
+            return hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "lh",
+                op | (start_extent << MEMOP_EXTENT_SHIFT), arg);
+
         break;
 
     default:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 07/10] x86/mm: add vmtrace_buf resource type
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (5 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 06/10] memory: batch processing in acquire_resource() Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-01 10:52   ` Roger Pau Monné
  2020-06-30 12:33 ` [PATCH v4 08/10] x86/domctl: add XEN_DOMCTL_vmtrace_op Michał Leszczyński
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michal Leszczynski, Ian Jackson, George Dunlap,
	Jan Beulich, luwei.kang, Roger Pau Monné

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Allow to map processor trace buffer using
acquire_resource().

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/mm.c           | 25 +++++++++++++++++++++++++
 xen/include/public/memory.h |  1 +
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e376fc7e8f..bb781bd90c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4624,6 +4624,31 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
         }
         break;
     }
+
+    case XENMEM_resource_vmtrace_buf:
+    {
+        mfn_t mfn;
+        unsigned int i;
+        struct vcpu *v = domain_vcpu(d, id);
+        rc = -EINVAL;
+
+        if ( !v )
+            break;
+
+        if ( !v->arch.vmtrace.pt_buf )
+            break;
+
+        mfn = page_to_mfn(v->arch.vmtrace.pt_buf);
+
+        if ( frame + nr_frames > (v->domain->vmtrace_pt_size >> PAGE_SHIFT) )
+            break;
+
+        rc = 0;
+        for ( i = 0; i < nr_frames; i++ )
+            mfn_list[i] = mfn_x(mfn_add(mfn, frame + i));
+
+        break;
+    }
 #endif
 
     default:
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index dbd35305df..f823c784c3 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -620,6 +620,7 @@ struct xen_mem_acquire_resource {
 
 #define XENMEM_resource_ioreq_server 0
 #define XENMEM_resource_grant_table 1
+#define XENMEM_resource_vmtrace_buf 2
 
     /*
      * IN - a type-specific resource identifier, which must be zero
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 08/10] x86/domctl: add XEN_DOMCTL_vmtrace_op
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (6 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 07/10] x86/mm: add vmtrace_buf resource type Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-01 11:00   ` Roger Pau Monné
  2020-06-30 12:33 ` [PATCH v4 09/10] tools/libxc: add xc_vmtrace_* functions Michał Leszczyński
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michal Leszczynski, Ian Jackson, George Dunlap,
	Jan Beulich, luwei.kang, Roger Pau Monné

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Implement domctl to manage the runtime state of
processor trace feature.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/domctl.c       | 48 +++++++++++++++++++++++++++++++++++++
 xen/include/public/domctl.h | 26 ++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 6f2c69788d..a041b724d8 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -322,6 +322,48 @@ void arch_get_domain_info(const struct domain *d,
     info->arch_config.emulation_flags = d->arch.emulation_flags;
 }
 
+static int do_vmtrace_op(struct domain *d, struct xen_domctl_vmtrace_op *op,
+                         XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+{
+    int rc;
+    struct vcpu *v;
+
+    if ( !vmtrace_supported )
+        return -EOPNOTSUPP;
+
+    if ( !is_hvm_domain(d) )
+        return -EOPNOTSUPP;
+
+    if ( op->vcpu >= d->max_vcpus )
+        return -EINVAL;
+
+    v = domain_vcpu(d, op->vcpu);
+    rc = 0;
+
+    switch ( op->cmd )
+    {
+    case XEN_DOMCTL_vmtrace_pt_enable:
+    case XEN_DOMCTL_vmtrace_pt_disable:
+        vcpu_pause(v);
+        spin_lock(&d->vmtrace_lock);
+
+        rc = vmtrace_control_pt(v, op->cmd == XEN_DOMCTL_vmtrace_pt_enable);
+
+        spin_unlock(&d->vmtrace_lock);
+        vcpu_unpause(v);
+        break;
+
+    case XEN_DOMCTL_vmtrace_pt_get_offset:
+        rc = vmtrace_get_pt_offset(v, &op->offset);
+        break;
+
+    default:
+        rc = -EOPNOTSUPP;
+    }
+
+    return rc;
+}
+
 #define MAX_IOPORTS 0x10000
 
 long arch_do_domctl(
@@ -337,6 +379,12 @@ long arch_do_domctl(
     switch ( domctl->cmd )
     {
 
+    case XEN_DOMCTL_vmtrace_op:
+        ret = do_vmtrace_op(d, &domctl->u.vmtrace_op, u_domctl);
+        if ( !ret )
+            copyback = true;
+	break;
+
     case XEN_DOMCTL_shadow_op:
         ret = paging_domctl(d, &domctl->u.shadow_op, u_domctl, 0);
         if ( ret == -ERESTART )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 7b8289d436..f836cb5970 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1136,6 +1136,28 @@ struct xen_domctl_vuart_op {
                                  */
 };
 
+/* XEN_DOMCTL_vmtrace_op: Perform VM tracing related operation */
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+
+struct xen_domctl_vmtrace_op {
+    /* IN variable */
+    uint32_t cmd;
+/* Enable/disable external vmtrace for given domain */
+#define XEN_DOMCTL_vmtrace_pt_enable      1
+#define XEN_DOMCTL_vmtrace_pt_disable     2
+#define XEN_DOMCTL_vmtrace_pt_get_offset  3
+    domid_t domain;
+    uint32_t vcpu;
+    uint64_aligned_t size;
+
+    /* OUT variable */
+    uint64_aligned_t offset;
+};
+typedef struct xen_domctl_vmtrace_op xen_domctl_vmtrace_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vmtrace_op_t);
+
+#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -1217,6 +1239,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_vuart_op                      81
 #define XEN_DOMCTL_get_cpu_policy                82
 #define XEN_DOMCTL_set_cpu_policy                83
+#define XEN_DOMCTL_vmtrace_op                    84
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1277,6 +1300,9 @@ struct xen_domctl {
         struct xen_domctl_monitor_op        monitor_op;
         struct xen_domctl_psr_alloc         psr_alloc;
         struct xen_domctl_vuart_op          vuart_op;
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+        struct xen_domctl_vmtrace_op        vmtrace_op;
+#endif
         uint8_t                             pad[128];
     } u;
 };
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 09/10] tools/libxc: add xc_vmtrace_* functions
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (7 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 08/10] x86/domctl: add XEN_DOMCTL_vmtrace_op Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-21 10:52   ` Wei Liu
  2020-06-30 12:33 ` [PATCH v4 10/10] tools/proctrace: add proctrace tool Michał Leszczyński
  2020-06-30 12:48 ` [PATCH v4 00/10] Implement support for external IPT monitoring Hubert Jasudowicz
  10 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: tamas.lengyel, Michal Leszczynski, luwei.kang, Ian Jackson, Wei Liu

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Add functions in libxc that use the new XEN_DOMCTL_vmtrace interface.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 tools/libxc/Makefile          |  1 +
 tools/libxc/include/xenctrl.h | 39 +++++++++++++++++++
 tools/libxc/xc_vmtrace.c      | 73 +++++++++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+)
 create mode 100644 tools/libxc/xc_vmtrace.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fae5969a73..605e44501d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -27,6 +27,7 @@ CTRL_SRCS-y       += xc_csched2.c
 CTRL_SRCS-y       += xc_arinc653.c
 CTRL_SRCS-y       += xc_rt.c
 CTRL_SRCS-y       += xc_tbuf.c
+CTRL_SRCS-y       += xc_vmtrace.c
 CTRL_SRCS-y       += xc_pm.c
 CTRL_SRCS-y       += xc_cpu_hotplug.c
 CTRL_SRCS-y       += xc_resume.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 113ddd935d..66966f6c17 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,45 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable processor trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_enable(xc_interface *xch, uint32_t domid,
+                         uint32_t vcpu);
+
+/**
+ * Disable processor trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid,
+                          uint32_t vcpu);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_get_offset(xc_interface *xch, uint32_t domid,
+                             uint32_t vcpu, uint64_t *offset);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_vmtrace.c b/tools/libxc/xc_vmtrace.c
new file mode 100644
index 0000000000..32f90a6203
--- /dev/null
+++ b/tools/libxc/xc_vmtrace.c
@@ -0,0 +1,73 @@
+/******************************************************************************
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "xc_private.h"
+#include <xen/trace.h>
+
+int xc_vmtrace_pt_enable(
+        xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+    DECLARE_DOMCTL;
+    int rc;
+
+    domctl.cmd = XEN_DOMCTL_vmtrace_op;
+    domctl.domain = domid;
+    domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_enable;
+    domctl.u.vmtrace_op.vcpu = vcpu;
+
+    rc = do_domctl(xch, &domctl);
+    return rc;
+}
+
+int xc_vmtrace_pt_get_offset(
+        xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset)
+{
+    DECLARE_DOMCTL;
+    int rc;
+
+    domctl.cmd = XEN_DOMCTL_vmtrace_op;
+    domctl.domain = domid;
+    domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_get_offset;
+    domctl.u.vmtrace_op.vcpu = vcpu;
+
+    rc = do_domctl(xch, &domctl);
+    if ( !rc )
+        *offset = domctl.u.vmtrace_op.offset;
+    return rc;
+}
+
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+    DECLARE_DOMCTL;
+    int rc;
+
+    domctl.cmd = XEN_DOMCTL_vmtrace_op;
+    domctl.domain = domid;
+    domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_disable;
+    domctl.u.vmtrace_op.vcpu = vcpu;
+
+    rc = do_domctl(xch, &domctl);
+    return rc;
+}
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 10/10] tools/proctrace: add proctrace tool
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (8 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 09/10] tools/libxc: add xc_vmtrace_* functions Michał Leszczyński
@ 2020-06-30 12:33 ` Michał Leszczyński
  2020-07-02 15:10   ` Andrew Cooper
  2020-06-30 12:48 ` [PATCH v4 00/10] Implement support for external IPT monitoring Hubert Jasudowicz
  10 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 12:33 UTC (permalink / raw)
  To: xen-devel
  Cc: tamas.lengyel, Michal Leszczynski, luwei.kang, Ian Jackson, Wei Liu

From: Michal Leszczynski <michal.leszczynski@cert.pl>

Add an demonstration tool that uses xc_vmtrace_* calls in order
to manage external IPT monitoring for DomU.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 tools/proctrace/COPYING     | 339 ++++++++++++++++++++++++++++++++++++
 tools/proctrace/Makefile    |  48 +++++
 tools/proctrace/proctrace.c | 163 +++++++++++++++++
 3 files changed, 550 insertions(+)
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

diff --git a/tools/proctrace/COPYING b/tools/proctrace/COPYING
new file mode 100644
index 0000000000..c0a841112c
--- /dev/null
+++ b/tools/proctrace/COPYING
@@ -0,0 +1,339 @@
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+                       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+		    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+			    NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+	    How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; If not, see <http://www.gnu.org/licenses/>.
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
new file mode 100644
index 0000000000..2983c477fe
--- /dev/null
+++ b/tools/proctrace/Makefile
@@ -0,0 +1,48 @@
+# Copyright (C) CERT Polska - NASK PIB
+# Author: Michał Leszczyński <michal.leszczynski@cert.pl>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; under version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += -Werror
+CFLAGS  += $(CFLAGS_libxenevtchn)
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenevtchn)
+LDLIBS  += $(LDLIBS_libxenforeignmemory)
+
+.PHONY: all
+all: build
+
+.PHONY: build
+build: proctrace
+
+.PHONY: install
+install: build
+	$(INSTALL_DIR) $(DESTDIR)$(sbindir)
+	$(INSTALL_PROG) proctrace $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: uninstall
+uninstall:
+	rm -f $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: clean
+clean:
+	$(RM) -f $(DEPS_RM)
+
+.PHONY: distclean
+distclean: clean
+
+iptlive: iptlive.o Makefile
+	$(CC) $(LDFLAGS) $< -o $@ $(LDLIBS) $(APPEND_LDFLAGS)
+
+-include $(DEPS_INCLUDE)
diff --git a/tools/proctrace/proctrace.c b/tools/proctrace/proctrace.c
new file mode 100644
index 0000000000..22bf91db8d
--- /dev/null
+++ b/tools/proctrace/proctrace.c
@@ -0,0 +1,163 @@
+/******************************************************************************
+ * tools/proctrace.c
+ *
+ * Demonstrative tool for collecting Intel Processor Trace data from Xen.
+ *  Could be used to externally monitor a given vCPU in given DomU.
+ *
+ * Copyright (C) 2020 by CERT Polska - NASK PIB
+ *
+ * Authors: Michał Leszczyński, michal.leszczynski@cert.pl
+ * Date:    June, 2020
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; under version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include <signal.h>
+
+#include <xenctrl.h>
+#include <xen/xen.h>
+#include <xenforeignmemory.h>
+
+#define BUF_SIZE (16384 * XC_PAGE_SIZE)
+
+volatile int interrupted = 0;
+
+void term_handler(int signum) {
+    interrupted = 1;
+}
+
+int main(int argc, char* argv[]) {
+    xc_interface *xc;
+    uint32_t domid;
+    uint32_t vcpu_id;
+
+    int rc = -1;
+    uint8_t *buf = NULL;
+    uint64_t last_offset = 0;
+
+    xenforeignmemory_handle *fmem;
+    xenforeignmemory_resource_handle *fres;
+
+    if (signal(SIGINT, term_handler) == SIG_ERR)
+    {
+        fprintf(stderr, "Failed to register signal handler\n");
+        return 1;
+    }
+
+    if (argc != 3) {
+        fprintf(stderr, "Usage: %s <domid> <vcpu_id>\n", argv[0]);
+        fprintf(stderr, "It's recommended to redirect this"
+                        "program's output to file\n");
+        fprintf(stderr, "or to pipe it's output to xxd or other program.\n");
+        return 1;
+    }
+
+    domid = atoi(argv[1]);
+    vcpu_id = atoi(argv[2]);
+
+    xc = xc_interface_open(0, 0, 0);
+
+    fmem = xenforeignmemory_open(0, 0);
+
+    if (!xc) {
+        fprintf(stderr, "Failed to open xc interface\n");
+        return 1;
+    }
+
+    rc = xc_vmtrace_pt_enable(xc, domid, vcpu_id);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_vmtrace_pt_enable\n");
+        return 1;
+    }
+
+    fres = xenforeignmemory_map_resource(
+        fmem, domid, XENMEM_resource_vmtrace_buf,
+        /* vcpu: */ vcpu_id,
+        /* frame: */ 0,
+        /* num_frames: */ BUF_SIZE >> XC_PAGE_SHIFT,
+        (void **)&buf,
+        PROT_READ, 0);
+
+    if (!buf) {
+        fprintf(stderr, "Failed to map trace buffer\n");
+        return 1;
+    }
+
+    while (!interrupted) {
+        uint64_t offset;
+        rc = xc_vmtrace_pt_get_offset(xc, domid, vcpu_id, &offset);
+
+        if (rc) {
+            fprintf(stderr, "Failed to call xc_vmtrace_pt_get_offset\n");
+            return 1;
+        }
+
+        if (offset > last_offset)
+        {
+            fwrite(buf + last_offset, offset - last_offset, 1, stdout);
+        }
+        else if (offset < last_offset)
+        {
+            // buffer wrapped
+            fwrite(buf + last_offset, BUF_SIZE - last_offset, 1, stdout);
+            fwrite(buf, offset, 1, stdout);
+        }
+
+        last_offset = offset;
+        usleep(1000 * 100);
+    }
+
+    rc = xenforeignmemory_unmap_resource(fmem, fres);
+
+    if (rc) {
+        fprintf(stderr, "Failed to unmap resource\n");
+        return 1;
+    }
+
+    rc = xenforeignmemory_close(fmem);
+
+    if (rc) {
+        fprintf(stderr, "Failed to close fmem\n");
+        return 1;
+    }
+
+    rc = xc_vmtrace_pt_disable(xc, domid, vcpu_id);
+
+    if (rc) {
+        fprintf(stderr, "Failed to call xc_vmtrace_pt_disable\n");
+        return 1;
+    }
+
+    rc = xc_interface_close(xc);
+
+    if (rc) {
+        fprintf(stderr, "Failed to close xc interface\n");
+        return 1;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 00/10] Implement support for external IPT monitoring
  2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
                   ` (9 preceding siblings ...)
  2020-06-30 12:33 ` [PATCH v4 10/10] tools/proctrace: add proctrace tool Michał Leszczyński
@ 2020-06-30 12:48 ` Hubert Jasudowicz
  10 siblings, 0 replies; 75+ messages in thread
From: Hubert Jasudowicz @ 2020-06-30 12:48 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jun Nakajima, Anthony PERARD, Julien Grall, Roger Pau Monné

On 6/30/20 2:33 PM, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Intel Processor Trace is an architectural extension available in modern Intel 
> family CPUs. It allows recording the detailed trace of activity while the 
> processor executes the code. One might use the recorded trace to reconstruct 
> the code flow. It means, to find out the executed code paths, determine 
> branches taken, and so forth.
> 
> The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
> Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
> Chapter 36: "Intel Processor Trace."
> 
> This patch series implements an interface that Dom0 could use in order to 
> enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
> a feature has numerous applications like malware monitoring, fuzzing, or 
> performance testing.
> 
> Also thanks to Tamas K Lengyel for a few preliminary hints before
> first version of this patch was submitted to xen-devel.
> 
> Changed since v1:
>   * MSR_RTIT_CTL is managed using MSR load lists
>   * other PT-related MSRs are modified only when vCPU goes out of context
>   * trace buffer is now acquired as a resource
>   * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
>     must be specified in the moment of domain creation
>   * trace buffers are allocated on domain creation, destructed on
>     domain destruction
>   * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
>     these calls don't manage buffer memory anymore
>   * lifted 32 MFN/GFN array limit when acquiring resources
>   * minor code style changes according to review
> 
> Changed since v2:
>   * trace buffer is now allocated on domain creation (in v2 it was
>     allocated when hvm param was set)
>   * restored 32-item limit in mfn/gfn arrays in acquire_resource
>     and instead implemented hypercall continuations
>   * code changes according to Jan's and Roger's review
> 
> Changed since v3:
>   * vmtrace HVMOPs are not implemented as DOMCTLs
>   * patches splitted up according to Andrew's comments
>   * code changes according to v3 review on the mailing list
> 
> 
> Michal Leszczynski (10):
>   x86/vmx: add Intel PT MSR definitions
>   x86/vmx: add IPT cpu feature
>   tools/libxl: add vmtrace_pt_size parameter
>   x86/vmx: implement processor tracing for VMX
>   common/domain: allocate vmtrace_pt_buffer
>   memory: batch processing in acquire_resource()
>   x86/mm: add vmtrace_buf resource type
>   x86/domctl: add XEN_DOMCTL_vmtrace_op
>   tools/libxc: add xc_vmtrace_* functions
>   tools/proctrace: add proctrace tool
> 
>  docs/man/xl.cfg.5.pod.in                    |  10 +
>  tools/golang/xenlight/helpers.gen.go        |   2 +
>  tools/golang/xenlight/types.gen.go          |   1 +
>  tools/libxc/Makefile                        |   1 +
>  tools/libxc/include/xenctrl.h               |  39 +++
>  tools/libxc/xc_vmtrace.c                    |  73 +++++
>  tools/libxl/libxl.h                         |   8 +
>  tools/libxl/libxl_create.c                  |   1 +
>  tools/libxl/libxl_types.idl                 |   2 +
>  tools/proctrace/COPYING                     | 339 ++++++++++++++++++++
>  tools/proctrace/Makefile                    |  48 +++
>  tools/proctrace/proctrace.c                 | 163 ++++++++++
>  tools/xl/xl_parse.c                         |  20 ++
>  xen/arch/x86/domain.c                       |  11 +
>  xen/arch/x86/domctl.c                       |  48 +++
>  xen/arch/x86/hvm/vmx/vmcs.c                 |   7 +-
>  xen/arch/x86/hvm/vmx/vmx.c                  |  89 +++++
>  xen/arch/x86/mm.c                           |  25 ++
>  xen/common/domain.c                         |  46 +++
>  xen/common/memory.c                         |  32 +-
>  xen/include/asm-x86/cpufeature.h            |   1 +
>  xen/include/asm-x86/domain.h                |   4 +
>  xen/include/asm-x86/hvm/hvm.h               |  38 +++
>  xen/include/asm-x86/hvm/vmx/vmcs.h          |   4 +
>  xen/include/asm-x86/hvm/vmx/vmx.h           |  14 +
>  xen/include/asm-x86/msr-index.h             |  37 +++
>  xen/include/public/arch-x86/cpufeatureset.h |   1 +
>  xen/include/public/domctl.h                 |  27 ++
>  xen/include/public/memory.h                 |   1 +
>  xen/include/xen/domain.h                    |   2 +
>  xen/include/xen/sched.h                     |   4 +
>  31 files changed, 1094 insertions(+), 4 deletions(-)
>  create mode 100644 tools/libxc/xc_vmtrace.c
>  create mode 100644 tools/proctrace/COPYING
>  create mode 100644 tools/proctrace/Makefile
>  create mode 100644 tools/proctrace/proctrace.c
> 

FYI, this patchset is also available at:
https://github.com/icedevml/xen/tree/ipt-patch-v4

Hubert Jasudowicz


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions
  2020-06-30 12:33 ` [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
@ 2020-06-30 16:23   ` Jan Beulich
  2020-06-30 17:37   ` Andrew Cooper
  2020-07-01 17:52   ` Andrew Cooper
  2 siblings, 0 replies; 75+ messages in thread
From: Jan Beulich @ 2020-06-30 16:23 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: tamas.lengyel, Wei Liu, Andrew Cooper, luwei.kang, xen-devel,
	Roger Pau Monné

On 30.06.2020 14:33, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Define constants related to Intel Processor Trace features.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>

This needs re-basing onto current staging, now that Andrew's patch
to add the MSR numbers has gone in. Apart from this a couple of
cosmetic requests:

> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -69,6 +69,43 @@
>  #define MSR_MCU_OPT_CTRL                    0x00000123
>  #define  MCU_OPT_CTRL_RNGDS_MITG_DIS        (_AC(1, ULL) <<  0)
>  
> +/* Intel PT MSRs */
> +#define MSR_RTIT_OUTPUT_BASE                0x00000560
> +
> +#define MSR_RTIT_OUTPUT_MASK                0x00000561
> +
> +#define MSR_RTIT_CTL                        0x00000570
> +#define  RTIT_CTL_TRACEEN                    (_AC(1, ULL) <<  0)

The right side is indented one space too many - see the similar
#define in context above.

> +#define  RTIT_CTL_CYCEN                      (_AC(1, ULL) <<  1)
> +#define  RTIT_CTL_OS                         (_AC(1, ULL) <<  2)
> +#define  RTIT_CTL_USR                        (_AC(1, ULL) <<  3)
> +#define  RTIT_CTL_PWR_EVT_EN                 (_AC(1, ULL) <<  4)
> +#define  RTIT_CTL_FUP_ON_PTW                 (_AC(1, ULL) <<  5)
> +#define  RTIT_CTL_FABRIC_EN                  (_AC(1, ULL) <<  6)
> +#define  RTIT_CTL_CR3_FILTER                 (_AC(1, ULL) <<  7)
> +#define  RTIT_CTL_TOPA                       (_AC(1, ULL) <<  8)
> +#define  RTIT_CTL_MTC_EN                     (_AC(1, ULL) <<  9)
> +#define  RTIT_CTL_TSC_EN                     (_AC(1, ULL) <<  10)

The double blanks on the earlier lines exist such that here you
can reduce to a single one. You'll also find examples of this
further up in the file.

> +#define  RTIT_CTL_DIS_RETC                   (_AC(1, ULL) <<  11)
> +#define  RTIT_CTL_PTW_EN                     (_AC(1, ULL) <<  12)
> +#define  RTIT_CTL_BRANCH_EN                  (_AC(1, ULL) <<  13)
> +#define  RTIT_CTL_MTC_FREQ                   (_AC(0x0F, ULL) <<  14)

0xf please (i.e. lower case and no random number of leading
zeros).

> +#define  RTIT_CTL_CYC_THRESH                 (_AC(0x0F, ULL) <<  19)
> +#define  RTIT_CTL_PSB_FREQ                   (_AC(0x0F, ULL) <<  24)
> +#define  RTIT_CTL_ADDR(n)                    (_AC(0x0F, ULL) <<  (32 + (4 * (n))))

Strictly speaking we don't need the parentheses around the operands
of binary * here - in mathematics precedence between + and * is
well defined. (We do parenthesize certain other expressions, when
the precedence may not be as well known.)

Thanks, Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions
  2020-06-30 12:33 ` [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
  2020-06-30 16:23   ` Jan Beulich
@ 2020-06-30 17:37   ` Andrew Cooper
  2020-06-30 18:03     ` Tamas K Lengyel
  2020-07-01 17:52   ` Andrew Cooper
  2 siblings, 1 reply; 75+ messages in thread
From: Andrew Cooper @ 2020-06-30 17:37 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: tamas.lengyel, luwei.kang, Wei Liu, Jan Beulich, Roger Pau Monné

On 30/06/2020 13:33, Michał Leszczyński wrote:
> diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
> index b328a47ed8..0203029be9 100644
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -69,6 +69,43 @@
>  #define MSR_MCU_OPT_CTRL                    0x00000123
>  #define  MCU_OPT_CTRL_RNGDS_MITG_DIS        (_AC(1, ULL) <<  0)
>  
> +/* Intel PT MSRs */
> +#define MSR_RTIT_OUTPUT_BASE                0x00000560
> +
> +#define MSR_RTIT_OUTPUT_MASK                0x00000561
> +
> +#define MSR_RTIT_CTL                        0x00000570
> +#define  RTIT_CTL_TRACEEN                    (_AC(1, ULL) <<  0)
> +#define  RTIT_CTL_CYCEN                      (_AC(1, ULL) <<  1)

In addition to what Jan has said, please can we be consistent with an
underscore (or not) before EN.  Preferably with, so these would become
TRACE_EN and CYC_EN.

That said, there are a lot of bit definitions which aren't used at all. 
IMO, it would be better to introduce defines when you use them.

Thanks,

~Andrew


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions
  2020-06-30 17:37   ` Andrew Cooper
@ 2020-06-30 18:03     ` Tamas K Lengyel
  2020-06-30 18:27       ` Michał Leszczyński
  0 siblings, 1 reply; 75+ messages in thread
From: Tamas K Lengyel @ 2020-06-30 18:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tamas K Lengyel, Wei Liu, Michał Leszczyński, Kang,
	Luwei, Jan Beulich, Xen-devel, Roger Pau Monné

On Tue, Jun 30, 2020 at 11:39 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 30/06/2020 13:33, Michał Leszczyński wrote:
> > diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
> > index b328a47ed8..0203029be9 100644
> > --- a/xen/include/asm-x86/msr-index.h
> > +++ b/xen/include/asm-x86/msr-index.h
> > @@ -69,6 +69,43 @@
> >  #define MSR_MCU_OPT_CTRL                    0x00000123
> >  #define  MCU_OPT_CTRL_RNGDS_MITG_DIS        (_AC(1, ULL) <<  0)
> >
> > +/* Intel PT MSRs */
> > +#define MSR_RTIT_OUTPUT_BASE                0x00000560
> > +
> > +#define MSR_RTIT_OUTPUT_MASK                0x00000561
> > +
> > +#define MSR_RTIT_CTL                        0x00000570
> > +#define  RTIT_CTL_TRACEEN                    (_AC(1, ULL) <<  0)
> > +#define  RTIT_CTL_CYCEN                      (_AC(1, ULL) <<  1)
>
> In addition to what Jan has said, please can we be consistent with an
> underscore (or not) before EN.  Preferably with, so these would become
> TRACE_EN and CYC_EN.
>
> That said, there are a lot of bit definitions which aren't used at all.
> IMO, it would be better to introduce defines when you use them.

In the past I found it very valuable when this type of plumbing was
already present in Xen instead of me having to go into the SDM to digg
out the magic numbers. So while some of the bits might not be used
right now I also don't see any downside in having them, just in case.

Tamas


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions
  2020-06-30 18:03     ` Tamas K Lengyel
@ 2020-06-30 18:27       ` Michał Leszczyński
  0 siblings, 0 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-06-30 18:27 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, Andrew Cooper, Jan Beulich, Xen-devel,
	Kang, Luwei, Roger Pau Monné

----- 30 cze 2020 o 20:03, Tamas K Lengyel tamas.k.lengyel@gmail.com napisał(a):

> On Tue, Jun 30, 2020 at 11:39 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>>
>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>> > diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
>> > index b328a47ed8..0203029be9 100644
>> > --- a/xen/include/asm-x86/msr-index.h
>> > +++ b/xen/include/asm-x86/msr-index.h
>> > @@ -69,6 +69,43 @@
>> >  #define MSR_MCU_OPT_CTRL                    0x00000123
>> >  #define  MCU_OPT_CTRL_RNGDS_MITG_DIS        (_AC(1, ULL) <<  0)
>> >
>> > +/* Intel PT MSRs */
>> > +#define MSR_RTIT_OUTPUT_BASE                0x00000560
>> > +
>> > +#define MSR_RTIT_OUTPUT_MASK                0x00000561
>> > +
>> > +#define MSR_RTIT_CTL                        0x00000570
>> > +#define  RTIT_CTL_TRACEEN                    (_AC(1, ULL) <<  0)
>> > +#define  RTIT_CTL_CYCEN                      (_AC(1, ULL) <<  1)
>>
>> In addition to what Jan has said, please can we be consistent with an
>> underscore (or not) before EN.  Preferably with, so these would become
>> TRACE_EN and CYC_EN.
>>
>> That said, there are a lot of bit definitions which aren't used at all.
>> IMO, it would be better to introduce defines when you use them.
> 
> In the past I found it very valuable when this type of plumbing was
> already present in Xen instead of me having to go into the SDM to digg
> out the magic numbers. So while some of the bits might not be used
> right now I also don't see any downside in having them, just in case.
> 
> Tamas


+1 for keeping the unused #defines, this is a helpful piece of knowledge
which speeds up further patch development. It doesn't affect the compilation
nor runtime time and it doesn't occupy too much space in the code so I would
opt for keep it.

I will rebase this series onto latest master within patch v5. The remaining
patches in this series are not affected and still could be reviewed,
so I will wait a few days before posting the new version.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-06-30 12:33 ` [PATCH v4 02/10] x86/vmx: add IPT cpu feature Michał Leszczyński
@ 2020-07-01  9:49   ` Roger Pau Monné
  2020-07-01 15:12   ` Julien Grall
  2020-07-01 21:42   ` Andrew Cooper
  2 siblings, 0 replies; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-01  9:49 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Kevin Tian, Stefano Stabellini, tamas.lengyel,
	Jun Nakajima, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	luwei.kang, Jan Beulich, xen-devel

On Tue, Jun 30, 2020 at 02:33:45PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Check if Intel Processor Trace feature is supported by current
> processor. Define vmtrace_supported global variable.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c                 | 7 ++++++-
>  xen/common/domain.c                         | 2 ++
>  xen/include/asm-x86/cpufeature.h            | 1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h          | 1 +
>  xen/include/public/arch-x86/cpufeatureset.h | 1 +
>  xen/include/xen/domain.h                    | 2 ++
>  6 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index ca94c2bedc..b73d824357 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
>          _vmx_cpu_based_exec_control &=
>              ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
>  
> +    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
> +
> +    /* Check whether IPT is supported in VMX operation. */
> +    vmtrace_supported = cpu_has_ipt &&
> +                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);

This function gets called for every CPU that's brought up, so you need
to set it on the BSP, and then check that the APs also support the
feature or else fail to bring them up AFAICT. If not you could end up
with a non working system.

I agree it's very unlikely to boot on a system with such differences
between CPUs, but better be safe than sorry.

> +
>      if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
>      {
>          min = 0;
> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>                 SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>                 SECONDARY_EXEC_XSAVES |
>                 SECONDARY_EXEC_TSC_SCALING);
> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>          if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>              opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>          if ( opt_vpid_enabled )
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 7cc9526139..0a33e0dfd6 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>  
>  vcpu_info_t dummy_vcpu_info;
>  
> +bool_t vmtrace_supported;

Plain bool, and I think it wants to be __read_mostly.

I'm also unsure whether this is the best place to put such variable,
since there are no users introduced on this patch it's hard to tell.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-06-30 12:33 ` [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
@ 2020-07-01 10:05   ` Roger Pau Monné
  2020-07-02  9:00   ` Roger Pau Monné
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-01 10:05 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jan Beulich, Anthony PERARD, xen-devel

On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Allow to specify the size of per-vCPU trace buffer upon
> domain creation. This is zero by default (meaning: not enabled).
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  docs/man/xl.cfg.5.pod.in             | 10 ++++++++++
>  tools/golang/xenlight/helpers.gen.go |  2 ++
>  tools/golang/xenlight/types.gen.go   |  1 +
>  tools/libxl/libxl.h                  |  8 ++++++++
>  tools/libxl/libxl_create.c           |  1 +
>  tools/libxl/libxl_types.idl          |  2 ++
>  tools/xl/xl_parse.c                  | 20 ++++++++++++++++++++
>  xen/common/domain.c                  | 12 ++++++++++++
>  xen/include/public/domctl.h          |  1 +
>  9 files changed, 57 insertions(+)
> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index 0532739c1f..78f434b722 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -278,6 +278,16 @@ memory=8096 will report significantly less memory available for use
>  than a system with maxmem=8096 memory=8096 due to the memory overhead
>  of having to track the unused pages.
>  
> +=item B<vmtrace_pt_size=BYTES>
> +
> +Specifies the size of processor trace buffer that would be allocated
> +for each vCPU belonging to this domain. Disabled (i.e. B<vmtrace_pt_size=0>
> +by default. This must be set to non-zero value in order to be able to
> +use processor tracing features with this domain.
> +
> +B<NOTE>: The size value must be between 4 kB and 4 GB and it must

I think the minimum value is 8kB, since 4kB would be order 0, which
is used to signal that the feature is disabled?

> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index 61b4ef7b7e..4eba224590 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -1861,6 +1861,26 @@ void parse_config_data(const char *config_source,
>          }
>      }
>  
> +    if (!xlu_cfg_get_long(config, "vmtrace_pt_size", &l, 1) && l) {
> +        int32_t shift = 0;

unsigned int? I don't think there's a reason for this to be a fixed
width signed integer.

> +
> +        if (l & (l - 1))
> +        {
> +            fprintf(stderr, "ERROR: pt buffer size must be a power of 2\n");
> +            exit(1);
> +        }
> +
> +        while (l >>= 1) ++shift;
> +
> +        if (shift <= XEN_PAGE_SHIFT)
> +        {
> +            fprintf(stderr, "ERROR: too small pt buffer\n");
> +            exit(1);
> +        }
> +
> +        b_info->vmtrace_pt_order = shift - XEN_PAGE_SHIFT;
> +    }
> +
>      if (!xlu_cfg_get_list(config, "ioports", &ioports, &num_ioports, 0)) {
>          b_info->num_ioports = num_ioports;
>          b_info->ioports = calloc(num_ioports, sizeof(*b_info->ioports));
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 0a33e0dfd6..27dcfbac8c 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -338,6 +338,12 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
>          return -EINVAL;
>      }
>  
> +    if ( config->vmtrace_pt_order && !vmtrace_supported )
> +    {
> +        dprintk(XENLOG_INFO, "Processor tracing is not supported\n");
> +        return -EINVAL;
> +    }
> +
>      return arch_sanitise_domain_config(config);
>  }
>  
> @@ -443,6 +449,12 @@ struct domain *domain_create(domid_t domid,
>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
>  
>          radix_tree_init(&d->pirq_tree);
> +
> +        if ( config->vmtrace_pt_order )
> +        {
> +            uint32_t shift_val = config->vmtrace_pt_order + PAGE_SHIFT;
> +            d->vmtrace_pt_size = (1ULL << shift_val);

I don't think the vmtrace_pt_size domain field has been introduced
yet?

Please check that each patch builds on it's own, or else we would
break bisectability of the tree.

Also I would consider just storing this directly as an order, there's
no reason to convert it back to a size?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 04/10] x86/vmx: implement processor tracing for VMX
  2020-06-30 12:33 ` [PATCH v4 04/10] x86/vmx: implement processor tracing for VMX Michał Leszczyński
@ 2020-07-01 10:30   ` Roger Pau Monné
  0 siblings, 0 replies; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-01 10:30 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, tamas.lengyel, Jun Nakajima, Wei Liu, Andrew Cooper,
	luwei.kang, Jan Beulich, xen-devel

On Tue, Jun 30, 2020 at 02:33:47PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Use Intel Processor Trace feature in order to
> provision vmtrace_pt_* features.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c         | 89 ++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/hvm.h      | 38 +++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +
>  xen/include/asm-x86/hvm/vmx/vmx.h  | 14 +++++
>  4 files changed, 144 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ab19d9424e..db3f051b40 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -508,11 +508,24 @@ static void vmx_restore_host_msrs(void)
>  
>  static void vmx_save_guest_msrs(struct vcpu *v)
>  {
> +    uint64_t rtit_ctl;
> +
>      /*
>       * We cannot cache SHADOW_GS_BASE while the VCPU runs, as it can
>       * be updated at any time via SWAPGS, which we cannot trap.
>       */
>      v->arch.hvm.vmx.shadow_gs = rdgsshadow();
> +
> +    if ( unlikely(v->arch.hvm.vmx.pt_state &&
> +                  v->arch.hvm.vmx.pt_state->active) )
> +    {

Nit: define rtit_ctl here to reduce the scope.

> +        rdmsrl(MSR_RTIT_CTL, rtit_ctl);
> +        BUG_ON(rtit_ctl & RTIT_CTL_TRACEEN);
> +
> +        rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.pt_state->status);
> +        rdmsrl(MSR_RTIT_OUTPUT_MASK,
> +               v->arch.hvm.vmx.pt_state->output_mask.raw);
> +    }
>  }
>  
>  static void vmx_restore_guest_msrs(struct vcpu *v)
> @@ -524,6 +537,17 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
>  
>      if ( cpu_has_msr_tsc_aux )
>          wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
> +
> +    if ( unlikely(v->arch.hvm.vmx.pt_state &&
> +                  v->arch.hvm.vmx.pt_state->active) )
> +    {
> +        wrmsrl(MSR_RTIT_OUTPUT_BASE,
> +               v->arch.hvm.vmx.pt_state->output_base);
> +        wrmsrl(MSR_RTIT_OUTPUT_MASK,
> +               v->arch.hvm.vmx.pt_state->output_mask.raw);
> +        wrmsrl(MSR_RTIT_STATUS,
> +               v->arch.hvm.vmx.pt_state->status);
> +    }
>  }
>  
>  void vmx_update_cpu_exec_control(struct vcpu *v)
> @@ -2240,6 +2264,60 @@ static bool vmx_get_pending_event(struct vcpu *v, struct x86_event *info)
>      return true;
>  }
>  
> +static int vmx_init_pt(struct vcpu *v)
> +{
> +    v->arch.hvm.vmx.pt_state = xzalloc(struct pt_state);
> +
> +    if ( !v->arch.hvm.vmx.pt_state )
> +        return -EFAULT;

-ENOMEM

> +
> +    if ( !v->arch.vmtrace.pt_buf )

Agian, I'm quite sure this doesn't build, since pt_buf is introduced
in patch 5.

I will try to continue to review, but it's quite hard when fields not
yet introduced are used in the code, as I have no idea what that is.

> +        return -EINVAL;
> +
> +    if ( !v->domain->vmtrace_pt_size )
> +	return -EINVAL;

Indentation (hard tab), and could be joined with the previous check,
since both return -EINVAL.

> +
> +    v->arch.hvm.vmx.pt_state->output_base = page_to_maddr(v->arch.vmtrace.pt_buf);
> +    v->arch.hvm.vmx.pt_state->output_mask.raw = v->domain->vmtrace_pt_size - 1;
> +
> +    if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
> +        return -EFAULT;
> +
> +    if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
> +                              RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> +                              RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
> +        return -EFAULT;

I think I've already pointed this out before (in v2), but please don't
drop the returned error codes from vmx_add_host_load_msr and
vmx_add_guest_msr. Please store them in a local variable and return
those if != 0.

> +
> +    return 0;
> +}
> +
> +static int vmx_destroy_pt(struct vcpu* v)
> +{
> +    if ( v->arch.hvm.vmx.pt_state )
> +        xfree(v->arch.hvm.vmx.pt_state);
> +
> +    v->arch.hvm.vmx.pt_state = NULL;
> +    return 0;
> +}

I think those should be port of vmx_vcpu_{initialise/destroy}, there's
no need to introduce new hooks for it? As the allocation size will be
known at domain creation already.

> +static int vmx_control_pt(struct vcpu *v, bool_t enable)

Plain bool.

> +{
> +    if ( !v->arch.hvm.vmx.pt_state )
> +        return -EINVAL;
> +
> +    v->arch.hvm.vmx.pt_state->active = enable;
> +    return 0;
> +}
> +
> +static int vmx_get_pt_offset(struct vcpu *v, uint64_t *offset)
> +{
> +    if ( !v->arch.hvm.vmx.pt_state )
> +        return -EINVAL;
> +
> +    *offset = v->arch.hvm.vmx.pt_state->output_mask.offset;
> +    return 0;
> +}
> +
>  static struct hvm_function_table __initdata vmx_function_table = {
>      .name                 = "VMX",
>      .cpu_up_prepare       = vmx_cpu_up_prepare,
> @@ -2295,6 +2373,10 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
>      .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
> +    .vmtrace_init_pt = vmx_init_pt,
> +    .vmtrace_destroy_pt = vmx_destroy_pt,
> +    .vmtrace_control_pt = vmx_control_pt,
> +    .vmtrace_get_pt_offset = vmx_get_pt_offset,

As pointed out above, vmtrace_init_pt and vmtrace_destroy_pt should
IMO be dropped and instead done in vmx_vcpu_{initialise/destroy}.

>      .tsc_scaling = {
>          .max_ratio = VMX_TSC_MULTIPLIER_MAX,
>      },
> @@ -3674,6 +3756,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>  
>      hvm_invalidate_regs_fields(regs);
>  
> +    if ( unlikely(v->arch.hvm.vmx.pt_state &&
> +                  v->arch.hvm.vmx.pt_state->active) )
> +    {
> +        rdmsrl(MSR_RTIT_OUTPUT_MASK,
> +               v->arch.hvm.vmx.pt_state->output_mask.raw);
> +    }
> +
>      if ( paging_mode_hap(v->domain) )
>      {
>          /*
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 1eb377dd82..8f194889e5 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -214,6 +214,12 @@ struct hvm_function_table {
>      bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
>      int (*altp2m_vcpu_emulate_vmfunc)(const struct cpu_user_regs *regs);
>  
> +    /* vmtrace */
> +    int (*vmtrace_init_pt)(struct vcpu *v);
> +    int (*vmtrace_destroy_pt)(struct vcpu *v);
> +    int (*vmtrace_control_pt)(struct vcpu *v, bool_t enable);
> +    int (*vmtrace_get_pt_offset)(struct vcpu *v, uint64_t *offset);
> +
>      /*
>       * Parameters and callbacks for hardware-assisted TSC scaling,
>       * which are valid only when the hardware feature is available.
> @@ -655,6 +661,38 @@ static inline bool altp2m_vcpu_emulate_ve(struct vcpu *v)
>      return false;
>  }
>  
> +static inline int vmtrace_init_pt(struct vcpu *v)
> +{
> +    if ( hvm_funcs.vmtrace_init_pt )
> +        return hvm_funcs.vmtrace_init_pt(v);
> +
> +    return -EOPNOTSUPP;
> +}
> +
> +static inline int vmtrace_destroy_pt(struct vcpu *v)
> +{
> +    if ( hvm_funcs.vmtrace_destroy_pt )
> +        return hvm_funcs.vmtrace_destroy_pt(v);
> +
> +    return -EOPNOTSUPP;
> +}
> +
> +static inline int vmtrace_control_pt(struct vcpu *v, bool_t enable)
> +{
> +    if ( hvm_funcs.vmtrace_control_pt )
> +        return hvm_funcs.vmtrace_control_pt(v, enable);
> +
> +    return -EOPNOTSUPP;
> +}
> +
> +static inline int vmtrace_get_pt_offset(struct vcpu *v, uint64_t *offset)
> +{
> +    if ( hvm_funcs.vmtrace_get_pt_offset )
> +        return hvm_funcs.vmtrace_get_pt_offset(v, offset);
> +
> +    return -EOPNOTSUPP;
> +}
> +
>  /*
>   * This must be defined as a macro instead of an inline function,
>   * because it uses 'struct vcpu' and 'struct domain' which have
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 0e9a0b8de6..64c0d82614 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -186,6 +186,9 @@ struct vmx_vcpu {
>       * pCPU and wakeup the related vCPU.
>       */
>      struct pi_blocking_vcpu pi_blocking;
> +
> +    /* State of processor trace feature */
> +    struct pt_state      *pt_state;

I think it's fine to add this here for now, but we might also consider
putting it outside of a HVM specific structure if it's to be used by
PV guests. Since all this is HVM specific I'm fine with adding it
here.

>  };
>  
>  int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 111ccd7e61..be7213d3c0 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -689,4 +689,18 @@ typedef union ldt_or_tr_instr_info {
>      };
>  } ldt_or_tr_instr_info_t;
>  
> +/* Processor Trace state per vCPU */
> +struct pt_state {

Please use ipt_state here, since this is an Intel specific structure.

> +    bool_t active;

Plain bool.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer
  2020-06-30 12:33 ` [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer Michał Leszczyński
@ 2020-07-01 10:38   ` Roger Pau Monné
  2020-07-01 15:35   ` Julien Grall
  1 sibling, 0 replies; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-01 10:38 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jan Beulich, xen-devel

On Tue, Jun 30, 2020 at 02:33:48PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Allocate processor trace buffer for each vCPU when the domain
> is created, deallocate trace buffers on domain destruction.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/domain.c        | 11 +++++++++++
>  xen/common/domain.c          | 32 ++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/domain.h |  4 ++++
>  xen/include/xen/sched.h      |  4 ++++
>  4 files changed, 51 insertions(+)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index fee6c3931a..0d79fd390c 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2199,6 +2199,17 @@ int domain_relinquish_resources(struct domain *d)
>                  altp2m_vcpu_disable_ve(v);
>          }
>  
> +        for_each_vcpu ( d, v )
> +        {
> +            if ( !v->arch.vmtrace.pt_buf )
> +                continue;
> +
> +            vmtrace_destroy_pt(v);
> +
> +            free_domheap_pages(v->arch.vmtrace.pt_buf,
> +                get_order_from_bytes(v->domain->vmtrace_pt_size));
> +        }
> +
>          if ( is_pv_domain(d) )
>          {
>              for_each_vcpu ( d, v )
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 27dcfbac8c..8513659ef8 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -137,6 +137,31 @@ static void vcpu_destroy(struct vcpu *v)
>      free_vcpu_struct(v);
>  }
>  
> +static int vmtrace_alloc_buffers(struct vcpu *v)
> +{
> +    struct page_info *pg;
> +    uint64_t size = v->domain->vmtrace_pt_size;

IMO you would be better by just storing an order here (like it's
passed from the toolstack), that would avoid the checks and conversion
to an order. Also vmtrace_pt_size could be of type unsigned int
instead of uint64_t.

> +
> +    if ( size < PAGE_SIZE || size > GB(4) || (size & (size - 1)) )
> +    {
> +        /*
> +         * We don't accept trace buffer size smaller than single page
> +         * and the upper bound is defined as 4GB in the specification.
> +         * The buffer size must be also a power of 2.
> +         */
> +        return -EINVAL;
> +    }
> +
> +    pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
> +                             MEMF_no_refcount);
> +
> +    if ( !pg )
> +        return -ENOMEM;
> +
> +    v->arch.vmtrace.pt_buf = pg;

You can assign to pt_buf directly IMO, no need for the pg local
variable.

> +    return 0;
> +}
> +
>  struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
>  {
>      struct vcpu *v;
> @@ -162,6 +187,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
>      v->vcpu_id = vcpu_id;
>      v->dirty_cpu = VCPU_CPU_CLEAN;
>  
> +    if ( d->vmtrace_pt_size && vmtrace_alloc_buffers(v) != 0 )
> +        return NULL;

You are leaking the allocated v here, see other error paths below in
the function.

> +
>      spin_lock_init(&v->virq_lock);
>  
>      tasklet_init(&v->continue_hypercall_tasklet, NULL, NULL);
> @@ -188,6 +216,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
>      if ( arch_vcpu_create(v) != 0 )
>          goto fail_sched;
>  
> +    if ( d->vmtrace_pt_size && vmtrace_init_pt(v) != 0 )
> +        goto fail_sched;
> +
>      d->vcpu[vcpu_id] = v;
>      if ( vcpu_id != 0 )
>      {
> @@ -422,6 +453,7 @@ struct domain *domain_create(domid_t domid,
>      d->shutdown_code = SHUTDOWN_CODE_INVALID;
>  
>      spin_lock_init(&d->pbuf_lock);
> +    spin_lock_init(&d->vmtrace_lock);
>  
>      rwlock_init(&d->vnuma_rwlock);
>  
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 6fd94c2e14..b01c107f5c 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -627,6 +627,10 @@ struct arch_vcpu
>      struct {
>          bool next_interrupt_enabled;
>      } monitor;
> +
> +    struct {
> +        struct page_info *pt_buf;
> +    } vmtrace;
>  };
>  
>  struct guest_memory_policy
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index ac53519d7f..48f0a61bbd 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -457,6 +457,10 @@ struct domain
>      unsigned    pbuf_idx;
>      spinlock_t  pbuf_lock;
>  
> +    /* Used by vmtrace features */
> +    spinlock_t  vmtrace_lock;

Does this need to be per domain or rather per-vcpu? It's hard to tell
because there's no user of it in the patch.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-06-30 12:33 ` [PATCH v4 06/10] memory: batch processing in acquire_resource() Michał Leszczyński
@ 2020-07-01 10:46   ` Roger Pau Monné
  2020-07-03 10:35   ` Julien Grall
  1 sibling, 0 replies; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-01 10:46 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jan Beulich, xen-devel

On Tue, Jun 30, 2020 at 02:33:49PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Allow to acquire large resources by allowing acquire_resource()
> to process items in batches, using hypercall continuation.

This patch should be the first of thew series IMO, since it can go in
independently of the rest, as it's a general improvement to
XENMEM_acquire_resource.

> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 714077c1e5..3ab06581a2 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
>  }
>  
>  static int acquire_resource(
> -    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
> +    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
> +    unsigned long *start_extent)
>  {
>      struct domain *d, *currd = current->domain;
>      xen_mem_acquire_resource_t xmar;
> +    uint32_t total_frames;
>      /*
>       * The mfn_list and gfn_list (below) arrays are ok on stack for the
>       * moment since they are small, but if they need to grow in future
> @@ -1077,8 +1079,17 @@ static int acquire_resource(
>          return 0;
>      }
>  
> +    total_frames = xmar.nr_frames;
> +
> +    if ( *start_extent )
> +    {
> +        xmar.frame += *start_extent;
> +        xmar.nr_frames -= *start_extent;
> +        guest_handle_add_offset(xmar.frame_list, *start_extent);
> +    }
> +
>      if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
> -        return -E2BIG;
> +        xmar.nr_frames = ARRAY_SIZE(mfn_list);
>  
>      rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
>      if ( rc )
> @@ -1135,6 +1146,14 @@ static int acquire_resource(
>          }
>      }
>  
> +    if ( !rc )
> +    {
> +        *start_extent += xmar.nr_frames;
> +
> +        if ( *start_extent != total_frames )
> +            rc = -ERESTART;
> +    }

I think you should add some kind of loop here, processing just 32
frames and preempting might be too low. You generally want to loop
doing batches of 32 entries until hypercall_preempt_check() returns
true.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 07/10] x86/mm: add vmtrace_buf resource type
  2020-06-30 12:33 ` [PATCH v4 07/10] x86/mm: add vmtrace_buf resource type Michał Leszczyński
@ 2020-07-01 10:52   ` Roger Pau Monné
  0 siblings, 0 replies; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-01 10:52 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jan Beulich, xen-devel

On Tue, Jun 30, 2020 at 02:33:50PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Allow to map processor trace buffer using
> acquire_resource().
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/mm.c           | 25 +++++++++++++++++++++++++
>  xen/include/public/memory.h |  1 +
>  2 files changed, 26 insertions(+)
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index e376fc7e8f..bb781bd90c 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4624,6 +4624,31 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>          }
>          break;
>      }
> +
> +    case XENMEM_resource_vmtrace_buf:
> +    {
> +        mfn_t mfn;
> +        unsigned int i;
> +        struct vcpu *v = domain_vcpu(d, id);

Missing blank newline between variable definitions and code.

> +        rc = -EINVAL;
> +
> +        if ( !v )
> +            break;
> +
> +        if ( !v->arch.vmtrace.pt_buf )
> +            break;
> +
> +        mfn = page_to_mfn(v->arch.vmtrace.pt_buf);
> +
> +        if ( frame + nr_frames > (v->domain->vmtrace_pt_size >> PAGE_SHIFT) )
> +            break;

You can place all the checks done above in a single if.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 08/10] x86/domctl: add XEN_DOMCTL_vmtrace_op
  2020-06-30 12:33 ` [PATCH v4 08/10] x86/domctl: add XEN_DOMCTL_vmtrace_op Michał Leszczyński
@ 2020-07-01 11:00   ` Roger Pau Monné
  0 siblings, 0 replies; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-01 11:00 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jan Beulich, xen-devel

On Tue, Jun 30, 2020 at 02:33:51PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Implement domctl to manage the runtime state of
> processor trace feature.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/domctl.c       | 48 +++++++++++++++++++++++++++++++++++++
>  xen/include/public/domctl.h | 26 ++++++++++++++++++++
>  2 files changed, 74 insertions(+)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 6f2c69788d..a041b724d8 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -322,6 +322,48 @@ void arch_get_domain_info(const struct domain *d,
>      info->arch_config.emulation_flags = d->arch.emulation_flags;
>  }
>  
> +static int do_vmtrace_op(struct domain *d, struct xen_domctl_vmtrace_op *op,
> +                         XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> +{
> +    int rc;
> +    struct vcpu *v;
> +
> +    if ( !vmtrace_supported )
> +        return -EOPNOTSUPP;
> +
> +    if ( !is_hvm_domain(d) )
> +        return -EOPNOTSUPP;

You can join both checks.

> +
> +    if ( op->vcpu >= d->max_vcpus )
> +        return -EINVAL;
> +
> +    v = domain_vcpu(d, op->vcpu);
> +    rc = 0;

No need to init rc to zero, after the switch below it will always be
initialized.

> +
> +    switch ( op->cmd )
> +    {
> +    case XEN_DOMCTL_vmtrace_pt_enable:
> +    case XEN_DOMCTL_vmtrace_pt_disable:
> +        vcpu_pause(v);
> +        spin_lock(&d->vmtrace_lock);
> +
> +        rc = vmtrace_control_pt(v, op->cmd == XEN_DOMCTL_vmtrace_pt_enable);
> +
> +        spin_unlock(&d->vmtrace_lock);
> +        vcpu_unpause(v);
> +        break;
> +
> +    case XEN_DOMCTL_vmtrace_pt_get_offset:
> +        rc = vmtrace_get_pt_offset(v, &op->offset);

Since you don't pause the vcpu here, I think you want to use atomic
operations to update v->arch.hvm.vmx.pt_state->output_mask.raw, or
else you could see inconsistent results if a vmexit is updating it in
parallel? (since you don't pause the target vcpu).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-06-30 12:33 ` [PATCH v4 02/10] x86/vmx: add IPT cpu feature Michał Leszczyński
  2020-07-01  9:49   ` Roger Pau Monné
@ 2020-07-01 15:12   ` Julien Grall
  2020-07-01 16:06     ` Andrew Cooper
  2020-07-01 21:42   ` Andrew Cooper
  2 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-01 15:12 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	luwei.kang, Roger Pau Monné

On 30/06/2020 13:33, Michał Leszczyński wrote:
> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>                  SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>                  SECONDARY_EXEC_XSAVES |
>                  SECONDARY_EXEC_TSC_SCALING);
> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>           if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>               opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>           if ( opt_vpid_enabled )
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 7cc9526139..0a33e0dfd6 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>   
>   vcpu_info_t dummy_vcpu_info;
>   
> +bool_t vmtrace_supported;

All the code looks x86 specific. So may I ask why this was implemented 
in common code?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer
  2020-06-30 12:33 ` [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer Michał Leszczyński
  2020-07-01 10:38   ` Roger Pau Monné
@ 2020-07-01 15:35   ` Julien Grall
  1 sibling, 0 replies; 75+ messages in thread
From: Julien Grall @ 2020-07-01 15:35 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: Stefano Stabellini, tamas.lengyel, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné

Hi,

On 30/06/2020 13:33, Michał Leszczyński wrote:
> +static int vmtrace_alloc_buffers(struct vcpu *v)
> +{
> +    struct page_info *pg;
> +    uint64_t size = v->domain->vmtrace_pt_size;
> +
> +    if ( size < PAGE_SIZE || size > GB(4) || (size & (size - 1)) )
> +    {
> +        /*
> +         * We don't accept trace buffer size smaller than single page
> +         * and the upper bound is defined as 4GB in the specification.

This is common code, so what specification are you talking about?

I am guessing this is an Intel one, but I don't think Intel should 
dictate the common code implementation.

> +         * The buffer size must be also a power of 2.
> +         */
> +        return -EINVAL;
> +    }
> +
> +    pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
> +                             MEMF_no_refcount);
> +
> +    if ( !pg )
> +        return -ENOMEM;
> +
> +    v->arch.vmtrace.pt_buf = pg;

v->arch.vmtrace.pt_buf is not defined on Arm. Please make sure common 
code build on all arch.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 15:12   ` Julien Grall
@ 2020-07-01 16:06     ` Andrew Cooper
  2020-07-01 16:17       ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Cooper @ 2020-07-01 16:06 UTC (permalink / raw)
  To: Julien Grall, Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné

On 01/07/2020 16:12, Julien Grall wrote:
> On 30/06/2020 13:33, Michał Leszczyński wrote:
>> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>>                  SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>>                  SECONDARY_EXEC_XSAVES |
>>                  SECONDARY_EXEC_TSC_SCALING);
>> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>           if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>>               opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>>           if ( opt_vpid_enabled )
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 7cc9526139..0a33e0dfd6 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>>     vcpu_info_t dummy_vcpu_info;
>>   +bool_t vmtrace_supported;
>
> All the code looks x86 specific. So may I ask why this was implemented
> in common code?

There were some questions directed specifically at the ARM maintainers
about CoreSight, which have gone unanswered.

~Andrew


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 16:06     ` Andrew Cooper
@ 2020-07-01 16:17       ` Julien Grall
  2020-07-01 16:18         ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-01 16:17 UTC (permalink / raw)
  To: Andrew Cooper, Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné



On 01/07/2020 17:06, Andrew Cooper wrote:
> On 01/07/2020 16:12, Julien Grall wrote:
>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>>>                   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>>>                   SECONDARY_EXEC_XSAVES |
>>>                   SECONDARY_EXEC_TSC_SCALING);
>>> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>            if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>>>                opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>>>            if ( opt_vpid_enabled )
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index 7cc9526139..0a33e0dfd6 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>>>      vcpu_info_t dummy_vcpu_info;
>>>    +bool_t vmtrace_supported;
>>
>> All the code looks x86 specific. So may I ask why this was implemented
>> in common code?
> 
> There were some questions directed specifically at the ARM maintainers
> about CoreSight, which have gone unanswered.

I can only find one question related to the size. Is there any other?

I don't know how the interface will look like given that AFAICT the 
buffer may be embedded in the HW. We would need to investigate how to 
differentiate between two domUs in this case without impacting the 
performance in the common code.

So I think it is a little premature to implement this in common code and 
always compiled in for Arm. It would be best if this stay in x86 code.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 16:17       ` Julien Grall
@ 2020-07-01 16:18         ` Julien Grall
  2020-07-01 17:26           ` Andrew Cooper
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-01 16:18 UTC (permalink / raw)
  To: Andrew Cooper, Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné



On 01/07/2020 17:17, Julien Grall wrote:
> 
> 
> On 01/07/2020 17:06, Andrew Cooper wrote:
>> On 01/07/2020 16:12, Julien Grall wrote:
>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>>>>                   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>>>>                   SECONDARY_EXEC_XSAVES |
>>>>                   SECONDARY_EXEC_TSC_SCALING);
>>>> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>>            if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>>>>                opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>>>>            if ( opt_vpid_enabled )
>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>> index 7cc9526139..0a33e0dfd6 100644
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>>>>      vcpu_info_t dummy_vcpu_info;
>>>>    +bool_t vmtrace_supported;
>>>
>>> All the code looks x86 specific. So may I ask why this was implemented
>>> in common code?
>>
>> There were some questions directed specifically at the ARM maintainers
>> about CoreSight, which have gone unanswered.
> 
> I can only find one question related to the size. Is there any other?
> 
> I don't know how the interface will look like given that AFAICT the 
> buffer may be embedded in the HW. We would need to investigate how to 
> differentiate between two domUs in this case without impacting the 
> performance in the common code.

s/in the common code/during the context switch/

> So I think it is a little premature to implement this in common code and 
> always compiled in for Arm. It would be best if this stay in x86 code.
> 
> Cheers,
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 16:18         ` Julien Grall
@ 2020-07-01 17:26           ` Andrew Cooper
  2020-07-01 18:02             ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Cooper @ 2020-07-01 17:26 UTC (permalink / raw)
  To: Julien Grall, Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné

On 01/07/2020 17:18, Julien Grall wrote:
>
>
> On 01/07/2020 17:17, Julien Grall wrote:
>>
>>
>> On 01/07/2020 17:06, Andrew Cooper wrote:
>>> On 01/07/2020 16:12, Julien Grall wrote:
>>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>>> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>>>>>                   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>>>>>                   SECONDARY_EXEC_XSAVES |
>>>>>                   SECONDARY_EXEC_TSC_SCALING);
>>>>> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>>>            if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>>>>>                opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>>>>>            if ( opt_vpid_enabled )
>>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>>> index 7cc9526139..0a33e0dfd6 100644
>>>>> --- a/xen/common/domain.c
>>>>> +++ b/xen/common/domain.c
>>>>> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>>>>>      vcpu_info_t dummy_vcpu_info;
>>>>>    +bool_t vmtrace_supported;
>>>>
>>>> All the code looks x86 specific. So may I ask why this was implemented
>>>> in common code?
>>>
>>> There were some questions directed specifically at the ARM maintainers
>>> about CoreSight, which have gone unanswered.
>>
>> I can only find one question related to the size. Is there any other?
>>
>> I don't know how the interface will look like given that AFAICT the
>> buffer may be embedded in the HW. We would need to investigate how to
>> differentiate between two domUs in this case without impacting the
>> performance in the common code.
>
> s/in the common code/during the context switch/
>
>> So I think it is a little premature to implement this in common code
>> and always compiled in for Arm. It would be best if this stay in x86
>> code.

I've just checked with a colleague.  CoreSight can dump to a memory
buffer - there's even a decode library for the packet stream
https://github.com/Linaro/OpenCSD, although ultimately it is platform
specific as to whether the feature is supported.

Furthermore, the choice isn't "x86 vs ARM", now that RISCv support is
on-list, and Power9 is floating on the horizon.

For the sake of what is literally just one byte in common code, I stand
my original suggestion of this being a common interface.  It is not
something which should be x86 specific.

~Andrew


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions
  2020-06-30 12:33 ` [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
  2020-06-30 16:23   ` Jan Beulich
  2020-06-30 17:37   ` Andrew Cooper
@ 2020-07-01 17:52   ` Andrew Cooper
  2 siblings, 0 replies; 75+ messages in thread
From: Andrew Cooper @ 2020-07-01 17:52 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: tamas.lengyel, luwei.kang, Wei Liu, Jan Beulich, Roger Pau Monné

On 30/06/2020 13:33, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
>
> Define constants related to Intel Processor Trace features.
>
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>

Acked-by: Andrew Cooper <andrew.cooper3@ctirix.com>

I wanted to have a play with the series, and have ended up having to do
the rebase anyway.

As we're in code freeze for 4.14, I've started x86-next in its usual
location
(https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/x86-next)
and will commit this (and any other accumulated patches) once 4.15 opens.

~Andrew


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 17:26           ` Andrew Cooper
@ 2020-07-01 18:02             ` Julien Grall
  2020-07-01 18:06               ` Andrew Cooper
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-01 18:02 UTC (permalink / raw)
  To: Andrew Cooper, Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné

Hi,

On 01/07/2020 18:26, Andrew Cooper wrote:
> On 01/07/2020 17:18, Julien Grall wrote:
>>
>>
>> On 01/07/2020 17:17, Julien Grall wrote:
>>>
>>>
>>> On 01/07/2020 17:06, Andrew Cooper wrote:
>>>> On 01/07/2020 16:12, Julien Grall wrote:
>>>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>>>> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>>>>>>                    SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>>>>>>                    SECONDARY_EXEC_XSAVES |
>>>>>>                    SECONDARY_EXEC_TSC_SCALING);
>>>>>> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>>>>             if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>>>>>>                 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>>>>>>             if ( opt_vpid_enabled )
>>>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>>>> index 7cc9526139..0a33e0dfd6 100644
>>>>>> --- a/xen/common/domain.c
>>>>>> +++ b/xen/common/domain.c
>>>>>> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>>>>>>       vcpu_info_t dummy_vcpu_info;
>>>>>>     +bool_t vmtrace_supported;
>>>>>
>>>>> All the code looks x86 specific. So may I ask why this was implemented
>>>>> in common code?
>>>>
>>>> There were some questions directed specifically at the ARM maintainers
>>>> about CoreSight, which have gone unanswered.
>>>
>>> I can only find one question related to the size. Is there any other?
>>>
>>> I don't know how the interface will look like given that AFAICT the
>>> buffer may be embedded in the HW. We would need to investigate how to
>>> differentiate between two domUs in this case without impacting the
>>> performance in the common code.
>>
>> s/in the common code/during the context switch/
>>
>>> So I think it is a little premature to implement this in common code
>>> and always compiled in for Arm. It would be best if this stay in x86
>>> code.
> 
> I've just checked with a colleague.  CoreSight can dump to a memory
> buffer - there's even a decode library for the packet stream
> https://github.com/Linaro/OpenCSD, although ultimately it is platform
> specific as to whether the feature is supported.
> 
> Furthermore, the choice isn't "x86 vs ARM", now that RISCv support is
> on-list, and Power9 is floating on the horizon.
> 
> For the sake of what is literally just one byte in common code, I stand
> my original suggestion of this being a common interface.  It is not
> something which should be x86 specific.

This argument can also be used against putting in common code. What I am 
the most concern of is we are trying to guess how the interface will 
look like for another architecture. Your suggested interface may work, 
but this also may end up to be a complete mess.

So I think we want to wait for a new architecture to use vmtrace before 
moving to common code. This is not going to be a massive effort to move 
that bit in common if needed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 18:02             ` Julien Grall
@ 2020-07-01 18:06               ` Andrew Cooper
  2020-07-01 18:09                 ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Cooper @ 2020-07-01 18:06 UTC (permalink / raw)
  To: Julien Grall, Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné

On 01/07/2020 19:02, Julien Grall wrote:
> Hi,
>
> On 01/07/2020 18:26, Andrew Cooper wrote:
>> On 01/07/2020 17:18, Julien Grall wrote:
>>>
>>>
>>> On 01/07/2020 17:17, Julien Grall wrote:
>>>>
>>>>
>>>> On 01/07/2020 17:06, Andrew Cooper wrote:
>>>>> On 01/07/2020 16:12, Julien Grall wrote:
>>>>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>>>>> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>>>>>>>                    SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>>>>>>>                    SECONDARY_EXEC_XSAVES |
>>>>>>>                    SECONDARY_EXEC_TSC_SCALING);
>>>>>>> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>>>>>             if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>>>>>>>                 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>>>>>>>             if ( opt_vpid_enabled )
>>>>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>>>>> index 7cc9526139..0a33e0dfd6 100644
>>>>>>> --- a/xen/common/domain.c
>>>>>>> +++ b/xen/common/domain.c
>>>>>>> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>>>>>>>       vcpu_info_t dummy_vcpu_info;
>>>>>>>     +bool_t vmtrace_supported;
>>>>>>
>>>>>> All the code looks x86 specific. So may I ask why this was
>>>>>> implemented
>>>>>> in common code?
>>>>>
>>>>> There were some questions directed specifically at the ARM
>>>>> maintainers
>>>>> about CoreSight, which have gone unanswered.
>>>>
>>>> I can only find one question related to the size. Is there any other?
>>>>
>>>> I don't know how the interface will look like given that AFAICT the
>>>> buffer may be embedded in the HW. We would need to investigate how to
>>>> differentiate between two domUs in this case without impacting the
>>>> performance in the common code.
>>>
>>> s/in the common code/during the context switch/
>>>
>>>> So I think it is a little premature to implement this in common code
>>>> and always compiled in for Arm. It would be best if this stay in x86
>>>> code.
>>
>> I've just checked with a colleague.  CoreSight can dump to a memory
>> buffer - there's even a decode library for the packet stream
>> https://github.com/Linaro/OpenCSD, although ultimately it is platform
>> specific as to whether the feature is supported.
>>
>> Furthermore, the choice isn't "x86 vs ARM", now that RISCv support is
>> on-list, and Power9 is floating on the horizon.
>>
>> For the sake of what is literally just one byte in common code, I stand
>> my original suggestion of this being a common interface.  It is not
>> something which should be x86 specific.
>
> This argument can also be used against putting in common code. What I
> am the most concern of is we are trying to guess how the interface
> will look like for another architecture. Your suggested interface may
> work, but this also may end up to be a complete mess.
>
> So I think we want to wait for a new architecture to use vmtrace
> before moving to common code. This is not going to be a massive effort
> to move that bit in common if needed.

I suggest you read the series.

The only thing in common code is the bit of the interface saying "I'd
like buffers this big please".

~Andrew


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 18:06               ` Andrew Cooper
@ 2020-07-01 18:09                 ` Julien Grall
  2020-07-02  8:29                   ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-01 18:09 UTC (permalink / raw)
  To: Andrew Cooper, Michał Leszczyński, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Jun Nakajima,
	Wei Liu, Ian Jackson, George Dunlap, Jan Beulich, luwei.kang,
	Roger Pau Monné



On 01/07/2020 19:06, Andrew Cooper wrote:
> On 01/07/2020 19:02, Julien Grall wrote:
>> Hi,
>>
>> On 01/07/2020 18:26, Andrew Cooper wrote:
>>> On 01/07/2020 17:18, Julien Grall wrote:
>>>>
>>>>
>>>> On 01/07/2020 17:17, Julien Grall wrote:
>>>>>
>>>>>
>>>>> On 01/07/2020 17:06, Andrew Cooper wrote:
>>>>>> On 01/07/2020 16:12, Julien Grall wrote:
>>>>>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>>>>>> @@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
>>>>>>>>                     SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
>>>>>>>>                     SECONDARY_EXEC_XSAVES |
>>>>>>>>                     SECONDARY_EXEC_TSC_SCALING);
>>>>>>>> -        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>>>>>>              if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>>>>>>>>                  opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>>>>>>>>              if ( opt_vpid_enabled )
>>>>>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>>>>>> index 7cc9526139..0a33e0dfd6 100644
>>>>>>>> --- a/xen/common/domain.c
>>>>>>>> +++ b/xen/common/domain.c
>>>>>>>> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>>>>>>>>        vcpu_info_t dummy_vcpu_info;
>>>>>>>>      +bool_t vmtrace_supported;
>>>>>>>
>>>>>>> All the code looks x86 specific. So may I ask why this was
>>>>>>> implemented
>>>>>>> in common code?
>>>>>>
>>>>>> There were some questions directed specifically at the ARM
>>>>>> maintainers
>>>>>> about CoreSight, which have gone unanswered.
>>>>>
>>>>> I can only find one question related to the size. Is there any other?
>>>>>
>>>>> I don't know how the interface will look like given that AFAICT the
>>>>> buffer may be embedded in the HW. We would need to investigate how to
>>>>> differentiate between two domUs in this case without impacting the
>>>>> performance in the common code.
>>>>
>>>> s/in the common code/during the context switch/
>>>>
>>>>> So I think it is a little premature to implement this in common code
>>>>> and always compiled in for Arm. It would be best if this stay in x86
>>>>> code.
>>>
>>> I've just checked with a colleague.  CoreSight can dump to a memory
>>> buffer - there's even a decode library for the packet stream
>>> https://github.com/Linaro/OpenCSD, although ultimately it is platform
>>> specific as to whether the feature is supported.
>>>
>>> Furthermore, the choice isn't "x86 vs ARM", now that RISCv support is
>>> on-list, and Power9 is floating on the horizon.
>>>
>>> For the sake of what is literally just one byte in common code, I stand
>>> my original suggestion of this being a common interface.  It is not
>>> something which should be x86 specific.
>>
>> This argument can also be used against putting in common code. What I
>> am the most concern of is we are trying to guess how the interface
>> will look like for another architecture. Your suggested interface may
>> work, but this also may end up to be a complete mess.
>>
>> So I think we want to wait for a new architecture to use vmtrace
>> before moving to common code. This is not going to be a massive effort
>> to move that bit in common if needed.
> 
> I suggest you read the series.

Already went through the series and ...

> 
> The only thing in common code is the bit of the interface saying "I'd
> like buffers this big please".

... I stand by my point. There is no need to have this code in common 
code until someone else need it. This code can be easily implemented in 
arch_domain_create().

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-06-30 12:33 ` [PATCH v4 02/10] x86/vmx: add IPT cpu feature Michał Leszczyński
  2020-07-01  9:49   ` Roger Pau Monné
  2020-07-01 15:12   ` Julien Grall
@ 2020-07-01 21:42   ` Andrew Cooper
  2020-07-02  8:10     ` Roger Pau Monné
  2 siblings, 1 reply; 75+ messages in thread
From: Andrew Cooper @ 2020-07-01 21:42 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: Julien Grall, Kevin Tian, Stefano Stabellini, tamas.lengyel,
	Jun Nakajima, Wei Liu, Ian Jackson, George Dunlap, Jan Beulich,
	luwei.kang, Roger Pau Monné

On 30/06/2020 13:33, Michał Leszczyński wrote:
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index ca94c2bedc..b73d824357 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
>          _vmx_cpu_based_exec_control &=
>              ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
>  
> +    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
> +
> +    /* Check whether IPT is supported in VMX operation. */
> +    vmtrace_supported = cpu_has_ipt &&
> +                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);

There is a subtle corner case here.  vmx_init_vmcs_config() is called on
all CPUs, and is supposed to level things down safely if we find any
asymmetry.

If instead you go with something like this:

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index b73d824357..6960109183 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -294,8 +294,8 @@ static int vmx_init_vmcs_config(void)
     rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 
     /* Check whether IPT is supported in VMX operation. */
-    vmtrace_supported = cpu_has_ipt &&
-                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
+    if ( !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
+        vmtrace_supported = false;
 
     if ( _vmx_cpu_based_exec_control &
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
     {
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index c9b6af826d..9d7822e006 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1092,6 +1092,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 #endif
     }
 
+    /* Set a default for VMTrace before HVM setup occurs. */
+    vmtrace_supported = cpu_has_ipt;
+
     /* Sanitise the raw E820 map to produce a final clean version. */
     max_page = raw_max_page = init_e820(memmap_type, &e820_raw);
 

Then you'll also get a vmtrace_supported=true which works correctly in
the Broadwell and no-VT-x case as well.


> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 7cc9526139..0a33e0dfd6 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
>  
>  vcpu_info_t dummy_vcpu_info;
>  
> +bool_t vmtrace_supported;

bool please.  We're in the process of converting over to C99 bools, and
objection was taken to a tree-wide cleanup.

> +
>  static void __domain_finalise_shutdown(struct domain *d)
>  {
>      struct vcpu *v;
> diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
> index f790d5c1f8..8d7955dd87 100644
> --- a/xen/include/asm-x86/cpufeature.h
> +++ b/xen/include/asm-x86/cpufeature.h
> @@ -104,6 +104,7 @@
>  #define cpu_has_clwb            boot_cpu_has(X86_FEATURE_CLWB)
>  #define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
>  #define cpu_has_avx512cd        boot_cpu_has(X86_FEATURE_AVX512CD)
> +#define cpu_has_ipt             boot_cpu_has(X86_FEATURE_IPT)
>  #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
>  #define cpu_has_avx512bw        boot_cpu_has(X86_FEATURE_AVX512BW)
>  #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 906810592f..0e9a0b8de6 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -283,6 +283,7 @@ extern u32 vmx_secondary_exec_control;
>  #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x80000000000ULL
>  extern u64 vmx_ept_vpid_cap;
>  
> +#define VMX_MISC_PT_SUPPORTED                   0x00004000

VMX_MISC_PROC_TRACE, and ...

>  #define VMX_MISC_CR3_TARGET                     0x01ff0000
>  #define VMX_MISC_VMWRITE_ALL                    0x20000000
>  
> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
> index 5ca35d9d97..0d3f15f628 100644
> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,          5*32+20) /*S  Supervisor Mode Access Prevention */
>  XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add */
>  XEN_CPUFEATURE(CLFLUSHOPT,    5*32+23) /*A  CLFLUSHOPT instruction */
>  XEN_CPUFEATURE(CLWB,          5*32+24) /*A  CLWB instruction */
> +XEN_CPUFEATURE(IPT,           5*32+25) /*   Intel Processor Trace */

.. any chance we can spell this out as PROC_TRACE?  The "Intel" part
won't be true if any of the other vendors choose to implement this
interface to the spec.

Otherwise, LGTM.

~Andrew


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 21:42   ` Andrew Cooper
@ 2020-07-02  8:10     ` Roger Pau Monné
  2020-07-02  8:34       ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-02  8:10 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Julien Grall, Kevin Tian, Stefano Stabellini, tamas.lengyel,
	Jan Beulich, Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, luwei.kang, Jun Nakajima, xen-devel

On Wed, Jul 01, 2020 at 10:42:55PM +0100, Andrew Cooper wrote:
> On 30/06/2020 13:33, Michał Leszczyński wrote:
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index ca94c2bedc..b73d824357 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
> >          _vmx_cpu_based_exec_control &=
> >              ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
> >  
> > +    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
> > +
> > +    /* Check whether IPT is supported in VMX operation. */
> > +    vmtrace_supported = cpu_has_ipt &&
> > +                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
> 
> There is a subtle corner case here.  vmx_init_vmcs_config() is called on
> all CPUs, and is supposed to level things down safely if we find any
> asymmetry.
> 
> If instead you go with something like this:
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index b73d824357..6960109183 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -294,8 +294,8 @@ static int vmx_init_vmcs_config(void)
>      rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>  
>      /* Check whether IPT is supported in VMX operation. */
> -    vmtrace_supported = cpu_has_ipt &&
> -                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
> +    if ( !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
> +        vmtrace_supported = false;

This is also used during hotplug, so I'm not sure it's safe to turn
vmtrace_supported off during runtime, where VMs might be already using
it. IMO it would be easier to just set it on the BSP, and then refuse
to bring up any AP that doesn't have the feature. TBH I don't think we
are likely to find any system with such configuration, but seems more
robust than changing vmtrace_supported at runtime.

Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-01 18:09                 ` Julien Grall
@ 2020-07-02  8:29                   ` Jan Beulich
  2020-07-02  8:42                     ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-02  8:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

On 01.07.2020 20:09, Julien Grall wrote:
> On 01/07/2020 19:06, Andrew Cooper wrote:
>> On 01/07/2020 19:02, Julien Grall wrote:
>>> On 01/07/2020 18:26, Andrew Cooper wrote:
>>>> For the sake of what is literally just one byte in common code, I stand
>>>> my original suggestion of this being a common interface.  It is not
>>>> something which should be x86 specific.
>>>
>>> This argument can also be used against putting in common code. What I
>>> am the most concern of is we are trying to guess how the interface
>>> will look like for another architecture. Your suggested interface may
>>> work, but this also may end up to be a complete mess.
>>>
>>> So I think we want to wait for a new architecture to use vmtrace
>>> before moving to common code. This is not going to be a massive effort
>>> to move that bit in common if needed.
>>
>> I suggest you read the series.
> 
> Already went through the series and ...
> 
>>
>> The only thing in common code is the bit of the interface saying "I'd
>> like buffers this big please".
> 
> ... I stand by my point. There is no need to have this code in common 
> code until someone else need it. This code can be easily implemented in 
> arch_domain_create().

I'm with Andrew here, fwiw, as long as the little bit of code that
is actually put in common/ or include/xen/ doesn't imply arbitrary
restrictions on acceptable values. For example, unless there is
proof that for all architectures of interest currently or in the
not too distant future an order value is fine (as opposed to a
size one), then an order field would be fine to live in common
code imo. Otherwise it would need to be a size one, with per-arch
enforcement of further imposed restrictions (like needing to be a
power of 2).

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  8:10     ` Roger Pau Monné
@ 2020-07-02  8:34       ` Jan Beulich
  2020-07-02 20:29         ` Michał Leszczyński
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-02  8:34 UTC (permalink / raw)
  To: Roger Pau Monné, Andrew Cooper
  Cc: Julien Grall, Kevin Tian, Stefano Stabellini, tamas.lengyel,
	Wei Liu, Michał Leszczyński, Ian Jackson,
	George Dunlap, luwei.kang, Jun Nakajima, xen-devel

On 02.07.2020 10:10, Roger Pau Monné wrote:
> On Wed, Jul 01, 2020 at 10:42:55PM +0100, Andrew Cooper wrote:
>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index ca94c2bedc..b73d824357 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
>>>          _vmx_cpu_based_exec_control &=
>>>              ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
>>>  
>>> +    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>> +
>>> +    /* Check whether IPT is supported in VMX operation. */
>>> +    vmtrace_supported = cpu_has_ipt &&
>>> +                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>>
>> There is a subtle corner case here.  vmx_init_vmcs_config() is called on
>> all CPUs, and is supposed to level things down safely if we find any
>> asymmetry.
>>
>> If instead you go with something like this:
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index b73d824357..6960109183 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -294,8 +294,8 @@ static int vmx_init_vmcs_config(void)
>>      rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>  
>>      /* Check whether IPT is supported in VMX operation. */
>> -    vmtrace_supported = cpu_has_ipt &&
>> -                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>> +    if ( !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>> +        vmtrace_supported = false;
> 
> This is also used during hotplug, so I'm not sure it's safe to turn
> vmtrace_supported off during runtime, where VMs might be already using
> it. IMO it would be easier to just set it on the BSP, and then refuse
> to bring up any AP that doesn't have the feature.

+1

IOW I also don't think that "vmx_init_vmcs_config() ... is supposed to
level things down safely". Instead I think the expectation is for
CPU onlining to fail if a CPU lacks features compared to the BSP. As
can be implied from what Roger says, doing like what you suggest may
be fine during boot, but past that only at times where we know there's
no user of a certain feature, and where discarding the feature flag
won't lead to other inconsistencies (which may very well mean "never").

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  8:29                   ` Jan Beulich
@ 2020-07-02  8:42                     ` Julien Grall
  2020-07-02  8:50                       ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-02  8:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

Hi Jan,

On 02/07/2020 09:29, Jan Beulich wrote:
> On 01.07.2020 20:09, Julien Grall wrote:
>> On 01/07/2020 19:06, Andrew Cooper wrote:
>>> On 01/07/2020 19:02, Julien Grall wrote:
>>>> On 01/07/2020 18:26, Andrew Cooper wrote:
>>>>> For the sake of what is literally just one byte in common code, I stand
>>>>> my original suggestion of this being a common interface.  It is not
>>>>> something which should be x86 specific.
>>>>
>>>> This argument can also be used against putting in common code. What I
>>>> am the most concern of is we are trying to guess how the interface
>>>> will look like for another architecture. Your suggested interface may
>>>> work, but this also may end up to be a complete mess.
>>>>
>>>> So I think we want to wait for a new architecture to use vmtrace
>>>> before moving to common code. This is not going to be a massive effort
>>>> to move that bit in common if needed.
>>>
>>> I suggest you read the series.
>>
>> Already went through the series and ...
>>
>>>
>>> The only thing in common code is the bit of the interface saying "I'd
>>> like buffers this big please".
>>
>> ... I stand by my point. There is no need to have this code in common
>> code until someone else need it. This code can be easily implemented in
>> arch_domain_create().
> 
> I'm with Andrew here, fwiw, as long as the little bit of code that
> is actually put in common/ or include/xen/ doesn't imply arbitrary
> restrictions on acceptable values.
Well yes the code is simple. However, the code as it is wouldn't be 
usuable on other architecture without additional work (aside arch 
specific code). For instance, there is no way to map the buffer outside 
of Xen as it is all x86 specific.

If you want the allocation to be in the common code, then the 
infrastructure to map/unmap the buffer should also be in common code. 
Otherwise, there is no point to allocate it in common.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  8:42                     ` Julien Grall
@ 2020-07-02  8:50                       ` Jan Beulich
  2020-07-02  8:54                         ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-02  8:50 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

On 02.07.2020 10:42, Julien Grall wrote:
> On 02/07/2020 09:29, Jan Beulich wrote:
>> I'm with Andrew here, fwiw, as long as the little bit of code that
>> is actually put in common/ or include/xen/ doesn't imply arbitrary
>> restrictions on acceptable values.
> Well yes the code is simple. However, the code as it is wouldn't be 
> usuable on other architecture without additional work (aside arch 
> specific code). For instance, there is no way to map the buffer outside 
> of Xen as it is all x86 specific.
> 
> If you want the allocation to be in the common code, then the 
> infrastructure to map/unmap the buffer should also be in common code. 
> Otherwise, there is no point to allocate it in common.

I don't think I agree here - I see nothing wrong with exposing of
the memory being arch specific, when allocation is generic. This
is no different from, in just x86, allocation logic being common
to PV and HVM, but exposing being different for both.

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  8:50                       ` Jan Beulich
@ 2020-07-02  8:54                         ` Julien Grall
  2020-07-02  9:18                           ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-02  8:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné



On 02/07/2020 09:50, Jan Beulich wrote:
> On 02.07.2020 10:42, Julien Grall wrote:
>> On 02/07/2020 09:29, Jan Beulich wrote:
>>> I'm with Andrew here, fwiw, as long as the little bit of code that
>>> is actually put in common/ or include/xen/ doesn't imply arbitrary
>>> restrictions on acceptable values.
>> Well yes the code is simple. However, the code as it is wouldn't be
>> usuable on other architecture without additional work (aside arch
>> specific code). For instance, there is no way to map the buffer outside
>> of Xen as it is all x86 specific.
>>
>> If you want the allocation to be in the common code, then the
>> infrastructure to map/unmap the buffer should also be in common code.
>> Otherwise, there is no point to allocate it in common.
> 
> I don't think I agree here - I see nothing wrong with exposing of
> the memory being arch specific, when allocation is generic. This
> is no different from, in just x86, allocation logic being common
> to PV and HVM, but exposing being different for both.

Are you suggesting that the way it would be exposed may be different for 
other architecture?

If so, this is one more reason to not impose a way for allocating the 
buffer in the common code until another arch add support for vmtrace.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-06-30 12:33 ` [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
  2020-07-01 10:05   ` Roger Pau Monné
@ 2020-07-02  9:00   ` Roger Pau Monné
  2020-07-02 16:23     ` Michał Leszczyński
  2020-07-02 10:24   ` Anthony PERARD
  2020-07-04 17:48   ` Julien Grall
  3 siblings, 1 reply; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-02  9:00 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jan Beulich, Anthony PERARD, xen-devel

On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 59bdc28c89..7b8289d436 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>      uint32_t max_evtchn_port;
>      int32_t max_grant_frames;
>      int32_t max_maptrack_frames;
> +    uint8_t vmtrace_pt_order;

I've been thinking about this, and even though this is a domctl (so
not a stable interface) we might want to consider using a size (or a
number of pages) here rather than an order. IPT also supports
TOPA mode (kind of a linked list of buffers) that would allow for
sizes not rounded to order boundaries to be used, since then only each
item in the linked list needs to be rounded to an order boundary, so
you could for example use three 4K pages in TOPA mode AFAICT.

Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  8:54                         ` Julien Grall
@ 2020-07-02  9:18                           ` Jan Beulich
  2020-07-02  9:57                             ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-02  9:18 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

On 02.07.2020 10:54, Julien Grall wrote:
> 
> 
> On 02/07/2020 09:50, Jan Beulich wrote:
>> On 02.07.2020 10:42, Julien Grall wrote:
>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>> I'm with Andrew here, fwiw, as long as the little bit of code that
>>>> is actually put in common/ or include/xen/ doesn't imply arbitrary
>>>> restrictions on acceptable values.
>>> Well yes the code is simple. However, the code as it is wouldn't be
>>> usuable on other architecture without additional work (aside arch
>>> specific code). For instance, there is no way to map the buffer outside
>>> of Xen as it is all x86 specific.
>>>
>>> If you want the allocation to be in the common code, then the
>>> infrastructure to map/unmap the buffer should also be in common code.
>>> Otherwise, there is no point to allocate it in common.
>>
>> I don't think I agree here - I see nothing wrong with exposing of
>> the memory being arch specific, when allocation is generic. This
>> is no different from, in just x86, allocation logic being common
>> to PV and HVM, but exposing being different for both.
> 
> Are you suggesting that the way it would be exposed may be different for 
> other architecture?

Why not? To take a possibly extreme example - consider an arch
where (for bare metal) the buffer is specified to appear at a
fixed range of addresses. This would then want to be this way
in the virtualized case as well. There'd be no point in using
any common logic mapping the buffer at a guest requested
address. Instead it would simply appear at the arch mandated
one, without the guest needing to take any action.

> If so, this is one more reason to not impose a way for allocating the 
> buffer in the common code until another arch add support for vmtrace.

I'm still not seeing why allocation and exposure need to be done
at the same place.

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  9:18                           ` Jan Beulich
@ 2020-07-02  9:57                             ` Julien Grall
  2020-07-02 13:30                               ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-02  9:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

Hi,

On 02/07/2020 10:18, Jan Beulich wrote:
> On 02.07.2020 10:54, Julien Grall wrote:
>>
>>
>> On 02/07/2020 09:50, Jan Beulich wrote:
>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>> I'm with Andrew here, fwiw, as long as the little bit of code that
>>>>> is actually put in common/ or include/xen/ doesn't imply arbitrary
>>>>> restrictions on acceptable values.
>>>> Well yes the code is simple. However, the code as it is wouldn't be
>>>> usuable on other architecture without additional work (aside arch
>>>> specific code). For instance, there is no way to map the buffer outside
>>>> of Xen as it is all x86 specific.
>>>>
>>>> If you want the allocation to be in the common code, then the
>>>> infrastructure to map/unmap the buffer should also be in common code.
>>>> Otherwise, there is no point to allocate it in common.
>>>
>>> I don't think I agree here - I see nothing wrong with exposing of
>>> the memory being arch specific, when allocation is generic. This
>>> is no different from, in just x86, allocation logic being common
>>> to PV and HVM, but exposing being different for both.
>>
>> Are you suggesting that the way it would be exposed may be different for
>> other architecture?
> 
> Why not? To take a possibly extreme example - consider an arch
> where (for bare metal) the buffer is specified to appear at a
> fixed range of addresses.

I am probably missing something here... The current goal is the buffer 
will be mapped in the dom0. Most likely the way to map it will be using 
the acquire hypercall (unless you invent a brand new one...).

For a guest, you could possibly reserve a fixed range and then map it 
when creating the vCPU in Xen. But then, you will likely want a fixed 
size... So why would you bother to ask the user to define the size?

Another way to do it, would be the toolstack to do the mapping. At which 
point, you still need an hypercall to do the mapping (probably the 
hypercall acquire).

> 
>> If so, this is one more reason to not impose a way for allocating the
>> buffer in the common code until another arch add support for vmtrace.
> 
> I'm still not seeing why allocation and exposure need to be done
> at the same place.

If I were going to add support for CoreSight on Arm, then the acquire 
hypercall is likely going to be the way to go for mapping the resource 
in the tools. At which point this will need to be common.

I am still not entirely happy to define the interface yet, but this 
would still be better than trying to make the allocation in common code 
and the leaving the mapping aside. After all, this is a "little bit" 
more code.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-06-30 12:33 ` [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
  2020-07-01 10:05   ` Roger Pau Monné
  2020-07-02  9:00   ` Roger Pau Monné
@ 2020-07-02 10:24   ` Anthony PERARD
  2020-07-04 17:48   ` Julien Grall
  3 siblings, 0 replies; 75+ messages in thread
From: Anthony PERARD @ 2020-07-02 10:24 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei.kang,
	Jan Beulich, xen-devel

Hi Michał,

On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Allow to specify the size of per-vCPU trace buffer upon
> domain creation. This is zero by default (meaning: not enabled).
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index 0532739c1f..78f434b722 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -278,6 +278,16 @@ memory=8096 will report significantly less memory available for use
>  than a system with maxmem=8096 memory=8096 due to the memory overhead
>  of having to track the unused pages.
>  
> +=item B<vmtrace_pt_size=BYTES>

I don't like much this new configuration name. To me, "pt" sound like
passthrough, as in pci passthrough. But it seems to be for "processor
trace" (or tracing), isn't it? So if it is, then we have "trace" twice
in the name and I don't think that configuration is about tracing the
processor tracing feature. (Also I don't think we need to state "vm" in
the name easier as every configuration option should be about a vm.)

How about a name that is easier to understand without having to know all
the possible abbreviations? Maybe "processor_trace_buffer_size" or
similar?

> +
> +Specifies the size of processor trace buffer that would be allocated
> +for each vCPU belonging to this domain. Disabled (i.e. B<vmtrace_pt_size=0>
> +by default. This must be set to non-zero value in order to be able to
> +use processor tracing features with this domain.
> +
> +B<NOTE>: The size value must be between 4 kB and 4 GB and it must
> +be also a power of 2.

Maybe the configuration variable could take KBYTES for kilo-bytes
instead of just BYTES since the min is 4kB?

Also that item seems to be in the "Memory Allocation" section, but I
don't think that's a good place as the other options are for the size of
guest RAM. I don't know in which section this would be better but maybe
"Other Options" would be OK.

>  =back
>  
>  =head3 Guest Virtual NUMA Configuration
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index 61b4ef7b7e..4eba224590 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -1861,6 +1861,26 @@ void parse_config_data(const char *config_source,
>          }
>      }
>  
> +    if (!xlu_cfg_get_long(config, "vmtrace_pt_size", &l, 1) && l) {
> +        int32_t shift = 0;
> +
> +        if (l & (l - 1))
> +        {
> +            fprintf(stderr, "ERROR: pt buffer size must be a power of 2\n");

It would be better to state the option name in the error message.

> +            exit(1);
> +        }
> +
> +        while (l >>= 1) ++shift;
> +
> +        if (shift <= XEN_PAGE_SHIFT)
> +        {
> +            fprintf(stderr, "ERROR: too small pt buffer\n");
> +            exit(1);
> +        }
> +
> +        b_info->vmtrace_pt_order = shift - XEN_PAGE_SHIFT;
> +    }
> +
>      if (!xlu_cfg_get_list(config, "ioports", &ioports, &num_ioports, 0)) {
>          b_info->num_ioports = num_ioports;
>          b_info->ioports = calloc(num_ioports, sizeof(*b_info->ioports));
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 0a33e0dfd6..27dcfbac8c 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 59bdc28c89..7b8289d436 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h

I don't think it's wise to modify the toolstack, the hypervisor, and the
hypercall ABI in the same patch. Can you change this last two files in a
separate patch?

Thank you,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  9:57                             ` Julien Grall
@ 2020-07-02 13:30                               ` Jan Beulich
  2020-07-02 14:14                                 ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-02 13:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

On 02.07.2020 11:57, Julien Grall wrote:
> Hi,
> 
> On 02/07/2020 10:18, Jan Beulich wrote:
>> On 02.07.2020 10:54, Julien Grall wrote:
>>>
>>>
>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>>> I'm with Andrew here, fwiw, as long as the little bit of code that
>>>>>> is actually put in common/ or include/xen/ doesn't imply arbitrary
>>>>>> restrictions on acceptable values.
>>>>> Well yes the code is simple. However, the code as it is wouldn't be
>>>>> usuable on other architecture without additional work (aside arch
>>>>> specific code). For instance, there is no way to map the buffer outside
>>>>> of Xen as it is all x86 specific.
>>>>>
>>>>> If you want the allocation to be in the common code, then the
>>>>> infrastructure to map/unmap the buffer should also be in common code.
>>>>> Otherwise, there is no point to allocate it in common.
>>>>
>>>> I don't think I agree here - I see nothing wrong with exposing of
>>>> the memory being arch specific, when allocation is generic. This
>>>> is no different from, in just x86, allocation logic being common
>>>> to PV and HVM, but exposing being different for both.
>>>
>>> Are you suggesting that the way it would be exposed may be different for
>>> other architecture?
>>
>> Why not? To take a possibly extreme example - consider an arch
>> where (for bare metal) the buffer is specified to appear at a
>> fixed range of addresses.
> 
> I am probably missing something here... The current goal is the buffer 
> will be mapped in the dom0. Most likely the way to map it will be using 
> the acquire hypercall (unless you invent a brand new one...).
> 
> For a guest, you could possibly reserve a fixed range and then map it 
> when creating the vCPU in Xen. But then, you will likely want a fixed 
> size... So why would you bother to ask the user to define the size?

Because there may be the option to only populate part of the fixed
range?

> Another way to do it, would be the toolstack to do the mapping. At which 
> point, you still need an hypercall to do the mapping (probably the 
> hypercall acquire).

There may not be any mapping to do in such a contrived, fixed-range
environment. This scenario was specifically to demonstrate that the
way the mapping gets done may be arch-specific (here: a no-op)
despite the allocation not being so.

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02 13:30                               ` Jan Beulich
@ 2020-07-02 14:14                                 ` Julien Grall
  2020-07-02 14:17                                   ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-02 14:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

Hi,

On 02/07/2020 14:30, Jan Beulich wrote:
> On 02.07.2020 11:57, Julien Grall wrote:
>> Hi,
>>
>> On 02/07/2020 10:18, Jan Beulich wrote:
>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>
>>>>
>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>>>> I'm with Andrew here, fwiw, as long as the little bit of code that
>>>>>>> is actually put in common/ or include/xen/ doesn't imply arbitrary
>>>>>>> restrictions on acceptable values.
>>>>>> Well yes the code is simple. However, the code as it is wouldn't be
>>>>>> usuable on other architecture without additional work (aside arch
>>>>>> specific code). For instance, there is no way to map the buffer outside
>>>>>> of Xen as it is all x86 specific.
>>>>>>
>>>>>> If you want the allocation to be in the common code, then the
>>>>>> infrastructure to map/unmap the buffer should also be in common code.
>>>>>> Otherwise, there is no point to allocate it in common.
>>>>>
>>>>> I don't think I agree here - I see nothing wrong with exposing of
>>>>> the memory being arch specific, when allocation is generic. This
>>>>> is no different from, in just x86, allocation logic being common
>>>>> to PV and HVM, but exposing being different for both.
>>>>
>>>> Are you suggesting that the way it would be exposed may be different for
>>>> other architecture?
>>>
>>> Why not? To take a possibly extreme example - consider an arch
>>> where (for bare metal) the buffer is specified to appear at a
>>> fixed range of addresses.
>>
>> I am probably missing something here... The current goal is the buffer
>> will be mapped in the dom0. Most likely the way to map it will be using
>> the acquire hypercall (unless you invent a brand new one...).
>>
>> For a guest, you could possibly reserve a fixed range and then map it
>> when creating the vCPU in Xen. But then, you will likely want a fixed
>> size... So why would you bother to ask the user to define the size?
> 
> Because there may be the option to only populate part of the fixed
> range?

It was yet another extreme case ;).

> 
>> Another way to do it, would be the toolstack to do the mapping. At which
>> point, you still need an hypercall to do the mapping (probably the
>> hypercall acquire).
> 
> There may not be any mapping to do in such a contrived, fixed-range
> environment. This scenario was specifically to demonstrate that the
> way the mapping gets done may be arch-specific (here: a no-op)
> despite the allocation not being so.
You are arguing on extreme cases which I don't think is really helpful 
here. Yes if you want to map at a fixed address in a guest you may not 
need the acquire hypercall. But in most of the other cases (see has for 
the tools) you will need it.

So what's the problem with requesting to have the acquire hypercall 
implemented in common code?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02 14:14                                 ` Julien Grall
@ 2020-07-02 14:17                                   ` Jan Beulich
  2020-07-02 14:31                                     ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-02 14:17 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné

On 02.07.2020 16:14, Julien Grall wrote:
> Hi,
> 
> On 02/07/2020 14:30, Jan Beulich wrote:
>> On 02.07.2020 11:57, Julien Grall wrote:
>>> Hi,
>>>
>>> On 02/07/2020 10:18, Jan Beulich wrote:
>>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>>
>>>>>
>>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>>>>> I'm with Andrew here, fwiw, as long as the little bit of code that
>>>>>>>> is actually put in common/ or include/xen/ doesn't imply arbitrary
>>>>>>>> restrictions on acceptable values.
>>>>>>> Well yes the code is simple. However, the code as it is wouldn't be
>>>>>>> usuable on other architecture without additional work (aside arch
>>>>>>> specific code). For instance, there is no way to map the buffer outside
>>>>>>> of Xen as it is all x86 specific.
>>>>>>>
>>>>>>> If you want the allocation to be in the common code, then the
>>>>>>> infrastructure to map/unmap the buffer should also be in common code.
>>>>>>> Otherwise, there is no point to allocate it in common.
>>>>>>
>>>>>> I don't think I agree here - I see nothing wrong with exposing of
>>>>>> the memory being arch specific, when allocation is generic. This
>>>>>> is no different from, in just x86, allocation logic being common
>>>>>> to PV and HVM, but exposing being different for both.
>>>>>
>>>>> Are you suggesting that the way it would be exposed may be different for
>>>>> other architecture?
>>>>
>>>> Why not? To take a possibly extreme example - consider an arch
>>>> where (for bare metal) the buffer is specified to appear at a
>>>> fixed range of addresses.
>>>
>>> I am probably missing something here... The current goal is the buffer
>>> will be mapped in the dom0. Most likely the way to map it will be using
>>> the acquire hypercall (unless you invent a brand new one...).
>>>
>>> For a guest, you could possibly reserve a fixed range and then map it
>>> when creating the vCPU in Xen. But then, you will likely want a fixed
>>> size... So why would you bother to ask the user to define the size?
>>
>> Because there may be the option to only populate part of the fixed
>> range?
> 
> It was yet another extreme case ;).

Yes, sure - just to demonstrate my point.

>>> Another way to do it, would be the toolstack to do the mapping. At which
>>> point, you still need an hypercall to do the mapping (probably the
>>> hypercall acquire).
>>
>> There may not be any mapping to do in such a contrived, fixed-range
>> environment. This scenario was specifically to demonstrate that the
>> way the mapping gets done may be arch-specific (here: a no-op)
>> despite the allocation not being so.
> You are arguing on extreme cases which I don't think is really helpful 
> here. Yes if you want to map at a fixed address in a guest you may not 
> need the acquire hypercall. But in most of the other cases (see has for 
> the tools) you will need it.
> 
> So what's the problem with requesting to have the acquire hypercall 
> implemented in common code?

Didn't we start out by you asking that there be as little common code
as possible for the time being? I have no issue with putting the
acquire implementation there ...

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02 14:17                                   ` Jan Beulich
@ 2020-07-02 14:31                                     ` Julien Grall
  2020-07-02 20:28                                       ` Michał Leszczyński
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-02 14:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, tamas.lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, Jun Nakajima, xen-devel, luwei.kang,
	Roger Pau Monné



On 02/07/2020 15:17, Jan Beulich wrote:
> On 02.07.2020 16:14, Julien Grall wrote:
>> On 02/07/2020 14:30, Jan Beulich wrote:
>>> On 02.07.2020 11:57, Julien Grall wrote:
>>>> On 02/07/2020 10:18, Jan Beulich wrote:
>>>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>> Another way to do it, would be the toolstack to do the mapping. At which
>>>> point, you still need an hypercall to do the mapping (probably the
>>>> hypercall acquire).
>>>
>>> There may not be any mapping to do in such a contrived, fixed-range
>>> environment. This scenario was specifically to demonstrate that the
>>> way the mapping gets done may be arch-specific (here: a no-op)
>>> despite the allocation not being so.
>> You are arguing on extreme cases which I don't think is really helpful
>> here. Yes if you want to map at a fixed address in a guest you may not
>> need the acquire hypercall. But in most of the other cases (see has for
>> the tools) you will need it.
>>
>> So what's the problem with requesting to have the acquire hypercall
>> implemented in common code?
> 
> Didn't we start out by you asking that there be as little common code
> as possible for the time being?

Well as I said I am not in favor of having the allocation in common 
code, but if you want to keep it then you also want to implement 
map/unmap in the common code ([1], [2]).

> I have no issue with putting the
> acquire implementation there ...
This was definitely not clear given how you argued with extreme cases...

Cheers,

[1] <9a3f4d58-e5ad-c7a1-6c5f-42aa92101ca1@xen.org>
[2] <cf41855b-9e5e-13f2-9ab0-04b98f8b3cdd@xen.org>

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 10/10] tools/proctrace: add proctrace tool
  2020-06-30 12:33 ` [PATCH v4 10/10] tools/proctrace: add proctrace tool Michał Leszczyński
@ 2020-07-02 15:10   ` Andrew Cooper
  2020-07-21 10:52     ` Wei Liu
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Cooper @ 2020-07-02 15:10 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: luwei.kang, tamas.lengyel, Ian Jackson, Wei Liu

On 30/06/2020 13:33, Michał Leszczyński wrote:
> diff --git a/tools/proctrace/COPYING b/tools/proctrace/COPYING
> new file mode 100644
> index 0000000000..c0a841112c
> --- /dev/null
> +++ b/tools/proctrace/COPYING

The top-level COPYING file is GPL2.  There shouldn't be any need to
include a second copy here.

> diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
> new file mode 100644
> index 0000000000..2983c477fe
> --- /dev/null
> +++ b/tools/proctrace/Makefile
> @@ -0,0 +1,48 @@
> +# Copyright (C) CERT Polska - NASK PIB
> +# Author: Michał Leszczyński <michal.leszczynski@cert.pl>
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; under version 2 of the License.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +XEN_ROOT=$(CURDIR)/../..
> +include $(XEN_ROOT)/tools/Rules.mk
> +
> +CFLAGS  += -Werror
> +CFLAGS  += $(CFLAGS_libxenevtchn)
> +CFLAGS  += $(CFLAGS_libxenctrl)
> +LDLIBS  += $(LDLIBS_libxenctrl)
> +LDLIBS  += $(LDLIBS_libxenevtchn)
> +LDLIBS  += $(LDLIBS_libxenforeignmemory)
> +
> +.PHONY: all
> +all: build
> +
> +.PHONY: build
> +build: proctrace
> +
> +.PHONY: install
> +install: build
> +	$(INSTALL_DIR) $(DESTDIR)$(sbindir)
> +	$(INSTALL_PROG) proctrace $(DESTDIR)$(sbindir)/proctrace
> +
> +.PHONY: uninstall
> +uninstall:
> +	rm -f $(DESTDIR)$(sbindir)/proctrace
> +
> +.PHONY: clean
> +clean:
> +	$(RM) -f $(DEPS_RM)

You need to remove proctrace as well, for `make clean` to have the
intended semantics.

> +
> +.PHONY: distclean
> +distclean: clean
> +
> +iptlive: iptlive.o Makefile
> +	$(CC) $(LDFLAGS) $< -o $@ $(LDLIBS) $(APPEND_LDFLAGS)

This rule looks to be totally unused?

> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <sys/mman.h>
> +#include <signal.h>
> +
> +#include <xenctrl.h>
> +#include <xen/xen.h>
> +#include <xenforeignmemory.h>
> +
> +#define BUF_SIZE (16384 * XC_PAGE_SIZE)

This hardcodes the size of the buffer which is configurable per VM. 
Mapping the buffer fails when it is smaller than this.

It appears there is still outstanding bug from the acquire_resource work
which never got fixed.  The guest_handle_is_null(xmar.frame_list) path
in Xen is supposed to report the size of the resource, not the size of
Xen's local buffer, so userspace can ask "how large is this resource".

I'll try and find some time to fix this and arrange for backports, but
the current behaviour is nonsense, and problematic for new users.

~Andrew


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-02  9:00   ` Roger Pau Monné
@ 2020-07-02 16:23     ` Michał Leszczyński
  2020-07-03  9:44       ` Roger Pau Monné
  0 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-07-02 16:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, tamas lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei kang,
	Jan Beulich, Anthony PERARD, xen-devel

----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 59bdc28c89..7b8289d436 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>      uint32_t max_evtchn_port;
>>      int32_t max_grant_frames;
>>      int32_t max_maptrack_frames;
>> +    uint8_t vmtrace_pt_order;
> 
> I've been thinking about this, and even though this is a domctl (so
> not a stable interface) we might want to consider using a size (or a
> number of pages) here rather than an order. IPT also supports
> TOPA mode (kind of a linked list of buffers) that would allow for
> sizes not rounded to order boundaries to be used, since then only each
> item in the linked list needs to be rounded to an order boundary, so
> you could for example use three 4K pages in TOPA mode AFAICT.
> 
> Roger.

In previous versions it was "size" but it was requested to change it
to "order" in order to shrink the variable size from uint64_t to
uint8_t, because there is limited space for xen_domctl_createdomain
structure.

How should I proceed?

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02 14:31                                     ` Julien Grall
@ 2020-07-02 20:28                                       ` Michał Leszczyński
  2020-07-03  7:58                                         ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-07-02 20:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Stefano Stabellini, tamas lengyel, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	xen-devel, luwei kang, Roger Pau Monné

----- 2 lip 2020 o 16:31, Julien Grall julien@xen.org napisał(a):

> On 02/07/2020 15:17, Jan Beulich wrote:
>> On 02.07.2020 16:14, Julien Grall wrote:
>>> On 02/07/2020 14:30, Jan Beulich wrote:
>>>> On 02.07.2020 11:57, Julien Grall wrote:
>>>>> On 02/07/2020 10:18, Jan Beulich wrote:
>>>>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>> Another way to do it, would be the toolstack to do the mapping. At which
>>>>> point, you still need an hypercall to do the mapping (probably the
>>>>> hypercall acquire).
>>>>
>>>> There may not be any mapping to do in such a contrived, fixed-range
>>>> environment. This scenario was specifically to demonstrate that the
>>>> way the mapping gets done may be arch-specific (here: a no-op)
>>>> despite the allocation not being so.
>>> You are arguing on extreme cases which I don't think is really helpful
>>> here. Yes if you want to map at a fixed address in a guest you may not
>>> need the acquire hypercall. But in most of the other cases (see has for
>>> the tools) you will need it.
>>>
>>> So what's the problem with requesting to have the acquire hypercall
>>> implemented in common code?
>> 
>> Didn't we start out by you asking that there be as little common code
>> as possible for the time being?
> 
> Well as I said I am not in favor of having the allocation in common
> code, but if you want to keep it then you also want to implement
> map/unmap in the common code ([1], [2]).
> 
>> I have no issue with putting the
>> acquire implementation there ...
> This was definitely not clear given how you argued with extreme cases...
> 
> Cheers,
> 
> [1] <9a3f4d58-e5ad-c7a1-6c5f-42aa92101ca1@xen.org>
> [2] <cf41855b-9e5e-13f2-9ab0-04b98f8b3cdd@xen.org>
> 
> --
> Julien Grall


Guys,

could you express your final decision on this topic?

While I understand the discussion and the arguments you've raised,
I would like to know what particular elements should be moved where.

So are we going abstract way, or non-abstract-x86 only way?

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02  8:34       ` Jan Beulich
@ 2020-07-02 20:29         ` Michał Leszczyński
  0 siblings, 0 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-07-02 20:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Kevin Tian, Stefano Stabellini, tamas lengyel,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	xen-devel, luwei kang, Roger Pau Monné

----- 2 lip 2020 o 10:34, Jan Beulich jbeulich@suse.com napisał(a):

> On 02.07.2020 10:10, Roger Pau Monné wrote:
>> On Wed, Jul 01, 2020 at 10:42:55PM +0100, Andrew Cooper wrote:
>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> index ca94c2bedc..b73d824357 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> @@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
>>>>          _vmx_cpu_based_exec_control &=
>>>>              ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
>>>>  
>>>> +    rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>> +
>>>> +    /* Check whether IPT is supported in VMX operation. */
>>>> +    vmtrace_supported = cpu_has_ipt &&
>>>> +                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>>>
>>> There is a subtle corner case here.  vmx_init_vmcs_config() is called on
>>> all CPUs, and is supposed to level things down safely if we find any
>>> asymmetry.
>>>
>>> If instead you go with something like this:
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index b73d824357..6960109183 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -294,8 +294,8 @@ static int vmx_init_vmcs_config(void)
>>>      rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>  
>>>      /* Check whether IPT is supported in VMX operation. */
>>> -    vmtrace_supported = cpu_has_ipt &&
>>> -                        (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>>> +    if ( !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>>> +        vmtrace_supported = false;
>> 
>> This is also used during hotplug, so I'm not sure it's safe to turn
>> vmtrace_supported off during runtime, where VMs might be already using
>> it. IMO it would be easier to just set it on the BSP, and then refuse
>> to bring up any AP that doesn't have the feature.
> 
> +1
> 
> IOW I also don't think that "vmx_init_vmcs_config() ... is supposed to
> level things down safely". Instead I think the expectation is for
> CPU onlining to fail if a CPU lacks features compared to the BSP. As
> can be implied from what Roger says, doing like what you suggest may
> be fine during boot, but past that only at times where we know there's
> no user of a certain feature, and where discarding the feature flag
> won't lead to other inconsistencies (which may very well mean "never").
> 
> Jan


Ok, I will modify it in a way Roger suggested for the previous patch
version. CPU onlining will fail if there is an inconsistency.

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-02 20:28                                       ` Michał Leszczyński
@ 2020-07-03  7:58                                         ` Julien Grall
  2020-07-04 19:16                                           ` Michał Leszczyński
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-03  7:58 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Kevin Tian, Stefano Stabellini, tamas lengyel, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	xen-devel, luwei kang, Roger Pau Monné

Hi,

On 02/07/2020 21:28, Michał Leszczyński wrote:
> ----- 2 lip 2020 o 16:31, Julien Grall julien@xen.org napisał(a):
> 
>> On 02/07/2020 15:17, Jan Beulich wrote:
>>> On 02.07.2020 16:14, Julien Grall wrote:
>>>> On 02/07/2020 14:30, Jan Beulich wrote:
>>>>> On 02.07.2020 11:57, Julien Grall wrote:
>>>>>> On 02/07/2020 10:18, Jan Beulich wrote:
>>>>>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>>> Another way to do it, would be the toolstack to do the mapping. At which
>>>>>> point, you still need an hypercall to do the mapping (probably the
>>>>>> hypercall acquire).
>>>>>
>>>>> There may not be any mapping to do in such a contrived, fixed-range
>>>>> environment. This scenario was specifically to demonstrate that the
>>>>> way the mapping gets done may be arch-specific (here: a no-op)
>>>>> despite the allocation not being so.
>>>> You are arguing on extreme cases which I don't think is really helpful
>>>> here. Yes if you want to map at a fixed address in a guest you may not
>>>> need the acquire hypercall. But in most of the other cases (see has for
>>>> the tools) you will need it.
>>>>
>>>> So what's the problem with requesting to have the acquire hypercall
>>>> implemented in common code?
>>>
>>> Didn't we start out by you asking that there be as little common code
>>> as possible for the time being?
>>
>> Well as I said I am not in favor of having the allocation in common
>> code, but if you want to keep it then you also want to implement
>> map/unmap in the common code ([1], [2]).
>>
>>> I have no issue with putting the
>>> acquire implementation there ...
>> This was definitely not clear given how you argued with extreme cases...
>>
>> Cheers,
>>
>> [1] <9a3f4d58-e5ad-c7a1-6c5f-42aa92101ca1@xen.org>
>> [2] <cf41855b-9e5e-13f2-9ab0-04b98f8b3cdd@xen.org>
>>
>> --
>> Julien Grall
> 
> 
> Guys,
> 
> could you express your final decision on this topic?

Can you move the acquire implementation from x86 to common code?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-02 16:23     ` Michał Leszczyński
@ 2020-07-03  9:44       ` Roger Pau Monné
  2020-07-03  9:56         ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-03  9:44 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei kang,
	Jan Beulich, Anthony PERARD, xen-devel

On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
> >> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >> index 59bdc28c89..7b8289d436 100644
> >> --- a/xen/include/public/domctl.h
> >> +++ b/xen/include/public/domctl.h
> >> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
> >>      uint32_t max_evtchn_port;
> >>      int32_t max_grant_frames;
> >>      int32_t max_maptrack_frames;
> >> +    uint8_t vmtrace_pt_order;
> > 
> > I've been thinking about this, and even though this is a domctl (so
> > not a stable interface) we might want to consider using a size (or a
> > number of pages) here rather than an order. IPT also supports
> > TOPA mode (kind of a linked list of buffers) that would allow for
> > sizes not rounded to order boundaries to be used, since then only each
> > item in the linked list needs to be rounded to an order boundary, so
> > you could for example use three 4K pages in TOPA mode AFAICT.
> > 
> > Roger.
> 
> In previous versions it was "size" but it was requested to change it
> to "order" in order to shrink the variable size from uint64_t to
> uint8_t, because there is limited space for xen_domctl_createdomain
> structure.

It's likely I'm missing something here, but I wasn't aware
xen_domctl_createdomain had any constrains regarding it's size. It's
currently 48bytes which seems fairly small.

There might be constrains on struct domain (the hypervisor internal
domain tracking structure), but I think you are already using a size
field there IIRC.

> 
> How should I proceed?

This is an unstable interface, so we could always change it. It seems
like we might want to use a size parameter at some point to take
advantage of non physically contiguous buffers, but if there are other
blockers that prevent such field from being wider ATM I'm fine with
it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-03  9:44       ` Roger Pau Monné
@ 2020-07-03  9:56         ` Jan Beulich
  2020-07-03 10:11           ` Roger Pau Monné
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-03  9:56 UTC (permalink / raw)
  To: Roger Pau Monné, Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei kang,
	Anthony PERARD, xen-devel

On 03.07.2020 11:44, Roger Pau Monné wrote:
> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>
>>> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>> index 59bdc28c89..7b8289d436 100644
>>>> --- a/xen/include/public/domctl.h
>>>> +++ b/xen/include/public/domctl.h
>>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>>>      uint32_t max_evtchn_port;
>>>>      int32_t max_grant_frames;
>>>>      int32_t max_maptrack_frames;
>>>> +    uint8_t vmtrace_pt_order;
>>>
>>> I've been thinking about this, and even though this is a domctl (so
>>> not a stable interface) we might want to consider using a size (or a
>>> number of pages) here rather than an order. IPT also supports
>>> TOPA mode (kind of a linked list of buffers) that would allow for
>>> sizes not rounded to order boundaries to be used, since then only each
>>> item in the linked list needs to be rounded to an order boundary, so
>>> you could for example use three 4K pages in TOPA mode AFAICT.
>>>
>>> Roger.
>>
>> In previous versions it was "size" but it was requested to change it
>> to "order" in order to shrink the variable size from uint64_t to
>> uint8_t, because there is limited space for xen_domctl_createdomain
>> structure.
> 
> It's likely I'm missing something here, but I wasn't aware
> xen_domctl_createdomain had any constrains regarding it's size. It's
> currently 48bytes which seems fairly small.

Additionally I would guess a uint32_t could do here, if the value
passed was "number of pages" rather than "number of bytes"?

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-03  9:56         ` Jan Beulich
@ 2020-07-03 10:11           ` Roger Pau Monné
  2020-07-04 17:23             ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Roger Pau Monné @ 2020-07-03 10:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Stefano Stabellini, tamas lengyel, Wei Liu,
	Andrew Cooper, Michał Leszczyński, Ian Jackson,
	George Dunlap, luwei kang, Anthony PERARD, xen-devel

On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
> On 03.07.2020 11:44, Roger Pau Monné wrote:
> > On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
> >> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >>
> >>> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
> >>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >>>> index 59bdc28c89..7b8289d436 100644
> >>>> --- a/xen/include/public/domctl.h
> >>>> +++ b/xen/include/public/domctl.h
> >>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
> >>>>      uint32_t max_evtchn_port;
> >>>>      int32_t max_grant_frames;
> >>>>      int32_t max_maptrack_frames;
> >>>> +    uint8_t vmtrace_pt_order;
> >>>
> >>> I've been thinking about this, and even though this is a domctl (so
> >>> not a stable interface) we might want to consider using a size (or a
> >>> number of pages) here rather than an order. IPT also supports
> >>> TOPA mode (kind of a linked list of buffers) that would allow for
> >>> sizes not rounded to order boundaries to be used, since then only each
> >>> item in the linked list needs to be rounded to an order boundary, so
> >>> you could for example use three 4K pages in TOPA mode AFAICT.
> >>>
> >>> Roger.
> >>
> >> In previous versions it was "size" but it was requested to change it
> >> to "order" in order to shrink the variable size from uint64_t to
> >> uint8_t, because there is limited space for xen_domctl_createdomain
> >> structure.
> > 
> > It's likely I'm missing something here, but I wasn't aware
> > xen_domctl_createdomain had any constrains regarding it's size. It's
> > currently 48bytes which seems fairly small.
> 
> Additionally I would guess a uint32_t could do here, if the value
> passed was "number of pages" rather than "number of bytes"?

That could work, not sure if it needs to state however that those will
be 4K pages, since Arm can have a different minimum page size IIRC?
(or that's already the assumption for all number of frames fields)
vmtrace_nr_frames seems fine to me.

Roger.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-06-30 12:33 ` [PATCH v4 06/10] memory: batch processing in acquire_resource() Michał Leszczyński
  2020-07-01 10:46   ` Roger Pau Monné
@ 2020-07-03 10:35   ` Julien Grall
  2020-07-03 10:52     ` Paul Durrant
  1 sibling, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-03 10:35 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: Stefano Stabellini, tamas.lengyel, Wei Liu, paul, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, luwei.kang

(+ Paul as the author XENMEM_acquire_resource)

Hi,

On 30/06/2020 13:33, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Allow to acquire large resources by allowing acquire_resource()
> to process items in batches, using hypercall continuation.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>   xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
>   1 file changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 714077c1e5..3ab06581a2 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
>   }
>   
>   static int acquire_resource(
> -    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
> +    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
> +    unsigned long *start_extent)
>   {
>       struct domain *d, *currd = current->domain;
>       xen_mem_acquire_resource_t xmar;
> +    uint32_t total_frames;
>       /*
>        * The mfn_list and gfn_list (below) arrays are ok on stack for the
>        * moment since they are small, but if they need to grow in future
> @@ -1077,8 +1079,17 @@ static int acquire_resource(
>           return 0;
>       }
>   
> +    total_frames = xmar.nr_frames;

On 32-bit, the start_extent would be 26-bits wide which is not enough to 
cover all the xmar.nr_frames. Therefore, you want that check that it is 
possible to encode a continuation. Something like:

/* Is the size too large for us to encode a continuation? */
if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )

> +
> +    if ( *start_extent ) > +    {
> +        xmar.frame += *start_extent;
> +        xmar.nr_frames -= *start_extent;

As start_extent is exposed to the guest, you want to check if it is not 
bigger than xmar.nr_frames.

> +        guest_handle_add_offset(xmar.frame_list, *start_extent);
> +    }
> +
>       if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
> -        return -E2BIG;
> +        xmar.nr_frames = ARRAY_SIZE(mfn_list);

The documentation of the hypercall suggests that if you pass NULL, then 
it will return the maximum number value for nr_frames supported by the 
implementation. So technically a domain cannot use more than 
ARRAY_SIZE(mfn_list).

However, you new addition conflict with the documentation. Can you 
clarify how a domain will know that it can use more than 
ARRAY_SIZE(mfn_list)?

>   
>       rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
>       if ( rc )
> @@ -1135,6 +1146,14 @@ static int acquire_resource(
>           }
>       }
>   
> +    if ( !rc )
> +    {
> +        *start_extent += xmar.nr_frames;
> +
> +        if ( *start_extent != total_frames )
> +            rc = -ERESTART;
> +    }
> +
>    out:
>       rcu_unlock_domain(d);
>   
> @@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>   
>       case XENMEM_acquire_resource:
>           rc = acquire_resource(
> -            guest_handle_cast(arg, xen_mem_acquire_resource_t));
> +            guest_handle_cast(arg, xen_mem_acquire_resource_t),
> +            &start_extent);

Hmmm... it looks like we forgot to check that start_extent is always 0 
when the hypercall was added.

As this is exposed to the guest, it technically means that there no 
guarantee that start_extent will always be 0.

However, in practice, this was likely the intention and should be the 
case. So it may just be enough to mention the potential breakage in the 
commit message.

@All: what do you think?

> +
> +        if ( rc == -ERESTART )
> +            return hypercall_create_continuation(
> +                __HYPERVISOR_memory_op, "lh",
> +                op | (start_extent << MEMOP_EXTENT_SHIFT), arg);
> +
>           break;
>   
>       default:
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-07-03 10:35   ` Julien Grall
@ 2020-07-03 10:52     ` Paul Durrant
  2020-07-03 11:17       ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Paul Durrant @ 2020-07-03 10:52 UTC (permalink / raw)
  To: 'Julien Grall', 'Michał Leszczyński',
	xen-devel
  Cc: 'Stefano Stabellini', tamas.lengyel, 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Jan Beulich',
	luwei.kang

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 03 July 2020 11:36
> To: Michał Leszczyński <michal.leszczynski@cert.pl>; xen-devel@lists.xenproject.org
> Cc: luwei.kang@intel.com; tamas.lengyel@intel.com; Andrew Cooper <andrew.cooper3@citrix.com>; George
> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; paul@xen.org
> Subject: Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
> 
> (+ Paul as the author XENMEM_acquire_resource)
> 
> Hi,
> 
> On 30/06/2020 13:33, Michał Leszczyński wrote:
> > From: Michal Leszczynski <michal.leszczynski@cert.pl>
> >
> > Allow to acquire large resources by allowing acquire_resource()
> > to process items in batches, using hypercall continuation.
> >
> > Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> > ---
> >   xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
> >   1 file changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/xen/common/memory.c b/xen/common/memory.c
> > index 714077c1e5..3ab06581a2 100644
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
> >   }
> >
> >   static int acquire_resource(
> > -    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
> > +    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
> > +    unsigned long *start_extent)
> >   {
> >       struct domain *d, *currd = current->domain;
> >       xen_mem_acquire_resource_t xmar;
> > +    uint32_t total_frames;
> >       /*
> >        * The mfn_list and gfn_list (below) arrays are ok on stack for the
> >        * moment since they are small, but if they need to grow in future
> > @@ -1077,8 +1079,17 @@ static int acquire_resource(
> >           return 0;
> >       }
> >
> > +    total_frames = xmar.nr_frames;
> 
> On 32-bit, the start_extent would be 26-bits wide which is not enough to
> cover all the xmar.nr_frames. Therefore, you want that check that it is
> possible to encode a continuation. Something like:
> 
> /* Is the size too large for us to encode a continuation? */
> if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )
> 
> > +
> > +    if ( *start_extent ) > +    {
> > +        xmar.frame += *start_extent;
> > +        xmar.nr_frames -= *start_extent;
> 
> As start_extent is exposed to the guest, you want to check if it is not
> bigger than xmar.nr_frames.
> 
> > +        guest_handle_add_offset(xmar.frame_list, *start_extent);
> > +    }
> > +
> >       if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
> > -        return -E2BIG;
> > +        xmar.nr_frames = ARRAY_SIZE(mfn_list);
> 
> The documentation of the hypercall suggests that if you pass NULL, then
> it will return the maximum number value for nr_frames supported by the
> implementation. So technically a domain cannot use more than
> ARRAY_SIZE(mfn_list).
> 
> However, you new addition conflict with the documentation. Can you
> clarify how a domain will know that it can use more than
> ARRAY_SIZE(mfn_list)?

The domain should not need to know. It should be told the maximum number of frames of the type it wants. If we have to carve that up into batches inside Xen then the caller should not need to care, right?

> 
> >
> >       rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
> >       if ( rc )
> > @@ -1135,6 +1146,14 @@ static int acquire_resource(
> >           }
> >       }
> >
> > +    if ( !rc )
> > +    {
> > +        *start_extent += xmar.nr_frames;
> > +
> > +        if ( *start_extent != total_frames )
> > +            rc = -ERESTART;
> > +    }
> > +
> >    out:
> >       rcu_unlock_domain(d);
> >
> > @@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >
> >       case XENMEM_acquire_resource:
> >           rc = acquire_resource(
> > -            guest_handle_cast(arg, xen_mem_acquire_resource_t));
> > +            guest_handle_cast(arg, xen_mem_acquire_resource_t),
> > +            &start_extent);
> 
> Hmmm... it looks like we forgot to check that start_extent is always 0
> when the hypercall was added.
> 
> As this is exposed to the guest, it technically means that there no
> guarantee that start_extent will always be 0.
> 

I don't follow. A start extent != 0 means you are in a continuation. How can you check for 0 without breaking continuations?

  Paul

> However, in practice, this was likely the intention and should be the
> case. So it may just be enough to mention the potential breakage in the
> commit message.
> 
> @All: what do you think?
> 
> > +
> > +        if ( rc == -ERESTART )
> > +            return hypercall_create_continuation(
> > +                __HYPERVISOR_memory_op, "lh",
> > +                op | (start_extent << MEMOP_EXTENT_SHIFT), arg);
> > +
> >           break;
> >
> >       default:
> >
> 
> Cheers,
> 
> --
> Julien Grall



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-07-03 10:52     ` Paul Durrant
@ 2020-07-03 11:17       ` Julien Grall
  2020-07-03 11:22         ` Jan Beulich
  2020-07-03 11:40         ` Paul Durrant
  0 siblings, 2 replies; 75+ messages in thread
From: Julien Grall @ 2020-07-03 11:17 UTC (permalink / raw)
  To: paul, 'Michał Leszczyński', xen-devel
  Cc: 'Stefano Stabellini', tamas.lengyel, 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Jan Beulich',
	luwei.kang

Hi,

On 03/07/2020 11:52, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 03 July 2020 11:36
>> To: Michał Leszczyński <michal.leszczynski@cert.pl>; xen-devel@lists.xenproject.org
>> Cc: luwei.kang@intel.com; tamas.lengyel@intel.com; Andrew Cooper <andrew.cooper3@citrix.com>; George
>> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; paul@xen.org
>> Subject: Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
>>
>> (+ Paul as the author XENMEM_acquire_resource)
>>
>> Hi,
>>
>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>> From: Michal Leszczynski <michal.leszczynski@cert.pl>
>>>
>>> Allow to acquire large resources by allowing acquire_resource()
>>> to process items in batches, using hypercall continuation.
>>>
>>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
>>> ---
>>>    xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
>>>    1 file changed, 29 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>> index 714077c1e5..3ab06581a2 100644
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
>>>    }
>>>
>>>    static int acquire_resource(
>>> -    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
>>> +    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
>>> +    unsigned long *start_extent)
>>>    {
>>>        struct domain *d, *currd = current->domain;
>>>        xen_mem_acquire_resource_t xmar;
>>> +    uint32_t total_frames;
>>>        /*
>>>         * The mfn_list and gfn_list (below) arrays are ok on stack for the
>>>         * moment since they are small, but if they need to grow in future
>>> @@ -1077,8 +1079,17 @@ static int acquire_resource(
>>>            return 0;
>>>        }
>>>
>>> +    total_frames = xmar.nr_frames;
>>
>> On 32-bit, the start_extent would be 26-bits wide which is not enough to
>> cover all the xmar.nr_frames. Therefore, you want that check that it is
>> possible to encode a continuation. Something like:
>>
>> /* Is the size too large for us to encode a continuation? */
>> if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )
>>
>>> +
>>> +    if ( *start_extent ) > +    {
>>> +        xmar.frame += *start_extent;
>>> +        xmar.nr_frames -= *start_extent;
>>
>> As start_extent is exposed to the guest, you want to check if it is not
>> bigger than xmar.nr_frames.
>>
>>> +        guest_handle_add_offset(xmar.frame_list, *start_extent);
>>> +    }
>>> +
>>>        if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
>>> -        return -E2BIG;
>>> +        xmar.nr_frames = ARRAY_SIZE(mfn_list);
>>
>> The documentation of the hypercall suggests that if you pass NULL, then
>> it will return the maximum number value for nr_frames supported by the
>> implementation. So technically a domain cannot use more than
>> ARRAY_SIZE(mfn_list).
>>
>> However, you new addition conflict with the documentation. Can you
>> clarify how a domain will know that it can use more than
>> ARRAY_SIZE(mfn_list)?
> 
> The domain should not need to know. It should be told the maximum number of frames of the type it wants. If we have to carve that up into batches inside Xen then the caller should not need to care, right?

In the current implementation, we tell the guest how many frames it can 
request in a batch. This number may be much smaller that the maximum 
number of frames of the type.

Furthermore this value is not tie to the xmar.type. Therefore, it is 
valid for a guest to call this hypercall only once at boot to figure out 
the maximum batch.

So while the change you suggest looks a good idea, I don't think it is 
possible to do that with the current hypercall.

> 
>>
>>>
>>>        rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
>>>        if ( rc )
>>> @@ -1135,6 +1146,14 @@ static int acquire_resource(
>>>            }
>>>        }
>>>
>>> +    if ( !rc )
>>> +    {
>>> +        *start_extent += xmar.nr_frames;
>>> +
>>> +        if ( *start_extent != total_frames )
>>> +            rc = -ERESTART;
>>> +    }
>>> +
>>>     out:
>>>        rcu_unlock_domain(d);
>>>
>>> @@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>
>>>        case XENMEM_acquire_resource:
>>>            rc = acquire_resource(
>>> -            guest_handle_cast(arg, xen_mem_acquire_resource_t));
>>> +            guest_handle_cast(arg, xen_mem_acquire_resource_t),
>>> +            &start_extent);
>>
>> Hmmm... it looks like we forgot to check that start_extent is always 0
>> when the hypercall was added.
>>
>> As this is exposed to the guest, it technically means that there no
>> guarantee that start_extent will always be 0.
>>
> 
> I don't follow. A start extent != 0 means you are in a continuation. How can you check for 0 without breaking continuations?

I think you misundertood my point. My point is we never checked that 
start_extent was 0. So a guest could validly pass a non-zero value to 
start_extent and not break on older Xen release.

When this patch will be merged, such guest would behave differently. Or 
did I miss any check/documentation for the start_extent value?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-07-03 11:17       ` Julien Grall
@ 2020-07-03 11:22         ` Jan Beulich
  2020-07-03 11:36           ` Julien Grall
  2020-07-03 11:40         ` Paul Durrant
  1 sibling, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-03 11:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Stefano Stabellini', tamas.lengyel, 'Wei Liu',
	paul, 'Andrew Cooper',
	'Michał Leszczyński', 'Ian Jackson',
	'George Dunlap',
	luwei.kang, xen-devel

On 03.07.2020 13:17, Julien Grall wrote:
> Hi,
> 
> On 03/07/2020 11:52, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 03 July 2020 11:36
>>> To: Michał Leszczyński <michal.leszczynski@cert.pl>; xen-devel@lists.xenproject.org
>>> Cc: luwei.kang@intel.com; tamas.lengyel@intel.com; Andrew Cooper <andrew.cooper3@citrix.com>; George
>>> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
>>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; paul@xen.org
>>> Subject: Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
>>>
>>> (+ Paul as the author XENMEM_acquire_resource)
>>>
>>> Hi,
>>>
>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>> From: Michal Leszczynski <michal.leszczynski@cert.pl>
>>>>
>>>> Allow to acquire large resources by allowing acquire_resource()
>>>> to process items in batches, using hypercall continuation.
>>>>
>>>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
>>>> ---
>>>>    xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
>>>>    1 file changed, 29 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>>> index 714077c1e5..3ab06581a2 100644
>>>> --- a/xen/common/memory.c
>>>> +++ b/xen/common/memory.c
>>>> @@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
>>>>    }
>>>>
>>>>    static int acquire_resource(
>>>> -    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
>>>> +    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
>>>> +    unsigned long *start_extent)
>>>>    {
>>>>        struct domain *d, *currd = current->domain;
>>>>        xen_mem_acquire_resource_t xmar;
>>>> +    uint32_t total_frames;
>>>>        /*
>>>>         * The mfn_list and gfn_list (below) arrays are ok on stack for the
>>>>         * moment since they are small, but if they need to grow in future
>>>> @@ -1077,8 +1079,17 @@ static int acquire_resource(
>>>>            return 0;
>>>>        }
>>>>
>>>> +    total_frames = xmar.nr_frames;
>>>
>>> On 32-bit, the start_extent would be 26-bits wide which is not enough to
>>> cover all the xmar.nr_frames. Therefore, you want that check that it is
>>> possible to encode a continuation. Something like:
>>>
>>> /* Is the size too large for us to encode a continuation? */
>>> if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )
>>>
>>>> +
>>>> +    if ( *start_extent ) > +    {
>>>> +        xmar.frame += *start_extent;
>>>> +        xmar.nr_frames -= *start_extent;
>>>
>>> As start_extent is exposed to the guest, you want to check if it is not
>>> bigger than xmar.nr_frames.
>>>
>>>> +        guest_handle_add_offset(xmar.frame_list, *start_extent);
>>>> +    }
>>>> +
>>>>        if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
>>>> -        return -E2BIG;
>>>> +        xmar.nr_frames = ARRAY_SIZE(mfn_list);
>>>
>>> The documentation of the hypercall suggests that if you pass NULL, then
>>> it will return the maximum number value for nr_frames supported by the
>>> implementation. So technically a domain cannot use more than
>>> ARRAY_SIZE(mfn_list).
>>>
>>> However, you new addition conflict with the documentation. Can you
>>> clarify how a domain will know that it can use more than
>>> ARRAY_SIZE(mfn_list)?
>>
>> The domain should not need to know. It should be told the maximum number of frames of the type it wants. If we have to carve that up into batches inside Xen then the caller should not need to care, right?
> 
> In the current implementation, we tell the guest how many frames it can 
> request in a batch. This number may be much smaller that the maximum 
> number of frames of the type.
> 
> Furthermore this value is not tie to the xmar.type. Therefore, it is 
> valid for a guest to call this hypercall only once at boot to figure out 
> the maximum batch.
> 
> So while the change you suggest looks a good idea, I don't think it is 
> possible to do that with the current hypercall.

Doesn't the limit simply change to UINT_MAX >> MEMOP_EXTENT_SHIFT,
which then is what should be reported?

>>>> @@ -1135,6 +1146,14 @@ static int acquire_resource(
>>>>            }
>>>>        }
>>>>
>>>> +    if ( !rc )
>>>> +    {
>>>> +        *start_extent += xmar.nr_frames;
>>>> +
>>>> +        if ( *start_extent != total_frames )
>>>> +            rc = -ERESTART;
>>>> +    }
>>>> +
>>>>     out:
>>>>        rcu_unlock_domain(d);
>>>>
>>>> @@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>
>>>>        case XENMEM_acquire_resource:
>>>>            rc = acquire_resource(
>>>> -            guest_handle_cast(arg, xen_mem_acquire_resource_t));
>>>> +            guest_handle_cast(arg, xen_mem_acquire_resource_t),
>>>> +            &start_extent);
>>>
>>> Hmmm... it looks like we forgot to check that start_extent is always 0
>>> when the hypercall was added.
>>>
>>> As this is exposed to the guest, it technically means that there no
>>> guarantee that start_extent will always be 0.
>>>
>>
>> I don't follow. A start extent != 0 means you are in a continuation. How can you check for 0 without breaking continuations?
> 
> I think you misundertood my point. My point is we never checked that 
> start_extent was 0. So a guest could validly pass a non-zero value to 
> start_extent and not break on older Xen release.
> 
> When this patch will be merged, such guest would behave differently. Or 
> did I miss any check/documentation for the start_extent value?

I think we may have done the same in the past already when enabling
sub-ops for use of continuations. A guest specifying a non-zero
start_extent itself is effectively a request for an undefined sub-op.
With, as a result, undefined behavior.

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-07-03 11:22         ` Jan Beulich
@ 2020-07-03 11:36           ` Julien Grall
  2020-07-03 12:50             ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-03 11:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: 'Stefano Stabellini', tamas.lengyel, 'Wei Liu',
	paul, 'Andrew Cooper',
	'Michał Leszczyński', 'Ian Jackson',
	'George Dunlap',
	luwei.kang, xen-devel

Hi,

On 03/07/2020 12:22, Jan Beulich wrote:
> On 03.07.2020 13:17, Julien Grall wrote:
>> On 03/07/2020 11:52, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 03 July 2020 11:36
>>>> To: Michał Leszczyński <michal.leszczynski@cert.pl>; xen-devel@lists.xenproject.org
>>>> Cc: luwei.kang@intel.com; tamas.lengyel@intel.com; Andrew Cooper <andrew.cooper3@citrix.com>; George
>>>> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
>>>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; paul@xen.org
>>>> Subject: Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
>>>>
>>>> (+ Paul as the author XENMEM_acquire_resource)
>>>>
>>>> Hi,
>>>>
>>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>>> From: Michal Leszczynski <michal.leszczynski@cert.pl>
>>>>>
>>>>> Allow to acquire large resources by allowing acquire_resource()
>>>>> to process items in batches, using hypercall continuation.
>>>>>
>>>>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
>>>>> ---
>>>>>     xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
>>>>>     1 file changed, 29 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>>>> index 714077c1e5..3ab06581a2 100644
>>>>> --- a/xen/common/memory.c
>>>>> +++ b/xen/common/memory.c
>>>>> @@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
>>>>>     }
>>>>>
>>>>>     static int acquire_resource(
>>>>> -    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
>>>>> +    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
>>>>> +    unsigned long *start_extent)
>>>>>     {
>>>>>         struct domain *d, *currd = current->domain;
>>>>>         xen_mem_acquire_resource_t xmar;
>>>>> +    uint32_t total_frames;
>>>>>         /*
>>>>>          * The mfn_list and gfn_list (below) arrays are ok on stack for the
>>>>>          * moment since they are small, but if they need to grow in future
>>>>> @@ -1077,8 +1079,17 @@ static int acquire_resource(
>>>>>             return 0;
>>>>>         }
>>>>>
>>>>> +    total_frames = xmar.nr_frames;
>>>>
>>>> On 32-bit, the start_extent would be 26-bits wide which is not enough to
>>>> cover all the xmar.nr_frames. Therefore, you want that check that it is
>>>> possible to encode a continuation. Something like:
>>>>
>>>> /* Is the size too large for us to encode a continuation? */
>>>> if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )
>>>>
>>>>> +
>>>>> +    if ( *start_extent ) > +    {
>>>>> +        xmar.frame += *start_extent;
>>>>> +        xmar.nr_frames -= *start_extent;
>>>>
>>>> As start_extent is exposed to the guest, you want to check if it is not
>>>> bigger than xmar.nr_frames.
>>>>
>>>>> +        guest_handle_add_offset(xmar.frame_list, *start_extent);
>>>>> +    }
>>>>> +
>>>>>         if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
>>>>> -        return -E2BIG;
>>>>> +        xmar.nr_frames = ARRAY_SIZE(mfn_list);
>>>>
>>>> The documentation of the hypercall suggests that if you pass NULL, then
>>>> it will return the maximum number value for nr_frames supported by the
>>>> implementation. So technically a domain cannot use more than
>>>> ARRAY_SIZE(mfn_list).
>>>>
>>>> However, you new addition conflict with the documentation. Can you
>>>> clarify how a domain will know that it can use more than
>>>> ARRAY_SIZE(mfn_list)?
>>>
>>> The domain should not need to know. It should be told the maximum number of frames of the type it wants. If we have to carve that up into batches inside Xen then the caller should not need to care, right?
>>
>> In the current implementation, we tell the guest how many frames it can
>> request in a batch. This number may be much smaller that the maximum
>> number of frames of the type.
>>
>> Furthermore this value is not tie to the xmar.type. Therefore, it is
>> valid for a guest to call this hypercall only once at boot to figure out
>> the maximum batch.
>>
>> So while the change you suggest looks a good idea, I don't think it is
>> possible to do that with the current hypercall.
> 
> Doesn't the limit simply change to UINT_MAX >> MEMOP_EXTENT_SHIFT,
> which then is what should be reported?

Hmmm... Can you remind me whether we support migration to an older release?

But it may stilln't be a concern as this can only be used by Dom0 or a 
PV domain targeting another domain.

> 
>>>>> @@ -1135,6 +1146,14 @@ static int acquire_resource(
>>>>>             }
>>>>>         }
>>>>>
>>>>> +    if ( !rc )
>>>>> +    {
>>>>> +        *start_extent += xmar.nr_frames;
>>>>> +
>>>>> +        if ( *start_extent != total_frames )
>>>>> +            rc = -ERESTART;
>>>>> +    }
>>>>> +
>>>>>      out:
>>>>>         rcu_unlock_domain(d);
>>>>>
>>>>> @@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>
>>>>>         case XENMEM_acquire_resource:
>>>>>             rc = acquire_resource(
>>>>> -            guest_handle_cast(arg, xen_mem_acquire_resource_t));
>>>>> +            guest_handle_cast(arg, xen_mem_acquire_resource_t),
>>>>> +            &start_extent);
>>>>
>>>> Hmmm... it looks like we forgot to check that start_extent is always 0
>>>> when the hypercall was added.
>>>>
>>>> As this is exposed to the guest, it technically means that there no
>>>> guarantee that start_extent will always be 0.
>>>>
>>>
>>> I don't follow. A start extent != 0 means you are in a continuation. How can you check for 0 without breaking continuations?
>>
>> I think you misundertood my point. My point is we never checked that
>> start_extent was 0. So a guest could validly pass a non-zero value to
>> start_extent and not break on older Xen release.
>>
>> When this patch will be merged, such guest would behave differently. Or
>> did I miss any check/documentation for the start_extent value?
> 
> I think we may have done the same in the past already when enabling
> sub-ops for use of continuations. A guest specifying a non-zero
> start_extent itself is effectively a request for an undefined sub-op.
> With, as a result, undefined behavior.
Ok. So just mentioning the change in the commit message should be fine then.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-07-03 11:17       ` Julien Grall
  2020-07-03 11:22         ` Jan Beulich
@ 2020-07-03 11:40         ` Paul Durrant
  1 sibling, 0 replies; 75+ messages in thread
From: Paul Durrant @ 2020-07-03 11:40 UTC (permalink / raw)
  To: 'Julien Grall', 'Michał Leszczyński',
	xen-devel
  Cc: 'Stefano Stabellini', tamas.lengyel, 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Jan Beulich',
	luwei.kang

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 03 July 2020 12:18
> To: paul@xen.org; 'Michał Leszczyński' <michal.leszczynski@cert.pl>; xen-devel@lists.xenproject.org
> Cc: luwei.kang@intel.com; tamas.lengyel@intel.com; 'Andrew Cooper' <andrew.cooper3@citrix.com>;
> 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Jan Beulich'
> <jbeulich@suse.com>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>
> Subject: Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
> 
> Hi,
> 
> On 03/07/2020 11:52, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 03 July 2020 11:36
> >> To: Michał Leszczyński <michal.leszczynski@cert.pl>; xen-devel@lists.xenproject.org
> >> Cc: luwei.kang@intel.com; tamas.lengyel@intel.com; Andrew Cooper <andrew.cooper3@citrix.com>;
> George
> >> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
> >> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>;
> paul@xen.org
> >> Subject: Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
> >>
> >> (+ Paul as the author XENMEM_acquire_resource)
> >>
> >> Hi,
> >>
> >> On 30/06/2020 13:33, Michał Leszczyński wrote:
> >>> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> >>>
> >>> Allow to acquire large resources by allowing acquire_resource()
> >>> to process items in batches, using hypercall continuation.
> >>>
> >>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> >>> ---
> >>>    xen/common/memory.c | 32 +++++++++++++++++++++++++++++---
> >>>    1 file changed, 29 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/xen/common/memory.c b/xen/common/memory.c
> >>> index 714077c1e5..3ab06581a2 100644
> >>> --- a/xen/common/memory.c
> >>> +++ b/xen/common/memory.c
> >>> @@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
> >>>    }
> >>>
> >>>    static int acquire_resource(
> >>> -    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
> >>> +    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
> >>> +    unsigned long *start_extent)
> >>>    {
> >>>        struct domain *d, *currd = current->domain;
> >>>        xen_mem_acquire_resource_t xmar;
> >>> +    uint32_t total_frames;
> >>>        /*
> >>>         * The mfn_list and gfn_list (below) arrays are ok on stack for the
> >>>         * moment since they are small, but if they need to grow in future
> >>> @@ -1077,8 +1079,17 @@ static int acquire_resource(
> >>>            return 0;
> >>>        }
> >>>
> >>> +    total_frames = xmar.nr_frames;
> >>
> >> On 32-bit, the start_extent would be 26-bits wide which is not enough to
> >> cover all the xmar.nr_frames. Therefore, you want that check that it is
> >> possible to encode a continuation. Something like:
> >>
> >> /* Is the size too large for us to encode a continuation? */
> >> if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )
> >>
> >>> +
> >>> +    if ( *start_extent ) > +    {
> >>> +        xmar.frame += *start_extent;
> >>> +        xmar.nr_frames -= *start_extent;
> >>
> >> As start_extent is exposed to the guest, you want to check if it is not
> >> bigger than xmar.nr_frames.
> >>
> >>> +        guest_handle_add_offset(xmar.frame_list, *start_extent);
> >>> +    }
> >>> +
> >>>        if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
> >>> -        return -E2BIG;
> >>> +        xmar.nr_frames = ARRAY_SIZE(mfn_list);
> >>
> >> The documentation of the hypercall suggests that if you pass NULL, then
> >> it will return the maximum number value for nr_frames supported by the
> >> implementation. So technically a domain cannot use more than
> >> ARRAY_SIZE(mfn_list).
> >>
> >> However, you new addition conflict with the documentation. Can you
> >> clarify how a domain will know that it can use more than
> >> ARRAY_SIZE(mfn_list)?
> >
> > The domain should not need to know. It should be told the maximum number of frames of the type it
> wants. If we have to carve that up into batches inside Xen then the caller should not need to care,
> right?
> 
> In the current implementation, we tell the guest how many frames it can
> request in a batch. This number may be much smaller that the maximum
> number of frames of the type.
> 
> Furthermore this value is not tie to the xmar.type. Therefore, it is
> valid for a guest to call this hypercall only once at boot to figure out
> the maximum batch.
> 
> So while the change you suggest looks a good idea, I don't think it is
> possible to do that with the current hypercall.
> 

Oh, I was clearly misremembering what the semantic was; I thought it was implementation max for the given type but indeed we do just return the array size, so we expect the caller to know the individual resource type limitations.
So, as Jan says, passing back UINT_MAX >> MEMOP_EXTENT_SHIFT seems to be what we need.

  Paul



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 06/10] memory: batch processing in acquire_resource()
  2020-07-03 11:36           ` Julien Grall
@ 2020-07-03 12:50             ` Jan Beulich
  0 siblings, 0 replies; 75+ messages in thread
From: Jan Beulich @ 2020-07-03 12:50 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Stefano Stabellini', tamas.lengyel, 'Wei Liu',
	paul, 'Andrew Cooper',
	'Michał Leszczyński', 'Ian Jackson',
	'George Dunlap',
	luwei.kang, xen-devel

On 03.07.2020 13:36, Julien Grall wrote:
> On 03/07/2020 12:22, Jan Beulich wrote:
>> On 03.07.2020 13:17, Julien Grall wrote:
>>> In the current implementation, we tell the guest how many frames it can
>>> request in a batch. This number may be much smaller that the maximum
>>> number of frames of the type.
>>>
>>> Furthermore this value is not tie to the xmar.type. Therefore, it is
>>> valid for a guest to call this hypercall only once at boot to figure out
>>> the maximum batch.
>>>
>>> So while the change you suggest looks a good idea, I don't think it is
>>> possible to do that with the current hypercall.
>>
>> Doesn't the limit simply change to UINT_MAX >> MEMOP_EXTENT_SHIFT,
>> which then is what should be reported?
> 
> Hmmm... Can you remind me whether we support migration to an older release?

I'm pretty sure we say "N -> N+1 only" somewhere, but this "somewhere"
clearly isn't SUPPORT.md.

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-03 10:11           ` Roger Pau Monné
@ 2020-07-04 17:23             ` Julien Grall
  2020-07-06  8:46               ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-04 17:23 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: Stefano Stabellini, tamas lengyel, Wei Liu, Andrew Cooper,
	Michał Leszczyński, Ian Jackson, George Dunlap,
	luwei kang, Anthony PERARD, xen-devel

Hi,

On 03/07/2020 11:11, Roger Pau Monné wrote:
> On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
>> On 03.07.2020 11:44, Roger Pau Monné wrote:
>>> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>>>> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>>
>>>>> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>>>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>>>> index 59bdc28c89..7b8289d436 100644
>>>>>> --- a/xen/include/public/domctl.h
>>>>>> +++ b/xen/include/public/domctl.h
>>>>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>>>>>       uint32_t max_evtchn_port;
>>>>>>       int32_t max_grant_frames;
>>>>>>       int32_t max_maptrack_frames;
>>>>>> +    uint8_t vmtrace_pt_order;
>>>>>
>>>>> I've been thinking about this, and even though this is a domctl (so
>>>>> not a stable interface) we might want to consider using a size (or a
>>>>> number of pages) here rather than an order. IPT also supports
>>>>> TOPA mode (kind of a linked list of buffers) that would allow for
>>>>> sizes not rounded to order boundaries to be used, since then only each
>>>>> item in the linked list needs to be rounded to an order boundary, so
>>>>> you could for example use three 4K pages in TOPA mode AFAICT.
>>>>>
>>>>> Roger.
>>>>
>>>> In previous versions it was "size" but it was requested to change it
>>>> to "order" in order to shrink the variable size from uint64_t to
>>>> uint8_t, because there is limited space for xen_domctl_createdomain
>>>> structure.
>>>
>>> It's likely I'm missing something here, but I wasn't aware
>>> xen_domctl_createdomain had any constrains regarding it's size. It's
>>> currently 48bytes which seems fairly small.
>>
>> Additionally I would guess a uint32_t could do here, if the value
>> passed was "number of pages" rather than "number of bytes"?
Looking at the rest of the code, the toolstack accepts a 64-bit value. 
So this would lead to truncation of the buffer if it is bigger than 2^44 
bytes.

I agree such buffer is unlikely, yet I still think we want to harden the 
code whenever we can. So the solution is to either prevent check 
truncation in libxl or directly use 64-bit in the domctl.

My preference is the latter.

> 
> That could work, not sure if it needs to state however that those will
> be 4K pages, since Arm can have a different minimum page size IIRC?
> (or that's already the assumption for all number of frames fields)
> vmtrace_nr_frames seems fine to me.

The hypercalls interface is using the same page granularity as the 
hypervisor (i.e 4KB).

While we already support guest using 64KB page granularity, it is 
impossible to have a 64KB Arm hypervisor in the current state. You are 
going to either break existing guest (if you switch to 64KB page 
granularity for the hypercall ABI) or render them insecure (the mimimum 
mapping in the P2M would be 64KB).

DOMCTLs are not stable yet, so using a number of pages is OK. However, I 
would strongly suggest to use a number of bytes for any xl/libxl/stable 
libraries interfaces as this avoids confusion and also make more 
futureproof.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-06-30 12:33 ` [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
                     ` (2 preceding siblings ...)
  2020-07-02 10:24   ` Anthony PERARD
@ 2020-07-04 17:48   ` Julien Grall
  3 siblings, 0 replies; 75+ messages in thread
From: Julien Grall @ 2020-07-04 17:48 UTC (permalink / raw)
  To: Michał Leszczyński, xen-devel
  Cc: Stefano Stabellini, tamas.lengyel, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Anthony PERARD,
	luwei.kang

Hi,

On 30/06/2020 13:33, Michał Leszczyński wrote:
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 71709dc585..891e8e28d6 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -438,6 +438,14 @@
>    */
>   #define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
>   
> +/*
> + * LIBXL_HAVE_VMTRACE_PT_ORDER indicates that
> + * libxl_domain_create_info has a vmtrace_pt_order parameter, which
> + * allows to enable pre-allocation of processor tracing buffers
> + * with the given order of size.
> + */
> +#define LIBXL_HAVE_VMTRACE_PT_ORDER 1
> +
>   /*
>    * libxl ABI compatibility
>    *
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 75862dc6ed..651d1f4c0f 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -608,6 +608,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
>               .max_evtchn_port = b_info->event_channels,
>               .max_grant_frames = b_info->max_grant_frames,
>               .max_maptrack_frames = b_info->max_maptrack_frames,
> +            .vmtrace_pt_order = b_info->vmtrace_pt_order,
>           };
>   
>           if (info->type != LIBXL_DOMAIN_TYPE_PV) {
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 9d3f05f399..1c5dd43e4d 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
>       # supported by x86 HVM and ARM support is planned.
>       ("altp2m", libxl_altp2m_mode),
>   
> +    ("vmtrace_pt_order", integer),

libxl can be used by external projects (such libvirt) for implementing 
their own toolstack.

While on x86 you always have the same granularity, on Arm the hypervisor 
and each guest may have a different page granularity (e.g 4KB, 16KB, 
64KB). So it is unclear what order one would have to use.

I think it would be best if the external user only specify the number of 
bytes. You can then sanity check the value and convert to an order (or 
number of pages) in libxl before passing the value to the hypervisor.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature
  2020-07-03  7:58                                         ` Julien Grall
@ 2020-07-04 19:16                                           ` Michał Leszczyński
  0 siblings, 0 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-07-04 19:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Stefano Stabellini, tamas lengyel, Jan Beulich,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	xen-devel, luwei kang, Roger Pau Monné

----- 3 lip 2020 o 9:58, Julien Grall julien@xen.org napisał(a):

> Hi,
> 
> On 02/07/2020 21:28, Michał Leszczyński wrote:
>> ----- 2 lip 2020 o 16:31, Julien Grall julien@xen.org napisał(a):
>> 
>>> On 02/07/2020 15:17, Jan Beulich wrote:
>>>> On 02.07.2020 16:14, Julien Grall wrote:
>>>>> On 02/07/2020 14:30, Jan Beulich wrote:
>>>>>> On 02.07.2020 11:57, Julien Grall wrote:
>>>>>>> On 02/07/2020 10:18, Jan Beulich wrote:
>>>>>>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>>>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>>>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>>>> Another way to do it, would be the toolstack to do the mapping. At which
>>>>>>> point, you still need an hypercall to do the mapping (probably the
>>>>>>> hypercall acquire).
>>>>>>
>>>>>> There may not be any mapping to do in such a contrived, fixed-range
>>>>>> environment. This scenario was specifically to demonstrate that the
>>>>>> way the mapping gets done may be arch-specific (here: a no-op)
>>>>>> despite the allocation not being so.
>>>>> You are arguing on extreme cases which I don't think is really helpful
>>>>> here. Yes if you want to map at a fixed address in a guest you may not
>>>>> need the acquire hypercall. But in most of the other cases (see has for
>>>>> the tools) you will need it.
>>>>>
>>>>> So what's the problem with requesting to have the acquire hypercall
>>>>> implemented in common code?
>>>>
>>>> Didn't we start out by you asking that there be as little common code
>>>> as possible for the time being?
>>>
>>> Well as I said I am not in favor of having the allocation in common
>>> code, but if you want to keep it then you also want to implement
>>> map/unmap in the common code ([1], [2]).
>>>
>>>> I have no issue with putting the
>>>> acquire implementation there ...
>>> This was definitely not clear given how you argued with extreme cases...
>>>
>>> Cheers,
>>>
>>> [1] <9a3f4d58-e5ad-c7a1-6c5f-42aa92101ca1@xen.org>
>>> [2] <cf41855b-9e5e-13f2-9ab0-04b98f8b3cdd@xen.org>
>>>
>>> --
>>> Julien Grall
>> 
>> 
>> Guys,
>> 
>> could you express your final decision on this topic?
> 
> Can you move the acquire implementation from x86 to common code?
> 
> Cheers,
> 
> --
> Julien Grall


Ok, sure. This will be done within the patch v5.

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-04 17:23             ` Julien Grall
@ 2020-07-06  8:46               ` Jan Beulich
  2020-07-07  8:44                 ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-06  8:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, tamas lengyel, Wei Liu, Andrew Cooper,
	Michał Leszczyński, Ian Jackson, George Dunlap,
	luwei kang, Anthony PERARD, xen-devel, Roger Pau Monné

On 04.07.2020 19:23, Julien Grall wrote:
> Hi,
> 
> On 03/07/2020 11:11, Roger Pau Monné wrote:
>> On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
>>> On 03.07.2020 11:44, Roger Pau Monné wrote:
>>>> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>>>>> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>>>
>>>>>> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>>>>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>>>>> index 59bdc28c89..7b8289d436 100644
>>>>>>> --- a/xen/include/public/domctl.h
>>>>>>> +++ b/xen/include/public/domctl.h
>>>>>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>>>>>>       uint32_t max_evtchn_port;
>>>>>>>       int32_t max_grant_frames;
>>>>>>>       int32_t max_maptrack_frames;
>>>>>>> +    uint8_t vmtrace_pt_order;
>>>>>>
>>>>>> I've been thinking about this, and even though this is a domctl (so
>>>>>> not a stable interface) we might want to consider using a size (or a
>>>>>> number of pages) here rather than an order. IPT also supports
>>>>>> TOPA mode (kind of a linked list of buffers) that would allow for
>>>>>> sizes not rounded to order boundaries to be used, since then only each
>>>>>> item in the linked list needs to be rounded to an order boundary, so
>>>>>> you could for example use three 4K pages in TOPA mode AFAICT.
>>>>>>
>>>>>> Roger.
>>>>>
>>>>> In previous versions it was "size" but it was requested to change it
>>>>> to "order" in order to shrink the variable size from uint64_t to
>>>>> uint8_t, because there is limited space for xen_domctl_createdomain
>>>>> structure.
>>>>
>>>> It's likely I'm missing something here, but I wasn't aware
>>>> xen_domctl_createdomain had any constrains regarding it's size. It's
>>>> currently 48bytes which seems fairly small.
>>>
>>> Additionally I would guess a uint32_t could do here, if the value
>>> passed was "number of pages" rather than "number of bytes"?
> Looking at the rest of the code, the toolstack accepts a 64-bit value. 
> So this would lead to truncation of the buffer if it is bigger than 2^44 
> bytes.
> 
> I agree such buffer is unlikely, yet I still think we want to harden the 
> code whenever we can. So the solution is to either prevent check 
> truncation in libxl or directly use 64-bit in the domctl.
> 
> My preference is the latter.
> 
>>
>> That could work, not sure if it needs to state however that those will
>> be 4K pages, since Arm can have a different minimum page size IIRC?
>> (or that's already the assumption for all number of frames fields)
>> vmtrace_nr_frames seems fine to me.
> 
> The hypercalls interface is using the same page granularity as the 
> hypervisor (i.e 4KB).
> 
> While we already support guest using 64KB page granularity, it is 
> impossible to have a 64KB Arm hypervisor in the current state. You are 
> going to either break existing guest (if you switch to 64KB page 
> granularity for the hypercall ABI) or render them insecure (the mimimum 
> mapping in the P2M would be 64KB).
> 
> DOMCTLs are not stable yet, so using a number of pages is OK. However, I 
> would strongly suggest to use a number of bytes for any xl/libxl/stable 
> libraries interfaces as this avoids confusion and also make more 
> futureproof.

If we can't settle on what "page size" means in the public interface
(which imo is embarrassing), then how about going with number of kb,
like other memory libxl controls do? (I guess using Mb, in line with
other config file controls, may end up being too coarse here.) This
would likely still allow for a 32-bit field to be wide enough.

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-06  8:46               ` Jan Beulich
@ 2020-07-07  8:44                 ` Julien Grall
  2020-07-07  9:10                   ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-07  8:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, tamas lengyel, Wei Liu, Andrew Cooper,
	Michał Leszczyński, Ian Jackson, George Dunlap,
	luwei kang, Anthony PERARD, xen-devel, Roger Pau Monné

Hi,

On 06/07/2020 09:46, Jan Beulich wrote:
> On 04.07.2020 19:23, Julien Grall wrote:
>> Hi,
>>
>> On 03/07/2020 11:11, Roger Pau Monné wrote:
>>> On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
>>>> On 03.07.2020 11:44, Roger Pau Monné wrote:
>>>>> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>>>>>> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>>>>
>>>>>>> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>>>>>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>>>>>> index 59bdc28c89..7b8289d436 100644
>>>>>>>> --- a/xen/include/public/domctl.h
>>>>>>>> +++ b/xen/include/public/domctl.h
>>>>>>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>>>>>>>        uint32_t max_evtchn_port;
>>>>>>>>        int32_t max_grant_frames;
>>>>>>>>        int32_t max_maptrack_frames;
>>>>>>>> +    uint8_t vmtrace_pt_order;
>>>>>>>
>>>>>>> I've been thinking about this, and even though this is a domctl (so
>>>>>>> not a stable interface) we might want to consider using a size (or a
>>>>>>> number of pages) here rather than an order. IPT also supports
>>>>>>> TOPA mode (kind of a linked list of buffers) that would allow for
>>>>>>> sizes not rounded to order boundaries to be used, since then only each
>>>>>>> item in the linked list needs to be rounded to an order boundary, so
>>>>>>> you could for example use three 4K pages in TOPA mode AFAICT.
>>>>>>>
>>>>>>> Roger.
>>>>>>
>>>>>> In previous versions it was "size" but it was requested to change it
>>>>>> to "order" in order to shrink the variable size from uint64_t to
>>>>>> uint8_t, because there is limited space for xen_domctl_createdomain
>>>>>> structure.
>>>>>
>>>>> It's likely I'm missing something here, but I wasn't aware
>>>>> xen_domctl_createdomain had any constrains regarding it's size. It's
>>>>> currently 48bytes which seems fairly small.
>>>>
>>>> Additionally I would guess a uint32_t could do here, if the value
>>>> passed was "number of pages" rather than "number of bytes"?
>> Looking at the rest of the code, the toolstack accepts a 64-bit value.
>> So this would lead to truncation of the buffer if it is bigger than 2^44
>> bytes.
>>
>> I agree such buffer is unlikely, yet I still think we want to harden the
>> code whenever we can. So the solution is to either prevent check
>> truncation in libxl or directly use 64-bit in the domctl.
>>
>> My preference is the latter.
>>
>>>
>>> That could work, not sure if it needs to state however that those will
>>> be 4K pages, since Arm can have a different minimum page size IIRC?
>>> (or that's already the assumption for all number of frames fields)
>>> vmtrace_nr_frames seems fine to me.
>>
>> The hypercalls interface is using the same page granularity as the
>> hypervisor (i.e 4KB).
>>
>> While we already support guest using 64KB page granularity, it is
>> impossible to have a 64KB Arm hypervisor in the current state. You are
>> going to either break existing guest (if you switch to 64KB page
>> granularity for the hypercall ABI) or render them insecure (the mimimum
>> mapping in the P2M would be 64KB).
>>
>> DOMCTLs are not stable yet, so using a number of pages is OK. However, I
>> would strongly suggest to use a number of bytes for any xl/libxl/stable
>> libraries interfaces as this avoids confusion and also make more
>> futureproof.
> 
> If we can't settle on what "page size" means in the public interface
> (which imo is embarrassing), then how about going with number of kb,
> like other memory libxl controls do? (I guess using Mb, in line with
> other config file controls, may end up being too coarse here.) This
> would likely still allow for a 32-bit field to be wide enough.

A 32-bit field would definitely not be able to cover a full address 
space. So do you mind to explain what is the upper bound you expect here?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-07  8:44                 ` Julien Grall
@ 2020-07-07  9:10                   ` Jan Beulich
  2020-07-07  9:16                     ` Julien Grall
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-07  9:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, tamas lengyel, Wei Liu, Andrew Cooper,
	Michał Leszczyński, Ian Jackson, George Dunlap,
	luwei kang, Anthony PERARD, xen-devel, Roger Pau Monné

On 07.07.2020 10:44, Julien Grall wrote:
> Hi,
> 
> On 06/07/2020 09:46, Jan Beulich wrote:
>> On 04.07.2020 19:23, Julien Grall wrote:
>>> Hi,
>>>
>>> On 03/07/2020 11:11, Roger Pau Monné wrote:
>>>> On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
>>>>> On 03.07.2020 11:44, Roger Pau Monné wrote:
>>>>>> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>>>>>>> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>>>>>
>>>>>>>> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>>>>>>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>>>>>>> index 59bdc28c89..7b8289d436 100644
>>>>>>>>> --- a/xen/include/public/domctl.h
>>>>>>>>> +++ b/xen/include/public/domctl.h
>>>>>>>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>>>>>>>>        uint32_t max_evtchn_port;
>>>>>>>>>        int32_t max_grant_frames;
>>>>>>>>>        int32_t max_maptrack_frames;
>>>>>>>>> +    uint8_t vmtrace_pt_order;
>>>>>>>>
>>>>>>>> I've been thinking about this, and even though this is a domctl (so
>>>>>>>> not a stable interface) we might want to consider using a size (or a
>>>>>>>> number of pages) here rather than an order. IPT also supports
>>>>>>>> TOPA mode (kind of a linked list of buffers) that would allow for
>>>>>>>> sizes not rounded to order boundaries to be used, since then only each
>>>>>>>> item in the linked list needs to be rounded to an order boundary, so
>>>>>>>> you could for example use three 4K pages in TOPA mode AFAICT.
>>>>>>>>
>>>>>>>> Roger.
>>>>>>>
>>>>>>> In previous versions it was "size" but it was requested to change it
>>>>>>> to "order" in order to shrink the variable size from uint64_t to
>>>>>>> uint8_t, because there is limited space for xen_domctl_createdomain
>>>>>>> structure.
>>>>>>
>>>>>> It's likely I'm missing something here, but I wasn't aware
>>>>>> xen_domctl_createdomain had any constrains regarding it's size. It's
>>>>>> currently 48bytes which seems fairly small.
>>>>>
>>>>> Additionally I would guess a uint32_t could do here, if the value
>>>>> passed was "number of pages" rather than "number of bytes"?
>>> Looking at the rest of the code, the toolstack accepts a 64-bit value.
>>> So this would lead to truncation of the buffer if it is bigger than 2^44
>>> bytes.
>>>
>>> I agree such buffer is unlikely, yet I still think we want to harden the
>>> code whenever we can. So the solution is to either prevent check
>>> truncation in libxl or directly use 64-bit in the domctl.
>>>
>>> My preference is the latter.
>>>
>>>>
>>>> That could work, not sure if it needs to state however that those will
>>>> be 4K pages, since Arm can have a different minimum page size IIRC?
>>>> (or that's already the assumption for all number of frames fields)
>>>> vmtrace_nr_frames seems fine to me.
>>>
>>> The hypercalls interface is using the same page granularity as the
>>> hypervisor (i.e 4KB).
>>>
>>> While we already support guest using 64KB page granularity, it is
>>> impossible to have a 64KB Arm hypervisor in the current state. You are
>>> going to either break existing guest (if you switch to 64KB page
>>> granularity for the hypercall ABI) or render them insecure (the mimimum
>>> mapping in the P2M would be 64KB).
>>>
>>> DOMCTLs are not stable yet, so using a number of pages is OK. However, I
>>> would strongly suggest to use a number of bytes for any xl/libxl/stable
>>> libraries interfaces as this avoids confusion and also make more
>>> futureproof.
>>
>> If we can't settle on what "page size" means in the public interface
>> (which imo is embarrassing), then how about going with number of kb,
>> like other memory libxl controls do? (I guess using Mb, in line with
>> other config file controls, may end up being too coarse here.) This
>> would likely still allow for a 32-bit field to be wide enough.
> 
> A 32-bit field would definitely not be able to cover a full address 
> space. So do you mind to explain what is the upper bound you expect here?

Do you foresee a need for buffer sizes of 4Tb and up?

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-07  9:10                   ` Jan Beulich
@ 2020-07-07  9:16                     ` Julien Grall
  2020-07-07 11:17                       ` Michał Leszczyński
  0 siblings, 1 reply; 75+ messages in thread
From: Julien Grall @ 2020-07-07  9:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, tamas lengyel, Wei Liu, Andrew Cooper,
	Michał Leszczyński, Ian Jackson, George Dunlap,
	luwei kang, Anthony PERARD, xen-devel, Roger Pau Monné



On 07/07/2020 10:10, Jan Beulich wrote:
> On 07.07.2020 10:44, Julien Grall wrote:
>> Hi,
>>
>> On 06/07/2020 09:46, Jan Beulich wrote:
>>> On 04.07.2020 19:23, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 03/07/2020 11:11, Roger Pau Monné wrote:
>>>>> On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
>>>>>> On 03.07.2020 11:44, Roger Pau Monné wrote:
>>>>>>> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>>>>>>>> ----- 2 lip 2020 o 11:00, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>>>>>>
>>>>>>>>> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>>>>>>>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>>>>>>>> index 59bdc28c89..7b8289d436 100644
>>>>>>>>>> --- a/xen/include/public/domctl.h
>>>>>>>>>> +++ b/xen/include/public/domctl.h
>>>>>>>>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>>>>>>>>>         uint32_t max_evtchn_port;
>>>>>>>>>>         int32_t max_grant_frames;
>>>>>>>>>>         int32_t max_maptrack_frames;
>>>>>>>>>> +    uint8_t vmtrace_pt_order;
>>>>>>>>>
>>>>>>>>> I've been thinking about this, and even though this is a domctl (so
>>>>>>>>> not a stable interface) we might want to consider using a size (or a
>>>>>>>>> number of pages) here rather than an order. IPT also supports
>>>>>>>>> TOPA mode (kind of a linked list of buffers) that would allow for
>>>>>>>>> sizes not rounded to order boundaries to be used, since then only each
>>>>>>>>> item in the linked list needs to be rounded to an order boundary, so
>>>>>>>>> you could for example use three 4K pages in TOPA mode AFAICT.
>>>>>>>>>
>>>>>>>>> Roger.
>>>>>>>>
>>>>>>>> In previous versions it was "size" but it was requested to change it
>>>>>>>> to "order" in order to shrink the variable size from uint64_t to
>>>>>>>> uint8_t, because there is limited space for xen_domctl_createdomain
>>>>>>>> structure.
>>>>>>>
>>>>>>> It's likely I'm missing something here, but I wasn't aware
>>>>>>> xen_domctl_createdomain had any constrains regarding it's size. It's
>>>>>>> currently 48bytes which seems fairly small.
>>>>>>
>>>>>> Additionally I would guess a uint32_t could do here, if the value
>>>>>> passed was "number of pages" rather than "number of bytes"?
>>>> Looking at the rest of the code, the toolstack accepts a 64-bit value.
>>>> So this would lead to truncation of the buffer if it is bigger than 2^44
>>>> bytes.
>>>>
>>>> I agree such buffer is unlikely, yet I still think we want to harden the
>>>> code whenever we can. So the solution is to either prevent check
>>>> truncation in libxl or directly use 64-bit in the domctl.
>>>>
>>>> My preference is the latter.
>>>>
>>>>>
>>>>> That could work, not sure if it needs to state however that those will
>>>>> be 4K pages, since Arm can have a different minimum page size IIRC?
>>>>> (or that's already the assumption for all number of frames fields)
>>>>> vmtrace_nr_frames seems fine to me.
>>>>
>>>> The hypercalls interface is using the same page granularity as the
>>>> hypervisor (i.e 4KB).
>>>>
>>>> While we already support guest using 64KB page granularity, it is
>>>> impossible to have a 64KB Arm hypervisor in the current state. You are
>>>> going to either break existing guest (if you switch to 64KB page
>>>> granularity for the hypercall ABI) or render them insecure (the mimimum
>>>> mapping in the P2M would be 64KB).
>>>>
>>>> DOMCTLs are not stable yet, so using a number of pages is OK. However, I
>>>> would strongly suggest to use a number of bytes for any xl/libxl/stable
>>>> libraries interfaces as this avoids confusion and also make more
>>>> futureproof.
>>>
>>> If we can't settle on what "page size" means in the public interface
>>> (which imo is embarrassing), then how about going with number of kb,
>>> like other memory libxl controls do? (I guess using Mb, in line with
>>> other config file controls, may end up being too coarse here.) This
>>> would likely still allow for a 32-bit field to be wide enough.
>>
>> A 32-bit field would definitely not be able to cover a full address
>> space. So do you mind to explain what is the upper bound you expect here?
> 
> Do you foresee a need for buffer sizes of 4Tb and up?

Not I am aware of... However, I think the question was worth it given 
that "wide enough" can mean anything.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-07  9:16                     ` Julien Grall
@ 2020-07-07 11:17                       ` Michał Leszczyński
  2020-07-07 11:21                         ` Jan Beulich
  0 siblings, 1 reply; 75+ messages in thread
From: Michał Leszczyński @ 2020-07-07 11:17 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, tamas lengyel, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, luwei kang, Jan Beulich,
	Anthony PERARD, xen-devel, Roger Pau Monné

----- 7 lip 2020 o 11:16, Julien Grall julien@xen.org napisał(a):

> On 07/07/2020 10:10, Jan Beulich wrote:
>> On 07.07.2020 10:44, Julien Grall wrote:
>>> Hi,
>>>
>>> On 06/07/2020 09:46, Jan Beulich wrote:
>>>> On 04.07.2020 19:23, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 03/07/2020 11:11, Roger Pau Monné wrote:
>>>>>> On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
>>>>>>> On 03.07.2020 11:44, Roger Pau Monné wrote:
>>>>>>>> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>>>>>>>>> In previous versions it was "size" but it was requested to change it
>>>>>>>>> to "order" in order to shrink the variable size from uint64_t to
>>>>>>>>> uint8_t, because there is limited space for xen_domctl_createdomain
>>>>>>>>> structure.
>>>>>>>>
>>>>>>>> It's likely I'm missing something here, but I wasn't aware
>>>>>>>> xen_domctl_createdomain had any constrains regarding it's size. It's
>>>>>>>> currently 48bytes which seems fairly small.
>>>>>>>
>>>>>>> Additionally I would guess a uint32_t could do here, if the value
>>>>>>> passed was "number of pages" rather than "number of bytes"?
>>>>> Looking at the rest of the code, the toolstack accepts a 64-bit value.
>>>>> So this would lead to truncation of the buffer if it is bigger than 2^44
>>>>> bytes.
>>>>>
>>>>> I agree such buffer is unlikely, yet I still think we want to harden the
>>>>> code whenever we can. So the solution is to either prevent check
>>>>> truncation in libxl or directly use 64-bit in the domctl.
>>>>>
>>>>> My preference is the latter.
>>>>>
>>>>>>
>>>>>> That could work, not sure if it needs to state however that those will
>>>>>> be 4K pages, since Arm can have a different minimum page size IIRC?
>>>>>> (or that's already the assumption for all number of frames fields)
>>>>>> vmtrace_nr_frames seems fine to me.
>>>>>
>>>>> The hypercalls interface is using the same page granularity as the
>>>>> hypervisor (i.e 4KB).
>>>>>
>>>>> While we already support guest using 64KB page granularity, it is
>>>>> impossible to have a 64KB Arm hypervisor in the current state. You are
>>>>> going to either break existing guest (if you switch to 64KB page
>>>>> granularity for the hypercall ABI) or render them insecure (the mimimum
>>>>> mapping in the P2M would be 64KB).
>>>>>
>>>>> DOMCTLs are not stable yet, so using a number of pages is OK. However, I
>>>>> would strongly suggest to use a number of bytes for any xl/libxl/stable
>>>>> libraries interfaces as this avoids confusion and also make more
>>>>> futureproof.
>>>>
>>>> If we can't settle on what "page size" means in the public interface
>>>> (which imo is embarrassing), then how about going with number of kb,
>>>> like other memory libxl controls do? (I guess using Mb, in line with
>>>> other config file controls, may end up being too coarse here.) This
>>>> would likely still allow for a 32-bit field to be wide enough.
>>>
>>> A 32-bit field would definitely not be able to cover a full address
>>> space. So do you mind to explain what is the upper bound you expect here?
>> 
>> Do you foresee a need for buffer sizes of 4Tb and up?
> 
> Not I am aware of... However, I think the question was worth it given
> that "wide enough" can mean anything.
> 
> Cheers,
> 
> --
> Julien Grall


So would it be OK to use uint32_t everywhere and to store the trace buffer
size as number of kB? I think this is the most straightforward option.

I would also stick with the name "processor_trace_buf_size"
everywhere, both in the hypervisor, ABI and the toolstack, with the
respective comments that the size is in kB.


Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-07 11:17                       ` Michał Leszczyński
@ 2020-07-07 11:21                         ` Jan Beulich
  2020-07-07 11:35                           ` Michał Leszczyński
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Beulich @ 2020-07-07 11:21 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: Julien Grall, Stefano Stabellini, tamas lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei kang,
	Anthony PERARD, xen-devel, Roger Pau Monné

On 07.07.2020 13:17, Michał Leszczyński wrote:
> So would it be OK to use uint32_t everywhere and to store the trace buffer
> size as number of kB? I think this is the most straightforward option.
> 
> I would also stick with the name "processor_trace_buf_size"
> everywhere, both in the hypervisor, ABI and the toolstack, with the
> respective comments that the size is in kB.

Perhaps even more clearly "processor_trace_buf_kb" then?

Jan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter
  2020-07-07 11:21                         ` Jan Beulich
@ 2020-07-07 11:35                           ` Michał Leszczyński
  0 siblings, 0 replies; 75+ messages in thread
From: Michał Leszczyński @ 2020-07-07 11:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Stefano Stabellini, tamas lengyel, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, luwei kang,
	Anthony PERARD, xen-devel, Roger Pau Monné

----- 7 lip 2020 o 13:21, Jan Beulich jbeulich@suse.com napisał(a):

> On 07.07.2020 13:17, Michał Leszczyński wrote:
>> So would it be OK to use uint32_t everywhere and to store the trace buffer
>> size as number of kB? I think this is the most straightforward option.
>> 
>> I would also stick with the name "processor_trace_buf_size"
>> everywhere, both in the hypervisor, ABI and the toolstack, with the
>> respective comments that the size is in kB.
> 
> Perhaps even more clearly "processor_trace_buf_kb" then?
> 
> Jan


Ok.

Best regards,
Michał Leszczyński
CERT Polska


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 10/10] tools/proctrace: add proctrace tool
  2020-07-02 15:10   ` Andrew Cooper
@ 2020-07-21 10:52     ` Wei Liu
  0 siblings, 0 replies; 75+ messages in thread
From: Wei Liu @ 2020-07-21 10:52 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: tamas.lengyel, Wei Liu, Michał Leszczyński,
	Ian Jackson, luwei.kang, xen-devel

On Thu, Jul 02, 2020 at 04:10:57PM +0100, Andrew Cooper wrote:
[...]
> 
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <sys/mman.h>
> > +#include <signal.h>
> > +
> > +#include <xenctrl.h>
> > +#include <xen/xen.h>
> > +#include <xenforeignmemory.h>
> > +
> > +#define BUF_SIZE (16384 * XC_PAGE_SIZE)
> 
> This hardcodes the size of the buffer which is configurable per VM. 
> Mapping the buffer fails when it is smaller than this.
> 
> It appears there is still outstanding bug from the acquire_resource work
> which never got fixed.  The guest_handle_is_null(xmar.frame_list) path
> in Xen is supposed to report the size of the resource, not the size of
> Xen's local buffer, so userspace can ask "how large is this resource".
> 
> I'll try and find some time to fix this and arrange for backports, but
> the current behaviour is nonsense, and problematic for new users.

I can't quite figure out if this is a blocking comment of this tool to
be accepted. Can you clarify?

Wei.

> 
> ~Andrew


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/10] tools/libxc: add xc_vmtrace_* functions
  2020-06-30 12:33 ` [PATCH v4 09/10] tools/libxc: add xc_vmtrace_* functions Michał Leszczyński
@ 2020-07-21 10:52   ` Wei Liu
  0 siblings, 0 replies; 75+ messages in thread
From: Wei Liu @ 2020-07-21 10:52 UTC (permalink / raw)
  To: Michał Leszczyński
  Cc: xen-devel, tamas.lengyel, luwei.kang, Ian Jackson, Wei Liu

On Tue, Jun 30, 2020 at 02:33:52PM +0200, Michał Leszczyński wrote:
> From: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Add functions in libxc that use the new XEN_DOMCTL_vmtrace interface.
> 
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>

Acked-by: Wei Liu <wl@xen.org>

(Subject to acceptance of hypervisor patches)


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2020-07-21 10:53 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30 12:33 [PATCH v4 00/10] Implement support for external IPT monitoring Michał Leszczyński
2020-06-30 12:33 ` [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
2020-06-30 16:23   ` Jan Beulich
2020-06-30 17:37   ` Andrew Cooper
2020-06-30 18:03     ` Tamas K Lengyel
2020-06-30 18:27       ` Michał Leszczyński
2020-07-01 17:52   ` Andrew Cooper
2020-06-30 12:33 ` [PATCH v4 02/10] x86/vmx: add IPT cpu feature Michał Leszczyński
2020-07-01  9:49   ` Roger Pau Monné
2020-07-01 15:12   ` Julien Grall
2020-07-01 16:06     ` Andrew Cooper
2020-07-01 16:17       ` Julien Grall
2020-07-01 16:18         ` Julien Grall
2020-07-01 17:26           ` Andrew Cooper
2020-07-01 18:02             ` Julien Grall
2020-07-01 18:06               ` Andrew Cooper
2020-07-01 18:09                 ` Julien Grall
2020-07-02  8:29                   ` Jan Beulich
2020-07-02  8:42                     ` Julien Grall
2020-07-02  8:50                       ` Jan Beulich
2020-07-02  8:54                         ` Julien Grall
2020-07-02  9:18                           ` Jan Beulich
2020-07-02  9:57                             ` Julien Grall
2020-07-02 13:30                               ` Jan Beulich
2020-07-02 14:14                                 ` Julien Grall
2020-07-02 14:17                                   ` Jan Beulich
2020-07-02 14:31                                     ` Julien Grall
2020-07-02 20:28                                       ` Michał Leszczyński
2020-07-03  7:58                                         ` Julien Grall
2020-07-04 19:16                                           ` Michał Leszczyński
2020-07-01 21:42   ` Andrew Cooper
2020-07-02  8:10     ` Roger Pau Monné
2020-07-02  8:34       ` Jan Beulich
2020-07-02 20:29         ` Michał Leszczyński
2020-06-30 12:33 ` [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter Michał Leszczyński
2020-07-01 10:05   ` Roger Pau Monné
2020-07-02  9:00   ` Roger Pau Monné
2020-07-02 16:23     ` Michał Leszczyński
2020-07-03  9:44       ` Roger Pau Monné
2020-07-03  9:56         ` Jan Beulich
2020-07-03 10:11           ` Roger Pau Monné
2020-07-04 17:23             ` Julien Grall
2020-07-06  8:46               ` Jan Beulich
2020-07-07  8:44                 ` Julien Grall
2020-07-07  9:10                   ` Jan Beulich
2020-07-07  9:16                     ` Julien Grall
2020-07-07 11:17                       ` Michał Leszczyński
2020-07-07 11:21                         ` Jan Beulich
2020-07-07 11:35                           ` Michał Leszczyński
2020-07-02 10:24   ` Anthony PERARD
2020-07-04 17:48   ` Julien Grall
2020-06-30 12:33 ` [PATCH v4 04/10] x86/vmx: implement processor tracing for VMX Michał Leszczyński
2020-07-01 10:30   ` Roger Pau Monné
2020-06-30 12:33 ` [PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer Michał Leszczyński
2020-07-01 10:38   ` Roger Pau Monné
2020-07-01 15:35   ` Julien Grall
2020-06-30 12:33 ` [PATCH v4 06/10] memory: batch processing in acquire_resource() Michał Leszczyński
2020-07-01 10:46   ` Roger Pau Monné
2020-07-03 10:35   ` Julien Grall
2020-07-03 10:52     ` Paul Durrant
2020-07-03 11:17       ` Julien Grall
2020-07-03 11:22         ` Jan Beulich
2020-07-03 11:36           ` Julien Grall
2020-07-03 12:50             ` Jan Beulich
2020-07-03 11:40         ` Paul Durrant
2020-06-30 12:33 ` [PATCH v4 07/10] x86/mm: add vmtrace_buf resource type Michał Leszczyński
2020-07-01 10:52   ` Roger Pau Monné
2020-06-30 12:33 ` [PATCH v4 08/10] x86/domctl: add XEN_DOMCTL_vmtrace_op Michał Leszczyński
2020-07-01 11:00   ` Roger Pau Monné
2020-06-30 12:33 ` [PATCH v4 09/10] tools/libxc: add xc_vmtrace_* functions Michał Leszczyński
2020-07-21 10:52   ` Wei Liu
2020-06-30 12:33 ` [PATCH v4 10/10] tools/proctrace: add proctrace tool Michał Leszczyński
2020-07-02 15:10   ` Andrew Cooper
2020-07-21 10:52     ` Wei Liu
2020-06-30 12:48 ` [PATCH v4 00/10] Implement support for external IPT monitoring Hubert Jasudowicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.