qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Make Intel PT configurable
@ 2021-09-09 14:41 Xiaoyao Li
  2021-09-09 14:41 ` [RFC PATCH 1/5] target/i386: Print CPUID subleaf info for unsupported feature Xiaoyao Li
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Xiaoyao Li @ 2021-09-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Eduardo Habkost; +Cc: xiaoyao.li, qemu-devel

Initial Intel PT support was added by making it as fixed feature set as
ICX's capabilities, which allowed different CPU model with PT enabled
live migration on ICX host. However, it breaks the PT exposure/working
on SPR machine. Because SPR has less PT capabilities regrading
CPUID(0x14,1):EBX[15:0].

This series aims to make Intel PT configurable that named CPU model can
define its own PT feature set and "-cpu host/max" can use the host pass
through feature set of PT.

At the same time, it also ensure existing named CPU model to generate
the same PT CPUID set as before to not break live migration. 

Xiaoyao Li (5):
  target/i386: Print CPUID subleaf info for unsupported feature
  target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD
  target/i386: Enable host pass through of Intel PT
  target/i386: Define specific PT feature set for IceLake-server and
    Snowridge
  target/i386: Access MSR_IA32_RTIT_ADDRn based on guest CPUID
    configuration

 target/i386/cpu.c     | 215 ++++++++++++++++++++++++++++--------------
 target/i386/cpu.h     |  40 +++++++-
 target/i386/kvm/kvm.c |   8 +-
 3 files changed, 186 insertions(+), 77 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/5] target/i386: Print CPUID subleaf info for unsupported feature
  2021-09-09 14:41 [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
@ 2021-09-09 14:41 ` Xiaoyao Li
  2021-10-15 15:12   ` Eduardo Habkost
  2021-09-09 14:41 ` [RFC PATCH 2/5] target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD Xiaoyao Li
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Xiaoyao Li @ 2021-09-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Eduardo Habkost; +Cc: xiaoyao.li, qemu-devel

Some CPUID leaves have meaningful subleaf index. Print the subleaf info
in feature_word_description for CPUID features.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 97e250e8760d..a06473c9e84c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4219,8 +4219,9 @@ static char *feature_word_description(FeatureWordInfo *f, uint32_t bit)
         {
             const char *reg = get_register_name_32(f->cpuid.reg);
             assert(reg);
-            return g_strdup_printf("CPUID.%02XH:%s",
-                                   f->cpuid.eax, reg);
+            return g_strdup_printf("CPUID.%02XH_%02XH:%s",
+                                   f->cpuid.eax,
+                                   f->cpuid.needs_ecx ? f->cpuid.ecx : 0, reg);
         }
     case MSR_FEATURE_WORD:
         return g_strdup_printf("MSR(%02XH)",
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/5] target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD
  2021-09-09 14:41 [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
  2021-09-09 14:41 ` [RFC PATCH 1/5] target/i386: Print CPUID subleaf info for unsupported feature Xiaoyao Li
@ 2021-09-09 14:41 ` Xiaoyao Li
  2021-10-15 16:04   ` Eduardo Habkost
  2021-09-09 14:41 ` [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT Xiaoyao Li
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Xiaoyao Li @ 2021-09-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Eduardo Habkost; +Cc: xiaoyao.li, qemu-devel

CPUID leaf 0x14 subleaf 0x0 and 0x1 enumerate the resource and
capability of Intel PT.

Introduce FeatureWord FEAT_14_0_EBX, FEAT_14_1_EAX and FEAT_14_1_EBX,
and complete FEAT_14_0_ECX. Thus all the features of Intel PT can be
expanded when "-cpu host/max" and can be configured in named CPU model.

Regarding FEAT_14_1_EAX and FEAT_14_1_EBX, don't define the name for
them since each bit of them doesn't represent a standalone sub-feature
of Intel PT. However, explicitly mark them as migratable. So the
meaningful bits of them can be autoenabled in x86_cpu_expand_features().

It has special handling for FEAT_14_1_EAX[2:0], because the 3 bits as a
whole represents the number of PT ADDRn_CFG ranges. Thus it has special
handling in mark_unavailable_features() and x86_cpu_filter_features().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.c | 87 +++++++++++++++++++++++++++++++++++++++++------
 target/i386/cpu.h | 37 +++++++++++++++++++-
 2 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a06473c9e84c..58e98210f219 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -567,7 +567,7 @@ static CPUCacheInfo legacy_l3_cache = {
 /* generated packets which contain IP payloads have LIP values */
 #define INTEL_PT_IP_LIP          (1 << 31)
 #define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges */
-#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x3
+#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
 #define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
 #define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
 #define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
@@ -1161,17 +1161,32 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         }
     },
 
+    [FEAT_14_0_EBX] = {
+        .type = CPUID_FEATURE_WORD,
+        .feat_names = {
+            [0] = "intel-pt-cr3-filter",
+            [1] = "intel-pt-PSB",
+            [2] = "intel-pt-ip-filter",
+            [3] = "intel-pt-mtc",
+            [4] = "intel-pt-ptwrite",
+            [5] = "intel-pt-power-event",
+            [6] = "intel-pt-psb-pmi-preservation",
+        },
+        .cpuid = {
+            .eax = 0x14,
+            .needs_ecx = true, .ecx = 0,
+            .reg = R_EBX,
+        },
+    },
+
     [FEAT_14_0_ECX] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, "intel-pt-lip",
+            [0] = "intel-pt-topa",
+            [1] = "intel-pt-multi-topa-entries",
+            [2] = "intel-pt-single-range",
+            [3] = "intel-pt-trace-transport-subsystem",
+            [31] = "intel-pt-lip",
         },
         .cpuid = {
             .eax = 0x14,
@@ -1181,6 +1196,26 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .tcg_features = TCG_14_0_ECX_FEATURES,
      },
 
+    [FEAT_14_1_EAX] = {
+        .type = CPUID_FEATURE_WORD,
+        .cpuid = {
+            .eax = 0x14,
+            .needs_ecx = true, .ecx = 1,
+            .reg = R_EAX,
+        },
+        .migratable_flags = ~0ull,
+    },
+
+    [FEAT_14_1_EBX] = {
+        .type = CPUID_FEATURE_WORD,
+        .cpuid = {
+            .eax = 0x14,
+            .needs_ecx = true, .ecx = 1,
+            .reg = R_EBX,
+        },
+        .migratable_flags = ~0ull,
+    },
+
 };
 
 typedef struct FeatureMask {
@@ -1253,10 +1288,22 @@ static FeatureDep feature_dependencies[] = {
         .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_RDSEED },
         .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDSEED_EXITING },
     },
+    {
+        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
+        .to = { FEAT_14_0_EBX,              ~0ull },
+    },
     {
         .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
         .to = { FEAT_14_0_ECX,              ~0ull },
     },
+    {
+        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
+        .to = { FEAT_14_1_EAX,              ~0ull },
+    },
+    {
+        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
+        .to = { FEAT_14_1_EBX,              ~0ull },
+    },
     {
         .from = { FEAT_8000_0001_EDX,       CPUID_EXT2_RDTSCP },
         .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDTSCP },
@@ -4260,7 +4307,14 @@ static void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
         return;
     }
 
-    for (i = 0; i < 64; ++i) {
+    if ((w == FEAT_14_1_EAX) && (mask & INTEL_PT_ADDR_RANGES_NUM_MASK)) {
+        warn_report("%s: CPUID.14H_01H:EAX [bit 2:0]", verbose_prefix);
+        i = 3;
+    } else {
+        i = 0;
+    }
+
+    for (; i < 64; ++i) {
         if ((1ULL << i) & mask) {
             g_autofree char *feat_word_str = feature_word_description(f, i);
             warn_report("%s: %s%s%s [bit %d]",
@@ -6038,7 +6092,18 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
         uint64_t host_feat =
             x86_cpu_get_supported_feature_word(w, false);
         uint64_t requested_features = env->features[w];
-        uint64_t unavailable_features = requested_features & ~host_feat;
+        uint64_t unavailable_features;
+        if (w == FEAT_14_1_EAX) {
+            unavailable_features = (requested_features & ~host_feat) &
+                                   ~INTEL_PT_ADDR_RANGES_NUM_MASK;
+            if ((requested_features & INTEL_PT_ADDR_RANGES_NUM_MASK) >
+                (host_feat & INTEL_PT_ADDR_RANGES_NUM_MASK)) {
+                unavailable_features |= requested_features &
+                                        INTEL_PT_ADDR_RANGES_NUM_MASK;
+            }
+        } else {
+            unavailable_features = requested_features & ~host_feat;
+        }
         mark_unavailable_features(cpu, w, unavailable_features, prefix);
     }
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6c50d3ab4f1d..f5478a169f9a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -566,7 +566,10 @@ typedef enum FeatureWord {
     FEAT_VMX_EPT_VPID_CAPS,
     FEAT_VMX_BASIC,
     FEAT_VMX_VMFUNC,
+    FEAT_14_0_EBX,
     FEAT_14_0_ECX,
+    FEAT_14_1_EAX,
+    FEAT_14_1_EBX,
     FEATURE_WORDS,
 } FeatureWord;
 
@@ -835,8 +838,40 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
 
+/*
+ * IA32_RTIT_CTL.CR3 filter can be set to 1 and
+ * IA32_RTIT_CR3_MATCH can be accessed
+ */
+#define CPUID_14_0_EBX_CR3_FILTER               (1U << 0)
+/* Support Configurable PSB and Cycle-Accurate Mode */
+#define CPUID_14_0_EBX_PSB                      (1U << 1)
+/*
+ * Support IP Filtering, IP TraceStop, and preservation
+ * of Intel PT MSRs across warm reset
+ */
+#define CPUID_14_0_EBX_IP_FILTER                (1U << 2)
+/* Support MTC timing packet */
+#define CPUID_14_0_EBX_MTC                      (1U << 3)
+/* Support PTWRITE */
+#define CPUID_14_0_EBX_PTWRITE                  (1U << 4)
+/* Support Power Event Trace packet generation */
+#define CPUID_14_0_EBX_POWER_EVENT              (1U << 5)
+/* Support PSB and PMI Preservation */
+#define CPUID_14_0_EBX_PSB_PMI_PRESERVATION     (1U << 6)
+
+/* Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 */
+#define CPUID_14_0_ECX_TOPA                     (1U << 0)
+/*
+ * ToPA tables can hold any number of output entries, up to the maximum allowed
+ * by the MaskOrTableOffset field of IA32_RTIT_OUTPUT_MASK_PTRS
+ */
+#define CPUID_14_0_ECX_MULTI_ENTRIES            (1U << 1)
+/* Support Single-Range Output scheme */
+#define CPUID_14_0_ECX_SINGLE_RANGE             (1U << 2)
+/* Support IA32_RTIT_CTL.FabricEn */
+#define CPUID_14_0_ECX_TRACE_TRANS_SUBSYSTEM    (1U << 3)
 /* Packets which contain IP payload have LIP values */
-#define CPUID_14_0_ECX_LIP              (1U << 31)
+#define CPUID_14_0_ECX_LIP                      (1U << 31)
 
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO      (1U << 0)
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
  2021-09-09 14:41 [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
  2021-09-09 14:41 ` [RFC PATCH 1/5] target/i386: Print CPUID subleaf info for unsupported feature Xiaoyao Li
  2021-09-09 14:41 ` [RFC PATCH 2/5] target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD Xiaoyao Li
@ 2021-09-09 14:41 ` Xiaoyao Li
  2021-10-15 20:22   ` Eduardo Habkost
  2021-09-09 14:41 ` [RFC PATCH 4/5] target/i386: Define specific PT feature set for IceLake-server and Snowridge Xiaoyao Li
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Xiaoyao Li @ 2021-09-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Eduardo Habkost; +Cc: xiaoyao.li, qemu-devel

commit e37a5c7fa459 ("i386: Add Intel Processor Trace feature support")
added the support of Intel PT by making CPUID[14] of PT as fixed feature
set (from ICX) for any CPU model on any host.

This truly breaks the PT exposing on Intel SPR platform because SPR has
less supported bitmap CPUID(0x14,1):EBX[15:0] than ICX.

This patch enables passing through host's PT capabilities for the case
"-cpu host/max" while ensure named CPU model still has the fixed
PT feature set to not break the live migration.

It introduces @has_specific_intel_pt_feature_set field for name CPU
model. If a named CPU model has this field as false, it will use fixed
PT feature set of ICX. Besides same to previous behavior, if fixed PT
feature set of ICX cannot be satisfied/supported by host, it disables PT
instead of adjusting the feature set based on host's capabilities.

In the future, new named CPU model, e.g., Sapphire Rapids, can define
its own PT feature set by setting @has_specific_intel_pt_feature_set to
true and defines it's own FEAT_14_0_EBX, FEAT_14_0_ECX, FEAT_14_1_EAX
and FEAT_14_1_EBX.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.c | 106 ++++++++++++++++++++--------------------------
 target/i386/cpu.h |   1 +
 2 files changed, 47 insertions(+), 60 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 58e98210f219..00c4ad23110d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -543,34 +543,24 @@ static CPUCacheInfo legacy_l3_cache = {
 #define L2_ITLB_4K_ASSOC       4
 #define L2_ITLB_4K_ENTRIES   512
 
-/* CPUID Leaf 0x14 constants: */
-#define INTEL_PT_MAX_SUBLEAF     0x1
-/*
- * bit[00]: IA32_RTIT_CTL.CR3 filter can be set to 1 and IA32_RTIT_CR3_MATCH
- *          MSR can be accessed;
- * bit[01]: Support Configurable PSB and Cycle-Accurate Mode;
- * bit[02]: Support IP Filtering, TraceStop filtering, and preservation
- *          of Intel PT MSRs across warm reset;
- * bit[03]: Support MTC timing packet and suppression of COFI-based packets;
- */
-#define INTEL_PT_MINIMAL_EBX     0xf
-/*
- * bit[00]: Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 and
- *          IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be
- *          accessed;
- * bit[01]: ToPA tables can hold any number of output entries, up to the
- *          maximum allowed by the MaskOrTableOffset field of
- *          IA32_RTIT_OUTPUT_MASK_PTRS;
- * bit[02]: Support Single-Range Output scheme;
- */
-#define INTEL_PT_MINIMAL_ECX     0x7
-/* generated packets which contain IP payloads have LIP values */
-#define INTEL_PT_IP_LIP          (1 << 31)
-#define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges */
-#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
-#define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
-#define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
-#define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
+#define INTEL_PT_MAX_SUBLEAF                0x1
+#define INTEL_PT_DEFAULT_ADDR_RANGES_NUM    0x2
+#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
+/* Support ART(0,3,6,9) */
+#define INTEL_PT_DEFAULT_MTC_BITMAP         0x0249
+/* Support 0,2^(0~11) */
+#define INTEL_PT_DEFAULT_CYCLE_BITMAP       0x1fff
+/* Support 2K,4K,8K,16K,32K,64K */
+#define INTEL_PT_DEFAULT_PSB_BITMAP         0x003f
+
+#define INTEL_PT_DEFAULT_0_EBX  (CPUID_14_0_EBX_CR3_FILTER | \
+            CPUID_14_0_EBX_PSB | CPUID_14_0_EBX_IP_FILTER | CPUID_14_0_EBX_MTC)
+#define INTEL_PT_DEFAULT_0_ECX  (CPUID_14_0_ECX_TOPA | \
+            CPUID_14_0_ECX_MULTI_ENTRIES | CPUID_14_0_ECX_SINGLE_RANGE)
+#define INTEL_PT_DEFAULT_1_EAX  (INTEL_PT_DEFAULT_MTC_BITMAP << 16 | \
+                                 INTEL_PT_DEFAULT_ADDR_RANGES_NUM)
+#define INTEL_PT_DEFAULT_1_EBX  (INTEL_PT_DEFAULT_PSB_BITMAP << 16 | \
+                                 INTEL_PT_DEFAULT_CYCLE_BITMAP)
 
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
                               uint32_t vendor2, uint32_t vendor3)
@@ -1517,6 +1507,7 @@ typedef struct X86CPUDefinition {
     int family;
     int model;
     int stepping;
+    bool has_specific_intel_pt_feature_set;
     FeatureWordArray features;
     const char *model_id;
     const CPUCaches *const cache_info;
@@ -5061,6 +5052,14 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model)
         env->features[w] = def->features[w];
     }
 
+    if (!def->has_specific_intel_pt_feature_set) {
+        env->use_default_intel_pt = true;
+        env->features[FEAT_14_0_EBX] = INTEL_PT_DEFAULT_0_EBX;
+        env->features[FEAT_14_0_ECX] = INTEL_PT_DEFAULT_0_ECX;
+        env->features[FEAT_14_1_EAX] = INTEL_PT_DEFAULT_1_EAX;
+        env->features[FEAT_14_1_EBX] = INTEL_PT_DEFAULT_1_EBX;
+    }
+
     /* legacy-cache defaults to 'off' if CPU model provides cache info */
     cpu->legacy_cache = !def->cache_info;
 
@@ -5465,14 +5464,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 
         if (count == 0) {
             *eax = INTEL_PT_MAX_SUBLEAF;
-            *ebx = INTEL_PT_MINIMAL_EBX;
-            *ecx = INTEL_PT_MINIMAL_ECX;
-            if (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP) {
-                *ecx |= CPUID_14_0_ECX_LIP;
-            }
+            *ebx = env->features[FEAT_14_0_EBX];
+            *ecx = env->features[FEAT_14_0_ECX];
         } else if (count == 1) {
-            *eax = INTEL_PT_MTC_BITMAP | INTEL_PT_ADDR_RANGES_NUM;
-            *ebx = INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP;
+            *eax = env->features[FEAT_14_1_EAX];
+            *ebx = env->features[FEAT_14_1_EBX];
         }
         break;
     }
@@ -6081,6 +6077,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
     CPUX86State *env = &cpu->env;
     FeatureWord w;
     const char *prefix = NULL;
+    uint64_t host_feat;
 
     if (verbose) {
         prefix = accel_uses_host_cpuid()
@@ -6089,8 +6086,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
     }
 
     for (w = 0; w < FEATURE_WORDS; w++) {
-        uint64_t host_feat =
-            x86_cpu_get_supported_feature_word(w, false);
+        host_feat = x86_cpu_get_supported_feature_word(w, false);
         uint64_t requested_features = env->features[w];
         uint64_t unavailable_features;
         if (w == FEAT_14_1_EAX) {
@@ -6107,32 +6103,22 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
         mark_unavailable_features(cpu, w, unavailable_features, prefix);
     }
 
-    if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
-        kvm_enabled()) {
-        KVMState *s = CPU(cpu)->kvm_state;
-        uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
-        uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
-        uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
-        uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
-        uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
+    if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
+        host_feat = x86_cpu_get_supported_feature_word(FEAT_14_0_ECX, false);
 
-        if (!eax_0 ||
-           ((ebx_0 & INTEL_PT_MINIMAL_EBX) != INTEL_PT_MINIMAL_EBX) ||
-           ((ecx_0 & INTEL_PT_MINIMAL_ECX) != INTEL_PT_MINIMAL_ECX) ||
-           ((eax_1 & INTEL_PT_MTC_BITMAP) != INTEL_PT_MTC_BITMAP) ||
-           ((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) <
-                                           INTEL_PT_ADDR_RANGES_NUM) ||
-           ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) !=
-                (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ||
-           ((ecx_0 & CPUID_14_0_ECX_LIP) !=
-                (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP))) {
-            /*
-             * Processor Trace capabilities aren't configurable, so if the
-             * host can't emulate the capabilities we report on
-             * cpu_x86_cpuid(), intel-pt can't be enabled on the current host.
-             */
+        if (env->use_default_intel_pt &&
+            ((env->features[FEAT_14_0_EBX] != INTEL_PT_DEFAULT_0_EBX) ||
+             ((env->features[FEAT_14_0_ECX] & ~CPUID_14_0_ECX_LIP) !=
+              INTEL_PT_DEFAULT_0_ECX) ||
+             (env->features[FEAT_14_1_EAX] != INTEL_PT_DEFAULT_1_EAX) ||
+             (env->features[FEAT_14_1_EBX] != INTEL_PT_DEFAULT_1_EBX))) {
             mark_unavailable_features(cpu, FEAT_7_0_EBX, CPUID_7_0_EBX_INTEL_PT, prefix);
         }
+
+        if ((env->features[FEAT_14_0_ECX] ^ host_feat) & CPUID_14_0_ECX_LIP) {
+            warn_report("Cannot configure different Intel PT IP payload format than hardware");
+            mark_unavailable_features(cpu, FEAT_7_0_EBX, CPUID_7_0_EBX_INTEL_PT, NULL);
+        }
     }
 }
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f5478a169f9a..e6236c371c4f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1639,6 +1639,7 @@ typedef struct CPUX86State {
     uint32_t cpuid_vendor2;
     uint32_t cpuid_vendor3;
     uint32_t cpuid_version;
+    bool use_default_intel_pt;
     FeatureWordArray features;
     /* Features that were explicitly enabled/disabled */
     FeatureWordArray user_features;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/5] target/i386: Define specific PT feature set for IceLake-server and Snowridge
  2021-09-09 14:41 [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
                   ` (2 preceding siblings ...)
  2021-09-09 14:41 ` [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT Xiaoyao Li
@ 2021-09-09 14:41 ` Xiaoyao Li
  2021-09-09 14:41 ` [RFC PATCH 5/5] target/i386: Access MSR_IA32_RTIT_ADDRn based on guest CPUID configuration Xiaoyao Li
  2021-09-26  5:21 ` [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
  5 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2021-09-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Eduardo Habkost; +Cc: xiaoyao.li, qemu-devel

For IceLake-server, it's just the same as using the default PT
feature set since the default one is exact taken from ICX.

For Snowridge, define it according to real SNR silicon capabilities.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 00c4ad23110d..2b50ccf79b92 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3383,6 +3383,15 @@ static const X86CPUDefinition builtin_x86_defs[] = {
         .features[FEAT_6_EAX] =
             CPUID_6_EAX_ARAT,
         /* Missing: Mode-based execute control (XS/XU), processor tracing, TSC scaling */
+        .has_specific_intel_pt_feature_set = true,
+        .features[FEAT_14_0_EBX] =
+            CPUID_14_0_EBX_CR3_FILTER | CPUID_14_0_EBX_PSB |
+            CPUID_14_0_EBX_IP_FILTER | CPUID_14_0_EBX_MTC,
+        .features[FEAT_14_0_ECX] =
+            CPUID_14_0_ECX_TOPA | CPUID_14_0_ECX_MULTI_ENTRIES |
+            CPUID_14_0_ECX_SINGLE_RANGE,
+        .features[FEAT_14_1_EAX] = 0x249 << 16 | 0x2,
+        .features[FEAT_14_1_EBX] = 0x003f << 16 | 0x1fff,
         .features[FEAT_VMX_BASIC] = MSR_VMX_BASIC_INS_OUTS |
              MSR_VMX_BASIC_TRUE_CTLS,
         .features[FEAT_VMX_ENTRY_CTLS] = VMX_VM_ENTRY_IA32E_MODE |
@@ -3652,6 +3661,17 @@ static const X86CPUDefinition builtin_x86_defs[] = {
             CPUID_XSAVE_XGETBV1,
         .features[FEAT_6_EAX] =
             CPUID_6_EAX_ARAT,
+        .has_specific_intel_pt_feature_set = true,
+        .features[FEAT_14_0_EBX] =
+            CPUID_14_0_EBX_CR3_FILTER | CPUID_14_0_EBX_PSB |
+            CPUID_14_0_EBX_IP_FILTER | CPUID_14_0_EBX_MTC |
+            CPUID_14_0_EBX_PTWRITE | CPUID_14_0_EBX_POWER_EVENT |
+            CPUID_14_0_EBX_PSB_PMI_PRESERVATION,
+        .features[FEAT_14_0_ECX] =
+            CPUID_14_0_ECX_TOPA | CPUID_14_0_ECX_MULTI_ENTRIES |
+            CPUID_14_0_ECX_SINGLE_RANGE | CPUID_14_0_ECX_LIP,
+        .features[FEAT_14_1_EAX] = 0x249 << 16 | 0x2,
+        .features[FEAT_14_1_EBX] = 0x003f << 16 | 0xffff,
         .features[FEAT_VMX_BASIC] = MSR_VMX_BASIC_INS_OUTS |
              MSR_VMX_BASIC_TRUE_CTLS,
         .features[FEAT_VMX_ENTRY_CTLS] = VMX_VM_ENTRY_IA32E_MODE |
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 5/5] target/i386: Access MSR_IA32_RTIT_ADDRn based on guest CPUID configuration
  2021-09-09 14:41 [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
                   ` (3 preceding siblings ...)
  2021-09-09 14:41 ` [RFC PATCH 4/5] target/i386: Define specific PT feature set for IceLake-server and Snowridge Xiaoyao Li
@ 2021-09-09 14:41 ` Xiaoyao Li
  2021-09-26  5:21 ` [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
  5 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2021-09-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Eduardo Habkost; +Cc: xiaoyao.li, qemu-devel

KVM only allows userspace to access legal number of MSR_IA32_RTIT_ADDRn,
which is enumrated by guest's CPUID(0x14,0x1):EAX[2:0], i.e.,
env->features[FEAT_14_1_EAX] & INTEL_PT_ADDR_RANGES_NUM_MASK

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.c     | 1 -
 target/i386/cpu.h     | 2 ++
 target/i386/kvm/kvm.c | 8 ++++----
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2b50ccf79b92..5ff70a8cba54 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -545,7 +545,6 @@ static CPUCacheInfo legacy_l3_cache = {
 
 #define INTEL_PT_MAX_SUBLEAF                0x1
 #define INTEL_PT_DEFAULT_ADDR_RANGES_NUM    0x2
-#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
 /* Support ART(0,3,6,9) */
 #define INTEL_PT_DEFAULT_MTC_BITMAP         0x0249
 /* Support 0,2^(0~11) */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e6236c371c4f..20e579f147f8 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -873,6 +873,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP                      (1U << 31)
 
+#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO      (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 500d2e0e686f..a90115da9ee5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3088,8 +3088,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             }
         }
         if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
-            int addr_num = kvm_arch_get_supported_cpuid(kvm_state,
-                                                    0x14, 1, R_EAX) & 0x7;
+            int addr_num = env->features[FEAT_14_1_EAX] &
+                           INTEL_PT_ADDR_RANGES_NUM_MASK;
 
             kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CTL,
                             env->msr_rtit_ctrl);
@@ -3433,8 +3433,8 @@ static int kvm_get_msrs(X86CPU *cpu)
     }
 
     if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
-        int addr_num =
-            kvm_arch_get_supported_cpuid(kvm_state, 0x14, 1, R_EAX) & 0x7;
+        int addr_num = env->features[FEAT_14_1_EAX] &
+                       INTEL_PT_ADDR_RANGES_NUM_MASK;
 
         kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CTL, 0);
         kvm_msr_entry_add(cpu, MSR_IA32_RTIT_STATUS, 0);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/5] Make Intel PT configurable
  2021-09-09 14:41 [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
                   ` (4 preceding siblings ...)
  2021-09-09 14:41 ` [RFC PATCH 5/5] target/i386: Access MSR_IA32_RTIT_ADDRn based on guest CPUID configuration Xiaoyao Li
@ 2021-09-26  5:21 ` Xiaoyao Li
  5 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2021-09-26  5:21 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Eduardo Habkost; +Cc: qemu-devel

Ping...

Eduardo, could you please take a look at this series.

Thanks!
-Xiaoyao

On 9/9/2021 10:41 PM, Xiaoyao Li wrote:
> Initial Intel PT support was added by making it as fixed feature set as
> ICX's capabilities, which allowed different CPU model with PT enabled
> live migration on ICX host. However, it breaks the PT exposure/working
> on SPR machine. Because SPR has less PT capabilities regrading
> CPUID(0x14,1):EBX[15:0].
> 
> This series aims to make Intel PT configurable that named CPU model can
> define its own PT feature set and "-cpu host/max" can use the host pass
> through feature set of PT.
> 
> At the same time, it also ensure existing named CPU model to generate
> the same PT CPUID set as before to not break live migration.
> 
> Xiaoyao Li (5):
>    target/i386: Print CPUID subleaf info for unsupported feature
>    target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD
>    target/i386: Enable host pass through of Intel PT
>    target/i386: Define specific PT feature set for IceLake-server and
>      Snowridge
>    target/i386: Access MSR_IA32_RTIT_ADDRn based on guest CPUID
>      configuration
> 
>   target/i386/cpu.c     | 215 ++++++++++++++++++++++++++++--------------
>   target/i386/cpu.h     |  40 +++++++-
>   target/i386/kvm/kvm.c |   8 +-
>   3 files changed, 186 insertions(+), 77 deletions(-)
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/5] target/i386: Print CPUID subleaf info for unsupported feature
  2021-09-09 14:41 ` [RFC PATCH 1/5] target/i386: Print CPUID subleaf info for unsupported feature Xiaoyao Li
@ 2021-10-15 15:12   ` Eduardo Habkost
  0 siblings, 0 replies; 15+ messages in thread
From: Eduardo Habkost @ 2021-10-15 15:12 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On Thu, Sep 09, 2021 at 10:41:46PM +0800, Xiaoyao Li wrote:
> Some CPUID leaves have meaningful subleaf index. Print the subleaf info
> in feature_word_description for CPUID features.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/5] target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD
  2021-09-09 14:41 ` [RFC PATCH 2/5] target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD Xiaoyao Li
@ 2021-10-15 16:04   ` Eduardo Habkost
  2021-10-17  7:53     ` Xiaoyao Li
  0 siblings, 1 reply; 15+ messages in thread
From: Eduardo Habkost @ 2021-10-15 16:04 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

Hi,

Apologies for the delay.  Comments below:

On Thu, Sep 09, 2021 at 10:41:47PM +0800, Xiaoyao Li wrote:
> CPUID leaf 0x14 subleaf 0x0 and 0x1 enumerate the resource and
> capability of Intel PT.
> 
> Introduce FeatureWord FEAT_14_0_EBX, FEAT_14_1_EAX and FEAT_14_1_EBX,
> and complete FEAT_14_0_ECX. Thus all the features of Intel PT can be
> expanded when "-cpu host/max" and can be configured in named CPU model.
> 
> Regarding FEAT_14_1_EAX and FEAT_14_1_EBX, don't define the name for
> them since each bit of them doesn't represent a standalone sub-feature
> of Intel PT. However, explicitly mark them as migratable. So the
> meaningful bits of them can be autoenabled in x86_cpu_expand_features().
> 
> It has special handling for FEAT_14_1_EAX[2:0], because the 3 bits as a
> whole represents the number of PT ADDRn_CFG ranges. Thus it has special
> handling in mark_unavailable_features() and x86_cpu_filter_features().
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  target/i386/cpu.c | 87 +++++++++++++++++++++++++++++++++++++++++------
>  target/i386/cpu.h | 37 +++++++++++++++++++-
>  2 files changed, 112 insertions(+), 12 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index a06473c9e84c..58e98210f219 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -567,7 +567,7 @@ static CPUCacheInfo legacy_l3_cache = {
>  /* generated packets which contain IP payloads have LIP values */
>  #define INTEL_PT_IP_LIP          (1 << 31)
>  #define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges */
> -#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x3
> +#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
>  #define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
>  #define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
>  #define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
> @@ -1161,17 +1161,32 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>          }
>      },
>  
> +    [FEAT_14_0_EBX] = {
> +        .type = CPUID_FEATURE_WORD,
> +        .feat_names = {
> +            [0] = "intel-pt-cr3-filter",
> +            [1] = "intel-pt-PSB",

I suggest using lowercase for all feature names, as it is the
usual convention for QOM property names.

> +            [2] = "intel-pt-ip-filter",
> +            [3] = "intel-pt-mtc",
> +            [4] = "intel-pt-ptwrite",
> +            [5] = "intel-pt-power-event",
> +            [6] = "intel-pt-psb-pmi-preservation",

This has a side effect: live migration with those flags enabled
will become possible.

All bits allow enumerate support for an optional feature to be
enabled by the OS, so it means we can emulate a CPU with bit=0
CPU on a bit=1 host.  So it will be safe as long as there's no
additional state that needs to be live migrated when those
features are used.  Do we have any additional state, or all MSRs
currently being migrated are enough?

> +        },
> +        .cpuid = {
> +            .eax = 0x14,
> +            .needs_ecx = true, .ecx = 0,
> +            .reg = R_EBX,
> +        },
> +    },
> +
>      [FEAT_14_0_ECX] = {
>          .type = CPUID_FEATURE_WORD,
>          .feat_names = {
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, "intel-pt-lip",
> +            [0] = "intel-pt-topa",
> +            [1] = "intel-pt-multi-topa-entries",
> +            [2] = "intel-pt-single-range",
> +            [3] = "intel-pt-trace-transport-subsystem",

All bits above are also optional features to be enabled
explicitly by the OS, so it also seems OK.

> +            [31] = "intel-pt-lip",

This one is trickier because its value must match the host, but
it is already present in the existing code.  We already have an
explicit check for host LIP == guest LIP, so it's OK.

>          },
>          .cpuid = {
>              .eax = 0x14,
> @@ -1181,6 +1196,26 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>          .tcg_features = TCG_14_0_ECX_FEATURES,
>       },
>  
> +    [FEAT_14_1_EAX] = {
> +        .type = CPUID_FEATURE_WORD,
> +        .cpuid = {
> +            .eax = 0x14,
> +            .needs_ecx = true, .ecx = 1,
> +            .reg = R_EAX,
> +        },
> +        .migratable_flags = ~0ull,

This one is trickier.  I see a few potential issues:

* Bits 15:3 are documented as reserved on my version of the Intel
  SDM (June 2021).  If they are reserved, I don't think we should
  make them migratable yet.  If they aren't, do you have a
  pointer to the documentation?

* Bits 2:0 is a number, not a simple boolean flag.  You are
  handling this as a special case in x86_cpu_filter_features()
  below, so it should be OK.

* The flags are migratable but have no names.  The only existing
  case of non-zero .migratable_flags required a special case at
  x86_cpu_feature_name().  I'm pretty sure this might break the
  getter of the "unavailable-features" property.  I think we need
  to make x86_cpu_feature_name() safer, somehow, or (a quicker
  solution) we can add names to all migratable bits here (even if
  they are not intended to be set by end users).

> +    },
> +
> +    [FEAT_14_1_EBX] = {
> +        .type = CPUID_FEATURE_WORD,
> +        .cpuid = {
> +            .eax = 0x14,
> +            .needs_ecx = true, .ecx = 1,
> +            .reg = R_EBX,
> +        },
> +        .migratable_flags = ~0ull,

Same observation above about flags with no names apply here.

> +    },
> +
>  };
>  
>  typedef struct FeatureMask {
> @@ -1253,10 +1288,22 @@ static FeatureDep feature_dependencies[] = {
>          .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_RDSEED },
>          .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDSEED_EXITING },
>      },
> +    {
> +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
> +        .to = { FEAT_14_0_EBX,              ~0ull },
> +    },
>      {
>          .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
>          .to = { FEAT_14_0_ECX,              ~0ull },
>      },
> +    {
> +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
> +        .to = { FEAT_14_1_EAX,              ~0ull },
> +    },
> +    {
> +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
> +        .to = { FEAT_14_1_EBX,              ~0ull },
> +    },
>      {
>          .from = { FEAT_8000_0001_EDX,       CPUID_EXT2_RDTSCP },
>          .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDTSCP },
> @@ -4260,7 +4307,14 @@ static void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
>          return;
>      }
>  
> -    for (i = 0; i < 64; ++i) {
> +    if ((w == FEAT_14_1_EAX) && (mask & INTEL_PT_ADDR_RANGES_NUM_MASK)) {
> +        warn_report("%s: CPUID.14H_01H:EAX [bit 2:0]", verbose_prefix);
> +        i = 3;
> +    } else {
> +        i = 0;
> +    }

That's tricky and easy to break.  To be honest, I prefer having
duplicate error messages showing bit 2,1,0 than having this hack.

If you want to make some range of CPUID bits special, I'd prefer
to implement this as a FeatureWordInfo field that indicates that.
This way this function doesn't become a pile of special cases on
top of each other.

In either case, I suggest addressing the duplicate error messages
in a separate patch so this discussion doesn't block the rest.

> +
> +    for (; i < 64; ++i) {
>          if ((1ULL << i) & mask) {
>              g_autofree char *feat_word_str = feature_word_description(f, i);
>              warn_report("%s: %s%s%s [bit %d]",
> @@ -6038,7 +6092,18 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>          uint64_t host_feat =
>              x86_cpu_get_supported_feature_word(w, false);
>          uint64_t requested_features = env->features[w];
> -        uint64_t unavailable_features = requested_features & ~host_feat;
> +        uint64_t unavailable_features;
> +        if (w == FEAT_14_1_EAX) {

I would love to find a way to make this more readable, but it's
simple enough.

Maybe a `switch (w)` statement would be better, to make it easier
to extend in the future, and discourage people from adding
special cases that are more complex than (w == FEAT_...).

Maybe it would be good to create a
  void x86_cpu_filter_feature_word(X86CPU *cpu, FeatureWord w)
function to keep the code complexity under control.

> +            unavailable_features = (requested_features & ~host_feat) &
> +                                   ~INTEL_PT_ADDR_RANGES_NUM_MASK;
> +            if ((requested_features & INTEL_PT_ADDR_RANGES_NUM_MASK) >
> +                (host_feat & INTEL_PT_ADDR_RANGES_NUM_MASK)) {
> +                unavailable_features |= requested_features &
> +                                        INTEL_PT_ADDR_RANGES_NUM_MASK;
> +            }
> +        } else {
> +            unavailable_features = requested_features & ~host_feat;
> +        }
>          mark_unavailable_features(cpu, w, unavailable_features, prefix);
>      }
>  

I miss the cpu_x86_cpuid() changes to actual use the new feature
words, here.  I think a cpu_x86_cpuid() change should be done in
the same patch, or we risk having a QEMU commit where the CPU
properties exist but do absolutely nothing.

> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 6c50d3ab4f1d..f5478a169f9a 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -566,7 +566,10 @@ typedef enum FeatureWord {
>      FEAT_VMX_EPT_VPID_CAPS,
>      FEAT_VMX_BASIC,
>      FEAT_VMX_VMFUNC,
> +    FEAT_14_0_EBX,
>      FEAT_14_0_ECX,
> +    FEAT_14_1_EAX,
> +    FEAT_14_1_EBX,
>      FEATURE_WORDS,
>  } FeatureWord;
>  
> @@ -835,8 +838,40 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
>  /* AVX512 BFloat16 Instruction */
>  #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
>  
> +/*
> + * IA32_RTIT_CTL.CR3 filter can be set to 1 and
> + * IA32_RTIT_CR3_MATCH can be accessed
> + */
> +#define CPUID_14_0_EBX_CR3_FILTER               (1U << 0)
> +/* Support Configurable PSB and Cycle-Accurate Mode */
> +#define CPUID_14_0_EBX_PSB                      (1U << 1)
> +/*
> + * Support IP Filtering, IP TraceStop, and preservation
> + * of Intel PT MSRs across warm reset
> + */
> +#define CPUID_14_0_EBX_IP_FILTER                (1U << 2)
> +/* Support MTC timing packet */
> +#define CPUID_14_0_EBX_MTC                      (1U << 3)
> +/* Support PTWRITE */
> +#define CPUID_14_0_EBX_PTWRITE                  (1U << 4)
> +/* Support Power Event Trace packet generation */
> +#define CPUID_14_0_EBX_POWER_EVENT              (1U << 5)
> +/* Support PSB and PMI Preservation */
> +#define CPUID_14_0_EBX_PSB_PMI_PRESERVATION     (1U << 6)
> +
> +/* Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 */
> +#define CPUID_14_0_ECX_TOPA                     (1U << 0)
> +/*
> + * ToPA tables can hold any number of output entries, up to the maximum allowed
> + * by the MaskOrTableOffset field of IA32_RTIT_OUTPUT_MASK_PTRS
> + */
> +#define CPUID_14_0_ECX_MULTI_ENTRIES            (1U << 1)
> +/* Support Single-Range Output scheme */
> +#define CPUID_14_0_ECX_SINGLE_RANGE             (1U << 2)
> +/* Support IA32_RTIT_CTL.FabricEn */
> +#define CPUID_14_0_ECX_TRACE_TRANS_SUBSYSTEM    (1U << 3)
>  /* Packets which contain IP payload have LIP values */
> -#define CPUID_14_0_ECX_LIP              (1U << 31)
> +#define CPUID_14_0_ECX_LIP                      (1U << 31)
>  
>  /* CLZERO instruction */
>  #define CPUID_8000_0008_EBX_CLZERO      (1U << 0)
> -- 
> 2.27.0
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
  2021-09-09 14:41 ` [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT Xiaoyao Li
@ 2021-10-15 20:22   ` Eduardo Habkost
  2021-10-17 10:37     ` Xiaoyao Li
  2021-10-18  3:46     ` Xiaoyao Li
  0 siblings, 2 replies; 15+ messages in thread
From: Eduardo Habkost @ 2021-10-15 20:22 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On Thu, Sep 09, 2021 at 10:41:48PM +0800, Xiaoyao Li wrote:
> commit e37a5c7fa459 ("i386: Add Intel Processor Trace feature support")
> added the support of Intel PT by making CPUID[14] of PT as fixed feature
> set (from ICX) for any CPU model on any host.
> 
> This truly breaks the PT exposing on Intel SPR platform because SPR has
> less supported bitmap CPUID(0x14,1):EBX[15:0] than ICX.
> 
> This patch enables passing through host's PT capabilities for the case
> "-cpu host/max" while ensure named CPU model still has the fixed
> PT feature set to not break the live migration.
> 
> It introduces @has_specific_intel_pt_feature_set field for name CPU
> model. If a named CPU model has this field as false, it will use fixed
> PT feature set of ICX. Besides same to previous behavior, if fixed PT
> feature set of ICX cannot be satisfied/supported by host, it disables PT
> instead of adjusting the feature set based on host's capabilities.
> 
> In the future, new named CPU model, e.g., Sapphire Rapids, can define
> its own PT feature set by setting @has_specific_intel_pt_feature_set to
> true and defines it's own FEAT_14_0_EBX, FEAT_14_0_ECX, FEAT_14_1_EAX
> and FEAT_14_1_EBX.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  target/i386/cpu.c | 106 ++++++++++++++++++++--------------------------
>  target/i386/cpu.h |   1 +
>  2 files changed, 47 insertions(+), 60 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 58e98210f219..00c4ad23110d 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -543,34 +543,24 @@ static CPUCacheInfo legacy_l3_cache = {
>  #define L2_ITLB_4K_ASSOC       4
>  #define L2_ITLB_4K_ENTRIES   512
>  
> -/* CPUID Leaf 0x14 constants: */
> -#define INTEL_PT_MAX_SUBLEAF     0x1
> -/*
> - * bit[00]: IA32_RTIT_CTL.CR3 filter can be set to 1 and IA32_RTIT_CR3_MATCH
> - *          MSR can be accessed;
> - * bit[01]: Support Configurable PSB and Cycle-Accurate Mode;
> - * bit[02]: Support IP Filtering, TraceStop filtering, and preservation
> - *          of Intel PT MSRs across warm reset;
> - * bit[03]: Support MTC timing packet and suppression of COFI-based packets;
> - */
> -#define INTEL_PT_MINIMAL_EBX     0xf
> -/*
> - * bit[00]: Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 and
> - *          IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be
> - *          accessed;
> - * bit[01]: ToPA tables can hold any number of output entries, up to the
> - *          maximum allowed by the MaskOrTableOffset field of
> - *          IA32_RTIT_OUTPUT_MASK_PTRS;
> - * bit[02]: Support Single-Range Output scheme;
> - */
> -#define INTEL_PT_MINIMAL_ECX     0x7
> -/* generated packets which contain IP payloads have LIP values */
> -#define INTEL_PT_IP_LIP          (1 << 31)
> -#define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges */
> -#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
> -#define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
> -#define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
> -#define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
> +#define INTEL_PT_MAX_SUBLEAF                0x1
> +#define INTEL_PT_DEFAULT_ADDR_RANGES_NUM    0x2
> +#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
> +/* Support ART(0,3,6,9) */
> +#define INTEL_PT_DEFAULT_MTC_BITMAP         0x0249
> +/* Support 0,2^(0~11) */
> +#define INTEL_PT_DEFAULT_CYCLE_BITMAP       0x1fff
> +/* Support 2K,4K,8K,16K,32K,64K */
> +#define INTEL_PT_DEFAULT_PSB_BITMAP         0x003f
> +
> +#define INTEL_PT_DEFAULT_0_EBX  (CPUID_14_0_EBX_CR3_FILTER | \
> +            CPUID_14_0_EBX_PSB | CPUID_14_0_EBX_IP_FILTER | CPUID_14_0_EBX_MTC)
> +#define INTEL_PT_DEFAULT_0_ECX  (CPUID_14_0_ECX_TOPA | \
> +            CPUID_14_0_ECX_MULTI_ENTRIES | CPUID_14_0_ECX_SINGLE_RANGE)
> +#define INTEL_PT_DEFAULT_1_EAX  (INTEL_PT_DEFAULT_MTC_BITMAP << 16 | \
> +                                 INTEL_PT_DEFAULT_ADDR_RANGES_NUM)
> +#define INTEL_PT_DEFAULT_1_EBX  (INTEL_PT_DEFAULT_PSB_BITMAP << 16 | \
> +                                 INTEL_PT_DEFAULT_CYCLE_BITMAP)

I like these new macros because they make the code at
x86_cpu_filter_features() clearer.  Can we do this in a separate
patch, to be applied before "Introduce FeatureWordInfo for Intel
PT CPUID leaf 0xD"?

>  
>  void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
>                                uint32_t vendor2, uint32_t vendor3)
> @@ -1517,6 +1507,7 @@ typedef struct X86CPUDefinition {
>      int family;
>      int model;
>      int stepping;
> +    bool has_specific_intel_pt_feature_set;
>      FeatureWordArray features;
>      const char *model_id;
>      const CPUCaches *const cache_info;
> @@ -5061,6 +5052,14 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model)
>          env->features[w] = def->features[w];
>      }
>  
> +    if (!def->has_specific_intel_pt_feature_set) {
> +        env->use_default_intel_pt = true;
> +        env->features[FEAT_14_0_EBX] = INTEL_PT_DEFAULT_0_EBX;
> +        env->features[FEAT_14_0_ECX] = INTEL_PT_DEFAULT_0_ECX;
> +        env->features[FEAT_14_1_EAX] = INTEL_PT_DEFAULT_1_EAX;
> +        env->features[FEAT_14_1_EBX] = INTEL_PT_DEFAULT_1_EBX;
> +    }

There's nothing special about intel-pt, and other features might
benefit from having default values set if omitted from the CPU
model definition.  I don't know yet what's the best solution
here, but some possibilities are:

A) The simpler and more obvious solution: just set
  features[FEAT_14_*] explicitly to INTEL_PT_DEFAULT_* on all CPU
  models in builtin_x86_defs.

B) Treat (env->features[FEAT_14_0_EBX] == 0 &&
          env->features[FEAT_14_0_ECX] == 0 &&
          env->features[FEAT_14_1_EAX] == 0 &&
          env->features[FEAT_14_1_EBX] == 0) as a special case
  that indicates that INTEL_PT_DEFAULT_* should be used instead
  of 0.

C) Rework X86CPUDefinition completely and replace
    FeatureWordArray features;
  with something like:
    struct {
      FeatureWord w;
      uint32_t value;
    } *features;
  or:
    struct {
      const char *property, *value;
    } *features;
  so we can have a concept of omitted/default flags in
  builtin_x86_defs.

(A) and (C) are generic but require a lot more work, so we could
keep your solution or just implement (B).

In either case, I suggest you add a comment explaining why the
boolean flag or special case exists (I believe the answer is:
"because all CPU models have the same default value for
INTEL_PT_* and we don't want to manually copy/paste it to all
entries in builtin_x86_defs").

> +
>      /* legacy-cache defaults to 'off' if CPU model provides cache info */
>      cpu->legacy_cache = !def->cache_info;
>  
> @@ -5465,14 +5464,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>  
>          if (count == 0) {
>              *eax = INTEL_PT_MAX_SUBLEAF;
> -            *ebx = INTEL_PT_MINIMAL_EBX;
> -            *ecx = INTEL_PT_MINIMAL_ECX;
> -            if (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP) {
> -                *ecx |= CPUID_14_0_ECX_LIP;
> -            }
> +            *ebx = env->features[FEAT_14_0_EBX];
> +            *ecx = env->features[FEAT_14_0_ECX];
>          } else if (count == 1) {
> -            *eax = INTEL_PT_MTC_BITMAP | INTEL_PT_ADDR_RANGES_NUM;
> -            *ebx = INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP;
> +            *eax = env->features[FEAT_14_1_EAX];
> +            *ebx = env->features[FEAT_14_1_EBX];
>          }
>          break;
>      }
> @@ -6081,6 +6077,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>      CPUX86State *env = &cpu->env;
>      FeatureWord w;
>      const char *prefix = NULL;
> +    uint64_t host_feat;
>  
>      if (verbose) {
>          prefix = accel_uses_host_cpuid()
> @@ -6089,8 +6086,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>      }
>  
>      for (w = 0; w < FEATURE_WORDS; w++) {
> -        uint64_t host_feat =
> -            x86_cpu_get_supported_feature_word(w, false);
> +        host_feat = x86_cpu_get_supported_feature_word(w, false);

This doesn't look right.  The value of host_feat set here is
useless outside this for loop, and there's no reason to change
the scope of this variable.

>          uint64_t requested_features = env->features[w];
>          uint64_t unavailable_features;
>          if (w == FEAT_14_1_EAX) {
> @@ -6107,32 +6103,22 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>          mark_unavailable_features(cpu, w, unavailable_features, prefix);
>      }
>  
> -    if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
> -        kvm_enabled()) {
> -        KVMState *s = CPU(cpu)->kvm_state;
> -        uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
> -        uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
> -        uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
> -        uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
> -        uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
> +    if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
> +        host_feat = x86_cpu_get_supported_feature_word(FEAT_14_0_ECX, false);

This doesn't look right.  You seem to be using the same variable
(host_feat) for two completely different purposes.

>  
> -        if (!eax_0 ||
> -           ((ebx_0 & INTEL_PT_MINIMAL_EBX) != INTEL_PT_MINIMAL_EBX) ||
> -           ((ecx_0 & INTEL_PT_MINIMAL_ECX) != INTEL_PT_MINIMAL_ECX) ||
> -           ((eax_1 & INTEL_PT_MTC_BITMAP) != INTEL_PT_MTC_BITMAP) ||
> -           ((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) <
> -                                           INTEL_PT_ADDR_RANGES_NUM) ||
> -           ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) !=
> -                (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ||
> -           ((ecx_0 & CPUID_14_0_ECX_LIP) !=
> -                (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP))) {
> -            /*
> -             * Processor Trace capabilities aren't configurable, so if the
> -             * host can't emulate the capabilities we report on
> -             * cpu_x86_cpuid(), intel-pt can't be enabled on the current host.
> -             */
> +        if (env->use_default_intel_pt &&
> +            ((env->features[FEAT_14_0_EBX] != INTEL_PT_DEFAULT_0_EBX) ||
> +             ((env->features[FEAT_14_0_ECX] & ~CPUID_14_0_ECX_LIP) !=
> +              INTEL_PT_DEFAULT_0_ECX) ||
> +             (env->features[FEAT_14_1_EAX] != INTEL_PT_DEFAULT_1_EAX) ||
> +             (env->features[FEAT_14_1_EBX] != INTEL_PT_DEFAULT_1_EBX))) {
>              mark_unavailable_features(cpu, FEAT_7_0_EBX, CPUID_7_0_EBX_INTEL_PT, prefix);

Why is use_default_intel_pt necessary?  Why can't this function
simply validate whatever is inside env->features[FEAT_14_*]?

>          }
> +
> +        if ((env->features[FEAT_14_0_ECX] ^ host_feat) & CPUID_14_0_ECX_LIP) {

What if CPUID_7_0_EBX_INTEL_PT is not set?  What should be the
value of host_feat?

> +            warn_report("Cannot configure different Intel PT IP payload format than hardware");
> +            mark_unavailable_features(cpu, FEAT_7_0_EBX, CPUID_7_0_EBX_INTEL_PT, NULL);
> +        }
>      }
>  }
>  
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index f5478a169f9a..e6236c371c4f 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1639,6 +1639,7 @@ typedef struct CPUX86State {
>      uint32_t cpuid_vendor2;
>      uint32_t cpuid_vendor3;
>      uint32_t cpuid_version;
> +    bool use_default_intel_pt;
>      FeatureWordArray features;
>      /* Features that were explicitly enabled/disabled */
>      FeatureWordArray user_features;
> -- 
> 2.27.0
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/5] target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD
  2021-10-15 16:04   ` Eduardo Habkost
@ 2021-10-17  7:53     ` Xiaoyao Li
  0 siblings, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2021-10-17  7:53 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On 10/16/2021 12:04 AM, Eduardo Habkost wrote:
> Hi,
> 
> Apologies for the delay.  Comments below:
> 
> On Thu, Sep 09, 2021 at 10:41:47PM +0800, Xiaoyao Li wrote:
>> CPUID leaf 0x14 subleaf 0x0 and 0x1 enumerate the resource and
>> capability of Intel PT.
>>
>> Introduce FeatureWord FEAT_14_0_EBX, FEAT_14_1_EAX and FEAT_14_1_EBX,
>> and complete FEAT_14_0_ECX. Thus all the features of Intel PT can be
>> expanded when "-cpu host/max" and can be configured in named CPU model.
>>
>> Regarding FEAT_14_1_EAX and FEAT_14_1_EBX, don't define the name for
>> them since each bit of them doesn't represent a standalone sub-feature
>> of Intel PT. However, explicitly mark them as migratable. So the
>> meaningful bits of them can be autoenabled in x86_cpu_expand_features().
>>
>> It has special handling for FEAT_14_1_EAX[2:0], because the 3 bits as a
>> whole represents the number of PT ADDRn_CFG ranges. Thus it has special
>> handling in mark_unavailable_features() and x86_cpu_filter_features().
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   target/i386/cpu.c | 87 +++++++++++++++++++++++++++++++++++++++++------
>>   target/i386/cpu.h | 37 +++++++++++++++++++-
>>   2 files changed, 112 insertions(+), 12 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index a06473c9e84c..58e98210f219 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -567,7 +567,7 @@ static CPUCacheInfo legacy_l3_cache = {
>>   /* generated packets which contain IP payloads have LIP values */
>>   #define INTEL_PT_IP_LIP          (1 << 31)
>>   #define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges */
>> -#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x3
>> +#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
>>   #define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
>>   #define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
>>   #define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
>> @@ -1161,17 +1161,32 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>>           }
>>       },
>>   
>> +    [FEAT_14_0_EBX] = {
>> +        .type = CPUID_FEATURE_WORD,
>> +        .feat_names = {
>> +            [0] = "intel-pt-cr3-filter",
>> +            [1] = "intel-pt-PSB",
> 
> I suggest using lowercase for all feature names, as it is the
> usual convention for QOM property names.

Will do it.

>> +            [2] = "intel-pt-ip-filter",
>> +            [3] = "intel-pt-mtc",
>> +            [4] = "intel-pt-ptwrite",
>> +            [5] = "intel-pt-power-event",
>> +            [6] = "intel-pt-psb-pmi-preservation",
> 
> This has a side effect: live migration with those flags enabled
> will become possible.
> 
> All bits allow enumerate support for an optional feature to be
> enabled by the OS, so it means we can emulate a CPU with bit=0
> CPU on a bit=1 host.  So it will be safe as long as there's no
> additional state that needs to be live migrated when those
> features are used.  Do we have any additional state, or all MSRs
> currently being migrated are enough?

For Intel PT, QEMU already live migration support for it, but only with 
fixed PT feature set for all the CPU models + max/host. No ?

>> +        },
>> +        .cpuid = {
>> +            .eax = 0x14,
>> +            .needs_ecx = true, .ecx = 0,
>> +            .reg = R_EBX,
>> +        },
>> +    },
>> +
>>       [FEAT_14_0_ECX] = {
>>           .type = CPUID_FEATURE_WORD,
>>           .feat_names = {
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, "intel-pt-lip",
>> +            [0] = "intel-pt-topa",
>> +            [1] = "intel-pt-multi-topa-entries",
>> +            [2] = "intel-pt-single-range",
>> +            [3] = "intel-pt-trace-transport-subsystem",
> 
> All bits above are also optional features to be enabled
> explicitly by the OS, so it also seems OK.
> 
>> +            [31] = "intel-pt-lip",
> 
> This one is trickier because its value must match the host, but
> it is already present in the existing code.  We already have an
> explicit check for host LIP == guest LIP, so it's OK.
> 
>>           },
>>           .cpuid = {
>>               .eax = 0x14,
>> @@ -1181,6 +1196,26 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>>           .tcg_features = TCG_14_0_ECX_FEATURES,
>>        },
>>   
>> +    [FEAT_14_1_EAX] = {
>> +        .type = CPUID_FEATURE_WORD,
>> +        .cpuid = {
>> +            .eax = 0x14,
>> +            .needs_ecx = true, .ecx = 1,
>> +            .reg = R_EAX,
>> +        },
>> +        .migratable_flags = ~0ull,
> 
> This one is trickier.  I see a few potential issues:
> 
> * Bits 15:3 are documented as reserved on my version of the Intel
>    SDM (June 2021).  If they are reserved, I don't think we should
>    make them migratable yet.  If they aren't, do you have a
>    pointer to the documentation?

you are right that they are reserved.

I was just too lazy to calculate the bitmask. Will fix it.

> * Bits 2:0 is a number, not a simple boolean flag.  You are
>    handling this as a special case in x86_cpu_filter_features()
>    below, so it should be OK.
> 
> * The flags are migratable but have no names.  The only existing
>    case of non-zero .migratable_flags required a special case at
>    x86_cpu_feature_name().  I'm pretty sure this might break the
>    getter of the "unavailable-features" property.  I think we need
>    to make x86_cpu_feature_name() safer, somehow, or (a quicker
>    solution) we can add names to all migratable bits here (even if
>    they are not intended to be set by end users).

I think for the bitmap field (bits 31:16), we can give them a name 
because each bit is independent. e.g.,

[16] = "intel-pt-mtc-period-encoding-0"
[17] = "intel-pt-mtc-period-encoding-1"
...
[31] = "intel-pt-mte-period-encoding-15"

the problem is bits 2:0, they are as a whole to represent one thing.
Maybe name it as

[0] = "intel-pt-addr-range-num-bit0"
[1] = "intel-pt-addr-range-num-bit1"
[2] = "intel-pt-addr-range-num-bit2"

Then, they will be displayed as feature flags when "-cpu ?", is it ok?

>> +    },
>> +
>> +    [FEAT_14_1_EBX] = {
>> +        .type = CPUID_FEATURE_WORD,
>> +        .cpuid = {
>> +            .eax = 0x14,
>> +            .needs_ecx = true, .ecx = 1,
>> +            .reg = R_EBX,
>> +        },
>> +        .migratable_flags = ~0ull,
> 
> Same observation above about flags with no names apply here.

The same as above, we can give each bit a name because they are also the 
bitmap for other fields encoding.

>> +    },
>> +
>>   };
>>   
>>   typedef struct FeatureMask {
>> @@ -1253,10 +1288,22 @@ static FeatureDep feature_dependencies[] = {
>>           .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_RDSEED },
>>           .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDSEED_EXITING },
>>       },
>> +    {
>> +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
>> +        .to = { FEAT_14_0_EBX,              ~0ull },
>> +    },
>>       {
>>           .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
>>           .to = { FEAT_14_0_ECX,              ~0ull },
>>       },
>> +    {
>> +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
>> +        .to = { FEAT_14_1_EAX,              ~0ull },
>> +    },
>> +    {
>> +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INTEL_PT },
>> +        .to = { FEAT_14_1_EBX,              ~0ull },
>> +    },
>>       {
>>           .from = { FEAT_8000_0001_EDX,       CPUID_EXT2_RDTSCP },
>>           .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDTSCP },
>> @@ -4260,7 +4307,14 @@ static void mark_unavailable_features(X86CPU *cpu, FeatureWord w, uint64_t mask,
>>           return;
>>       }
>>   
>> -    for (i = 0; i < 64; ++i) {
>> +    if ((w == FEAT_14_1_EAX) && (mask & INTEL_PT_ADDR_RANGES_NUM_MASK)) {
>> +        warn_report("%s: CPUID.14H_01H:EAX [bit 2:0]", verbose_prefix);
>> +        i = 3;
>> +    } else {
>> +        i = 0;
>> +    }
> 
> That's tricky and easy to break.  To be honest, I prefer having
> duplicate error messages showing bit 2,1,0 than having this hack.
> 
> If you want to make some range of CPUID bits special, I'd prefer
> to implement this as a FeatureWordInfo field that indicates that.

I tried to go this direction. But I didn't think up any good idea. I 
will try to think about it again.

> This way this function doesn't become a pile of special cases on
> top of each other.
> 
> In either case, I suggest addressing the duplicate error messages
> in a separate patch so this discussion doesn't block the rest.

OK. I'll split it into a separate one.

>> +
>> +    for (; i < 64; ++i) {
>>           if ((1ULL << i) & mask) {
>>               g_autofree char *feat_word_str = feature_word_description(f, i);
>>               warn_report("%s: %s%s%s [bit %d]",
>> @@ -6038,7 +6092,18 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>>           uint64_t host_feat =
>>               x86_cpu_get_supported_feature_word(w, false);
>>           uint64_t requested_features = env->features[w];
>> -        uint64_t unavailable_features = requested_features & ~host_feat;
>> +        uint64_t unavailable_features;
>> +        if (w == FEAT_14_1_EAX) {
> 
> I would love to find a way to make this more readable, but it's
> simple enough.
> 
> Maybe a `switch (w)` statement would be better, to make it easier
> to extend in the future, and discourage people from adding
> special cases that are more complex than (w == FEAT_...).
> 
> Maybe it would be good to create a
>    void x86_cpu_filter_feature_word(X86CPU *cpu, FeatureWord w)
> function to keep the code complexity under control.

good suggestion. I will implement it.

>> +            unavailable_features = (requested_features & ~host_feat) &
>> +                                   ~INTEL_PT_ADDR_RANGES_NUM_MASK;
>> +            if ((requested_features & INTEL_PT_ADDR_RANGES_NUM_MASK) >
>> +                (host_feat & INTEL_PT_ADDR_RANGES_NUM_MASK)) {
>> +                unavailable_features |= requested_features &
>> +                                        INTEL_PT_ADDR_RANGES_NUM_MASK;
>> +            }
>> +        } else {
>> +            unavailable_features = requested_features & ~host_feat;
>> +        }
>>           mark_unavailable_features(cpu, w, unavailable_features, prefix);
>>       }
>>   
> 
> I miss the cpu_x86_cpuid() changes to actual use the new feature
> words, here.  I think a cpu_x86_cpuid() change should be done in
> the same patch, or we risk having a QEMU commit where the CPU
> properties exist but do absolutely nothing.

It's intentional.

We cannot assign the new feature words to guest CPUID in cpu_x86_cpuid() 
until special handling for named_CPU_models+PT in next patch. Otherwise 
it generates different INTEL PT CPUID value comparing to old QEMU.

>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 6c50d3ab4f1d..f5478a169f9a 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -566,7 +566,10 @@ typedef enum FeatureWord {
>>       FEAT_VMX_EPT_VPID_CAPS,
>>       FEAT_VMX_BASIC,
>>       FEAT_VMX_VMFUNC,
>> +    FEAT_14_0_EBX,
>>       FEAT_14_0_ECX,
>> +    FEAT_14_1_EAX,
>> +    FEAT_14_1_EBX,
>>       FEATURE_WORDS,
>>   } FeatureWord;
>>   
>> @@ -835,8 +838,40 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
>>   /* AVX512 BFloat16 Instruction */
>>   #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
>>   
>> +/*
>> + * IA32_RTIT_CTL.CR3 filter can be set to 1 and
>> + * IA32_RTIT_CR3_MATCH can be accessed
>> + */
>> +#define CPUID_14_0_EBX_CR3_FILTER               (1U << 0)
>> +/* Support Configurable PSB and Cycle-Accurate Mode */
>> +#define CPUID_14_0_EBX_PSB                      (1U << 1)
>> +/*
>> + * Support IP Filtering, IP TraceStop, and preservation
>> + * of Intel PT MSRs across warm reset
>> + */
>> +#define CPUID_14_0_EBX_IP_FILTER                (1U << 2)
>> +/* Support MTC timing packet */
>> +#define CPUID_14_0_EBX_MTC                      (1U << 3)
>> +/* Support PTWRITE */
>> +#define CPUID_14_0_EBX_PTWRITE                  (1U << 4)
>> +/* Support Power Event Trace packet generation */
>> +#define CPUID_14_0_EBX_POWER_EVENT              (1U << 5)
>> +/* Support PSB and PMI Preservation */
>> +#define CPUID_14_0_EBX_PSB_PMI_PRESERVATION     (1U << 6)
>> +
>> +/* Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 */
>> +#define CPUID_14_0_ECX_TOPA                     (1U << 0)
>> +/*
>> + * ToPA tables can hold any number of output entries, up to the maximum allowed
>> + * by the MaskOrTableOffset field of IA32_RTIT_OUTPUT_MASK_PTRS
>> + */
>> +#define CPUID_14_0_ECX_MULTI_ENTRIES            (1U << 1)
>> +/* Support Single-Range Output scheme */
>> +#define CPUID_14_0_ECX_SINGLE_RANGE             (1U << 2)
>> +/* Support IA32_RTIT_CTL.FabricEn */
>> +#define CPUID_14_0_ECX_TRACE_TRANS_SUBSYSTEM    (1U << 3)
>>   /* Packets which contain IP payload have LIP values */
>> -#define CPUID_14_0_ECX_LIP              (1U << 31)
>> +#define CPUID_14_0_ECX_LIP                      (1U << 31)
>>   
>>   /* CLZERO instruction */
>>   #define CPUID_8000_0008_EBX_CLZERO      (1U << 0)
>> -- 
>> 2.27.0
>>
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
  2021-10-15 20:22   ` Eduardo Habkost
@ 2021-10-17 10:37     ` Xiaoyao Li
  2021-10-18  3:46     ` Xiaoyao Li
  1 sibling, 0 replies; 15+ messages in thread
From: Xiaoyao Li @ 2021-10-17 10:37 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On 10/16/2021 4:22 AM, Eduardo Habkost wrote:
> On Thu, Sep 09, 2021 at 10:41:48PM +0800, Xiaoyao Li wrote:
>> commit e37a5c7fa459 ("i386: Add Intel Processor Trace feature support")
>> added the support of Intel PT by making CPUID[14] of PT as fixed feature
>> set (from ICX) for any CPU model on any host.
>>
>> This truly breaks the PT exposing on Intel SPR platform because SPR has
>> less supported bitmap CPUID(0x14,1):EBX[15:0] than ICX.
>>
>> This patch enables passing through host's PT capabilities for the case
>> "-cpu host/max" while ensure named CPU model still has the fixed
>> PT feature set to not break the live migration.
>>
>> It introduces @has_specific_intel_pt_feature_set field for name CPU
>> model. If a named CPU model has this field as false, it will use fixed
>> PT feature set of ICX. Besides same to previous behavior, if fixed PT
>> feature set of ICX cannot be satisfied/supported by host, it disables PT
>> instead of adjusting the feature set based on host's capabilities.
>>
>> In the future, new named CPU model, e.g., Sapphire Rapids, can define
>> its own PT feature set by setting @has_specific_intel_pt_feature_set to
>> true and defines it's own FEAT_14_0_EBX, FEAT_14_0_ECX, FEAT_14_1_EAX
>> and FEAT_14_1_EBX.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   target/i386/cpu.c | 106 ++++++++++++++++++++--------------------------
>>   target/i386/cpu.h |   1 +
>>   2 files changed, 47 insertions(+), 60 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 58e98210f219..00c4ad23110d 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -543,34 +543,24 @@ static CPUCacheInfo legacy_l3_cache = {
>>   #define L2_ITLB_4K_ASSOC       4
>>   #define L2_ITLB_4K_ENTRIES   512
>>   
>> -/* CPUID Leaf 0x14 constants: */
>> -#define INTEL_PT_MAX_SUBLEAF     0x1
>> -/*
>> - * bit[00]: IA32_RTIT_CTL.CR3 filter can be set to 1 and IA32_RTIT_CR3_MATCH
>> - *          MSR can be accessed;
>> - * bit[01]: Support Configurable PSB and Cycle-Accurate Mode;
>> - * bit[02]: Support IP Filtering, TraceStop filtering, and preservation
>> - *          of Intel PT MSRs across warm reset;
>> - * bit[03]: Support MTC timing packet and suppression of COFI-based packets;
>> - */
>> -#define INTEL_PT_MINIMAL_EBX     0xf
>> -/*
>> - * bit[00]: Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 and
>> - *          IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be
>> - *          accessed;
>> - * bit[01]: ToPA tables can hold any number of output entries, up to the
>> - *          maximum allowed by the MaskOrTableOffset field of
>> - *          IA32_RTIT_OUTPUT_MASK_PTRS;
>> - * bit[02]: Support Single-Range Output scheme;
>> - */
>> -#define INTEL_PT_MINIMAL_ECX     0x7
>> -/* generated packets which contain IP payloads have LIP values */
>> -#define INTEL_PT_IP_LIP          (1 << 31)
>> -#define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges */
>> -#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
>> -#define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
>> -#define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
>> -#define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
>> +#define INTEL_PT_MAX_SUBLEAF                0x1
>> +#define INTEL_PT_DEFAULT_ADDR_RANGES_NUM    0x2
>> +#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
>> +/* Support ART(0,3,6,9) */
>> +#define INTEL_PT_DEFAULT_MTC_BITMAP         0x0249
>> +/* Support 0,2^(0~11) */
>> +#define INTEL_PT_DEFAULT_CYCLE_BITMAP       0x1fff
>> +/* Support 2K,4K,8K,16K,32K,64K */
>> +#define INTEL_PT_DEFAULT_PSB_BITMAP         0x003f
>> +
>> +#define INTEL_PT_DEFAULT_0_EBX  (CPUID_14_0_EBX_CR3_FILTER | \
>> +            CPUID_14_0_EBX_PSB | CPUID_14_0_EBX_IP_FILTER | CPUID_14_0_EBX_MTC)
>> +#define INTEL_PT_DEFAULT_0_ECX  (CPUID_14_0_ECX_TOPA | \
>> +            CPUID_14_0_ECX_MULTI_ENTRIES | CPUID_14_0_ECX_SINGLE_RANGE)
>> +#define INTEL_PT_DEFAULT_1_EAX  (INTEL_PT_DEFAULT_MTC_BITMAP << 16 | \
>> +                                 INTEL_PT_DEFAULT_ADDR_RANGES_NUM)
>> +#define INTEL_PT_DEFAULT_1_EBX  (INTEL_PT_DEFAULT_PSB_BITMAP << 16 | \
>> +                                 INTEL_PT_DEFAULT_CYCLE_BITMAP)
> 
> I like these new macros because they make the code at
> x86_cpu_filter_features() clearer.  Can we do this in a separate
> patch, to be applied before "Introduce FeatureWordInfo for Intel
> PT CPUID leaf 0xD"?
> 
>>   
>>   void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
>>                                 uint32_t vendor2, uint32_t vendor3)
>> @@ -1517,6 +1507,7 @@ typedef struct X86CPUDefinition {
>>       int family;
>>       int model;
>>       int stepping;
>> +    bool has_specific_intel_pt_feature_set;
>>       FeatureWordArray features;
>>       const char *model_id;
>>       const CPUCaches *const cache_info;
>> @@ -5061,6 +5052,14 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model)
>>           env->features[w] = def->features[w];
>>       }
>>   
>> +    if (!def->has_specific_intel_pt_feature_set) {
>> +        env->use_default_intel_pt = true;
>> +        env->features[FEAT_14_0_EBX] = INTEL_PT_DEFAULT_0_EBX;
>> +        env->features[FEAT_14_0_ECX] = INTEL_PT_DEFAULT_0_ECX;
>> +        env->features[FEAT_14_1_EAX] = INTEL_PT_DEFAULT_1_EAX;
>> +        env->features[FEAT_14_1_EBX] = INTEL_PT_DEFAULT_1_EBX;
>> +    }
> 
> There's nothing special about intel-pt, and other features might
> benefit from having default values set if omitted from the CPU
> model definition.  I don't know yet what's the best solution
> here, but some possibilities are:
> 
> A) The simpler and more obvious solution: just set
>    features[FEAT_14_*] explicitly to INTEL_PT_DEFAULT_* on all CPU
>    models in builtin_x86_defs.
>
> B) Treat (env->features[FEAT_14_0_EBX] == 0 &&
>            env->features[FEAT_14_0_ECX] == 0 &&
>            env->features[FEAT_14_1_EAX] == 0 &&
>            env->features[FEAT_14_1_EBX] == 0) as a special case
>    that indicates that INTEL_PT_DEFAULT_* should be used instead
>    of 0.
> 
> C) Rework X86CPUDefinition completely and replace
>      FeatureWordArray features;
>    with something like:
>      struct {
>        FeatureWord w;
>        uint32_t value;
>      } *features;
>    or:
>      struct {
>        const char *property, *value;
>      } *features;
>    so we can have a concept of omitted/default flags in
>    builtin_x86_defs.
> 
> (A) and (C) are generic but require a lot more work, so we could
> keep your solution or just implement (B).

yew, I can implement B. Maybe for B), we can just use

	(env->features[FEAT_14_0_EBX] == 0)

for simplicity?

> In either case, I suggest you add a comment explaining why the
> boolean flag or special case exists (I believe the answer is:
> "because all CPU models have the same default value for
> INTEL_PT_* and we don't want to manually copy/paste it to all
> entries in builtin_x86_defs").

yes, I can add the comment.

(It's just because we want to keep compatible with old QEMU, which has a 
not-so-good design of INTEL_PT feature set)

> 
>> +
>>       /* legacy-cache defaults to 'off' if CPU model provides cache info */
>>       cpu->legacy_cache = !def->cache_info;
>>   
>> @@ -5465,14 +5464,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>>   
>>           if (count == 0) {
>>               *eax = INTEL_PT_MAX_SUBLEAF;
>> -            *ebx = INTEL_PT_MINIMAL_EBX;
>> -            *ecx = INTEL_PT_MINIMAL_ECX;
>> -            if (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP) {
>> -                *ecx |= CPUID_14_0_ECX_LIP;
>> -            }
>> +            *ebx = env->features[FEAT_14_0_EBX];
>> +            *ecx = env->features[FEAT_14_0_ECX];
>>           } else if (count == 1) {
>> -            *eax = INTEL_PT_MTC_BITMAP | INTEL_PT_ADDR_RANGES_NUM;
>> -            *ebx = INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP;
>> +            *eax = env->features[FEAT_14_1_EAX];
>> +            *ebx = env->features[FEAT_14_1_EBX];
>>           }
>>           break;
>>       }
>> @@ -6081,6 +6077,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>>       CPUX86State *env = &cpu->env;
>>       FeatureWord w;
>>       const char *prefix = NULL;
>> +    uint64_t host_feat;
>>   
>>       if (verbose) {
>>           prefix = accel_uses_host_cpuid()
>> @@ -6089,8 +6086,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>>       }
>>   
>>       for (w = 0; w < FEATURE_WORDS; w++) {
>> -        uint64_t host_feat =
>> -            x86_cpu_get_supported_feature_word(w, false);
>> +        host_feat = x86_cpu_get_supported_feature_word(w, false);
> 
> This doesn't look right.  The value of host_feat set here is
> useless outside this for loop, and there's no reason to change
> the scope of this variable.

I just don't want to define host_feat twice in different sub-block.

I don't think there is any problem I move the define of host_feat out of 
the loop. host_feat is just the variable to store the value when query 
host/kvm supported value for specific feature word.

In the later code, I need to query the value for FEAT_14_0_ECX again.

>>           uint64_t requested_features = env->features[w];
>>           uint64_t unavailable_features;
>>           if (w == FEAT_14_1_EAX) {
>> @@ -6107,32 +6103,22 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>>           mark_unavailable_features(cpu, w, unavailable_features, prefix);
>>       }
>>   
>> -    if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
>> -        kvm_enabled()) {
>> -        KVMState *s = CPU(cpu)->kvm_state;
>> -        uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
>> -        uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
>> -        uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
>> -        uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
>> -        uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
>> +    if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
>> +        host_feat = x86_cpu_get_supported_feature_word(FEAT_14_0_ECX, false);
> 
> This doesn't look right.  You seem to be using the same variable
> (host_feat) for two completely different purposes.

on the former, it's

	host_feat = x86_cpu_get_supported_feature_word(w, false);

here, it's

	host_feat = x86_cpu_get_supported_feature_word(FEAT_14_0_ECX, false);

why different?

>>   
>> -        if (!eax_0 ||
>> -           ((ebx_0 & INTEL_PT_MINIMAL_EBX) != INTEL_PT_MINIMAL_EBX) ||
>> -           ((ecx_0 & INTEL_PT_MINIMAL_ECX) != INTEL_PT_MINIMAL_ECX) ||
>> -           ((eax_1 & INTEL_PT_MTC_BITMAP) != INTEL_PT_MTC_BITMAP) ||
>> -           ((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) <
>> -                                           INTEL_PT_ADDR_RANGES_NUM) ||
>> -           ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) !=
>> -                (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ||
>> -           ((ecx_0 & CPUID_14_0_ECX_LIP) !=
>> -                (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP))) {
>> -            /*
>> -             * Processor Trace capabilities aren't configurable, so if the
>> -             * host can't emulate the capabilities we report on
>> -             * cpu_x86_cpuid(), intel-pt can't be enabled on the current host.
>> -             */
>> +        if (env->use_default_intel_pt &&
>> +            ((env->features[FEAT_14_0_EBX] != INTEL_PT_DEFAULT_0_EBX) ||
>> +             ((env->features[FEAT_14_0_ECX] & ~CPUID_14_0_ECX_LIP) !=
>> +              INTEL_PT_DEFAULT_0_ECX) ||
>> +             (env->features[FEAT_14_1_EAX] != INTEL_PT_DEFAULT_1_EAX) ||
>> +             (env->features[FEAT_14_1_EBX] != INTEL_PT_DEFAULT_1_EBX))) {
>>               mark_unavailable_features(cpu, FEAT_7_0_EBX, CPUID_7_0_EBX_INTEL_PT, prefix);
> 
> Why is use_default_intel_pt necessary?  Why can't this function
> simply validate whatever is inside env->features[FEAT_14_*]?

Because this handling is only to keep it consistent with old QEMU. you 
can see that it validate against INTEL_PT_DEFAULT_* , which is the PT 
set of ICX. If remove use_default_intel_pt check, it will also apply the 
check on other CPU models (e.g., in patch 4) that have different 
INTEL_PT set.

>>           }
>> +
>> +        if ((env->features[FEAT_14_0_ECX] ^ host_feat) & CPUID_14_0_ECX_LIP) {
> 
> What if CPUID_7_0_EBX_INTEL_PT is not set?  What should be the
> value of host_feat?

Above all handling is executed under the condition of

	if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {

>> +            warn_report("Cannot configure different Intel PT IP payload format than hardware");
>> +            mark_unavailable_features(cpu, FEAT_7_0_EBX, CPUID_7_0_EBX_INTEL_PT, NULL);
>> +        }
>>       }
>>   }
>>   
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index f5478a169f9a..e6236c371c4f 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -1639,6 +1639,7 @@ typedef struct CPUX86State {
>>       uint32_t cpuid_vendor2;
>>       uint32_t cpuid_vendor3;
>>       uint32_t cpuid_version;
>> +    bool use_default_intel_pt;
>>       FeatureWordArray features;
>>       /* Features that were explicitly enabled/disabled */
>>       FeatureWordArray user_features;
>> -- 
>> 2.27.0
>>
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
  2021-10-15 20:22   ` Eduardo Habkost
  2021-10-17 10:37     ` Xiaoyao Li
@ 2021-10-18  3:46     ` Xiaoyao Li
  2021-10-18  5:37       ` Xiaoyao Li
  1 sibling, 1 reply; 15+ messages in thread
From: Xiaoyao Li @ 2021-10-18  3:46 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On 10/16/2021 4:22 AM, Eduardo Habkost wrote:
> On Thu, Sep 09, 2021 at 10:41:48PM +0800, Xiaoyao Li wrote:
>> commit e37a5c7fa459 ("i386: Add Intel Processor Trace feature support")
>> added the support of Intel PT by making CPUID[14] of PT as fixed feature
>> set (from ICX) for any CPU model on any host.
>>
>> This truly breaks the PT exposing on Intel SPR platform because SPR has
>> less supported bitmap CPUID(0x14,1):EBX[15:0] than ICX.
>>
>> This patch enables passing through host's PT capabilities for the case
>> "-cpu host/max" while ensure named CPU model still has the fixed
>> PT feature set to not break the live migration.
>>
>> It introduces @has_specific_intel_pt_feature_set field for name CPU
>> model. If a named CPU model has this field as false, it will use fixed
>> PT feature set of ICX. Besides same to previous behavior, if fixed PT
>> feature set of ICX cannot be satisfied/supported by host, it disables PT
>> instead of adjusting the feature set based on host's capabilities.
>>
>> In the future, new named CPU model, e.g., Sapphire Rapids, can define
>> its own PT feature set by setting @has_specific_intel_pt_feature_set to
>> true and defines it's own FEAT_14_0_EBX, FEAT_14_0_ECX, FEAT_14_1_EAX
>> and FEAT_14_1_EBX.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   target/i386/cpu.c | 106 ++++++++++++++++++++--------------------------
>>   target/i386/cpu.h |   1 +
>>   2 files changed, 47 insertions(+), 60 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 58e98210f219..00c4ad23110d 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -543,34 +543,24 @@ static CPUCacheInfo legacy_l3_cache = {
>>   #define L2_ITLB_4K_ASSOC       4
>>   #define L2_ITLB_4K_ENTRIES   512
>>   
>> -/* CPUID Leaf 0x14 constants: */
>> -#define INTEL_PT_MAX_SUBLEAF     0x1
>> -/*
>> - * bit[00]: IA32_RTIT_CTL.CR3 filter can be set to 1 and IA32_RTIT_CR3_MATCH
>> - *          MSR can be accessed;
>> - * bit[01]: Support Configurable PSB and Cycle-Accurate Mode;
>> - * bit[02]: Support IP Filtering, TraceStop filtering, and preservation
>> - *          of Intel PT MSRs across warm reset;
>> - * bit[03]: Support MTC timing packet and suppression of COFI-based packets;
>> - */
>> -#define INTEL_PT_MINIMAL_EBX     0xf
>> -/*
>> - * bit[00]: Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 and
>> - *          IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be
>> - *          accessed;
>> - * bit[01]: ToPA tables can hold any number of output entries, up to the
>> - *          maximum allowed by the MaskOrTableOffset field of
>> - *          IA32_RTIT_OUTPUT_MASK_PTRS;
>> - * bit[02]: Support Single-Range Output scheme;
>> - */
>> -#define INTEL_PT_MINIMAL_ECX     0x7
>> -/* generated packets which contain IP payloads have LIP values */
>> -#define INTEL_PT_IP_LIP          (1 << 31)
>> -#define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges */
>> -#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
>> -#define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
>> -#define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
>> -#define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
>> +#define INTEL_PT_MAX_SUBLEAF                0x1
>> +#define INTEL_PT_DEFAULT_ADDR_RANGES_NUM    0x2
>> +#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
>> +/* Support ART(0,3,6,9) */
>> +#define INTEL_PT_DEFAULT_MTC_BITMAP         0x0249
>> +/* Support 0,2^(0~11) */
>> +#define INTEL_PT_DEFAULT_CYCLE_BITMAP       0x1fff
>> +/* Support 2K,4K,8K,16K,32K,64K */
>> +#define INTEL_PT_DEFAULT_PSB_BITMAP         0x003f
>> +
>> +#define INTEL_PT_DEFAULT_0_EBX  (CPUID_14_0_EBX_CR3_FILTER | \
>> +            CPUID_14_0_EBX_PSB | CPUID_14_0_EBX_IP_FILTER | CPUID_14_0_EBX_MTC)
>> +#define INTEL_PT_DEFAULT_0_ECX  (CPUID_14_0_ECX_TOPA | \
>> +            CPUID_14_0_ECX_MULTI_ENTRIES | CPUID_14_0_ECX_SINGLE_RANGE)
>> +#define INTEL_PT_DEFAULT_1_EAX  (INTEL_PT_DEFAULT_MTC_BITMAP << 16 | \
>> +                                 INTEL_PT_DEFAULT_ADDR_RANGES_NUM)
>> +#define INTEL_PT_DEFAULT_1_EBX  (INTEL_PT_DEFAULT_PSB_BITMAP << 16 | \
>> +                                 INTEL_PT_DEFAULT_CYCLE_BITMAP)
> 
> I like these new macros because they make the code at
> x86_cpu_filter_features() clearer.  Can we do this in a separate
> patch, to be applied before "Introduce FeatureWordInfo for Intel
> PT CPUID leaf 0xD"?
> 

Before?

These macros needs the individual CPUID_14_0_* definitions defined in 
"Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD".

1) I can split the definitions of the CPUID bit from that patch and 
merge with the macros into a new patch.

2) create a new patch to introduce those macros after "Introduce 
FeatureWordInfo for Intel PT CPUID leaf 0xD"

which do you prefer?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
  2021-10-18  3:46     ` Xiaoyao Li
@ 2021-10-18  5:37       ` Xiaoyao Li
  2021-10-20 14:40         ` Eduardo Habkost
  0 siblings, 1 reply; 15+ messages in thread
From: Xiaoyao Li @ 2021-10-18  5:37 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On 10/18/2021 11:46 AM, Xiaoyao Li wrote:
> On 10/16/2021 4:22 AM, Eduardo Habkost wrote:
>> On Thu, Sep 09, 2021 at 10:41:48PM +0800, Xiaoyao Li wrote:
>>> commit e37a5c7fa459 ("i386: Add Intel Processor Trace feature support")
>>> added the support of Intel PT by making CPUID[14] of PT as fixed feature
>>> set (from ICX) for any CPU model on any host.
>>>
>>> This truly breaks the PT exposing on Intel SPR platform because SPR has
>>> less supported bitmap CPUID(0x14,1):EBX[15:0] than ICX.
>>>
>>> This patch enables passing through host's PT capabilities for the case
>>> "-cpu host/max" while ensure named CPU model still has the fixed
>>> PT feature set to not break the live migration.
>>>
>>> It introduces @has_specific_intel_pt_feature_set field for name CPU
>>> model. If a named CPU model has this field as false, it will use fixed
>>> PT feature set of ICX. Besides same to previous behavior, if fixed PT
>>> feature set of ICX cannot be satisfied/supported by host, it disables PT
>>> instead of adjusting the feature set based on host's capabilities.
>>>
>>> In the future, new named CPU model, e.g., Sapphire Rapids, can define
>>> its own PT feature set by setting @has_specific_intel_pt_feature_set to
>>> true and defines it's own FEAT_14_0_EBX, FEAT_14_0_ECX, FEAT_14_1_EAX
>>> and FEAT_14_1_EBX.
>>>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> ---
>>>   target/i386/cpu.c | 106 ++++++++++++++++++++--------------------------
>>>   target/i386/cpu.h |   1 +
>>>   2 files changed, 47 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>> index 58e98210f219..00c4ad23110d 100644
>>> --- a/target/i386/cpu.c
>>> +++ b/target/i386/cpu.c
>>> @@ -543,34 +543,24 @@ static CPUCacheInfo legacy_l3_cache = {
>>>   #define L2_ITLB_4K_ASSOC       4
>>>   #define L2_ITLB_4K_ENTRIES   512
>>> -/* CPUID Leaf 0x14 constants: */
>>> -#define INTEL_PT_MAX_SUBLEAF     0x1
>>> -/*
>>> - * bit[00]: IA32_RTIT_CTL.CR3 filter can be set to 1 and 
>>> IA32_RTIT_CR3_MATCH
>>> - *          MSR can be accessed;
>>> - * bit[01]: Support Configurable PSB and Cycle-Accurate Mode;
>>> - * bit[02]: Support IP Filtering, TraceStop filtering, and preservation
>>> - *          of Intel PT MSRs across warm reset;
>>> - * bit[03]: Support MTC timing packet and suppression of COFI-based 
>>> packets;
>>> - */
>>> -#define INTEL_PT_MINIMAL_EBX     0xf
>>> -/*
>>> - * bit[00]: Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 and
>>> - *          IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS 
>>> MSRs can be
>>> - *          accessed;
>>> - * bit[01]: ToPA tables can hold any number of output entries, up to 
>>> the
>>> - *          maximum allowed by the MaskOrTableOffset field of
>>> - *          IA32_RTIT_OUTPUT_MASK_PTRS;
>>> - * bit[02]: Support Single-Range Output scheme;
>>> - */
>>> -#define INTEL_PT_MINIMAL_ECX     0x7
>>> -/* generated packets which contain IP payloads have LIP values */
>>> -#define INTEL_PT_IP_LIP          (1 << 31)
>>> -#define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable 
>>> address ranges */
>>> -#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
>>> -#define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support 
>>> ART(0,3,6,9) */
>>> -#define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 
>>> 0,2^(0~11) */
>>> -#define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 
>>> 2K,4K,8K,16K,32K,64K */
>>> +#define INTEL_PT_MAX_SUBLEAF                0x1
>>> +#define INTEL_PT_DEFAULT_ADDR_RANGES_NUM    0x2
>>> +#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
>>> +/* Support ART(0,3,6,9) */
>>> +#define INTEL_PT_DEFAULT_MTC_BITMAP         0x0249
>>> +/* Support 0,2^(0~11) */
>>> +#define INTEL_PT_DEFAULT_CYCLE_BITMAP       0x1fff
>>> +/* Support 2K,4K,8K,16K,32K,64K */
>>> +#define INTEL_PT_DEFAULT_PSB_BITMAP         0x003f
>>> +
>>> +#define INTEL_PT_DEFAULT_0_EBX  (CPUID_14_0_EBX_CR3_FILTER | \
>>> +            CPUID_14_0_EBX_PSB | CPUID_14_0_EBX_IP_FILTER | 
>>> CPUID_14_0_EBX_MTC)
>>> +#define INTEL_PT_DEFAULT_0_ECX  (CPUID_14_0_ECX_TOPA | \
>>> +            CPUID_14_0_ECX_MULTI_ENTRIES | CPUID_14_0_ECX_SINGLE_RANGE)
>>> +#define INTEL_PT_DEFAULT_1_EAX  (INTEL_PT_DEFAULT_MTC_BITMAP << 16 | \
>>> +                                 INTEL_PT_DEFAULT_ADDR_RANGES_NUM)
>>> +#define INTEL_PT_DEFAULT_1_EBX  (INTEL_PT_DEFAULT_PSB_BITMAP << 16 | \
>>> +                                 INTEL_PT_DEFAULT_CYCLE_BITMAP)
>>
>> I like these new macros because they make the code at
>> x86_cpu_filter_features() clearer.  

I tried it. But I find it doesn't make the code at 
x86_cpu_filter_features() clearer. It just introduces more code churn.

>> Can we do this in a separate
>> patch, to be applied before "Introduce FeatureWordInfo for Intel
>> PT CPUID leaf 0xD"?
>>
> 
> Before?
> 
> These macros needs the individual CPUID_14_0_* definitions defined in 
> "Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD".
> 
> 1) I can split the definitions of the CPUID bit from that patch and 
> merge with the macros into a new patch.
> 
> 2) create a new patch to introduce those macros after "Introduce 
> FeatureWordInfo for Intel PT CPUID leaf 0xD"
> 
> which do you prefer?
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
  2021-10-18  5:37       ` Xiaoyao Li
@ 2021-10-20 14:40         ` Eduardo Habkost
  0 siblings, 0 replies; 15+ messages in thread
From: Eduardo Habkost @ 2021-10-20 14:40 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On Mon, Oct 18, 2021 at 01:37:12PM +0800, Xiaoyao Li wrote:
> On 10/18/2021 11:46 AM, Xiaoyao Li wrote:
> > On 10/16/2021 4:22 AM, Eduardo Habkost wrote:
> > > On Thu, Sep 09, 2021 at 10:41:48PM +0800, Xiaoyao Li wrote:
[...]
> > > > +#define INTEL_PT_DEFAULT_0_EBX  (CPUID_14_0_EBX_CR3_FILTER | \
> > > > +            CPUID_14_0_EBX_PSB | CPUID_14_0_EBX_IP_FILTER |
> > > > CPUID_14_0_EBX_MTC)
> > > > +#define INTEL_PT_DEFAULT_0_ECX  (CPUID_14_0_ECX_TOPA | \
> > > > +            CPUID_14_0_ECX_MULTI_ENTRIES | CPUID_14_0_ECX_SINGLE_RANGE)
> > > > +#define INTEL_PT_DEFAULT_1_EAX  (INTEL_PT_DEFAULT_MTC_BITMAP << 16 | \
> > > > +                                 INTEL_PT_DEFAULT_ADDR_RANGES_NUM)
> > > > +#define INTEL_PT_DEFAULT_1_EBX  (INTEL_PT_DEFAULT_PSB_BITMAP << 16 | \
> > > > +                                 INTEL_PT_DEFAULT_CYCLE_BITMAP)
> > > 
> > > I like these new macros because they make the code at
> > > x86_cpu_filter_features() clearer.
> 
> I tried it. But I find it doesn't make the code at x86_cpu_filter_features()
> clearer. It just introduces more code churn.

Don't worry, this is not a requirement for getting the code
accepted, but just a suggestion to make the more complex parts of
the series smaller and easier to review.

-- 
Eduardo



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-20 14:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09 14:41 [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li
2021-09-09 14:41 ` [RFC PATCH 1/5] target/i386: Print CPUID subleaf info for unsupported feature Xiaoyao Li
2021-10-15 15:12   ` Eduardo Habkost
2021-09-09 14:41 ` [RFC PATCH 2/5] target/i386: Introduce FeatureWordInfo for Intel PT CPUID leaf 0xD Xiaoyao Li
2021-10-15 16:04   ` Eduardo Habkost
2021-10-17  7:53     ` Xiaoyao Li
2021-09-09 14:41 ` [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT Xiaoyao Li
2021-10-15 20:22   ` Eduardo Habkost
2021-10-17 10:37     ` Xiaoyao Li
2021-10-18  3:46     ` Xiaoyao Li
2021-10-18  5:37       ` Xiaoyao Li
2021-10-20 14:40         ` Eduardo Habkost
2021-09-09 14:41 ` [RFC PATCH 4/5] target/i386: Define specific PT feature set for IceLake-server and Snowridge Xiaoyao Li
2021-09-09 14:41 ` [RFC PATCH 5/5] target/i386: Access MSR_IA32_RTIT_ADDRn based on guest CPUID configuration Xiaoyao Li
2021-09-26  5:21 ` [RFC PATCH 0/5] Make Intel PT configurable Xiaoyao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).