kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] Introduce and test masked events
@ 2022-09-20 17:45 Aaron Lewis
  2022-09-20 17:45 ` [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup Aaron Lewis
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:45 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

This series introduces the concept of masked events to the pmu event
filter. Masked events can help reduce the number of events needed in the
pmu event filter by allowing a more generalized matching method to be
used for the unit mask when filtering guest events in the pmu.  With
masked events, if an event select should be restricted from the guest,
instead of having to add an entry to the pmu event filter for each
event select + unit mask pair, a masked event can be added to generalize
the unit mask values.

v4 -> v5
 - Patch #3, Simplified the return logic in filter_contains_match(). [Jim]
 - Patch #4, Corrected documentation for errors and returns. [Jim]
 - Patch #6, Added TEST_REQUIRE for KVM_CAP_PMU_EVENT_MASKED_EVENTS. [Jim]
 - Patch #7,
     - Removed TEST_REQUIRE for KVM_CAP_PMU_EVENT_MASKED_EVENTS (moved it
       to patch #6).
     - Changed the assert to a branch in supports_event_mem_inst_retired().
       [Jim]
     - Added a check to make sure 3 GP counters are available. [Jim]

v3 -> v4
 - Patch #1, Fix the mask for the guest event select used in bsearch.
 - Patch #2, Remove invalid events from the pmu event filter.
 - Patch #3, Create an internal/common representation for filter events.
 - Patch #4,
     - Use a common filter event to simplify kernel code. [Jim]
     - s/invalid/exclude for masked events. More descriptive. [Sean]
     - Simplified masked event layout. There was no need to complicate it.
     - Add KVM_CAP_PMU_EVENT_MASKED_EVENTS.
 - Patch #7, Rewrote the masked events tests using MEM_INST_RETIRED (0xd0)
   on Intel and LS Dispatch (0x29) on AMD. They have multiple unit masks
   each which were leveraged for improved masked events testing.

v2 -> v3
 - Reworked and documented the invert flag usage.  It was possible to
   get ambiguous results when using it.  That should not be possible
   now.
 - Added testing for inverted masked events to validate the updated
   implementation.
 - Removed testing for inverted masked events from the masked events test.
   They were meaningless with the updated implementation.  More meaning
   tests were added separately.

v1 -> v2
 - Made has_invalid_event() static to fix warning.
 - Fixed checkpatch.pl errors and warnings.
 - Updated to account for KVM_X86_PMU_OP().

Aaron Lewis (7):
  kvm: x86/pmu: Correct the mask used in a pmu event filter lookup
  kvm: x86/pmu: Remove invalid raw events from the pmu event filter
  kvm: x86/pmu: prepare the pmu event filter for masked events
  kvm: x86/pmu: Introduce masked events to the pmu event filter
  selftests: kvm/x86: Add flags when creating a pmu event filter
  selftests: kvm/x86: Add testing for KVM_SET_PMU_EVENT_FILTER
  selftests: kvm/x86: Test masked events

 Documentation/virt/kvm/api.rst                |  82 +++-
 arch/x86/include/asm/kvm-x86-pmu-ops.h        |   1 +
 arch/x86/include/uapi/asm/kvm.h               |  28 ++
 arch/x86/kvm/pmu.c                            | 262 ++++++++++--
 arch/x86/kvm/pmu.h                            |  39 ++
 arch/x86/kvm/svm/pmu.c                        |   6 +
 arch/x86/kvm/vmx/pmu_intel.c                  |   6 +
 arch/x86/kvm/x86.c                            |   1 +
 include/uapi/linux/kvm.h                      |   1 +
 .../kvm/x86_64/pmu_event_filter_test.c        | 387 +++++++++++++++++-
 10 files changed, 771 insertions(+), 42 deletions(-)

-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup
  2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
@ 2022-09-20 17:45 ` Aaron Lewis
  2022-10-07 23:25   ` Sean Christopherson
  2022-09-20 17:45 ` [PATCH v5 2/7] kvm: x86/pmu: Remove invalid raw events from the pmu event filter Aaron Lewis
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:45 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

When checking if a pmu event the guest is attempting to program should
be filtered, only consider the event select + unit mask in that
decision. Use an architecture specific mask to mask out all other bits,
including bits 35:32 on Intel.  Those bits are not part of the event
select and should not be considered in that decision.

Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/kvm-x86-pmu-ops.h |  1 +
 arch/x86/kvm/pmu.c                     | 15 ++++++++++++++-
 arch/x86/kvm/pmu.h                     |  1 +
 arch/x86/kvm/svm/pmu.c                 |  6 ++++++
 arch/x86/kvm/vmx/pmu_intel.c           |  6 ++++++
 5 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index c17e3e96fc1d..e0280cc3e6e4 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -24,6 +24,7 @@ KVM_X86_PMU_OP(set_msr)
 KVM_X86_PMU_OP(refresh)
 KVM_X86_PMU_OP(init)
 KVM_X86_PMU_OP(reset)
+KVM_X86_PMU_OP(get_eventsel_event_mask)
 KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
 KVM_X86_PMU_OP_OPTIONAL(cleanup)
 
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 02f9e4f245bd..98f383789579 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -247,6 +247,19 @@ static int cmp_u64(const void *pa, const void *pb)
 	return (a > b) - (a < b);
 }
 
+static inline u64 get_event_select(u64 eventsel)
+{
+	return eventsel & static_call(kvm_x86_pmu_get_eventsel_event_mask)();
+}
+
+static inline u64 get_raw_event(u64 eventsel)
+{
+	u64 event_select = get_event_select(eventsel);
+	u64 unit_mask = eventsel & ARCH_PERFMON_EVENTSEL_UMASK;
+
+	return event_select | unit_mask;
+}
+
 static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu_event_filter *filter;
@@ -263,7 +276,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 		goto out;
 
 	if (pmc_is_gp(pmc)) {
-		key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
+		key = get_raw_event(pmc->eventsel);
 		if (bsearch(&key, filter->events, filter->nevents,
 			    sizeof(__u64), cmp_u64))
 			allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 5cc5721f260b..4e22f9f55400 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -40,6 +40,7 @@ struct kvm_pmu_ops {
 	void (*reset)(struct kvm_vcpu *vcpu);
 	void (*deliver_pmi)(struct kvm_vcpu *vcpu);
 	void (*cleanup)(struct kvm_vcpu *vcpu);
+	u64 (*get_eventsel_event_mask)(void);
 };
 
 void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index f24613a108c5..0b35eb04aa60 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -294,6 +294,11 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
 	}
 }
 
+static u64 amd_pmu_get_eventsel_event_mask(void)
+{
+	return AMD64_EVENTSEL_EVENT;
+}
+
 struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.hw_event_available = amd_hw_event_available,
 	.pmc_is_enabled = amd_pmc_is_enabled,
@@ -307,4 +312,5 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.refresh = amd_pmu_refresh,
 	.init = amd_pmu_init,
 	.reset = amd_pmu_reset,
+	.get_eventsel_event_mask = amd_pmu_get_eventsel_event_mask,
 };
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index c399637a3a79..0aec7576af0c 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -793,6 +793,11 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
 	}
 }
 
+static u64 intel_pmu_get_eventsel_event_mask(void)
+{
+	return ARCH_PERFMON_EVENTSEL_EVENT;
+}
+
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.hw_event_available = intel_hw_event_available,
 	.pmc_is_enabled = intel_pmc_is_enabled,
@@ -808,4 +813,5 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.reset = intel_pmu_reset,
 	.deliver_pmi = intel_pmu_deliver_pmi,
 	.cleanup = intel_pmu_cleanup,
+	.get_eventsel_event_mask = intel_pmu_get_eventsel_event_mask,
 };
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v5 2/7] kvm: x86/pmu: Remove invalid raw events from the pmu event filter
  2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
  2022-09-20 17:45 ` [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup Aaron Lewis
@ 2022-09-20 17:45 ` Aaron Lewis
  2022-10-07 23:48   ` Sean Christopherson
  2022-09-20 17:45 ` [PATCH v5 3/7] kvm: x86/pmu: prepare the pmu event filter for masked events Aaron Lewis
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:45 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

If a raw event is invalid, i.e. bits set outside the event select +
unit mask, the event will never match the search, so it's pointless
to have it in the list.  Opt for a shorter list by removing invalid
raw events.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/pmu.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 98f383789579..e7d94e6b7f28 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -577,6 +577,38 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
 }
 EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
 
+static inline u64 get_event_filter_mask(void)
+{
+	u64 event_select_mask =
+		static_call(kvm_x86_pmu_get_eventsel_event_mask)();
+
+	return event_select_mask | ARCH_PERFMON_EVENTSEL_UMASK;
+}
+
+static inline bool is_event_valid(u64 event, u64 mask)
+{
+	return !(event & ~mask);
+}
+
+static void remove_invalid_raw_events(struct kvm_pmu_event_filter *filter)
+{
+	u64 raw_mask;
+	int i, j;
+
+	if (filter->flags)
+		return;
+
+	raw_mask = get_event_filter_mask();
+	for (i = 0, j = 0; i < filter->nevents; i++) {
+		u64 raw_event = filter->events[i];
+
+		if (is_event_valid(raw_event, raw_mask))
+			filter->events[j++] = raw_event;
+	}
+
+	filter->nevents = j;
+}
+
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_pmu_event_filter tmp, *filter;
@@ -608,6 +640,8 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 	/* Ensure nevents can't be changed between the user copies. */
 	*filter = tmp;
 
+	remove_invalid_raw_events(filter);
+
 	/*
 	 * Sort the in-kernel list so that we can search it with bsearch.
 	 */
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v5 3/7] kvm: x86/pmu: prepare the pmu event filter for masked events
  2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
  2022-09-20 17:45 ` [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup Aaron Lewis
  2022-09-20 17:45 ` [PATCH v5 2/7] kvm: x86/pmu: Remove invalid raw events from the pmu event filter Aaron Lewis
@ 2022-09-20 17:45 ` Aaron Lewis
  2022-10-08  0:08   ` Sean Christopherson
  2022-09-20 17:46 ` [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter Aaron Lewis
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:45 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

Create an internal representation for filter events to abstract the
events userspace uses from the events the kernel uses.  That will allow
the kernel to use a common event and a common code path between the
different types of filter events used in userspace once masked events
are introduced.

No functional changes intended

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/pmu.c | 116 ++++++++++++++++++++++++++++++++-------------
 arch/x86/kvm/pmu.h |  16 +++++++
 2 files changed, 98 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index e7d94e6b7f28..7ce8bfafea91 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -239,6 +239,19 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
 	return true;
 }
 
+static inline u16 get_event_select(u64 eventsel)
+{
+	u64 e = eventsel &
+		static_call(kvm_x86_pmu_get_eventsel_event_mask)();
+
+	return (e & ARCH_PERFMON_EVENTSEL_EVENT) | ((e >> 24) & 0xF00ULL);
+}
+
+static inline u8 get_unit_mask(u64 eventsel)
+{
+	return (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
+}
+
 static int cmp_u64(const void *pa, const void *pb)
 {
 	u64 a = *(u64 *)pa;
@@ -247,53 +260,61 @@ static int cmp_u64(const void *pa, const void *pb)
 	return (a > b) - (a < b);
 }
 
-static inline u64 get_event_select(u64 eventsel)
+static u64 *find_filter_entry(struct kvm_pmu_event_filter *filter, u64 key)
+{
+	return bsearch(&key, filter->events, filter->nevents,
+			  sizeof(filter->events[0]), cmp_u64);
+}
+
+static bool filter_contains_match(struct kvm_pmu_event_filter *filter,
+				  u64 eventsel)
 {
-	return eventsel & static_call(kvm_x86_pmu_get_eventsel_event_mask)();
+	u16 event_select = get_event_select(eventsel);
+	u8 unit_mask = get_unit_mask(eventsel);
+	u64 key;
+
+	key = KVM_PMU_ENCODE_FILTER_ENTRY(event_select, unit_mask);
+	return find_filter_entry(filter, key);
 }
 
-static inline u64 get_raw_event(u64 eventsel)
+static bool is_gp_event_allowed(struct kvm_pmu_event_filter *filter, u64 eventsel)
 {
-	u64 event_select = get_event_select(eventsel);
-	u64 unit_mask = eventsel & ARCH_PERFMON_EVENTSEL_UMASK;
+	if (filter_contains_match(filter, eventsel))
+		return filter->action == KVM_PMU_EVENT_ALLOW;
 
-	return event_select | unit_mask;
+	return filter->action == KVM_PMU_EVENT_DENY;
+}
+
+static bool is_fixed_event_allowed(struct kvm_pmu_event_filter *filter, int idx)
+{
+	int fixed_idx = idx - INTEL_PMC_IDX_FIXED;
+
+	if (filter->action == KVM_PMU_EVENT_DENY &&
+	    test_bit(fixed_idx, (ulong *)&filter->fixed_counter_bitmap))
+		return false;
+	if (filter->action == KVM_PMU_EVENT_ALLOW &&
+	    !test_bit(fixed_idx, (ulong *)&filter->fixed_counter_bitmap))
+		return false;
+
+	return true;
 }
 
 static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu_event_filter *filter;
 	struct kvm *kvm = pmc->vcpu->kvm;
-	bool allow_event = true;
-	__u64 key;
-	int idx;
 
 	if (!static_call(kvm_x86_pmu_hw_event_available)(pmc))
 		return false;
 
 	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
 	if (!filter)
-		goto out;
+		return true;
 
-	if (pmc_is_gp(pmc)) {
-		key = get_raw_event(pmc->eventsel);
-		if (bsearch(&key, filter->events, filter->nevents,
-			    sizeof(__u64), cmp_u64))
-			allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
-		else
-			allow_event = filter->action == KVM_PMU_EVENT_DENY;
-	} else {
-		idx = pmc->idx - INTEL_PMC_IDX_FIXED;
-		if (filter->action == KVM_PMU_EVENT_DENY &&
-		    test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
-			allow_event = false;
-		if (filter->action == KVM_PMU_EVENT_ALLOW &&
-		    !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
-			allow_event = false;
-	}
+	if (pmc_is_gp(pmc))
+		return is_gp_event_allowed(filter, pmc->eventsel);
 
-out:
-	return allow_event;
+	return is_fixed_event_allowed(filter, pmc->idx);
 }
 
 void reprogram_counter(struct kvm_pmc *pmc)
@@ -609,6 +630,38 @@ static void remove_invalid_raw_events(struct kvm_pmu_event_filter *filter)
 	filter->nevents = j;
 }
 
+static inline u64 encode_filter_entry(u64 event)
+{
+	u16 event_select = get_event_select(event);
+	u8 unit_mask = get_unit_mask(event);
+
+	return KVM_PMU_ENCODE_FILTER_ENTRY(event_select, unit_mask);
+}
+
+static void convert_to_filter_events(struct kvm_pmu_event_filter *filter)
+{
+	int i;
+
+	for (i = 0; i < filter->nevents; i++) {
+		u64 e = filter->events[i];
+
+		filter->events[i] = encode_filter_entry(e);
+	}
+}
+
+static void prepare_filter_events(struct kvm_pmu_event_filter *filter)
+{
+	remove_invalid_raw_events(filter);
+
+	convert_to_filter_events(filter);
+
+	/*
+	 * Sort the in-kernel list so that we can search it with bsearch.
+	 */
+	sort(&filter->events, filter->nevents, sizeof(filter->events[0]),
+	     cmp_u64, NULL);
+}
+
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_pmu_event_filter tmp, *filter;
@@ -640,12 +693,7 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 	/* Ensure nevents can't be changed between the user copies. */
 	*filter = tmp;
 
-	remove_invalid_raw_events(filter);
-
-	/*
-	 * Sort the in-kernel list so that we can search it with bsearch.
-	 */
-	sort(&filter->events, filter->nevents, sizeof(__u64), cmp_u64, NULL);
+	prepare_filter_events(filter);
 
 	mutex_lock(&kvm->lock);
 	filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4e22f9f55400..df4f81e5c685 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -205,4 +205,20 @@ bool is_vmware_backdoor_pmc(u32 pmc_idx);
 
 extern struct kvm_pmu_ops intel_pmu_ops;
 extern struct kvm_pmu_ops amd_pmu_ops;
+
+struct kvm_pmu_filter_entry {
+	union {
+		u64 raw;
+		struct {
+			u64 event_select:12;
+			u64 unit_mask:8;
+			u64 rsvd:44;
+		};
+	};
+};
+
+#define KVM_PMU_ENCODE_FILTER_ENTRY(event_select, unit_mask) \
+	(((event_select) & 0xFFFULL) | \
+	(((unit_mask) & 0xFFULL) << 12))
+
 #endif /* __KVM_X86_PMU_H */
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
                   ` (2 preceding siblings ...)
  2022-09-20 17:45 ` [PATCH v5 3/7] kvm: x86/pmu: prepare the pmu event filter for masked events Aaron Lewis
@ 2022-09-20 17:46 ` Aaron Lewis
  2022-10-08  0:37   ` Sean Christopherson
  2022-10-10 15:42   ` Sean Christopherson
  2022-09-20 17:46 ` [PATCH v5 5/7] selftests: kvm/x86: Add flags when creating a " Aaron Lewis
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:46 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

When building a list of filter events, it can sometimes be a challenge
to fit all the events needed to adequately restrict the guest into the
limited space available in the pmu event filter.  This stems from the
fact that the pmu event filter requires each raw event (i.e. event
select + unit mask) be listed, when the intention might be to restrict
the event select all together, regardless of it's unit mask.  Instead
of increasing the number of filter events in the pmu event filter, add
a new encoding that is able to do a more generalized match on the unit
mask.

Introduce masked events as a new encoding that the pmu event filter
understands in addition to raw events.  A masked event has the fields:
mask, match, and exclude.  When filtering based on these events, the
mask is applied to the guest's unit mask to see if it matches the match
value (i.e. unit_mask & mask == match).  The exclude bit can then be
used to exclude events from that match.  E.g. for a given event select,
if it's easier to say which unit mask values shouldn't be filtered, a
masked event can be set up to match all possible unit mask values, then
another masked event can be set up to match the unit mask values that
shouldn't be filtered.

Userspace can query to see if this feature exists by looking for the
capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS.

This feature is enabled by setting the flags field in the pmu event
filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS.

Events can be encoded by using KVM_PMU_EVENT_ENCODE_MASKED_ENTRY().

It is an error to have a bit set outside the valid bits for a masked
event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in
such cases, including the high bits (11:8) of the event select if
called on Intel.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 Documentation/virt/kvm/api.rst  |  82 ++++++++++++++++--
 arch/x86/include/uapi/asm/kvm.h |  28 ++++++
 arch/x86/kvm/pmu.c              | 145 +++++++++++++++++++++++++++-----
 arch/x86/kvm/pmu.h              |  32 +++++--
 arch/x86/kvm/x86.c              |   1 +
 include/uapi/linux/kvm.h        |   1 +
 6 files changed, 258 insertions(+), 31 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index abd7c32126ce..16c3e6fd4ed7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5029,6 +5029,15 @@ using this ioctl.
 :Parameters: struct kvm_pmu_event_filter (in)
 :Returns: 0 on success, -1 on error
 
+Errors:
+
+  ======     ============================================================
+  EFAULT     args[0] cannot be accessed
+  EINVAL     args[0] contains invalid data in the filter or filter events
+  E2BIG      nevents is too large
+  EBUSY      not enough memory to allocate the filter
+  ======     ============================================================
+
 ::
 
   struct kvm_pmu_event_filter {
@@ -5040,14 +5049,73 @@ using this ioctl.
 	__u64 events[0];
   };
 
-This ioctl restricts the set of PMU events that the guest can program.
-The argument holds a list of events which will be allowed or denied.
-The eventsel+umask of each event the guest attempts to program is compared
-against the events field to determine whether the guest should have access.
-The events field only controls general purpose counters; fixed purpose
-counters are controlled by the fixed_counter_bitmap.
+This ioctl restricts the set of PMU events the guest can program by limiting
+which event select and unit mask combinations are permitted.
+
+The argument holds a list of filter events which will be allowed or denied.
+
+Filter events only control general purpose counters; fixed purpose counters
+are controlled by the fixed_counter_bitmap.
+
+Valid values for 'flags'::
+
+``0``
+
+To use this mode, clear the 'flags' field.
+
+In this mode each filter event will contain an event select + unit mask.
+
+When the guest attempts to program the PMU the event select + unit mask from
+the event select register being programmed is compared against the filter
+events to determine whether the guest should have access.
+
+``KVM_PMU_EVENT_FLAG_MASKED_EVENTS``
+:Capability: KVM_CAP_PMU_EVENT_MASKED_EVENTS
+
+In this mode each filter event will contain an event select, mask, match, and
+exclude value.
+
+When the guest attempts to program the PMU, these steps are followed in
+determining if the guest should have access:
+ 1. Match the event select from the guest against the filter events.
+ 2. If a match is found, match the guest's unit mask to the mask and match
+    values of the included filter events.
+    I.e. (unit mask & mask) == match && !exclude.
+ 3. If a match is found, match the guest's unit mask to the mask and match
+    values of the excluded filter events.
+    I.e. (unit mask & mask) == match && exclude.
+ 4.
+   a. If an included match is found and an excluded match is not found, filter
+      the event.
+   b. For everything else, do not filter the event.
+ 5.
+   a. If the event is filtered and it's an allow list, allow the guest to
+      program the event.
+   b. If the event is filtered and it's a deny list, do not allow the guest to
+      program the event.
+
+To encode a filter event use:
+  KVM_PMU_EVENT_ENCODE_MASKED_ENTRY().
+
+To access individual components in a masked entry use:
+::
+  struct kvm_pmu_event_masked_entry {
+	union {
+	__u64 raw;
+		struct {
+			/* event_select bits 11:8 are only valid on AMD. */
+			__u64 event_select:12;
+			__u64 mask:8;
+			__u64 match:8;
+			__u64 exclude:1;
+			__u64 rsvd:35;
+		};
+	};
+  };
 
-No flags are defined yet, the field must be zero.
+-EINVAL will be returned if the 'rsvd' field is not zero or if any of
+the high bits (11:8) in the 'event_select' field are set when called
+on Intel.
 
 Valid values for 'action'::
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 46de10a809ec..c82400a06c8b 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -528,6 +528,34 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0)
+#define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
+
+struct kvm_pmu_event_masked_entry {
+	union {
+		__u64 raw;
+		struct {
+			/* event_select bits 11:8 are only valid on AMD. */
+			__u64 event_select:12;
+			__u64 mask:8;
+			__u64 match:8;
+			__u64 exclude:1;
+			__u64 rsvd:35;
+		};
+	};
+};
+
+#define KVM_PMU_EVENT_ENCODE_MASKED_ENTRY(event_select, mask, match, exclude) \
+	(((event_select) & 0xFFFULL) | \
+	(((mask) & 0xFFULL) << 12) | \
+	(((match) & 0xFFULL) << 20) | \
+	((__u64)(!!(exclude)) << 28))
+
+#define KVM_PMU_EVENT_MASKED_EVENTSEL_EVENT	  (GENMASK_ULL(11, 0))
+#define KVM_PMU_EVENT_MASKED_EVENTSEL_MASK	  (GENMASK_ULL(19, 12))
+#define KVM_PMU_EVENT_MASKED_EVENTSEL_MATCH	  (GENMASK_ULL(27, 20))
+#define KVM_PMU_EVENT_MASKED_EVENTSEL_EXCLUDE	  (BIT_ULL(28))
+
 /* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7ce8bfafea91..b188ddb23f75 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -252,34 +252,61 @@ static inline u8 get_unit_mask(u64 eventsel)
 	return (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
 }
 
-static int cmp_u64(const void *pa, const void *pb)
+static int filter_event_cmp(const void *pa, const void *pb)
 {
-	u64 a = *(u64 *)pa;
-	u64 b = *(u64 *)pb;
+	u64 a = *(u64 *)pa & (KVM_PMU_FILTER_EVENTSEL_INDEX |
+			      KVM_PMU_FILTER_EVENTSEL_EVENT |
+			      KVM_PMU_FILTER_EVENTSEL_EXCLUDE);
+	u64 b = *(u64 *)pb & (KVM_PMU_FILTER_EVENTSEL_INDEX |
+			      KVM_PMU_FILTER_EVENTSEL_EVENT |
+			      KVM_PMU_FILTER_EVENTSEL_EXCLUDE);
 
 	return (a > b) - (a < b);
 }
 
-static u64 *find_filter_entry(struct kvm_pmu_event_filter *filter, u64 key)
+static int find_filter_index(struct kvm_pmu_event_filter *filter, u64 key)
 {
-	return bsearch(&key, filter->events, filter->nevents,
-			  sizeof(filter->events[0]), cmp_u64);
+	u64 *fe = bsearch(&key, filter->events, filter->nevents,
+			  sizeof(filter->events[0]), filter_event_cmp);
+
+	if (!fe)
+		return -1;
+
+	return fe - filter->events;
 }
 
 static bool filter_contains_match(struct kvm_pmu_event_filter *filter,
-				  u64 eventsel)
+				  u64 eventsel, bool exclude)
 {
 	u16 event_select = get_event_select(eventsel);
 	u8 unit_mask = get_unit_mask(eventsel);
+	struct kvm_pmu_filter_entry fe;
+	int i, index;
 	u64 key;
 
-	key = KVM_PMU_ENCODE_FILTER_ENTRY(event_select, unit_mask);
-	return find_filter_entry(filter, key);
+	key = KVM_PMU_ENCODE_FILTER_ENTRY(event_select, 0, 0, exclude);
+	index = find_filter_index(filter, key);
+	if (index < 0)
+		return false;
+
+	for (i = index; i < filter->nevents; i++) {
+		fe.raw = filter->events[i];
+
+		if (fe.event_select != event_select ||
+		    fe.exclude != exclude)
+			break;
+
+		if ((unit_mask & fe.mask) == fe.match)
+			return true;
+	}
+
+	return false;
 }
 
 static bool is_gp_event_allowed(struct kvm_pmu_event_filter *filter, u64 eventsel)
 {
-	if (filter_contains_match(filter, eventsel))
+	if (filter_contains_match(filter, eventsel, false) &&
+	    !filter_contains_match(filter, eventsel, true))
 		return filter->action == KVM_PMU_EVENT_ALLOW;
 
 	return filter->action == KVM_PMU_EVENT_DENY;
@@ -598,11 +625,20 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
 }
 EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
 
-static inline u64 get_event_filter_mask(void)
+static inline u64 get_event_filter_mask(u32 flags)
 {
 	u64 event_select_mask =
 		static_call(kvm_x86_pmu_get_eventsel_event_mask)();
 
+	if (flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS) {
+		u64 masked_eventsel_event = get_event_select(event_select_mask);
+
+		return masked_eventsel_event |
+		       KVM_PMU_EVENT_MASKED_EVENTSEL_MASK |
+		       KVM_PMU_EVENT_MASKED_EVENTSEL_MATCH |
+		       KVM_PMU_EVENT_MASKED_EVENTSEL_EXCLUDE;
+	}
+
 	return event_select_mask | ARCH_PERFMON_EVENTSEL_UMASK;
 }
 
@@ -611,6 +647,29 @@ static inline bool is_event_valid(u64 event, u64 mask)
 	return !(event & ~mask);
 }
 
+static bool is_filter_valid(struct kvm_pmu_event_filter *filter)
+{
+	u64 event_mask;
+	int i;
+
+	/* To maintain backwards compatibility don't validate raw events. */
+	if (!filter->flags)
+		return true;
+
+	event_mask = get_event_filter_mask(filter->flags);
+	for (i = 0; i < filter->nevents; i++) {
+		if (!is_event_valid(filter->events[i], event_mask))
+			return false;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(filter->pad); i++) {
+		if (filter->pad[i])
+			return false;
+	}
+
+	return true;
+}
+
 static void remove_invalid_raw_events(struct kvm_pmu_event_filter *filter)
 {
 	u64 raw_mask;
@@ -619,7 +678,7 @@ static void remove_invalid_raw_events(struct kvm_pmu_event_filter *filter)
 	if (filter->flags)
 		return;
 
-	raw_mask = get_event_filter_mask();
+	raw_mask = get_event_filter_mask(filter->flags);
 	for (i = 0, j = 0; i < filter->nevents; i++) {
 		u64 raw_event = filter->events[i];
 
@@ -630,12 +689,27 @@ static void remove_invalid_raw_events(struct kvm_pmu_event_filter *filter)
 	filter->nevents = j;
 }
 
-static inline u64 encode_filter_entry(u64 event)
+static inline u64 encode_filter_entry(u64 event, u32 flags)
 {
-	u16 event_select = get_event_select(event);
-	u8 unit_mask = get_unit_mask(event);
+	u16 event_select;
+	u8 mask, match;
+	bool exclude;
 
-	return KVM_PMU_ENCODE_FILTER_ENTRY(event_select, unit_mask);
+	if (flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS) {
+		struct kvm_pmu_event_masked_entry me = { .raw = event };
+
+		mask = me.mask;
+		match = me.match;
+		event_select = me.event_select;
+		exclude = me.exclude;
+	} else {
+		mask = 0xFF;
+		match = get_unit_mask(event);
+		event_select = get_event_select(event);
+		exclude = false;
+	}
+
+	return KVM_PMU_ENCODE_FILTER_ENTRY(event_select, mask, match, exclude);
 }
 
 static void convert_to_filter_events(struct kvm_pmu_event_filter *filter)
@@ -645,7 +719,34 @@ static void convert_to_filter_events(struct kvm_pmu_event_filter *filter)
 	for (i = 0; i < filter->nevents; i++) {
 		u64 e = filter->events[i];
 
-		filter->events[i] = encode_filter_entry(e);
+		filter->events[i] = encode_filter_entry(e, filter->flags);
+	}
+}
+
+/*
+ * Sort will order the list by exclude, then event select.  This function will
+ * then index the sublists of event selects such that when a search is done on
+ * the list, the head of the event select sublist is returned.  This simplifies
+ * the logic in filter_contains_match() when walking the list.
+ */
+static void index_filter_events(struct kvm_pmu_event_filter *filter)
+{
+	struct kvm_pmu_filter_entry *prev, *curr;
+	int i, index = 0;
+
+	if (filter->nevents)
+		prev = (struct kvm_pmu_filter_entry *)(filter->events);
+
+	for (i = 0; i < filter->nevents; i++) {
+		curr = (struct kvm_pmu_filter_entry *)(&filter->events[i]);
+
+		if (curr->event_select != prev->event_select ||
+		    curr->exclude != prev->exclude) {
+			index = 0;
+			prev = curr;
+		}
+
+		curr->event_index = index++;
 	}
 }
 
@@ -659,7 +760,9 @@ static void prepare_filter_events(struct kvm_pmu_event_filter *filter)
 	 * Sort the in-kernel list so that we can search it with bsearch.
 	 */
 	sort(&filter->events, filter->nevents, sizeof(filter->events[0]),
-	     cmp_u64, NULL);
+	     filter_event_cmp, NULL);
+
+	index_filter_events(filter);
 }
 
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
@@ -675,7 +778,7 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 	    tmp.action != KVM_PMU_EVENT_DENY)
 		return -EINVAL;
 
-	if (tmp.flags != 0)
+	if (tmp.flags & ~KVM_PMU_EVENT_FLAGS_VALID_MASK)
 		return -EINVAL;
 
 	if (tmp.nevents > KVM_PMU_EVENT_FILTER_MAX_EVENTS)
@@ -693,6 +796,10 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 	/* Ensure nevents can't be changed between the user copies. */
 	*filter = tmp;
 
+	r = -EINVAL;
+	if (!is_filter_valid(filter))
+		goto cleanup;
+
 	prepare_filter_events(filter);
 
 	mutex_lock(&kvm->lock);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index df4f81e5c685..ffc07e4d8d71 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -206,19 +206,41 @@ bool is_vmware_backdoor_pmc(u32 pmc_idx);
 extern struct kvm_pmu_ops intel_pmu_ops;
 extern struct kvm_pmu_ops amd_pmu_ops;
 
+#define KVM_PMU_FILTER_EVENTSEL_INDEX		(GENMASK_ULL(27, 16))
+#define KVM_PMU_FILTER_EVENTSEL_EVENT		(GENMASK_ULL(39, 28))
+#define KVM_PMU_FILTER_EVENTSEL_EXCLUDE		(BIT_ULL(40))
+
+/*
+ * Internal representation by which all filter events converge (e.g. masked
+ * events, raw events).  That way there is only one way filter events
+ * behave once the filter is set.
+ *
+ * When filter events are converted into this format then sorted, the
+ * resulting list naturally ends up in two sublists.  One for the 'include
+ * list' and one for the 'exclude list'.  These sublists are further broken
+ * down into sublists ordered by their event select.  After that, the
+ * event select sublists are indexed such that a search for: exclude = n,
+ * event_select = n, and event_index = 0 will return the head of an event
+ * select sublist that can be walked to see if a match exists.
+ */
 struct kvm_pmu_filter_entry {
 	union {
 		u64 raw;
 		struct {
+			u64 mask:8;
+			u64 match:8;
+			u64 event_index:12;
 			u64 event_select:12;
-			u64 unit_mask:8;
-			u64 rsvd:44;
+			u64 exclude:1;
+			u64 rsvd:23;
 		};
 	};
 };
 
-#define KVM_PMU_ENCODE_FILTER_ENTRY(event_select, unit_mask) \
-	(((event_select) & 0xFFFULL) | \
-	(((unit_mask) & 0xFFULL) << 12))
+#define KVM_PMU_ENCODE_FILTER_ENTRY(event_select, mask, match, exclude) \
+	(((mask) & 0xFFULL) | \
+	(((match) & 0xFFULL) << 8) | \
+	(((event_select) & 0xFFFULL) << 28) | \
+	((u64)(!!(exclude)) << 40))
 
 #endif /* __KVM_X86_PMU_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 43a6a7efc6ec..0a6c5e1aca0f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4374,6 +4374,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
  	case KVM_CAP_SPLIT_IRQCHIP:
 	case KVM_CAP_IMMEDIATE_EXIT:
 	case KVM_CAP_PMU_EVENT_FILTER:
+	case KVM_CAP_PMU_EVENT_MASKED_EVENTS:
 	case KVM_CAP_GET_MSR_FEATURES:
 	case KVM_CAP_MSR_PLATFORM_INFO:
 	case KVM_CAP_EXCEPTION_PAYLOAD:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index eed0315a77a6..685034baea9d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1177,6 +1177,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 223
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v5 5/7] selftests: kvm/x86: Add flags when creating a pmu event filter
  2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
                   ` (3 preceding siblings ...)
  2022-09-20 17:46 ` [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter Aaron Lewis
@ 2022-09-20 17:46 ` Aaron Lewis
  2022-09-20 17:46 ` [PATCH v5 6/7] selftests: kvm/x86: Add testing for KVM_SET_PMU_EVENT_FILTER Aaron Lewis
  2022-09-20 17:46 ` [PATCH v5 7/7] selftests: kvm/x86: Test masked events Aaron Lewis
  6 siblings, 0 replies; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:46 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

Now that the flags field can be non-zero, pass it in when creating a
pmu event filter.

This is needed in preparation for testing masked events.

No functional change intended.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 .../testing/selftests/kvm/x86_64/pmu_event_filter_test.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index ea4e259a1e2e..bd7054a53981 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -221,14 +221,15 @@ static struct kvm_pmu_event_filter *alloc_pmu_event_filter(uint32_t nevents)
 
 
 static struct kvm_pmu_event_filter *
-create_pmu_event_filter(const uint64_t event_list[],
-			int nevents, uint32_t action)
+create_pmu_event_filter(const uint64_t event_list[], int nevents,
+			uint32_t action, uint32_t flags)
 {
 	struct kvm_pmu_event_filter *f;
 	int i;
 
 	f = alloc_pmu_event_filter(nevents);
 	f->action = action;
+	f->flags = flags;
 	for (i = 0; i < nevents; i++)
 		f->events[i] = event_list[i];
 
@@ -239,7 +240,7 @@ static struct kvm_pmu_event_filter *event_filter(uint32_t action)
 {
 	return create_pmu_event_filter(event_list,
 				       ARRAY_SIZE(event_list),
-				       action);
+				       action, 0);
 }
 
 /*
@@ -286,7 +287,7 @@ static void test_amd_deny_list(struct kvm_vcpu *vcpu)
 	struct kvm_pmu_event_filter *f;
 	uint64_t count;
 
-	f = create_pmu_event_filter(&event, 1, KVM_PMU_EVENT_DENY);
+	f = create_pmu_event_filter(&event, 1, KVM_PMU_EVENT_DENY, 0);
 	count = test_with_filter(vcpu, f);
 
 	free(f);
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v5 6/7] selftests: kvm/x86: Add testing for KVM_SET_PMU_EVENT_FILTER
  2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
                   ` (4 preceding siblings ...)
  2022-09-20 17:46 ` [PATCH v5 5/7] selftests: kvm/x86: Add flags when creating a " Aaron Lewis
@ 2022-09-20 17:46 ` Aaron Lewis
  2022-09-20 17:56   ` Jim Mattson
  2022-09-20 17:46 ` [PATCH v5 7/7] selftests: kvm/x86: Test masked events Aaron Lewis
  6 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:46 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

Test that masked events are not using invalid bits, and if they are,
ensure the pmu event filter is not accepted by KVM_SET_PMU_EVENT_FILTER.
The only valid bits that can be used for masked events are set when
using KVM_PMU_EVENT_ENCODE_MASKED_EVENT() with one exception: If any
of the high bits (11:8) of the event select are set when using Intel,
the PMU event filter will fail.

Also, because validation was not being done prior to the introduction
of masked events, only expect validation to fail when masked events
are used.  E.g. in the first test a filter event with all its bits set
is accepted by KVM_SET_PMU_EVENT_FILTER when flags = 0.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
---
 .../kvm/x86_64/pmu_event_filter_test.c        | 36 +++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index bd7054a53981..0750e2fa7a38 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -442,6 +442,39 @@ static bool use_amd_pmu(void)
 		 is_zen3(entry->eax));
 }
 
+static int run_filter_test(struct kvm_vcpu *vcpu, const uint64_t *events,
+			   int nevents, uint32_t flags)
+{
+	struct kvm_pmu_event_filter *f;
+	int r;
+
+	f = create_pmu_event_filter(events, nevents, KVM_PMU_EVENT_ALLOW, flags);
+	r = __vm_ioctl(vcpu->vm, KVM_SET_PMU_EVENT_FILTER, f);
+	free(f);
+
+	return r;
+}
+
+static void test_filter_ioctl(struct kvm_vcpu *vcpu)
+{
+	uint64_t e = ~0ul;
+	int r;
+
+	/*
+	 * Unfortunately having invalid bits set in event data is expected to
+	 * pass when flags == 0 (bits other than eventsel+umask).
+	 */
+	r = run_filter_test(vcpu, &e, 1, 0);
+	TEST_ASSERT(r == 0, "Valid PMU Event Filter is failing");
+
+	r = run_filter_test(vcpu, &e, 1, KVM_PMU_EVENT_FLAG_MASKED_EVENTS);
+	TEST_ASSERT(r != 0, "Invalid PMU Event Filter is expected to fail");
+
+	e = KVM_PMU_EVENT_ENCODE_MASKED_ENTRY(0xff, 0xff, 0xff, 0xf);
+	r = run_filter_test(vcpu, &e, 1, KVM_PMU_EVENT_FLAG_MASKED_EVENTS);
+	TEST_ASSERT(r == 0, "Valid PMU Event Filter is failing");
+}
+
 int main(int argc, char *argv[])
 {
 	void (*guest_code)(void);
@@ -452,6 +485,7 @@ int main(int argc, char *argv[])
 	setbuf(stdout, NULL);
 
 	TEST_REQUIRE(kvm_has_cap(KVM_CAP_PMU_EVENT_FILTER));
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_PMU_EVENT_MASKED_EVENTS));
 
 	TEST_REQUIRE(use_intel_pmu() || use_amd_pmu());
 	guest_code = use_intel_pmu() ? intel_guest_code : amd_guest_code;
@@ -472,6 +506,8 @@ int main(int argc, char *argv[])
 	test_not_member_deny_list(vcpu);
 	test_not_member_allow_list(vcpu);
 
+	test_filter_ioctl(vcpu);
+
 	kvm_vm_free(vm);
 
 	test_pmu_config_disable(guest_code);
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v5 7/7] selftests: kvm/x86: Test masked events
  2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
                   ` (5 preceding siblings ...)
  2022-09-20 17:46 ` [PATCH v5 6/7] selftests: kvm/x86: Add testing for KVM_SET_PMU_EVENT_FILTER Aaron Lewis
@ 2022-09-20 17:46 ` Aaron Lewis
  2022-09-20 17:59   ` Jim Mattson
  6 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-09-20 17:46 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis

Add testing to show that a pmu event can be filtered with a generalized
match on it's unit mask.

These tests set up test cases to demonstrate various ways of filtering
a pmu event that has multiple unit mask values.  It does this by
setting up the filter in KVM with the masked events provided, then
enabling three pmu counters in the guest.  The test then verifies that
the pmu counters agree with which counters should be counting and which
counters should be filtered for both a sparse filter list and a dense
filter list.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
---
 .../kvm/x86_64/pmu_event_filter_test.c        | 342 +++++++++++++++++-
 1 file changed, 341 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 0750e2fa7a38..6cf11d82ad5b 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -442,6 +442,337 @@ static bool use_amd_pmu(void)
 		 is_zen3(entry->eax));
 }
 
+/*
+ * "MEM_INST_RETIRED.ALL_LOADS", "MEM_INST_RETIRED.ALL_STORES", and
+ * "MEM_INST_RETIRED.ANY" from https://perfmon-events.intel.com/
+ * supported on Intel Xeon processors:
+ *  - Sapphire Rapids, Ice Lake, Cascade Lake, Skylake.
+ */
+#define MEM_INST_RETIRED		0xD0
+#define MEM_INST_RETIRED_LOAD		EVENT(MEM_INST_RETIRED, 0x81)
+#define MEM_INST_RETIRED_STORE		EVENT(MEM_INST_RETIRED, 0x82)
+#define MEM_INST_RETIRED_LOAD_STORE	EVENT(MEM_INST_RETIRED, 0x83)
+
+static bool supports_event_mem_inst_retired(void)
+{
+	uint32_t eax, ebx, ecx, edx;
+
+	cpuid(1, &eax, &ebx, &ecx, &edx);
+	if (x86_family(eax) == 0x6) {
+		switch (x86_model(eax)) {
+		/* Sapphire Rapids */
+		case 0x8F:
+		/* Ice Lake */
+		case 0x6A:
+		/* Skylake */
+		/* Cascade Lake */
+		case 0x55:
+			return true;
+		}
+	}
+
+	return false;
+}
+
+static int num_gp_counters(void)
+{
+	const struct kvm_cpuid_entry2 *entry;
+
+	entry = kvm_get_supported_cpuid_entry(0xa);
+	union cpuid10_eax eax = { .full = entry->eax };
+
+	return eax.split.num_counters;
+}
+
+/*
+ * "LS Dispatch", from Processor Programming Reference
+ * (PPR) for AMD Family 17h Model 01h, Revision B1 Processors,
+ * Preliminary Processor Programming Reference (PPR) for AMD Family
+ * 17h Model 31h, Revision B0 Processors, and Preliminary Processor
+ * Programming Reference (PPR) for AMD Family 19h Model 01h, Revision
+ * B1 Processors Volume 1 of 2.
+ */
+#define LS_DISPATCH		0x29
+#define LS_DISPATCH_LOAD	EVENT(LS_DISPATCH, BIT(0))
+#define LS_DISPATCH_STORE	EVENT(LS_DISPATCH, BIT(1))
+#define LS_DISPATCH_LOAD_STORE	EVENT(LS_DISPATCH, BIT(2))
+
+#define INCLUDE_MASKED_ENTRY(event_select, mask, match) \
+	KVM_PMU_EVENT_ENCODE_MASKED_ENTRY(event_select, mask, match, false)
+#define EXCLUDE_MASKED_ENTRY(event_select, mask, match) \
+	KVM_PMU_EVENT_ENCODE_MASKED_ENTRY(event_select, mask, match, true)
+
+struct perf_counter {
+	union {
+		uint64_t raw;
+		struct {
+			uint64_t loads:22;
+			uint64_t stores:22;
+			uint64_t loads_stores:20;
+		};
+	};
+};
+
+static uint64_t masked_events_guest_test(uint32_t msr_base)
+{
+	uint64_t ld0, ld1, st0, st1, ls0, ls1;
+	struct perf_counter c;
+	int val;
+
+	ld0 = rdmsr(msr_base + 0);
+	st0 = rdmsr(msr_base + 1);
+	ls0 = rdmsr(msr_base + 2);
+
+	__asm__ __volatile__("movl $0, %[v];"
+			     "movl %[v], %%eax;"
+			     "incl %[v];"
+			     : [v]"+m"(val) :: "eax");
+
+	ld1 = rdmsr(msr_base + 0);
+	st1 = rdmsr(msr_base + 1);
+	ls1 = rdmsr(msr_base + 2);
+
+	c.loads = ld1 - ld0;
+	c.stores = st1 - st0;
+	c.loads_stores = ls1 - ls0;
+
+	return c.raw;
+}
+
+static void intel_masked_events_guest_code(void)
+{
+	uint64_t r;
+
+	for (;;) {
+		wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+		wrmsr(MSR_P6_EVNTSEL0 + 0, ARCH_PERFMON_EVENTSEL_ENABLE |
+		      ARCH_PERFMON_EVENTSEL_OS | MEM_INST_RETIRED_LOAD);
+		wrmsr(MSR_P6_EVNTSEL0 + 1, ARCH_PERFMON_EVENTSEL_ENABLE |
+		      ARCH_PERFMON_EVENTSEL_OS | MEM_INST_RETIRED_STORE);
+		wrmsr(MSR_P6_EVNTSEL0 + 2, ARCH_PERFMON_EVENTSEL_ENABLE |
+		      ARCH_PERFMON_EVENTSEL_OS | MEM_INST_RETIRED_LOAD_STORE);
+
+		wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
+
+		r = masked_events_guest_test(MSR_IA32_PMC0);
+
+		GUEST_SYNC(r);
+	}
+}
+
+static void amd_masked_events_guest_code(void)
+{
+	uint64_t r;
+
+	for (;;) {
+		wrmsr(MSR_K7_EVNTSEL0, 0);
+		wrmsr(MSR_K7_EVNTSEL1, 0);
+		wrmsr(MSR_K7_EVNTSEL2, 0);
+
+		wrmsr(MSR_K7_EVNTSEL0, ARCH_PERFMON_EVENTSEL_ENABLE |
+		      ARCH_PERFMON_EVENTSEL_OS | LS_DISPATCH_LOAD);
+		wrmsr(MSR_K7_EVNTSEL1, ARCH_PERFMON_EVENTSEL_ENABLE |
+		      ARCH_PERFMON_EVENTSEL_OS | LS_DISPATCH_STORE);
+		wrmsr(MSR_K7_EVNTSEL2, ARCH_PERFMON_EVENTSEL_ENABLE |
+		      ARCH_PERFMON_EVENTSEL_OS | LS_DISPATCH_LOAD_STORE);
+
+		r = masked_events_guest_test(MSR_K7_PERFCTR0);
+
+		GUEST_SYNC(r);
+	}
+}
+
+static struct perf_counter run_masked_events_test(struct kvm_vcpu *vcpu,
+						 const uint64_t masked_events[],
+						 const int nmasked_events)
+{
+	struct kvm_pmu_event_filter *f;
+	struct perf_counter r;
+
+	f = create_pmu_event_filter(masked_events, nmasked_events,
+				    KVM_PMU_EVENT_ALLOW,
+				    KVM_PMU_EVENT_FLAG_MASKED_EVENTS);
+	r.raw = test_with_filter(vcpu, f);
+	free(f);
+
+	return r;
+}
+
+/* Matches KVM_PMU_EVENT_FILTER_MAX_EVENTS in pmu.c */
+#define MAX_FILTER_EVENTS	300
+#define MAX_TEST_EVENTS		10
+
+#define ALLOW_LOADS		BIT(0)
+#define ALLOW_STORES		BIT(1)
+#define ALLOW_LOADS_STORES	BIT(2)
+
+struct masked_events_test {
+	uint64_t intel_events[MAX_TEST_EVENTS];
+	uint64_t intel_event_end;
+	uint64_t amd_events[MAX_TEST_EVENTS];
+	uint64_t amd_event_end;
+	const char *msg;
+	uint32_t flags;
+};
+
+/*
+ * These are the test cases for the masked events tests.
+ *
+ * For each test, the guest enables 3 PMU counters (loads, stores,
+ * loads + stores).  The filter is then set in KVM with the masked events
+ * provided.  The test then verifies that the counters agree with which
+ * ones should be counting and which ones should be filtered.
+ */
+const struct masked_events_test test_cases[] = {
+	{
+		.intel_events = {
+			INCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0xFF, 0x81),
+		},
+		.amd_events = {
+			INCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xFF, BIT(0)),
+		},
+		.msg = "Only allow loads.",
+		.flags = ALLOW_LOADS,
+	}, {
+		.intel_events = {
+			INCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0xFF, 0x82),
+		},
+		.amd_events = {
+			INCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xFF, BIT(1)),
+		},
+		.msg = "Only allow stores.",
+		.flags = ALLOW_STORES,
+	}, {
+		.intel_events = {
+			INCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0xFF, 0x83),
+		},
+		.amd_events = {
+			INCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xFF, BIT(2)),
+		},
+		.msg = "Only allow loads + stores.",
+		.flags = ALLOW_LOADS_STORES,
+	}, {
+		.intel_events = {
+			INCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0x7C, 0),
+			EXCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0xFF, 0x83),
+		},
+		.amd_events = {
+			INCLUDE_MASKED_ENTRY(LS_DISPATCH, ~(BIT(0) | BIT(1)), 0),
+		},
+		.msg = "Only allow loads and stores.",
+		.flags = ALLOW_LOADS | ALLOW_STORES,
+	}, {
+		.intel_events = {
+			INCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0x7C, 0),
+			EXCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0xFF, 0x82),
+		},
+		.amd_events = {
+			INCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xF8, 0),
+			EXCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xFF, BIT(1)),
+		},
+		.msg = "Only allow loads and loads + stores.",
+		.flags = ALLOW_LOADS | ALLOW_LOADS_STORES
+	}, {
+		.intel_events = {
+			INCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0xFE, 0x82),
+		},
+		.amd_events = {
+			INCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xF8, 0),
+			EXCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xFF, BIT(0)),
+		},
+		.msg = "Only allow stores and loads + stores.",
+		.flags = ALLOW_STORES | ALLOW_LOADS_STORES
+	}, {
+		.intel_events = {
+			INCLUDE_MASKED_ENTRY(MEM_INST_RETIRED, 0x7C, 0),
+		},
+		.amd_events = {
+			INCLUDE_MASKED_ENTRY(LS_DISPATCH, 0xF8, 0),
+		},
+		.msg = "Only allow loads, stores, and loads + stores.",
+		.flags = ALLOW_LOADS | ALLOW_STORES | ALLOW_LOADS_STORES
+	},
+};
+
+static int append_test_events(const struct masked_events_test *test,
+			      uint64_t *events, int nevents)
+{
+	const uint64_t *evts;
+	int i;
+
+	evts = use_intel_pmu() ? test->intel_events : test->amd_events;
+	for (i = 0; i < MAX_TEST_EVENTS; i++) {
+		if (evts[i] == 0)
+			break;
+
+		events[nevents + i] = evts[i];
+	}
+
+	return nevents + i;
+}
+
+static bool bool_eq(bool a, bool b)
+{
+	return a == b;
+}
+
+static void run_masked_events_tests(struct kvm_vcpu *vcpu, uint64_t *events,
+				    int nevents)
+{
+	int ntests = ARRAY_SIZE(test_cases);
+	struct perf_counter c;
+	int i, n;
+
+	for (i = 0; i < ntests; i++) {
+		const struct masked_events_test *test = &test_cases[i];
+
+		/* Do any test case events overflow MAX_TEST_EVENTS? */
+		assert(test->intel_event_end == 0);
+		assert(test->amd_event_end == 0);
+
+		n = append_test_events(test, events, nevents);
+
+		c = run_masked_events_test(vcpu, events, n);
+		TEST_ASSERT(bool_eq(c.loads, test->flags & ALLOW_LOADS) &&
+			    bool_eq(c.stores, test->flags & ALLOW_STORES) &&
+			    bool_eq(c.loads_stores,
+				    test->flags & ALLOW_LOADS_STORES),
+			    "%s  loads: %u, stores: %u, loads + stores: %u",
+			    test->msg, c.loads, c.stores, c.loads_stores);
+	}
+}
+
+static void add_dummy_events(uint64_t *events, int nevents)
+{
+	int i;
+
+	for (i = 0; i < nevents; i++) {
+		int event_select = i % 0xFF;
+		bool exclude = ((i % 4) == 0);
+
+		if (event_select == MEM_INST_RETIRED ||
+		    event_select == LS_DISPATCH)
+			event_select++;
+
+		events[i] = KVM_PMU_EVENT_ENCODE_MASKED_ENTRY(event_select, 0,
+							      0, exclude);
+	}
+}
+
+static void test_masked_events(struct kvm_vcpu *vcpu)
+{
+	int nevents = MAX_FILTER_EVENTS - MAX_TEST_EVENTS;
+	uint64_t events[MAX_FILTER_EVENTS];
+
+	/* Run the test cases against a sparse PMU event filter. */
+	run_masked_events_tests(vcpu, events, 0);
+
+	/* Run the test cases against a dense PMU event filter. */
+	add_dummy_events(events, MAX_FILTER_EVENTS);
+	run_masked_events_tests(vcpu, events, nevents);
+}
+
 static int run_filter_test(struct kvm_vcpu *vcpu, const uint64_t *events,
 			   int nevents, uint32_t flags)
 {
@@ -478,7 +809,7 @@ static void test_filter_ioctl(struct kvm_vcpu *vcpu)
 int main(int argc, char *argv[])
 {
 	void (*guest_code)(void);
-	struct kvm_vcpu *vcpu;
+	struct kvm_vcpu *vcpu, *vcpu2 = NULL;
 	struct kvm_vm *vm;
 
 	/* Tell stdout not to buffer its content */
@@ -506,6 +837,15 @@ int main(int argc, char *argv[])
 	test_not_member_deny_list(vcpu);
 	test_not_member_allow_list(vcpu);
 
+	if (use_intel_pmu() &&
+	    supports_event_mem_inst_retired() &&
+	    num_gp_counters() >= 3)
+		vcpu2 = vm_vcpu_add(vm, 2, intel_masked_events_guest_code);
+	else if (use_amd_pmu())
+		vcpu2 = vm_vcpu_add(vm, 2, amd_masked_events_guest_code);
+
+	if (vcpu2)
+		test_masked_events(vcpu2);
 	test_filter_ioctl(vcpu);
 
 	kvm_vm_free(vm);
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 6/7] selftests: kvm/x86: Add testing for KVM_SET_PMU_EVENT_FILTER
  2022-09-20 17:46 ` [PATCH v5 6/7] selftests: kvm/x86: Add testing for KVM_SET_PMU_EVENT_FILTER Aaron Lewis
@ 2022-09-20 17:56   ` Jim Mattson
  0 siblings, 0 replies; 23+ messages in thread
From: Jim Mattson @ 2022-09-20 17:56 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, seanjc

On Tue, Sep 20, 2022 at 10:46 AM Aaron Lewis <aaronlewis@google.com> wrote:
>
> Test that masked events are not using invalid bits, and if they are,
> ensure the pmu event filter is not accepted by KVM_SET_PMU_EVENT_FILTER.
> The only valid bits that can be used for masked events are set when
> using KVM_PMU_EVENT_ENCODE_MASKED_EVENT() with one exception: If any
> of the high bits (11:8) of the event select are set when using Intel,
> the PMU event filter will fail.
>
> Also, because validation was not being done prior to the introduction
> of masked events, only expect validation to fail when masked events
> are used.  E.g. in the first test a filter event with all its bits set
> is accepted by KVM_SET_PMU_EVENT_FILTER when flags = 0.
>
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 7/7] selftests: kvm/x86: Test masked events
  2022-09-20 17:46 ` [PATCH v5 7/7] selftests: kvm/x86: Test masked events Aaron Lewis
@ 2022-09-20 17:59   ` Jim Mattson
  0 siblings, 0 replies; 23+ messages in thread
From: Jim Mattson @ 2022-09-20 17:59 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, seanjc

On Tue, Sep 20, 2022 at 10:46 AM Aaron Lewis <aaronlewis@google.com> wrote:
>
> Add testing to show that a pmu event can be filtered with a generalized
> match on it's unit mask.
>
> These tests set up test cases to demonstrate various ways of filtering
> a pmu event that has multiple unit mask values.  It does this by
> setting up the filter in KVM with the masked events provided, then
> enabling three pmu counters in the guest.  The test then verifies that
> the pmu counters agree with which counters should be counting and which
> counters should be filtered for both a sparse filter list and a dense
> filter list.
>
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup
  2022-09-20 17:45 ` [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup Aaron Lewis
@ 2022-10-07 23:25   ` Sean Christopherson
  2022-10-15 15:43     ` Aaron Lewis
  0 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2022-10-07 23:25 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Tue, Sep 20, 2022, Aaron Lewis wrote:
> When checking if a pmu event the guest is attempting to program should
> be filtered, only consider the event select + unit mask in that
> decision. Use an architecture specific mask to mask out all other bits,
> including bits 35:32 on Intel.  Those bits are not part of the event
> select and should not be considered in that decision.
> 
> Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/include/asm/kvm-x86-pmu-ops.h |  1 +
>  arch/x86/kvm/pmu.c                     | 15 ++++++++++++++-
>  arch/x86/kvm/pmu.h                     |  1 +
>  arch/x86/kvm/svm/pmu.c                 |  6 ++++++
>  arch/x86/kvm/vmx/pmu_intel.c           |  6 ++++++
>  5 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
> index c17e3e96fc1d..e0280cc3e6e4 100644
> --- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
> @@ -24,6 +24,7 @@ KVM_X86_PMU_OP(set_msr)
>  KVM_X86_PMU_OP(refresh)
>  KVM_X86_PMU_OP(init)
>  KVM_X86_PMU_OP(reset)
> +KVM_X86_PMU_OP(get_eventsel_event_mask)
>  KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
>  KVM_X86_PMU_OP_OPTIONAL(cleanup)
>  
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 02f9e4f245bd..98f383789579 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -247,6 +247,19 @@ static int cmp_u64(const void *pa, const void *pb)
>  	return (a > b) - (a < b);
>  }
>  
> +static inline u64 get_event_select(u64 eventsel)
> +{
> +	return eventsel & static_call(kvm_x86_pmu_get_eventsel_event_mask)();

There's no need to use a callback, both values are constant.  And with a constant,
this line becomes much shorter and this helper goes away.  More below.

> +}
> +
> +static inline u64 get_raw_event(u64 eventsel)

As a general rule, don't tag local static functions as "inline".  Unless something
_needs_ to be inline, the compiler is almost always going to be smarter, and if
something absolutely must be inlined, then __always_inline is requires as "inline"
is purely a hint.

> +{
> +	u64 event_select = get_event_select(eventsel);
> +	u64 unit_mask = eventsel & ARCH_PERFMON_EVENTSEL_UMASK;

And looking forward, I think it makes sense for kvm_pmu_ops to hold the entire
mask.  This code from patch 3 is unnecessarily complex, and bleeds vendor details
where they don't belong.  I'll follow up more in patches 3 and 4.

  static inline u16 get_event_select(u64 eventsel)
  {
        u64 e = eventsel &
                static_call(kvm_x86_pmu_get_eventsel_event_mask)();

        return (e & ARCH_PERFMON_EVENTSEL_EVENT) | ((e >> 24) & 0xF00ULL);
  }

> +
> +	return event_select | unit_mask;
> +}
> +
>  static bool check_pmu_event_filter(struct kvm_pmc *pmc)
>  {
>  	struct kvm_pmu_event_filter *filter;
> @@ -263,7 +276,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
>  		goto out;
>  
>  	if (pmc_is_gp(pmc)) {
> -		key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
> +		key = get_raw_event(pmc->eventsel);

With the above suggestion, this is simply:

		key = pmc->eventsel & kvm_pmu_.ops.EVENTSEL_MASK;

And the total patch is:

---
 arch/x86/kvm/pmu.c           | 2 +-
 arch/x86/kvm/pmu.h           | 2 ++
 arch/x86/kvm/svm/pmu.c       | 2 ++
 arch/x86/kvm/vmx/pmu_intel.c | 2 ++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d9b9a0f0db17..d0e2c7eda65b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -273,7 +273,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 		goto out;
 
 	if (pmc_is_gp(pmc)) {
-		key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
+		key = pmc->eventsel & kvm_pmu_ops.EVENTSEL_MASK;
 		if (bsearch(&key, filter->events, filter->nevents,
 			    sizeof(__u64), cmp_u64))
 			allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 5cc5721f260b..45a7dd24125d 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -40,6 +40,8 @@ struct kvm_pmu_ops {
 	void (*reset)(struct kvm_vcpu *vcpu);
 	void (*deliver_pmi)(struct kvm_vcpu *vcpu);
 	void (*cleanup)(struct kvm_vcpu *vcpu);
+
+	const u64 EVENTSEL_MASK;
 };
 
 void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b68956299fa8..6ef7d1fcdbc2 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -228,4 +228,6 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.refresh = amd_pmu_refresh,
 	.init = amd_pmu_init,
 	.reset = amd_pmu_reset,
+	.EVENTSEL_MASK = AMD64_EVENTSEL_EVENT |
+			 ARCH_PERFMON_EVENTSEL_UMASK,
 };
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 25b70a85bef5..0a1c95b64ef1 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -811,4 +811,6 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.reset = intel_pmu_reset,
 	.deliver_pmi = intel_pmu_deliver_pmi,
 	.cleanup = intel_pmu_cleanup,
+	.EVENTSEL_MASK = ARCH_PERFMON_EVENTSEL_EVENT |
+			 ARCH_PERFMON_EVENTSEL_UMASK,
 };

base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354
-- 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 2/7] kvm: x86/pmu: Remove invalid raw events from the pmu event filter
  2022-09-20 17:45 ` [PATCH v5 2/7] kvm: x86/pmu: Remove invalid raw events from the pmu event filter Aaron Lewis
@ 2022-10-07 23:48   ` Sean Christopherson
  0 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-10-07 23:48 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Tue, Sep 20, 2022, Aaron Lewis wrote:
> If a raw event is invalid, i.e. bits set outside the event select +
> unit mask, the event will never match the search, so it's pointless
> to have it in the list.  Opt for a shorter list by removing invalid
> raw events.

Please use "impossible" instead of "invalid".  While I agree they are invalid,
because this is pre-existing ABI, KVM can't treat them as invalid, i.e. can't
reject them.  My initial reaction to seeing remove_invalid_raw_events() was not
exactly PG-13 :-)

Can you also call out that because this is established uABI, KVM can't outright
reject garbage?

> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/pmu.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 98f383789579..e7d94e6b7f28 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -577,6 +577,38 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
>  }
>  EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
>  
> +static inline u64 get_event_filter_mask(void)
> +{
> +	u64 event_select_mask =
> +		static_call(kvm_x86_pmu_get_eventsel_event_mask)();
> +
> +	return event_select_mask | ARCH_PERFMON_EVENTSEL_UMASK;
> +}
> +
> +static inline bool is_event_valid(u64 event, u64 mask)
> +{
> +	return !(event & ~mask);
> +}

As a general theme, don't go crazy with short helpers that only ever have a single
user.  Having to jump around to find the definition interrupts the reader and
obfuscates simple things.  If the code is particularly complex/weird, then adding
a helper to abstract the concept is useful, but that's not the case here.

> +
> +static void remove_invalid_raw_events(struct kvm_pmu_event_filter *filter)

s/invalid/impossible, and drop the "raw".  Making up terminology when it's not
strictly necessary is always confusing.

> +{
> +	u64 raw_mask;
> +	int i, j;
> +
> +	if (filter->flags)

If I'm reading all this magic correctly, this is dead code because
kvm_vm_ioctl_set_pmu_event_filter() checks flags

	if (tmp.flags != 0)
		return -EINVAL;

and does the somehwat horrific thing of:

	/* Ensure nevents can't be changed between the user copies. */
	*filter = tmp;

> +		return;
> +
> +	raw_mask = get_event_filter_mask();
> +	for (i = 0, j = 0; i < filter->nevents; i++) {
> +		u64 raw_event = filter->events[i];

Meh, using a local variable is unecessary.

> +
> +		if (is_event_valid(raw_event, raw_mask))
> +			filter->events[j++] = raw_event;

With the helpers gone, this is simply: 

		if (filter->events[i] & ~kvm_pmu_ops.EVENTSEL_MASK)
			continue;

		filter->events[j++] = filter->events[i];

> +	}
> +
> +	filter->nevents = j;
> +}
> +
>  int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_pmu_event_filter tmp, *filter;
> @@ -608,6 +640,8 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
>  	/* Ensure nevents can't be changed between the user copies. */
>  	*filter = tmp;

Eww.  This is so gross.  But I guess it's not really that much worse than copying
only the new bits.

Can you opportunisticaly rewrite the comment to clarify that it guards against
_all_ TOCTOU attacks on the verified data?

	/* Restore the verified state to guard against TOCTOU attacks. */
	*filter = tmp;

As a full diff:

---
 arch/x86/kvm/pmu.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d0e2c7eda65b..384cefbe20dd 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -574,6 +574,20 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
 }
 EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
 
+static void remove_impossible_events(struct kvm_pmu_event_filter *filter)
+{
+	int i, j;
+
+	for (i = 0, j = 0; i < filter->nevents; i++) {
+		if (filter->events[i] & ~kvm_pmu_ops.EVENTSEL_MASK)
+			continue;
+
+		filter->events[j++] = filter->events[i];
+	}
+
+	filter->nevents = j;
+}
+
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_pmu_event_filter tmp, *filter;
@@ -602,9 +616,11 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 	if (copy_from_user(filter, argp, size))
 		goto cleanup;
 
-	/* Ensure nevents can't be changed between the user copies. */
+	/* Restore the verified state to guard against TOCTOU attacks. */
 	*filter = tmp;
 
+	remove_impossible_events(filter);
+
 	/*
 	 * Sort the in-kernel list so that we can search it with bsearch.
 	 */

base-commit: a5c25b1eacf50b851badf1c5cbb618094cbdf40f
-- 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 3/7] kvm: x86/pmu: prepare the pmu event filter for masked events
  2022-09-20 17:45 ` [PATCH v5 3/7] kvm: x86/pmu: prepare the pmu event filter for masked events Aaron Lewis
@ 2022-10-08  0:08   ` Sean Christopherson
  0 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-10-08  0:08 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Tue, Sep 20, 2022, Aaron Lewis wrote:
> Create an internal representation for filter events to abstract the
> events userspace uses from the events the kernel uses.  That will allow
> the kernel to use a common event and a common code path between the
> different types of filter events used in userspace once masked events
> are introduced.
> 
> No functional changes intended
> 
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/pmu.c | 116 ++++++++++++++++++++++++++++++++-------------
>  arch/x86/kvm/pmu.h |  16 +++++++
>  2 files changed, 98 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index e7d94e6b7f28..7ce8bfafea91 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -239,6 +239,19 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
>  	return true;
>  }

Hoisting the definition of the union here to make it easier to review.

> +
> +struct kvm_pmu_filter_entry {
> +	union {
> +		u64 raw;
> +		struct {
> +			u64 event_select:12;
> +			u64 unit_mask:8;
> +			u64 rsvd:44;
> +		};
> +	};
> +};
> +
> +#define KVM_PMU_ENCODE_FILTER_ENTRY(event_select, unit_mask) \
> +	(((event_select) & 0xFFFULL) | \
> +	(((unit_mask) & 0xFFULL) << 12))
> +
>  #endif /* __KVM_X86_PMU_H */

...

> +static inline u16 get_event_select(u64 eventsel)
> +{
> +	u64 e = eventsel &
> +		static_call(kvm_x86_pmu_get_eventsel_event_mask)();
> +
> +	return (e & ARCH_PERFMON_EVENTSEL_EVENT) | ((e >> 24) & 0xF00ULL);

This is nasty.  It bleeds vendor details and is unnecessarily complex.  It also
forces this code to care about umask (more on that in patch 4).  Maybe the filter
code ends up caring about the umask anyways, but I think we can defer that until
it's actually necessary.

Rather than pack the data into an arbitrary format, preserve the architectural
format and fill in the gaps.  Then there's no need for a separate in-kernel
layout, and no need to encode anything.

Then this patch is a rather simple refactoring:

---
 arch/x86/kvm/pmu.c | 54 +++++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 384cefbe20dd..882b0870735e 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -257,40 +257,50 @@ static int cmp_u64(const void *pa, const void *pb)
 	return (a > b) - (a < b);
 }
 
+static u64 *find_filter_entry(struct kvm_pmu_event_filter *filter, u64 key)
+{
+	return bsearch(&key, filter->events, filter->nevents,
+		       sizeof(filter->events[0]), cmp_u64);
+}
+
+static bool is_gp_event_allowed(struct kvm_pmu_event_filter *filter, u64 eventsel)
+{
+	if (find_filter_entry(filter, eventsel & kvm_pmu_ops.EVENTSEL_MASK))
+		return filter->action == KVM_PMU_EVENT_ALLOW;
+
+	return filter->action == KVM_PMU_EVENT_DENY;
+}
+
+static bool is_fixed_event_allowed(struct kvm_pmu_event_filter *filter, int idx)
+{
+	int fixed_idx = idx - INTEL_PMC_IDX_FIXED;
+
+	if (filter->action == KVM_PMU_EVENT_DENY &&
+	    test_bit(fixed_idx, (ulong *)&filter->fixed_counter_bitmap))
+		return false;
+	if (filter->action == KVM_PMU_EVENT_ALLOW &&
+	    !test_bit(fixed_idx, (ulong *)&filter->fixed_counter_bitmap))
+		return false;
+
+	return true;
+}
+
 static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu_event_filter *filter;
 	struct kvm *kvm = pmc->vcpu->kvm;
-	bool allow_event = true;
-	__u64 key;
-	int idx;
 
 	if (!static_call(kvm_x86_pmu_hw_event_available)(pmc))
 		return false;
 
 	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
 	if (!filter)
-		goto out;
+		return true;
 
-	if (pmc_is_gp(pmc)) {
-		key = pmc->eventsel & kvm_pmu_ops.EVENTSEL_MASK;
-		if (bsearch(&key, filter->events, filter->nevents,
-			    sizeof(__u64), cmp_u64))
-			allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
-		else
-			allow_event = filter->action == KVM_PMU_EVENT_DENY;
-	} else {
-		idx = pmc->idx - INTEL_PMC_IDX_FIXED;
-		if (filter->action == KVM_PMU_EVENT_DENY &&
-		    test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
-			allow_event = false;
-		if (filter->action == KVM_PMU_EVENT_ALLOW &&
-		    !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
-			allow_event = false;
-	}
+	if (pmc_is_gp(pmc))
+		return is_gp_event_allowed(filter, pmc->eventsel);
 
-out:
-	return allow_event;
+	return is_fixed_event_allowed(filter, pmc->idx);
 }
 
 void reprogram_counter(struct kvm_pmc *pmc)

base-commit: 214272b6e5c3c64394cf52dc81bd5bf099167e5d
-- 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-09-20 17:46 ` [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter Aaron Lewis
@ 2022-10-08  0:37   ` Sean Christopherson
  2022-10-10 15:42   ` Sean Christopherson
  1 sibling, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-10-08  0:37 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Tue, Sep 20, 2022, Aaron Lewis wrote:
> +To access individual components in a masked entry use:
> +::
> +  struct kvm_pmu_event_masked_entry {
> +	union {
> +	__u64 raw;
> +		struct {
> +			/* event_select bits 11:8 are only valid on AMD. */
> +			__u64 event_select:12;
> +			__u64 mask:8;
> +			__u64 match:8;
> +			__u64 exclude:1;

As suggested in patch 3, keep the architectural bits where they are and fill in
the gaps.  IIUC, all of bits 63:36 are available, i.e. there's lots of room for
expansion.  Go top down (start at 63) and cross our fingers that neither Intel
nor AMD puts stuff there.  If that does happen, then we can start mucking with
the layout, but until then, let's not make it too hard for ourselves.

> +			__u64 rsvd:35;

>  #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
>  #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 7ce8bfafea91..b188ddb23f75 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -252,34 +252,61 @@ static inline u8 get_unit_mask(u64 eventsel)
>  	return (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
>  }

...

> +/*
> + * Sort will order the list by exclude, then event select.  This function will
> + * then index the sublists of event selects such that when a search is done on
> + * the list, the head of the event select sublist is returned.  This simplifies
> + * the logic in filter_contains_match() when walking the list.

Unless I'm missing something, this is a complex way to solve a relatively simple
problem.  You want a list of "includes" and a list of "excludes".  Just have two
separate lists.

Actually, if we're effectively changing the ABI, why not make everyone's lives
simpler and expose that to userspace.  E.g. use one of the "pad" words to specify
the number of "include" events and effectively do this:

struct kvm_pmu_event_filter {
	__u32 action;
	__u32 nevents;
	__u32 fixed_counter_bitmap;
	__u32 flags;
	__u32 nr_include_events;
	__u64 include[nr_include_events];
	__u64 exclude[nevents - nr_allowed_events];
};

Then we don't need to steal a bit for "exclude" in the uABI.  The kernel code
gets a wee bit simpler.

> @@ -693,6 +796,10 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
>  	/* Ensure nevents can't be changed between the user copies. */
>  	*filter = tmp;
>  
> +	r = -EINVAL;
> +	if (!is_filter_valid(filter))

Do this in prepare_filter_events() to avoid walking the filter multiple times.
I've gotten fairly twisted around and my local copy of the code is a mess, but
something like this:

static int prepare_filter_events(struct kvm_pmu_event_filter *filter)
{
	int i;

	for (i = 0, j = 0; i < filter->nevents; i++) {
		if (filter->events[i] & ~kvm_pmu_ops.EVENTSEL_MASK) {
			if (!filter->mask)
				continue;

			if (<reserved bits set>)
				return -EINVAL;
		}

		filter->events[j++] = filter->events[i];
	}

	<more magic here>
}


> +		goto cleanup;
> +
>  	prepare_filter_events(filter);
>  
>  	mutex_lock(&kvm->lock);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-09-20 17:46 ` [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter Aaron Lewis
  2022-10-08  0:37   ` Sean Christopherson
@ 2022-10-10 15:42   ` Sean Christopherson
  2022-10-15 15:43     ` Aaron Lewis
  1 sibling, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2022-10-10 15:42 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Tue, Sep 20, 2022, Aaron Lewis wrote:
>  static void convert_to_filter_events(struct kvm_pmu_event_filter *filter)
> @@ -645,7 +719,34 @@ static void convert_to_filter_events(struct kvm_pmu_event_filter *filter)
>  	for (i = 0; i < filter->nevents; i++) {
>  		u64 e = filter->events[i];
>  
> -		filter->events[i] = encode_filter_entry(e);
> +		filter->events[i] = encode_filter_entry(e, filter->flags);
> +	}
> +}
> +
> +/*
> + * Sort will order the list by exclude, then event select.  This function will
> + * then index the sublists of event selects such that when a search is done on
> + * the list, the head of the event select sublist is returned.  This simplifies
> + * the logic in filter_contains_match() when walking the list.

I'm not so sure that this is simpler overall though.  If inclusive vs. exclusive
are separate lists, then avoiding "index" would mean there's no need to convert
entries.  And IIUC, the only thing this saves in filter_contains_match() is
having to walk backwards, e.g. it's

	for (i = index; i < filter->nevents; i++) {
		if (!<eventsel event match>)
			break;

		if (is_filter_match(...))
			return true;
	}

	return false;

versus

	for (i = index; i < filter->nevents; i++) {
		if (filter_event_cmp(eventsel, filter->events[i]))
			break;

		if (is_filter_match(eventsel, filter->events[i]))
			return true;
	}

	for (i = index - 1; i > 0; i--) {
		if (filter_event_cmp(eventsel, filter->events[i]))
			break;

		if (is_filter_match(eventsel, filter->events[i]))
			return true;
	}

	return false;

It's definitely _more_ code in filter_contains_match(), and the duplicate code is
unfortunate, but I wouldn't necessarily say it's simpler.  There's a fair bit of
complexity in understanding the indexing scheme, it's just hidden.

And I believe if the indexing is dropped, then the same filter_event_cmp() helper
can be used for sort() and bsearch(), and for bounding the walks.

And here's also no need to "encode" entries or use a second struct overly.

We might need separate logic for the existing non-masked mechanism, but again
that's only more code, IMO it's not more complex.  E.g. I believe that the legacy
case can be handled with a dedicated:

	if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS))
		return find_filter_index(..., cmp_u64) > 0;

Oh, and as a bonus of splitting include vs. exclude, the legacy case effectively
optimizes exclude since the length of the exclude array will be '0'.

If we do keep the indexing, I think we should rename "index" to "head", e.g. like
"head pages", to make it more obvious that the helper returns the head of a list.

> + */
> +static void index_filter_events(struct kvm_pmu_event_filter *filter)
> +{
> +	struct kvm_pmu_filter_entry *prev, *curr;
> +	int i, index = 0;
> +
> +	if (filter->nevents)
> +		prev = (struct kvm_pmu_filter_entry *)(filter->events);
> +
> +	for (i = 0; i < filter->nevents; i++) {
> +		curr = (struct kvm_pmu_filter_entry *)(&filter->events[i]);
> +
> +		if (curr->event_select != prev->event_select ||
> +		    curr->exclude != prev->exclude) {
> +			index = 0;
> +			prev = curr;
> +		}
> +
> +		curr->event_index = index++;
>  	}
>  }
> + * When filter events are converted into this format then sorted, the
> + * resulting list naturally ends up in two sublists.  One for the 'include
> + * list' and one for the 'exclude list'.  These sublists are further broken
> + * down into sublists ordered by their event select.  After that, the
> + * event select sublists are indexed such that a search for: exclude = n,
> + * event_select = n, and event_index = 0 will return the head of an event
> + * select sublist that can be walked to see if a match exists.
> + */
>  struct kvm_pmu_filter_entry {
>  	union {
>  		u64 raw;
>  		struct {
> +			u64 mask:8;
> +			u64 match:8;
> +			u64 event_index:12;

This is broken.  There are 2^12 possible event_select values, but event_index is
the index into the full list of events, i.e. is bounded only by nevents, and so
this needs to be stored as a 32-bit value.  E.g. if userspace creates a filter
with 2^32-2 entries for eventsel==0, then the index for eventsel==1 will be
2^32-1 even though there are only two event_select values in the entire list.

>  			u64 event_select:12;
> -			u64 unit_mask:8;
> -			u64 rsvd:44;
> +			u64 exclude:1;
> +			u64 rsvd:23;
>  		};
>  	};
>  };

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup
  2022-10-07 23:25   ` Sean Christopherson
@ 2022-10-15 15:43     ` Aaron Lewis
  2022-10-17 18:20       ` Sean Christopherson
  0 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-10-15 15:43 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pbonzini, jmattson

> And the total patch is:
>
> ---
>  arch/x86/kvm/pmu.c           | 2 +-
>  arch/x86/kvm/pmu.h           | 2 ++
>  arch/x86/kvm/svm/pmu.c       | 2 ++
>  arch/x86/kvm/vmx/pmu_intel.c | 2 ++
>  4 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index d9b9a0f0db17..d0e2c7eda65b 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -273,7 +273,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
>                 goto out;
>
>         if (pmc_is_gp(pmc)) {
> -               key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
> +               key = pmc->eventsel & kvm_pmu_ops.EVENTSEL_MASK;
>                 if (bsearch(&key, filter->events, filter->nevents,
>                             sizeof(__u64), cmp_u64))
>                         allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 5cc5721f260b..45a7dd24125d 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -40,6 +40,8 @@ struct kvm_pmu_ops {
>         void (*reset)(struct kvm_vcpu *vcpu);
>         void (*deliver_pmi)(struct kvm_vcpu *vcpu);
>         void (*cleanup)(struct kvm_vcpu *vcpu);
> +
> +       const u64 EVENTSEL_MASK;

Agreed, a constant is better.  Had I realized I could do that, that
would have been my first choice.

What about calling it EVENTSEL_RAW_MASK to make it more descriptive
though?  It's a little too generic given there is EVENTSEL_UMASK and
EVENTSEL_CMASK, also there is precedent for using the term 'raw event'
for (eventsel+umask), i.e.
https://man7.org/linux/man-pages/man1/perf-record.1.html.

>  };
>
>  void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index b68956299fa8..6ef7d1fcdbc2 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -228,4 +228,6 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
>         .refresh = amd_pmu_refresh,
>         .init = amd_pmu_init,
>         .reset = amd_pmu_reset,
> +       .EVENTSEL_MASK = AMD64_EVENTSEL_EVENT |
> +                        ARCH_PERFMON_EVENTSEL_UMASK,
>  };
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 25b70a85bef5..0a1c95b64ef1 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -811,4 +811,6 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = {
>         .reset = intel_pmu_reset,
>         .deliver_pmi = intel_pmu_deliver_pmi,
>         .cleanup = intel_pmu_cleanup,
> +       .EVENTSEL_MASK = ARCH_PERFMON_EVENTSEL_EVENT |
> +                        ARCH_PERFMON_EVENTSEL_UMASK,
>  };
>
> base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354
> --
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-10-10 15:42   ` Sean Christopherson
@ 2022-10-15 15:43     ` Aaron Lewis
  2022-10-17 17:47       ` Sean Christopherson
  0 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-10-15 15:43 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pbonzini, jmattson

On Mon, Oct 10, 2022 at 8:42 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Sep 20, 2022, Aaron Lewis wrote:
> >  static void convert_to_filter_events(struct kvm_pmu_event_filter *filter)
> > @@ -645,7 +719,34 @@ static void convert_to_filter_events(struct kvm_pmu_event_filter *filter)
> >       for (i = 0; i < filter->nevents; i++) {
> >               u64 e = filter->events[i];
> >
> > -             filter->events[i] = encode_filter_entry(e);
> > +             filter->events[i] = encode_filter_entry(e, filter->flags);
> > +     }
> > +}
> > +
> > +/*
> > + * Sort will order the list by exclude, then event select.  This function will
> > + * then index the sublists of event selects such that when a search is done on
> > + * the list, the head of the event select sublist is returned.  This simplifies
> > + * the logic in filter_contains_match() when walking the list.
>

Including previous comments so I can respond in one place:

> Unless I'm missing something, this is a complex way to solve a relatively simple
> problem.  You want a list of "includes" and a list of "excludes".  Just have two
> separate lists.
>
> Actually, if we're effectively changing the ABI, why not make everyone's lives
> simpler and expose that to userspace.  E.g. use one of the "pad" words to specify
> the number of "include" events and effectively do this:
>
> struct kvm_pmu_event_filter {
>         __u32 action;
>         __u32 nevents;
>         __u32 fixed_counter_bitmap;
>         __u32 flags;
>         __u32 nr_include_events;
>         __u64 include[nr_include_events];
>         __u64 exclude[nevents - nr_allowed_events];
> };
>
> Then we don't need to steal a bit for "exclude" in the uABI.  The kernel code
> gets a wee bit simpler.

<end of previous comments>

I'm not sure I understand the struct changes you're proposing.  Is
this what you mean?

struct kvm_pmu_event_filter {
        __u32 action;
        __u32 nevents;
        __u32 fixed_counter_bitmap;
        __u32 flags;
        __u32 nr_include_events;
        __u32 pad;
        __u64 *include; // length == nr_include_events
        __u64 exclude[]; // length == nevents - nr_include_events
};

I considered having an include list and exclude list on the filter,
but I thought that was too much detail to share with userspace.  I've
spent too much time lately trying to change ABI's or wishing they were
different. I'd prefer to share as little as possible with userspace at
this point in hopes of allowing us to change our minds or evolve this
in the future.

To that end, if we use a bit in the event to distinguish between an
include list and an exclude list the only thing we're locked into is
the type of event being introduced with masked events, which would be
easy to iterate on or change.  Then, if we sort the list like I do,
it's naturally split into an include list and an exclude list anyway,
so we essentially have the same thing in the end.  At that point if we
want to explicitly have an include list and an exclude list we could
make an internal struct for kvm_pmu_event_filter similar to what msr
filtering does, but I'm not sure that's necessary.

I think setting it up this way makes the data coming from userspace
less error prone and easier for them to work with because the lists
are being separated for them into include and exclude, and the order
they build it in doesn't matter.

Isn't having the include list pointer exposed to userspace a little
awkward?  As is I'm not sure why userspace would care about it, and
when we load the filter we'd essentially ignore it.  It feels like an
internal variable that's exposed to userspace.  Also, the naming is
confusing when working with non-masked events.  E.g. the "exclude"
list would be used in place of what was the "events" list.  That seems
less intuitive.

> I'm not so sure that this is simpler overall though.  If inclusive vs. exclusive
> are separate lists, then avoiding "index" would mean there's no need to convert
> entries.  And IIUC, the only thing this saves in filter_contains_match() is
> having to walk backwards, e.g. it's
>
>         for (i = index; i < filter->nevents; i++) {
>                 if (!<eventsel event match>)
>                         break;
>
>                 if (is_filter_match(...))
>                         return true;
>         }
>
>         return false;
>
> versus
>
>         for (i = index; i < filter->nevents; i++) {
>                 if (filter_event_cmp(eventsel, filter->events[i]))
>                         break;
>
>                 if (is_filter_match(eventsel, filter->events[i]))
>                         return true;
>         }
>
>         for (i = index - 1; i > 0; i--) {
>                 if (filter_event_cmp(eventsel, filter->events[i]))
>                         break;
>
>                 if (is_filter_match(eventsel, filter->events[i]))
>                         return true;
>         }
>
>         return false;
>
> It's definitely _more_ code in filter_contains_match(), and the duplicate code is
> unfortunate, but I wouldn't necessarily say it's simpler.  There's a fair bit of
> complexity in understanding the indexing scheme, it's just hidden.
>

I think indexing the data is nice to have.  The only additional cost
is walking the list when a filter is loaded, then we are able to
search the head of the list and not have to do this odd walk forward
then backward business.  I'm open to dropping it if it's causing more
confusion than it's worth.  And as you pointed out, I believe not
having it would allow for the use of filter_event_cmp() during
bounding walks.

> And I believe if the indexing is dropped, then the same filter_event_cmp() helper
> can be used for sort() and bsearch(), and for bounding the walks.
>
> And here's also no need to "encode" entries or use a second struct overly.
>

Encoding events solves a different problem than indexing.  It converts
the events that come from userspace into a common format we can use
internally.  That allows the kernel to have a common code path, no
matter what type of event it started out as.

> We might need separate logic for the existing non-masked mechanism, but again
> that's only more code, IMO it's not more complex.  E.g. I believe that the legacy
> case can be handled with a dedicated:
>
>         if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS))
>                 return find_filter_index(..., cmp_u64) > 0;
>

I'd rather not drop encoding, however, if we did, sorting and
searching would both have to be special cased.  Masked events and
non-masked events are not compatible with each other that way.  If the
list is sorted for masked events, you can't do a cmp_u64 search on it
and expect to find anything.  I suppose you could if using the struct
overlay and we included match (which doubles as the unit mask for
non-masked events) in the comparer and dropped indexing.  But that
only works because the struct overlay keeps the event select
contiguous.  The fact that the architectural format has the event
select straddle the unit mask prevents that from working.

> Oh, and as a bonus of splitting include vs. exclude, the legacy case effectively
> optimizes exclude since the length of the exclude array will be '0'.
>

This should apply to both implementations.  I didn't add an include
list and exclude list, but the data is laid out for it.  The main
reason I didn't do that is I didn't think there was enough of a win to
create an internal copy of kvm_pmu_event_filter (stated above), and
it'd be exposing too much of the implementation to userspace to add it
there.

> If we do keep the indexing, I think we should rename "index" to "head", e.g. like
> "head pages", to make it more obvious that the helper returns the head of a list.
>
> > + */
> > +static void index_filter_events(struct kvm_pmu_event_filter *filter)
> > +{
> > +     struct kvm_pmu_filter_entry *prev, *curr;
> > +     int i, index = 0;
> > +
> > +     if (filter->nevents)
> > +             prev = (struct kvm_pmu_filter_entry *)(filter->events);
> > +
> > +     for (i = 0; i < filter->nevents; i++) {
> > +             curr = (struct kvm_pmu_filter_entry *)(&filter->events[i]);
> > +
> > +             if (curr->event_select != prev->event_select ||
> > +                 curr->exclude != prev->exclude) {
> > +                     index = 0;
> > +                     prev = curr;
> > +             }
> > +
> > +             curr->event_index = index++;
> >       }
> >  }
> > + * When filter events are converted into this format then sorted, the
> > + * resulting list naturally ends up in two sublists.  One for the 'include
> > + * list' and one for the 'exclude list'.  These sublists are further broken
> > + * down into sublists ordered by their event select.  After that, the
> > + * event select sublists are indexed such that a search for: exclude = n,
> > + * event_select = n, and event_index = 0 will return the head of an event
> > + * select sublist that can be walked to see if a match exists.
> > + */
> >  struct kvm_pmu_filter_entry {
> >       union {
> >               u64 raw;
> >               struct {
> > +                     u64 mask:8;
> > +                     u64 match:8;
> > +                     u64 event_index:12;
>
> This is broken.  There are 2^12 possible event_select values, but event_index is
> the index into the full list of events, i.e. is bounded only by nevents, and so
> this needs to be stored as a 32-bit value.  E.g. if userspace creates a filter
> with 2^32-2 entries for eventsel==0, then the index for eventsel==1 will be
> 2^32-1 even though there are only two event_select values in the entire list.
>

nevents <= KVM_PMU_EVENT_FILTER_MAX_EVENTS (300), so there is plenty
of space in event_index to cover any valid index.  Though a moot point
if we cut it.

> >                       u64 event_select:12;
> > -                     u64 unit_mask:8;
> > -                     u64 rsvd:44;
> > +                     u64 exclude:1;
> > +                     u64 rsvd:23;
> >               };
> >       };
> >  };

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-10-15 15:43     ` Aaron Lewis
@ 2022-10-17 17:47       ` Sean Christopherson
  2022-10-19  1:06         ` Aaron Lewis
  0 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2022-10-17 17:47 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Sat, Oct 15, 2022, Aaron Lewis wrote:
> On Mon, Oct 10, 2022 at 8:42 AM Sean Christopherson <seanjc@google.com> wrote:
> > Unless I'm missing something, this is a complex way to solve a relatively simple
> > problem.  You want a list of "includes" and a list of "excludes".  Just have two
> > separate lists.
> >
> > Actually, if we're effectively changing the ABI, why not make everyone's lives
> > simpler and expose that to userspace.  E.g. use one of the "pad" words to specify
> > the number of "include" events and effectively do this:
> >
> > struct kvm_pmu_event_filter {
> >         __u32 action;
> >         __u32 nevents;
> >         __u32 fixed_counter_bitmap;
> >         __u32 flags;
> >         __u32 nr_include_events;
> >         __u64 include[nr_include_events];
> >         __u64 exclude[nevents - nr_allowed_events];
> > };
> >
> > Then we don't need to steal a bit for "exclude" in the uABI.  The kernel code
> > gets a wee bit simpler.
> 
> <end of previous comments>
> 
> I'm not sure I understand the struct changes you're proposing.  Is
> this what you mean?
> 
> struct kvm_pmu_event_filter {
>         __u32 action;
>         __u32 nevents;
>         __u32 fixed_counter_bitmap;
>         __u32 flags;
>         __u32 nr_include_events;
>         __u32 pad;
>         __u64 *include; // length == nr_include_events
>         __u64 exclude[]; // length == nevents - nr_include_events

Ya, something like that.

> };
> 
> I considered having an include list and exclude list on the filter,
> but I thought that was too much detail to share with userspace.  I've
> spent too much time lately trying to change ABI's or wishing they were
> different. I'd prefer to share as little as possible with userspace at
> this point in hopes of allowing us to change our minds or evolve this
> in the future.

I don't think a list vs. a flag shares any more or less with userspace, e.g. KVM
could very well merge the lists internally.

> To that end, if we use a bit in the event to distinguish between an
> include list and an exclude list the only thing we're locked into is
> the type of event being introduced with masked events, which would be
> easy to iterate on or change.

As above, we're not locked into an internal implementation either way.

> Then, if we sort the list like I do, it's naturally split into an include
> list and an exclude list anyway, so we essentially have the same thing in the
> end.  At that point if we want to explicitly have an include list and an
> exclude list we could make an internal struct for kvm_pmu_event_filter
> similar to what msr filtering does, but I'm not sure that's necessary.
> 
> I think setting it up this way makes the data coming from userspace
> less error prone and easier for them to work with because the lists
> are being separated for them into include and exclude, and the order
> they build it in doesn't matter.

The above said, I agree that building the separate lists would be obnoxious for
userspace.  Userspace would either need to know the length of the "includes" list
up front, or would have to build the "excludes" list separately and then fold it
in at the end.  Unless we might run out of bits, lets keep the flag approach for
the uAPI.
 
> Isn't having the include list pointer exposed to userspace a little
> awkward?  As is I'm not sure why userspace would care about it, and
> when we load the filter we'd essentially ignore it.

I don't follow.  I assume userspace cares about specifying "includes", otherwise
why provide that functionality?

> It feels like an internal variable that's exposed to userspace.  Also, the
> naming is confusing when working with non-masked events.  E.g. the "exclude"
> list would be used in place of what was the "events" list.  That seems less
> intuitive.

Honestly, I find the "includes vs. excludes" terminology to be quite confusing
regardless of what API is presented to userspace.  Can't think of better names
though.

> > I'm not so sure that this is simpler overall though.  If inclusive vs. exclusive
> > are separate lists, then avoiding "index" would mean there's no need to convert
> > entries.  And IIUC, the only thing this saves in filter_contains_match() is
> > having to walk backwards, e.g. it's
> >
> >         for (i = index; i < filter->nevents; i++) {
> >                 if (!<eventsel event match>)
> >                         break;
> >
> >                 if (is_filter_match(...))
> >                         return true;
> >         }
> >
> >         return false;
> >
> > versus
> >
> >         for (i = index; i < filter->nevents; i++) {
> >                 if (filter_event_cmp(eventsel, filter->events[i]))
> >                         break;
> >
> >                 if (is_filter_match(eventsel, filter->events[i]))
> >                         return true;
> >         }
> >
> >         for (i = index - 1; i > 0; i--) {
> >                 if (filter_event_cmp(eventsel, filter->events[i]))
> >                         break;
> >
> >                 if (is_filter_match(eventsel, filter->events[i]))
> >                         return true;
> >         }
> >
> >         return false;
> >
> > It's definitely _more_ code in filter_contains_match(), and the duplicate code is
> > unfortunate, but I wouldn't necessarily say it's simpler.  There's a fair bit of
> > complexity in understanding the indexing scheme, it's just hidden.
> >
> 
> I think indexing the data is nice to have.  The only additional cost
> is walking the list when a filter is loaded, then we are able to
> search the head of the list and not have to do this odd walk forward
> then backward business.

I'm not concerned about the setup cost to create the sub-lists, my thoughts are
purely from a "how easy is it to understand" perspective, closely followed by
"how performant is the runtime code".  IIUC, the runtime performance is basically
equivalent since the comparison function will need to mask off bits either way.

For the forward+backward, IMO it's easy to understand with a single comment above
the forward+backward walks:

	/*
	 * Entries are sorted by eventsel, walk the list in both directions to
	 * process all entries with the target eventsel.
	 */

Sorting should probably have a comment too, but critically, understanding how the
list is sorted doesn't require full understanding of how lookups are processed.
E.g. a comment like this would suffice for sorting:

	/*
	 * Sort entries by eventsel so that all entries for a given eventsel can
	 * be processed effeciently during filter.
	 */

The sub-list on the other hand requires comments to explain the sub-list concept,
document the indexing code (and its comparator?), the #define for the head bits
(and/or struct field), and the walk code code that processes the sublist.

In addition to having more things to document, the comments will be spread out,
and IMO it's difficult to understand each individual piece without having a good
grasp of the overall implementation.  Using "head" instead of "index" would help
quite a bit for the lookup+walk, but I think the extra pieces will still leave
readers wondering "why?.  E.g. Why is there a separate comparator for sorting
vs. lookup?  Why is there a "head" field?  What problem do sub-lists solve?

> I'm open to dropping it if it's causing more confusion than it's worth.  And
> as you pointed out, I believe not having it would allow for the use of
> filter_event_cmp() during bounding walks.
> 
> > And I believe if the indexing is dropped, then the same filter_event_cmp() helper
> > can be used for sort() and bsearch(), and for bounding the walks.
> >
> > And here's also no need to "encode" entries or use a second struct overly.
> >
> 
> Encoding events solves a different problem than indexing.  It converts
> the events that come from userspace into a common format we can use
> internally.  That allows the kernel to have a common code path, no
> matter what type of event it started out as.

I don't see why encoding is necessary to achieve a common internal imlementation
though.  To make sure we're talking about the same thing, by "encoding" I mean
having different layouts for the same data in the uAPI struct than the in-kernel
struct.  I'm perfectly ok having bits in the uAPI struct that are dropped when
building the in-kernel lists, e.g. the EXCLUDE bit.  Ditto for having bits in the
in-kernel struct that don't exist in the uAPI struct, e.g. the "head" (index) of
the sublist if we go that route.

My objection to "encoding" is moving the eventsel bits around.  AFAICT, it's not
strictly necessary, and similar to the sub-list approach, that raises that question
of "why?".  In all cases, aren't the "eventsel" bits always contained in bits 35:32
and 7:0?  The comparator can simply mask off all other bits in order to find a
filter with the specified eventsel, no?

> > We might need separate logic for the existing non-masked mechanism, but again
> > that's only more code, IMO it's not more complex.  E.g. I believe that the legacy
> > case can be handled with a dedicated:
> >
> >         if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS))
> >                 return find_filter_index(..., cmp_u64) > 0;
> >
> 
> I'd rather not drop encoding, however, if we did, sorting and
> searching would both have to be special cased.  Masked events and
> non-masked events are not compatible with each other that way. If the
> list is sorted for masked events, you can't do a cmp_u64 search on it
> and expect to find anything.

I'm not sure I follow.  As above, doesn't the search for masked vs. non-masked only
differ on the comparator?

> I suppose you could if using the struct overlay and we included match (which
> doubles as the unit mask for non-masked events) in the comparer and dropped
> indexing.

Why does "match" need to be in the comparer?  I thought the sub-lists are created
for each "eventsel"?

> But that only works because the struct overlay keeps the event select
> contiguous.  The fact that the architectural format has the event select
> straddle the unit mask prevents that from working.
> 
> > Oh, and as a bonus of splitting include vs. exclude, the legacy case effectively
> > optimizes exclude since the length of the exclude array will be '0'.
> 
> This should apply to both implementations.

If everything is in a single list, doesn't the exclude lookup need to search the
entire thing to determine that there are no entries?

> I didn't add an include list and exclude list, but the data is laid out for
> it.  The main reason I didn't do that is I didn't think there was enough of a
> win to create an internal copy of kvm_pmu_event_filter (stated above), and
> it'd be exposing too much of the implementation to userspace to add it there.
>
> > If we do keep the indexing, I think we should rename "index" to "head", e.g. like
> > "head pages", to make it more obvious that the helper returns the head of a list.

...

> > >  struct kvm_pmu_filter_entry {
> > >       union {
> > >               u64 raw;
> > >               struct {
> > > +                     u64 mask:8;
> > > +                     u64 match:8;
> > > +                     u64 event_index:12;
> >
> > This is broken.  There are 2^12 possible event_select values, but event_index is
> > the index into the full list of events, i.e. is bounded only by nevents, and so
> > this needs to be stored as a 32-bit value.  E.g. if userspace creates a filter
> > with 2^32-2 entries for eventsel==0, then the index for eventsel==1 will be
> > 2^32-1 even though there are only two event_select values in the entire list.
> >
> 
> nevents <= KVM_PMU_EVENT_FILTER_MAX_EVENTS (300), so there is plenty
> of space in event_index to cover any valid index.  Though a moot point
> if we cut it.

Oooh.  In that case, won't 9 bits suffice?  Using 12 implies there's a connection
to eventsel, which is misleading and confusing.

Regardless, if we keep the indexing, there should be a BUILD_BUG_ON() to assert
that the "head" (event_index) field is larger enough to hold the max number of
events.  Not sure if there's a way to get the number of bits, i.e. might require
a separate #define to get the compile-time assert. :-/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup
  2022-10-15 15:43     ` Aaron Lewis
@ 2022-10-17 18:20       ` Sean Christopherson
  2022-10-21  1:33         ` Aaron Lewis
  0 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2022-10-17 18:20 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Sat, Oct 15, 2022, Aaron Lewis wrote:
> > And the total patch is:
> >
> > ---
> >  arch/x86/kvm/pmu.c           | 2 +-
> >  arch/x86/kvm/pmu.h           | 2 ++
> >  arch/x86/kvm/svm/pmu.c       | 2 ++
> >  arch/x86/kvm/vmx/pmu_intel.c | 2 ++
> >  4 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > index d9b9a0f0db17..d0e2c7eda65b 100644
> > --- a/arch/x86/kvm/pmu.c
> > +++ b/arch/x86/kvm/pmu.c
> > @@ -273,7 +273,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
> >                 goto out;
> >
> >         if (pmc_is_gp(pmc)) {
> > -               key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
> > +               key = pmc->eventsel & kvm_pmu_ops.EVENTSEL_MASK;
> >                 if (bsearch(&key, filter->events, filter->nevents,
> >                             sizeof(__u64), cmp_u64))
> >                         allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
> > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > index 5cc5721f260b..45a7dd24125d 100644
> > --- a/arch/x86/kvm/pmu.h
> > +++ b/arch/x86/kvm/pmu.h
> > @@ -40,6 +40,8 @@ struct kvm_pmu_ops {
> >         void (*reset)(struct kvm_vcpu *vcpu);
> >         void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> >         void (*cleanup)(struct kvm_vcpu *vcpu);
> > +
> > +       const u64 EVENTSEL_MASK;
> 
> Agreed, a constant is better.  Had I realized I could do that, that
> would have been my first choice.
> 
> What about calling it EVENTSEL_RAW_MASK to make it more descriptive
> though?  It's a little too generic given there is EVENTSEL_UMASK and
> EVENTSEL_CMASK, also there is precedent for using the term 'raw event'
> for (eventsel+umask), i.e.
> https://man7.org/linux/man-pages/man1/perf-record.1.html.

Hmm.  I'd prefer to avoid "raw" because that implies there's a non-raw version
that can be translated into the "raw" version.  This is kinda the opposite, where
the above field is the composite type and the invidiual fields are the "raw"
components.

Refresh me, as I've gotten twisted about: this mask needs to be EVENTSEL_EVENT_MASK
+ EVENTSEL_UMASK, correct?  Or phrased differently, it'll hold more than just
EVENTSEL_EVENT_MASK?

What about something completely different, e.g. FILTER_MASK?  It'll require a
comment to document, but that seems inevitable, and FILTER_MASK should be really
hard to confuse with the myriad EVENTSEL_*MASK fields.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-10-17 17:47       ` Sean Christopherson
@ 2022-10-19  1:06         ` Aaron Lewis
  2022-10-21  1:33           ` Aaron Lewis
  0 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-10-19  1:06 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pbonzini, jmattson

> > I think setting it up this way makes the data coming from userspace
> > less error prone and easier for them to work with because the lists
> > are being separated for them into include and exclude, and the order
> > they build it in doesn't matter.
>
> The above said, I agree that building the separate lists would be obnoxious for
> userspace.  Userspace would either need to know the length of the "includes" list
> up front, or would have to build the "excludes" list separately and then fold it
> in at the end.  Unless we might run out of bits, lets keep the flag approach for
> the uAPI.
>
> > Isn't having the include list pointer exposed to userspace a little
> > awkward?  As is I'm not sure why userspace would care about it, and
> > when we load the filter we'd essentially ignore it.
>
> I don't follow.  I assume userspace cares about specifying "includes", otherwise
> why provide that functionality?

Yes, userspace cares about includes, but I'm not sure they will care
about the "include" pointer because we would ignore it when loading a
filter (there's no information it provides we can get from the other
fields) and it would be cumbersome for them to use as you pointed out.
So, I was just guessing that they wouldn't use it.

>
> I'm not concerned about the setup cost to create the sub-lists, my thoughts are
> purely from a "how easy is it to understand" perspective, closely followed by
> "how performant is the runtime code".  IIUC, the runtime performance is basically
> equivalent since the comparison function will need to mask off bits either way.
>
> For the forward+backward, IMO it's easy to understand with a single comment above
> the forward+backward walks:
>
>         /*
>          * Entries are sorted by eventsel, walk the list in both directions to
>          * process all entries with the target eventsel.
>          */
>
> Sorting should probably have a comment too, but critically, understanding how the
> list is sorted doesn't require full understanding of how lookups are processed.
> E.g. a comment like this would suffice for sorting:
>
>         /*
>          * Sort entries by eventsel so that all entries for a given eventsel can
>          * be processed efficiently during filter.
>          */
>
> The sub-list on the other hand requires comments to explain the sub-list concept,
> document the indexing code (and its comparator?), the #define for the head bits
> (and/or struct field), and the walk code code that processes the sublist.
>
> In addition to having more things to document, the comments will be spread out,
> and IMO it's difficult to understand each individual piece without having a good
> grasp of the overall implementation.  Using "head" instead of "index" would help
> quite a bit for the lookup+walk, but I think the extra pieces will still leave
> readers wondering "why?.  E.g. Why is there a separate comparator for sorting
> vs. lookup?  Why is there a "head" field?  What problem do sub-lists solve?
>

I'll drop indexing.

>
> I don't see why encoding is necessary to achieve a common internal implementation
> though.  To make sure we're talking about the same thing, by "encoding" I mean
> having different layouts for the same data in the uAPI struct than the in-kernel
> struct.  I'm perfectly ok having bits in the uAPI struct that are dropped when
> building the in-kernel lists, e.g. the EXCLUDE bit.  Ditto for having bits in the
> in-kernel struct that don't exist in the uAPI struct, e.g. the "head" (index) of
> the sublist if we go that route.
>
> My objection to "encoding" is moving the eventsel bits around.  AFAICT, it's not
> strictly necessary, and similar to the sub-list approach, that raises that question
> of "why?".  In all cases, aren't the "eventsel" bits always contained in bits 35:32
> and 7:0?  The comparator can simply mask off all other bits in order to find a
> filter with the specified eventsel, no?

Strictly speaking, yes... as long as you are aware of some boundaries
and work within them.

Problem:
--------

Given the event select + unit mask pairs: {0x101, 0x1}, {0x101, 0x2},
{0x102, 0x1}

If we use the architectural layout we get: 0x10011, 0x10021, 0x10012.

Sorted by filter_event_cmp(), i.e. event select, it's possible to get:
0x10012, 0x10011, 0x10021.

Then if a search for {0x101, 0x2} with cmp_u64() is done it wouldn't
find anything because 0x10021 is deemed less than 0x10011 in this
list.  That's why adding find_filter_index(..., cmp_u64) > 0 doesn't
work to special case legacy events.

Similarly, if sorted by cmp_u64() we get: 0x10021, 0x10012, 0x10011.

Then a search with filter_event_cmp() for 0x10021 would fail as well.

Possible workaround #1:
-----------------------

If we use the internal layout (removed index).

struct kvm_pmu_filter_entry {
        union {
                u64 raw;
                struct {
                        u64 mask:8;
                        u64 match:8; // doubles as umask, i.e.
encode_filter_entry()
                        u64 event_select:12;
                        u64 exclude:1;
                        u64 rsvd:35;
                };
        };
};

Because the event_select is contiguous, all of its bits are above
match, and the fields in kvm_pmu_filter_entry are ordered from most
important to least important from a sort perspective, a sort with
cmp_u64() gives us: 0x10210, 0x10120, 0x10110.

This order will work for both legacy events and masked events.  So, as
you suggested, if we wanted to add the following to
is_gp_event_allowed() we can special case legacy events and early out:

        if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS))
                return find_filter_index(..., cmp_u64) > 0;

Or we can leave it as is and the mask events path will come up with
the same result.

Possible workaround #2:
-----------------------

If we use the architectural layout for masked events, e.g.:

struct kvm_pmu_event_filter_entry {
        union {
                __u64 raw;
                struct {
                        __u64 select_lo:8;
                        __u64 match:8;
                        __u64 rsvd1:16;
                        __u64 select_hi:4;
                        __u64 rsvd2:19;
                        __u64 mask:8;
                        __u64 exclude:1;
                };
        }
}

This becomes more restrictive, but for our use case I believe it will
work.  The problem with this layout in general is where the event
select ends up.  You can't put anything below 'select_lo', and
'select_lo' and 'select_hi' straddle the unit mask, so you just have
less control over the order things end up.  That said, if our goal is
to just sort by exclude + event select for masked events we can do
that with filter_event_cmp() on the layout above.  If we want to find
a legacy event we can use cmp_u64() on the layout above as well.  The
only caveat is you can't mix and match during lookup like you can in
workaround #1.  You have to sort and search with the same comparer.

If we convert legacy events to masked events I'm pretty sure we can
still have a common code path.

>
> > > We might need separate logic for the existing non-masked mechanism, but again
> > > that's only more code, IMO it's not more complex.  E.g. I believe that the legacy
> > > case can be handled with a dedicated:
> > >
> > >         if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS))
> > >                 return find_filter_index(..., cmp_u64) > 0;
> > >
> >
> > I'd rather not drop encoding, however, if we did, sorting and
> > searching would both have to be special cased.  Masked events and
> > non-masked events are not compatible with each other that way. If the
> > list is sorted for masked events, you can't do a cmp_u64 search on it
> > and expect to find anything.
>
> I'm not sure I follow.  As above, doesn't the search for masked vs. non-masked only
> differ on the comparator?
>
> > I suppose you could if using the struct overlay and we included match (which
> > doubles as the unit mask for non-masked events) in the comparer and dropped
> > indexing.
>
> Why does "match" need to be in the comparer?  I thought the sub-lists are created
> for each "eventsel"?
>
> > But that only works because the struct overlay keeps the event select
> > contiguous.  The fact that the architectural format has the event select
> > straddle the unit mask prevents that from working.
> >
> > > Oh, and as a bonus of splitting include vs. exclude, the legacy case effectively
> > > optimizes exclude since the length of the exclude array will be '0'.
> >
> > This should apply to both implementations.
>
> If everything is in a single list, doesn't the exclude lookup need to search the
> entire thing to determine that there are no entries?

Yes.  To do what I'm suggesting would require creating an internal
representation of the struct kvm_pmu_event_filter to track the include
and exclude portions of the list.  But that should be fairly
straightforward to set up when loading the filter, then we would be
able to search the individual lists rather than the whole thing.
Having said that, I'm not saying this is necessary, all I'm saying is
that it wouldn't be hard to do.

> ...
>
> > > >  struct kvm_pmu_filter_entry {
> > > >       union {
> > > >               u64 raw;
> > > >               struct {
> > > > +                     u64 mask:8;
> > > > +                     u64 match:8;
> > > > +                     u64 event_index:12;
> > >
> > > This is broken.  There are 2^12 possible event_select values, but event_index is
> > > the index into the full list of events, i.e. is bounded only by nevents, and so
> > > this needs to be stored as a 32-bit value.  E.g. if userspace creates a filter
> > > with 2^32-2 entries for eventsel==0, then the index for eventsel==1 will be
> > > 2^32-1 even though there are only two event_select values in the entire list.
> > >
> >
> > nevents <= KVM_PMU_EVENT_FILTER_MAX_EVENTS (300), so there is plenty
> > of space in event_index to cover any valid index.  Though a moot point
> > if we cut it.
>
> Oooh.  In that case, won't 9 bits suffice?  Using 12 implies there's a connection
> to eventsel, which is misleading and confusing.

Yes, 9 would suffice.  I gave some buffer because I liked the idea of
round numbers and I had a lot of space in the struct.  I didn't see it
implying a connection to event_select, but in hindsight I can see why
you may have thought that.

>
> Regardless, if we keep the indexing, there should be a BUILD_BUG_ON() to assert
> that the "head" (event_index) field is larger enough to hold the max number of
> events.  Not sure if there's a way to get the number of bits, i.e. might require
> a separate #define to get the compile-time assert. :-/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup
  2022-10-17 18:20       ` Sean Christopherson
@ 2022-10-21  1:33         ` Aaron Lewis
  0 siblings, 0 replies; 23+ messages in thread
From: Aaron Lewis @ 2022-10-21  1:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pbonzini, jmattson

On Mon, Oct 17, 2022 at 6:20 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Sat, Oct 15, 2022, Aaron Lewis wrote:
> > > And the total patch is:
> > >
> > > ---
> > >  arch/x86/kvm/pmu.c           | 2 +-
> > >  arch/x86/kvm/pmu.h           | 2 ++
> > >  arch/x86/kvm/svm/pmu.c       | 2 ++
> > >  arch/x86/kvm/vmx/pmu_intel.c | 2 ++
> > >  4 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > > index d9b9a0f0db17..d0e2c7eda65b 100644
> > > --- a/arch/x86/kvm/pmu.c
> > > +++ b/arch/x86/kvm/pmu.c
> > > @@ -273,7 +273,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
> > >                 goto out;
> > >
> > >         if (pmc_is_gp(pmc)) {
> > > -               key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
> > > +               key = pmc->eventsel & kvm_pmu_ops.EVENTSEL_MASK;
> > >                 if (bsearch(&key, filter->events, filter->nevents,
> > >                             sizeof(__u64), cmp_u64))
> > >                         allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
> > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > > index 5cc5721f260b..45a7dd24125d 100644
> > > --- a/arch/x86/kvm/pmu.h
> > > +++ b/arch/x86/kvm/pmu.h
> > > @@ -40,6 +40,8 @@ struct kvm_pmu_ops {
> > >         void (*reset)(struct kvm_vcpu *vcpu);
> > >         void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > >         void (*cleanup)(struct kvm_vcpu *vcpu);
> > > +
> > > +       const u64 EVENTSEL_MASK;
> >
> > Agreed, a constant is better.  Had I realized I could do that, that
> > would have been my first choice.
> >
> > What about calling it EVENTSEL_RAW_MASK to make it more descriptive
> > though?  It's a little too generic given there is EVENTSEL_UMASK and
> > EVENTSEL_CMASK, also there is precedent for using the term 'raw event'
> > for (eventsel+umask), i.e.
> > https://man7.org/linux/man-pages/man1/perf-record.1.html.
>
> Hmm.  I'd prefer to avoid "raw" because that implies there's a non-raw version
> that can be translated into the "raw" version.  This is kinda the opposite, where
> the above field is the composite type and the invidiual fields are the "raw"
> components.
>
> Refresh me, as I've gotten twisted about: this mask needs to be EVENTSEL_EVENT_MASK
> + EVENTSEL_UMASK, correct?  Or phrased differently, it'll hold more than just
> EVENTSEL_EVENT_MASK?

I found it more useful to just have the event select.  That allowed me
to use just it or the event select + umask as needed in patch #4.

const u64 EVENTSEL_EVENT;

>
> What about something completely different, e.g. FILTER_MASK?  It'll require a
> comment to document, but that seems inevitable, and FILTER_MASK should be really
> hard to confuse with the myriad EVENTSEL_*MASK fields.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-10-19  1:06         ` Aaron Lewis
@ 2022-10-21  1:33           ` Aaron Lewis
  2022-10-21 17:50             ` Sean Christopherson
  0 siblings, 1 reply; 23+ messages in thread
From: Aaron Lewis @ 2022-10-21  1:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pbonzini, jmattson

On Wed, Oct 19, 2022 at 1:06 AM Aaron Lewis <aaronlewis@google.com> wrote:
>
> > > I think setting it up this way makes the data coming from userspace
> > > less error prone and easier for them to work with because the lists
> > > are being separated for them into include and exclude, and the order
> > > they build it in doesn't matter.
> >
> > The above said, I agree that building the separate lists would be obnoxious for
> > userspace.  Userspace would either need to know the length of the "includes" list
> > up front, or would have to build the "excludes" list separately and then fold it
> > in at the end.  Unless we might run out of bits, lets keep the flag approach for
> > the uAPI.
> >
> > > Isn't having the include list pointer exposed to userspace a little
> > > awkward?  As is I'm not sure why userspace would care about it, and
> > > when we load the filter we'd essentially ignore it.
> >
> > I don't follow.  I assume userspace cares about specifying "includes", otherwise
> > why provide that functionality?
>
> Yes, userspace cares about includes, but I'm not sure they will care
> about the "include" pointer because we would ignore it when loading a
> filter (there's no information it provides we can get from the other
> fields) and it would be cumbersome for them to use as you pointed out.
> So, I was just guessing that they wouldn't use it.
>
> >
> > I'm not concerned about the setup cost to create the sub-lists, my thoughts are
> > purely from a "how easy is it to understand" perspective, closely followed by
> > "how performant is the runtime code".  IIUC, the runtime performance is basically
> > equivalent since the comparison function will need to mask off bits either way.
> >
> > For the forward+backward, IMO it's easy to understand with a single comment above
> > the forward+backward walks:
> >
> >         /*
> >          * Entries are sorted by eventsel, walk the list in both directions to
> >          * process all entries with the target eventsel.
> >          */
> >
> > Sorting should probably have a comment too, but critically, understanding how the
> > list is sorted doesn't require full understanding of how lookups are processed.
> > E.g. a comment like this would suffice for sorting:
> >
> >         /*
> >          * Sort entries by eventsel so that all entries for a given eventsel can
> >          * be processed efficiently during filter.
> >          */
> >
> > The sub-list on the other hand requires comments to explain the sub-list concept,
> > document the indexing code (and its comparator?), the #define for the head bits
> > (and/or struct field), and the walk code code that processes the sublist.
> >
> > In addition to having more things to document, the comments will be spread out,
> > and IMO it's difficult to understand each individual piece without having a good
> > grasp of the overall implementation.  Using "head" instead of "index" would help
> > quite a bit for the lookup+walk, but I think the extra pieces will still leave
> > readers wondering "why?.  E.g. Why is there a separate comparator for sorting
> > vs. lookup?  Why is there a "head" field?  What problem do sub-lists solve?
> >
>
> I'll drop indexing.
>
> >
> > I don't see why encoding is necessary to achieve a common internal implementation
> > though.  To make sure we're talking about the same thing, by "encoding" I mean
> > having different layouts for the same data in the uAPI struct than the in-kernel
> > struct.  I'm perfectly ok having bits in the uAPI struct that are dropped when
> > building the in-kernel lists, e.g. the EXCLUDE bit.  Ditto for having bits in the
> > in-kernel struct that don't exist in the uAPI struct, e.g. the "head" (index) of
> > the sublist if we go that route.
> >
> > My objection to "encoding" is moving the eventsel bits around.  AFAICT, it's not
> > strictly necessary, and similar to the sub-list approach, that raises that question
> > of "why?".  In all cases, aren't the "eventsel" bits always contained in bits 35:32
> > and 7:0?  The comparator can simply mask off all other bits in order to find a
> > filter with the specified eventsel, no?
>
> Strictly speaking, yes... as long as you are aware of some boundaries
> and work within them.
>
> Problem:
> --------
>
> Given the event select + unit mask pairs: {0x101, 0x1}, {0x101, 0x2},
> {0x102, 0x1}
>
> If we use the architectural layout we get: 0x10011, 0x10021, 0x10012.
>
> Sorted by filter_event_cmp(), i.e. event select, it's possible to get:
> 0x10012, 0x10011, 0x10021.
>
> Then if a search for {0x101, 0x2} with cmp_u64() is done it wouldn't
> find anything because 0x10021 is deemed less than 0x10011 in this
> list.  That's why adding find_filter_index(..., cmp_u64) > 0 doesn't
> work to special case legacy events.
>
> Similarly, if sorted by cmp_u64() we get: 0x10021, 0x10012, 0x10011.
>
> Then a search with filter_event_cmp() for 0x10021 would fail as well.
>
> Possible workaround #1:
> -----------------------
>
> If we use the internal layout (removed index).
>
> struct kvm_pmu_filter_entry {
>         union {
>                 u64 raw;
>                 struct {
>                         u64 mask:8;
>                         u64 match:8; // doubles as umask, i.e.
> encode_filter_entry()
>                         u64 event_select:12;
>                         u64 exclude:1;
>                         u64 rsvd:35;
>                 };
>         };
> };
>
> Because the event_select is contiguous, all of its bits are above
> match, and the fields in kvm_pmu_filter_entry are ordered from most
> important to least important from a sort perspective, a sort with
> cmp_u64() gives us: 0x10210, 0x10120, 0x10110.
>
> This order will work for both legacy events and masked events.  So, as
> you suggested, if we wanted to add the following to
> is_gp_event_allowed() we can special case legacy events and early out:
>
>         if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS))
>                 return find_filter_index(..., cmp_u64) > 0;
>
> Or we can leave it as is and the mask events path will come up with
> the same result.
>
> Possible workaround #2:
> -----------------------
>
> If we use the architectural layout for masked events, e.g.:
>
> struct kvm_pmu_event_filter_entry {
>         union {
>                 __u64 raw;
>                 struct {
>                         __u64 select_lo:8;
>                         __u64 match:8;
>                         __u64 rsvd1:16;
>                         __u64 select_hi:4;
>                         __u64 rsvd2:19;
>                         __u64 mask:8;
>                         __u64 exclude:1;
>                 };
>         }
> }
>
> This becomes more restrictive, but for our use case I believe it will
> work.  The problem with this layout in general is where the event
> select ends up.  You can't put anything below 'select_lo', and
> 'select_lo' and 'select_hi' straddle the unit mask, so you just have
> less control over the order things end up.  That said, if our goal is
> to just sort by exclude + event select for masked events we can do
> that with filter_event_cmp() on the layout above.  If we want to find
> a legacy event we can use cmp_u64() on the layout above as well.  The
> only caveat is you can't mix and match during lookup like you can in
> workaround #1.  You have to sort and search with the same comparer.
>
> If we convert legacy events to masked events I'm pretty sure we can
> still have a common code path.
>
> >
> > > > We might need separate logic for the existing non-masked mechanism, but again
> > > > that's only more code, IMO it's not more complex.  E.g. I believe that the legacy
> > > > case can be handled with a dedicated:
> > > >
> > > >         if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS))
> > > >                 return find_filter_index(..., cmp_u64) > 0;
> > > >
> > >
> > > I'd rather not drop encoding, however, if we did, sorting and
> > > searching would both have to be special cased.  Masked events and
> > > non-masked events are not compatible with each other that way. If the
> > > list is sorted for masked events, you can't do a cmp_u64 search on it
> > > and expect to find anything.
> >
> > I'm not sure I follow.  As above, doesn't the search for masked vs. non-masked only
> > differ on the comparator?
> >
> > > I suppose you could if using the struct overlay and we included match (which
> > > doubles as the unit mask for non-masked events) in the comparer and dropped
> > > indexing.
> >
> > Why does "match" need to be in the comparer?  I thought the sub-lists are created
> > for each "eventsel"?
> >
> > > But that only works because the struct overlay keeps the event select
> > > contiguous.  The fact that the architectural format has the event select
> > > straddle the unit mask prevents that from working.
> > >
> > > > Oh, and as a bonus of splitting include vs. exclude, the legacy case effectively
> > > > optimizes exclude since the length of the exclude array will be '0'.
> > >
> > > This should apply to both implementations.
> >
> > If everything is in a single list, doesn't the exclude lookup need to search the
> > entire thing to determine that there are no entries?
>
> Yes.  To do what I'm suggesting would require creating an internal
> representation of the struct kvm_pmu_event_filter to track the include
> and exclude portions of the list.  But that should be fairly
> straightforward to set up when loading the filter, then we would be
> able to search the individual lists rather than the whole thing.
> Having said that, I'm not saying this is necessary, all I'm saying is
> that it wouldn't be hard to do.
>
> > ...
> >
> > > > >  struct kvm_pmu_filter_entry {
> > > > >       union {
> > > > >               u64 raw;
> > > > >               struct {
> > > > > +                     u64 mask:8;
> > > > > +                     u64 match:8;
> > > > > +                     u64 event_index:12;
> > > >
> > > > This is broken.  There are 2^12 possible event_select values, but event_index is
> > > > the index into the full list of events, i.e. is bounded only by nevents, and so
> > > > this needs to be stored as a 32-bit value.  E.g. if userspace creates a filter
> > > > with 2^32-2 entries for eventsel==0, then the index for eventsel==1 will be
> > > > 2^32-1 even though there are only two event_select values in the entire list.
> > > >
> > >
> > > nevents <= KVM_PMU_EVENT_FILTER_MAX_EVENTS (300), so there is plenty
> > > of space in event_index to cover any valid index.  Though a moot point
> > > if we cut it.
> >
> > Oooh.  In that case, won't 9 bits suffice?  Using 12 implies there's a connection
> > to eventsel, which is misleading and confusing.
>
> Yes, 9 would suffice.  I gave some buffer because I liked the idea of
> round numbers and I had a lot of space in the struct.  I didn't see it
> implying a connection to event_select, but in hindsight I can see why
> you may have thought that.
>
> >
> > Regardless, if we keep the indexing, there should be a BUILD_BUG_ON() to assert
> > that the "head" (event_index) field is larger enough to hold the max number of
> > events.  Not sure if there's a way to get the number of bits, i.e. might require
> > a separate #define to get the compile-time assert. :-/

Here's what I came up with.  Let me know if this is what you were thinking:

arch/x86/include/uapi/asm/kvm.h

#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0)
#define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS)

/*
 * Masked event layout.
 * Bits   Description
 * ----   -----------
 * 7:0    event select (low bits)
 * 15:8   umask match
 * 31:16  unused
 * 35:32  event select (high bits)
 * 36:54  unused
 * 55     exclude bit
 * 63:56  umask mask
 */

#define KVM_PMU_ENCODE_MASKED_ENTRY(event_select, mask, match, exclude) \
        (((event_select) & 0xFFULL) | (((event_select) & 0XF00ULL) << 24) | \
        (((mask) & 0xFFULL) << 56) | \
        (((match) & 0xFFULL) << 8) | \
        ((__u64)(!!(exclude)) << 55))

#define KVM_PMU_MASKED_ENTRY_EVENT_SELECT \
        (GENMASK_ULL(7, 0) | GENMASK_ULL(35, 32))
#define KVM_PMU_MASKED_ENTRY_UMASK_MASK         (GENMASK_ULL(63, 56))
#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH        (GENMASK_ULL(15, 8))
#define KVM_PMU_MASKED_ENTRY_EXCLUDE            (BIT_ULL(55))
#define KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT   (56)


arch/x86/include/asm/kvm_host.h

struct kvm_x86_pmu_event_filter {
        __u32 action;
        __u32 nevents;
        __u32 fixed_counter_bitmap;
        __u32 flags;
        __u32 nr_includes;
        __u32 nr_excludes;
        __u64 *includes;
        __u64 *excludes;
        __u64 events[];
};

- struct kvm_pmu_event_filter __rcu *pmu_event_filter;
+ struct kvm_x86_pmu_event_filter __rcu *pmu_event_filter;


arch/x86/kvm/pmu.c

static int filter_sort_cmp(const void *pa, const void *pb)
{
        u64 a = *(u64 *)pa & (KVM_PMU_MASKED_ENTRY_EVENT_SELECT |
                              KVM_PMU_MASKED_ENTRY_EXCLUDE);
        u64 b = *(u64 *)pb & (KVM_PMU_MASKED_ENTRY_EVENT_SELECT |
                              KVM_PMU_MASKED_ENTRY_EXCLUDE);

        return (a > b) - (a < b);
}

/*
 * For the event filter, searching is done on the 'includes' list and
 * 'excludes' list separately rather than on the 'events' list (which
 * has both).  As a result the exclude bit can be ignored.
 */
static int filter_event_cmp(const void *pa, const void *pb)
{
        u64 a = *(u64 *)pa & KVM_PMU_MASKED_ENTRY_EVENT_SELECT;
        u64 b = *(u64 *)pb & KVM_PMU_MASKED_ENTRY_EVENT_SELECT;

        return (a > b) - (a < b);
}

static int find_filter_index(u64 *events, u64 nevents, u64 key)
{
        u64 *fe = bsearch(&key, events, nevents, sizeof(events[0]),
                          filter_event_cmp);

        if (!fe)
                return -1;

        return fe - events;
}

static bool is_filter_entry_match(u64 filter_event, u64 umask)
{
        u64 mask = filter_event >> (KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT - 8);
        u64 match = filter_event & KVM_PMU_MASKED_ENTRY_UMASK_MATCH;

        BUILD_BUG_ON((KVM_PMU_ENCODE_MASKED_ENTRY(0, 0xff, 0, false) >>
                     (KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT - 8)) !=
                     ARCH_PERFMON_EVENTSEL_UMASK);

        return (umask & mask) == match;
}

static bool filter_contains_match(u64 *events, u64 nevents, u64 eventsel)
{
        u64 event_select = eventsel & kvm_pmu_ops.EVENTSEL_EVENT;
        u64 umask = eventsel & ARCH_PERFMON_EVENTSEL_UMASK;
        int i, index;

        index = find_filter_index(events, nevents, event_select);
        if (index < 0)
                return false;

        /*
         * Entries are sorted by the event select.  Walk the list in both
         * directions to process all entries with the targeted event select.
         */
        for (i = index; i < nevents; i++) {
                if (filter_event_cmp(&events[i], &event_select) != 0)
                        break;

                if (is_filter_entry_match(events[i], umask))
                        return true;
        }

        for (i = index - 1; i >= 0; i--) {
                if (filter_event_cmp(&events[i], &event_select) != 0)
                        break;

                if (is_filter_entry_match(events[i], umask))
                        return true;
        }

        return false;
}

static bool is_gp_event_allowed(struct kvm_x86_pmu_event_filter *filter,
                                u64 eventsel)
{
        if (filter_contains_match(filter->includes,
filter->nr_includes, eventsel) &&
            !filter_contains_match(filter->excludes,
filter->nr_excludes, eventsel))
                return filter->action == KVM_PMU_EVENT_ALLOW;

        return filter->action == KVM_PMU_EVENT_DENY;
}

< All the code above here is for filtering the guest eventsel. >
< All the code below here is for validating and loading the filter. >

static bool is_filter_valid(struct kvm_x86_pmu_event_filter *filter)
{
        u64 mask;
        int i;

        /* To maintain backwards compatibility only validate masked events. */
        if (filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS) {
                mask = kvm_pmu_ops.EVENTSEL_EVENT |
                       KVM_PMU_MASKED_ENTRY_UMASK_MASK |
                       KVM_PMU_MASKED_ENTRY_UMASK_MATCH |
                       KVM_PMU_MASKED_ENTRY_EXCLUDE;

                for (i = 0; i < filter->nevents; i++) {
                        if (filter->events[i] & ~mask)
                                return false;
                }
        }

        return true;
}

static void prepare_filter_events(struct kvm_x86_pmu_event_filter *filter)
{
        int i, j;

        if (filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
                return;

        for (i = 0, j = 0; i < filter->nevents; i++) {
                /*
                 * Skip events that are impossible to match against a guest
                 * event.  When filtering, only the event select + unit mask
                 * of the guest event is used.
                 */
                if (filter->events[i] & ~(kvm_pmu_ops.EVENTSEL_EVENT |
                                          ARCH_PERFMON_EVENTSEL_UMASK))
                        continue;

                /*
                 * Convert userspace events to a common in-kernel event so
                 * only one code path is needed to support both events.  For
                 * the in-kernel events use masked events because they are
                 * flexible enough to handle both cases.  To convert to masked
                 * events all that's needed is to add the umask_mask.
                 */
                filter->events[j++] =
                        filter->events[i] |
                        (0xFFULL << KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT);
        }

        filter->nevents = j;
}

static void setup_filter_lists(struct kvm_x86_pmu_event_filter *filter)
{
        int i;

        for (i = 0; i < filter->nevents; i++) {
                if(filter->events[i] & KVM_PMU_MASKED_ENTRY_EXCLUDE)
                        break;
        }

        filter->nr_includes = i;
        filter->nr_excludes = filter->nevents - filter->nr_includes;
        filter->includes = filter->events;
        filter->excludes = filter->events + filter->nr_includes;
}

 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 {
-       struct kvm_pmu_event_filter tmp, *filter;
+       struct kvm_pmu_event_filter __user *user_filter = argp;
+       struct kvm_x86_pmu_event_filter *filter;
+       struct kvm_pmu_event_filter tmp;
        size_t size;
        int r;

-       if (copy_from_user(&tmp, argp, sizeof(tmp)))
+       if (copy_from_user(&tmp, user_filter, sizeof(tmp)))
                return -EFAULT;

        if (tmp.action != KVM_PMU_EVENT_ALLOW &&
            tmp.action != KVM_PMU_EVENT_DENY)
                return -EINVAL;

-       if (tmp.flags != 0)
+       if (tmp.flags & ~KVM_PMU_EVENT_FLAGS_VALID_MASK)
                return -EINVAL;

        if (tmp.nevents > KVM_PMU_EVENT_FILTER_MAX_EVENTS)
                return -E2BIG;

        size = struct_size(filter, events, tmp.nevents);
-       filter = kmalloc(size, GFP_KERNEL_ACCOUNT);
+       filter = kzalloc(size, GFP_KERNEL_ACCOUNT);
        if (!filter)
                return -ENOMEM;

+       filter->action = tmp.action;
+       filter->nevents = tmp.nevents;
+       filter->fixed_counter_bitmap = tmp.fixed_counter_bitmap;
+       filter->flags = tmp.flags;
+
        r = -EFAULT;
-       if (copy_from_user(filter, argp, size))
+       if (copy_from_user(filter->events, user_filter->events,
+                          sizeof(filter->events[0]) * filter->nevents))
                goto cleanup;

-       /* Restore the verified state to guard against TOCTOU attacks. */
-       *filter = tmp;
+       r = -EINVAL;
+       if (!is_filter_valid(filter))
+               goto cleanup;

-       remove_impossible_events(filter);
+       prepare_filter_events(filter);

        /*
-        * Sort the in-kernel list so that we can search it with bsearch.
+        * Sort entries by event select so that all entries for a given
+        * event select can be processed efficiently during filtering.
         */
-       sort(&filter->events, filter->nevents, sizeof(__u64), cmp_u64, NULL);
+       sort(&filter->events, filter->nevents, sizeof(filter->events[0]),
+            filter_sort_cmp, NULL);
+
+       setup_filter_lists(filter);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter
  2022-10-21  1:33           ` Aaron Lewis
@ 2022-10-21 17:50             ` Sean Christopherson
  0 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2022-10-21 17:50 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson

On Fri, Oct 21, 2022, Aaron Lewis wrote:
> Here's what I came up with.  Let me know if this is what you were thinking:

A purely mechanical suggestions, but overall looks awesome! 

> static int filter_sort_cmp(const void *pa, const void *pb)
> {
>         u64 a = *(u64 *)pa & (KVM_PMU_MASKED_ENTRY_EVENT_SELECT |
>                               KVM_PMU_MASKED_ENTRY_EXCLUDE);
>         u64 b = *(u64 *)pb & (KVM_PMU_MASKED_ENTRY_EVENT_SELECT |
>                               KVM_PMU_MASKED_ENTRY_EXCLUDE);
> 
>         return (a > b) - (a < b);
> }
> 
> /*
>  * For the event filter, searching is done on the 'includes' list and
>  * 'excludes' list separately rather than on the 'events' list (which
>  * has both).  As a result the exclude bit can be ignored.
>  */
> static int filter_event_cmp(const void *pa, const void *pb)
> {
>         u64 a = *(u64 *)pa & KVM_PMU_MASKED_ENTRY_EVENT_SELECT;
>         u64 b = *(u64 *)pb & KVM_PMU_MASKED_ENTRY_EVENT_SELECT;
> 
>         return (a > b) - (a < b);
> }


To dedup code slightly and make this a little more readable, what about adding a
common helper to do the compare?  That also makes it quite obvious that the only
difference is the inclusion (heh) of the EXCLUDE flag.

static int filter_cmp(u64 *pa, u64 *pb, u64 mask)
{
        u64 a = *pa & mask;
        u64 b = *pb & mask;

        return (a > b) - (a < b);
}

static int filter_sort_cmp(const void *pa, const void *pb)
{
        return filter_cmp(pa, pb, (KVM_PMU_MASKED_ENTRY_EVENT_SELECT |
                                   KVM_PMU_MASKED_ENTRY_EXCLUDE);
}

/*
 * For the event filter, searching is done on the 'includes' list and
 * 'excludes' list separately rather than on the 'events' list (which
 * has both).  As a result the exclude bit can be ignored.
 */
static int filter_event_cmp(const void *pa, const void *pb)
{
        return filter_cmp(pa, pb, (KVM_PMU_MASKED_ENTRY_EVENT_SELECT);
}

> static bool filter_contains_match(u64 *events, u64 nevents, u64 eventsel)
> {
>         u64 event_select = eventsel & kvm_pmu_ops.EVENTSEL_EVENT;
>         u64 umask = eventsel & ARCH_PERFMON_EVENTSEL_UMASK;
>         int i, index;
> 
>         index = find_filter_index(events, nevents, event_select);
>         if (index < 0)
>                 return false;
> 
>         /*
>          * Entries are sorted by the event select.  Walk the list in both
>          * directions to process all entries with the targeted event select.
>          */
>         for (i = index; i < nevents; i++) {
>                 if (filter_event_cmp(&events[i], &event_select) != 0)

Preferred kernel style is to omit comparisons against zero, i.e. just

		if (filter_event_cmp(&events[i], &event_select))
			break;

>                         break;
> 
>                 if (is_filter_entry_match(events[i], umask))
>                         return true;
>         }
> 
>         for (i = index - 1; i >= 0; i--) {
>                 if (filter_event_cmp(&events[i], &event_select) != 0)
>                         break;
> 
>                 if (is_filter_entry_match(events[i], umask))
>                         return true;
>         }
> 
>         return false;
> }
> 
> static bool is_gp_event_allowed(struct kvm_x86_pmu_event_filter *filter,
>                                 u64 eventsel)
> {
>         if (filter_contains_match(filter->includes,
> filter->nr_includes, eventsel) &&
>             !filter_contains_match(filter->excludes,
> filter->nr_excludes, eventsel))
>                 return filter->action == KVM_PMU_EVENT_ALLOW;
> 
>         return filter->action == KVM_PMU_EVENT_DENY;

Might be worth using a single char for the filter param, e.g. 'f' yields:

static bool is_gp_event_allowed(struct kvm_x86_pmu_event_filter *f,
                                u64 eventsel)
{
        if (filter_contains_match(f->includes, f->nr_includes, eventsel) &&
            !filter_contains_match(f->excludes, f->nr_excludes, eventsel))
                return f->action == KVM_PMU_EVENT_ALLOW;

        return f->action == KVM_PMU_EVENT_DENY;
}

> static void setup_filter_lists(struct kvm_x86_pmu_event_filter *filter)
> {
>         int i;
> 
>         for (i = 0; i < filter->nevents; i++) {
>                 if(filter->events[i] & KVM_PMU_MASKED_ENTRY_EXCLUDE)a

Space after the if

		if (filter-> ...)

>                         break;
>         }
> 
>         filter->nr_includes = i;
>         filter->nr_excludes = filter->nevents - filter->nr_includes;
>         filter->includes = filter->events;
>         filter->excludes = filter->events + filter->nr_includes;
> }
> 

...

> static void prepare_filter_events(struct kvm_x86_pmu_event_filter *filter)
> {
>         int i, j;
> 
>         if (filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
>                 return;
> 
>         for (i = 0, j = 0; i < filter->nevents; i++) {
>                 /*
>                  * Skip events that are impossible to match against a guest
>                  * event.  When filtering, only the event select + unit mask
>                  * of the guest event is used.

This is a good place for calling out that impossible filters can't be rejected
for backwards compatibility reasons.

>                  */
>                 if (filter->events[i] & ~(kvm_pmu_ops.EVENTSEL_EVENT |
>                                           ARCH_PERFMON_EVENTSEL_UMASK))
>                         continue;
> 
>                 /*
>                  * Convert userspace events to a common in-kernel event so
>                  * only one code path is needed to support both events.  For
>                  * the in-kernel events use masked events because they are
>                  * flexible enough to handle both cases.  To convert to masked
>                  * events all that's needed is to add the umask_mask.

I think it's worth calling out this creates an "all ones" umask_mask, and that
EXCLUDE isn't supported.

>                  */
>                 filter->events[j++] =
>                         filter->events[i] |
>                         (0xFFULL << KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT);
>         }
> 
>         filter->nevents = j;
> }

...

> -       /* Restore the verified state to guard against TOCTOU attacks. */
> -       *filter = tmp;
> +       r = -EINVAL;
> +       if (!is_filter_valid(filter))
> +               goto cleanup;
> 
> -       remove_impossible_events(filter);
> +       prepare_filter_events(filter);
> 
>         /*
> -        * Sort the in-kernel list so that we can search it with bsearch.
> +        * Sort entries by event select so that all entries for a given
> +        * event select can be processed efficiently during filtering.
>          */
> -       sort(&filter->events, filter->nevents, sizeof(__u64), cmp_u64, NULL);
> +       sort(&filter->events, filter->nevents, sizeof(filter->events[0]),
> +            filter_sort_cmp, NULL);
> +
> +       setup_filter_lists(filter);

The sort+setup should definitely go in a single helper.  Rather than have the
helpers deal with masked vs. legacy, what about putting that logic in a top-level
helper?  Then this code is simply:

	r = prepare_filter_lists(filter);
	if (r)
		goto cleanup;

And the helper names can be more explicit, i.e. can call out that they validate
a masked filter and convert to a masked filter.

E.g. (completely untested)

static bool is_masked_filter_valid(const struct kvm_x86_pmu_event_filter *filter)
{
        u64 mask = kvm_pmu_ops.EVENTSEL_EVENT |
                   KVM_PMU_MASKED_ENTRY_UMASK_MASK |
                   KVM_PMU_MASKED_ENTRY_UMASK_MATCH |
                   KVM_PMU_MASKED_ENTRY_EXCLUDE;
        int i;

        for (i = 0; i < filter->nevents; i++) {
                if (filter->events[i] & ~mask)
                        return false;
        }

        return true;
}
static void convert_to_masked_filter(struct kvm_x86_pmu_event_filter *filter)
{
        int i, j;

        for (i = 0, j = 0; i < filter->nevents; i++) {
                /*
                 * Skip events that are impossible to match against a guest
                 * event.  When filtering, only the event select + unit mask
                 * of the guest event is used.  To maintain backwards
                 * compatibility, impossible filters can't be rejected :-(
                 */
                if (filter->events[i] & ~(kvm_pmu_ops.EVENTSEL_EVENT |
                                          ARCH_PERFMON_EVENTSEL_UMASK))
                        continue;
                /*
                 * Convert userspace events to a common in-kernel event so
                 * only one code path is needed to support both events.  For
                 * the in-kernel events use masked events because they are
                 * flexible enough to handle both cases.  To convert to masked
                 * events all that's needed is to add an "all ones" umask_mask,
                 * (unmasked filter events don't support EXCLUDE).
                 */
                filter->events[j++] = filter->events[i] |
                                      (0xFFULL << KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT);
	}

        filter->nevents = j;
}

static int prepare_filter_lists(struct kvm_x86_pmu_event_filter *filter)
{
        int i;

        if (!(filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
                convert_to_masked_filter(filter)
        else if (!is_masked_filter_valid(filter))
                return -EINVAL;

        /*
         * Sort entries by event select and includes vs. excludes so that all
         * entries for a given event select can be processed efficiently during
         * filtering.  The EXCLUDE flag uses a more significant bit than the
         * event select, and so the sorted list is also effectively split into
         * includes and excludes sub-lists.
         */
        sort(&filter->events, filter->nevents, sizeof(filter->events[0]),
             filter_sort_cmp, NULL);

        /* Find the first EXCLUDE event (only supported for masked events). */
        if (filter->flags & KVM_PMU_EVENT_FLAG_MASKED_EVENTS) {
                for (i = 0; i < filter->nevents; i++) {
                        if (filter->events[i] & KVM_PMU_MASKED_ENTRY_EXCLUDE)
                                break;
                }
        }

        filter->nr_includes = i;
        filter->nr_excludes = filter->nevents - filter->nr_includes;
        filter->includes = filter->events;
        filter->excludes = filter->events + filter->nr_includes;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-10-21 17:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20 17:45 [PATCH v5 0/7] Introduce and test masked events Aaron Lewis
2022-09-20 17:45 ` [PATCH v5 1/7] kvm: x86/pmu: Correct the mask used in a pmu event filter lookup Aaron Lewis
2022-10-07 23:25   ` Sean Christopherson
2022-10-15 15:43     ` Aaron Lewis
2022-10-17 18:20       ` Sean Christopherson
2022-10-21  1:33         ` Aaron Lewis
2022-09-20 17:45 ` [PATCH v5 2/7] kvm: x86/pmu: Remove invalid raw events from the pmu event filter Aaron Lewis
2022-10-07 23:48   ` Sean Christopherson
2022-09-20 17:45 ` [PATCH v5 3/7] kvm: x86/pmu: prepare the pmu event filter for masked events Aaron Lewis
2022-10-08  0:08   ` Sean Christopherson
2022-09-20 17:46 ` [PATCH v5 4/7] kvm: x86/pmu: Introduce masked events to the pmu event filter Aaron Lewis
2022-10-08  0:37   ` Sean Christopherson
2022-10-10 15:42   ` Sean Christopherson
2022-10-15 15:43     ` Aaron Lewis
2022-10-17 17:47       ` Sean Christopherson
2022-10-19  1:06         ` Aaron Lewis
2022-10-21  1:33           ` Aaron Lewis
2022-10-21 17:50             ` Sean Christopherson
2022-09-20 17:46 ` [PATCH v5 5/7] selftests: kvm/x86: Add flags when creating a " Aaron Lewis
2022-09-20 17:46 ` [PATCH v5 6/7] selftests: kvm/x86: Add testing for KVM_SET_PMU_EVENT_FILTER Aaron Lewis
2022-09-20 17:56   ` Jim Mattson
2022-09-20 17:46 ` [PATCH v5 7/7] selftests: kvm/x86: Test masked events Aaron Lewis
2022-09-20 17:59   ` Jim Mattson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).