All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/2] measure SMI cost (kernel)
@ 2017-03-29  1:45 kan.liang
  2017-03-29  1:45 ` [PATCH V4 1/2] x86/msr: expose msr_flip_bit function kan.liang
  2017-03-29  1:45 ` [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI kan.liang
  0 siblings, 2 replies; 8+ messages in thread
From: kan.liang @ 2017-03-29  1:45 UTC (permalink / raw)
  To: peterz, tglx, mingo, linux-kernel; +Cc: bp, acme, eranian, jolsa, ak, Kan Liang

From: Kan Liang <Kan.liang@intel.com>

Currently, there is no way to measure the time cost in System management
mode (SMM) by perf.

Intel perfmon supports FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. Once it sets,
the PMU core counters will freeze on SMI handler. But it will not have an
effect on free running counters. E.g. APERF counter.
The cost of SMI can be measured by (aperf - cycles).

A new sysfs entry /sys/device/cpu/freeze_on_smi is introduced to set
FREEZE_WHILE_SMM bit in IA32_DEBUGCTL.

A new --smi-cost mode in perf stat is implemented to measure the SMI cost
by calculating cycles and aperf results. In practice, the percentages of
SMI cycles should be more useful than absolute value. So the output will be
the percentage of SMI cycles and SMI#.
If user wants to get the actual cycles, they can apply --no-metric-only.

Here is an example output.

 Performance counter stats for 'sudo echo ':

SMI cycles%          SMI#
    0.1%              1

       0.010858678 seconds time elapsed

Changes since V1:
 - Only include kernel patch
 - New functions to set msr bit on cpu and cpus.
   Using the new functions to replace rdmsrl_on_cpu and wrmsrl_on_cpu.
   That avoids the extra IPIs and atomic issue.
 - Support hotplug

Changes since V2:
 - reuse msr_info

Changes since V3:
 - Add hotplug protection
 - Thanks to Thomas's suggestion. Using msr_flip_bit interfaces to
   replace msr_set/clear_on_cpu(s) interfaces.
 - Serialize the entire setting of freeze_on_smi

Kan Liang (2):
  x86/msr: expose msr_flip_bit function
  perf/x86: add sysfs entry to freeze counter on SMI

 arch/x86/events/core.c           | 10 +++++++
 arch/x86/events/intel/core.c     | 60 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h     |  3 ++
 arch/x86/include/asm/msr-index.h |  2 ++
 arch/x86/include/asm/msr.h       |  1 +
 arch/x86/lib/msr.c               |  7 +++--
 6 files changed, 80 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V4 1/2] x86/msr: expose msr_flip_bit function
  2017-03-29  1:45 [PATCH V4 0/2] measure SMI cost (kernel) kan.liang
@ 2017-03-29  1:45 ` kan.liang
  2017-03-29  5:47   ` Thomas Gleixner
  2017-03-29  1:45 ` [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI kan.liang
  1 sibling, 1 reply; 8+ messages in thread
From: kan.liang @ 2017-03-29  1:45 UTC (permalink / raw)
  To: peterz, tglx, mingo, linux-kernel; +Cc: bp, acme, eranian, jolsa, ak, Kan Liang

From: Kan Liang <Kan.liang@intel.com>

There is no exported kernel interfaces which can flip a MSR bit. It has
to do read-modify-write operation on the MSR through rd/wrmsr*
interfaces. But the method is not atomic.

There is already __flip_bit support. Just rename and expose it.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kan Liang <Kan.liang@intel.com>
---
 arch/x86/include/asm/msr.h | 1 +
 arch/x86/lib/msr.c         | 7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 898dba2..c1e3026 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -312,6 +312,7 @@ struct msr *msrs_alloc(void);
 void msrs_free(struct msr *msrs);
 int msr_set_bit(u32 msr, u8 bit);
 int msr_clear_bit(u32 msr, u8 bit);
+int msr_flip_bit(u32 msr, u8 bit, bool set);
 
 #ifdef CONFIG_SMP
 int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c
index 0776425..9bda539 100644
--- a/arch/x86/lib/msr.c
+++ b/arch/x86/lib/msr.c
@@ -58,7 +58,7 @@ int msr_write(u32 msr, struct msr *m)
 	return wrmsrl_safe(msr, m->q);
 }
 
-static inline int __flip_bit(u32 msr, u8 bit, bool set)
+int msr_flip_bit(u32 msr, u8 bit, bool set)
 {
 	struct msr m, m1;
 	int err = -EINVAL;
@@ -85,6 +85,7 @@ static inline int __flip_bit(u32 msr, u8 bit, bool set)
 
 	return 1;
 }
+EXPORT_SYMBOL_GPL(msr_flip_bit);
 
 /**
  * Set @bit in a MSR @msr.
@@ -96,7 +97,7 @@ static inline int __flip_bit(u32 msr, u8 bit, bool set)
  */
 int msr_set_bit(u32 msr, u8 bit)
 {
-	return __flip_bit(msr, bit, true);
+	return msr_flip_bit(msr, bit, true);
 }
 
 /**
@@ -109,7 +110,7 @@ int msr_set_bit(u32 msr, u8 bit)
  */
 int msr_clear_bit(u32 msr, u8 bit)
 {
-	return __flip_bit(msr, bit, false);
+	return msr_flip_bit(msr, bit, false);
 }
 
 #ifdef CONFIG_TRACEPOINTS
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI
  2017-03-29  1:45 [PATCH V4 0/2] measure SMI cost (kernel) kan.liang
  2017-03-29  1:45 ` [PATCH V4 1/2] x86/msr: expose msr_flip_bit function kan.liang
@ 2017-03-29  1:45 ` kan.liang
  2017-03-29  4:18   ` Andi Kleen
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: kan.liang @ 2017-03-29  1:45 UTC (permalink / raw)
  To: peterz, tglx, mingo, linux-kernel; +Cc: bp, acme, eranian, jolsa, ak, Kan Liang

From: Kan Liang <Kan.liang@intel.com>

Currently, the SMIs are visible to all performance counters. Because
many users want to measure everything including SMIs. But in some
cases, the SMI cycles should not be count. For example, to calculate
the cost of SMI itself. So a knob is needed.

When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
counters will be effected. There is no way to do per-counter freeze
on SMI. So it should not use the per-event interface (e.g. ioctl or
event attribute) to set FREEZE_WHILE_SMM bit.

Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
while in SMM.
Value has to be 0 or 1. It will be applied to all online cpus.
Also serialize the entire setting so we don't get multiple concurrent
threads trying to update to different values.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
---
 arch/x86/events/core.c           | 10 +++++++
 arch/x86/events/intel/core.c     | 60 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h     |  3 ++
 arch/x86/include/asm/msr-index.h |  2 ++
 4 files changed, 75 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 349d4d1..c16fb50 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1750,6 +1750,8 @@ ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event)
 	return ret;
 }
 
+static struct attribute_group x86_pmu_attr_group;
+
 static int __init init_hw_perf_events(void)
 {
 	struct x86_pmu_quirk *quirk;
@@ -1813,6 +1815,14 @@ static int __init init_hw_perf_events(void)
 			x86_pmu_events_group.attrs = tmp;
 	}
 
+	if (x86_pmu.attrs) {
+		struct attribute **tmp;
+
+		tmp = merge_attr(x86_pmu_attr_group.attrs, x86_pmu.attrs);
+		if (!WARN_ON(!tmp))
+			x86_pmu_attr_group.attrs = tmp;
+	}
+
 	pr_info("... version:                %d\n",     x86_pmu.version);
 	pr_info("... bit width:              %d\n",     x86_pmu.cntval_bits);
 	pr_info("... generic registers:      %d\n",     x86_pmu.num_counters);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4244bed..f1ed7cb 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3160,6 +3160,13 @@ static int intel_pmu_cpu_prepare(int cpu)
 	return -ENOMEM;
 }
 
+static void flip_smm_bit(void *data)
+{
+	int val = *(int *)data;
+
+	msr_flip_bit(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_FREEZE_WHILE_SMM_BIT, (bool)val);
+}
+
 static void intel_pmu_cpu_starting(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
@@ -3174,6 +3181,8 @@ static void intel_pmu_cpu_starting(int cpu)
 
 	cpuc->lbr_sel = NULL;
 
+	flip_smm_bit(&x86_pmu.attr_freeze_on_smi);
+
 	if (!cpuc->shared_regs)
 		return;
 
@@ -3595,6 +3604,55 @@ static struct attribute *hsw_events_attrs[] = {
 	NULL
 };
 
+static ssize_t freeze_on_smi_show(struct device *cdev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	return sprintf(buf, "%d\n", x86_pmu.attr_freeze_on_smi);
+}
+
+static DEFINE_MUTEX(freeze_on_smi_mutex);
+
+static ssize_t freeze_on_smi_store(struct device *cdev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	unsigned long val;
+	ssize_t ret;
+
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (val > 1)
+		return -EINVAL;
+
+	if (x86_pmu.attr_freeze_on_smi == val)
+		return count;
+
+	mutex_lock(&freeze_on_smi_mutex);
+
+	get_online_cpus();
+
+	flip_smm_bit(&val);
+	smp_call_function(flip_smm_bit, &val, 1);
+
+	put_online_cpus();
+
+	x86_pmu.attr_freeze_on_smi = val;
+
+	mutex_unlock(&freeze_on_smi_mutex);
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(freeze_on_smi);
+
+static struct attribute *intel_pmu_attrs[] = {
+	&dev_attr_freeze_on_smi.attr,
+	NULL,
+};
+
 __init int intel_pmu_init(void)
 {
 	union cpuid10_edx edx;
@@ -3641,6 +3699,8 @@ __init int intel_pmu_init(void)
 
 	x86_pmu.max_pebs_events		= min_t(unsigned, MAX_PEBS_EVENTS, x86_pmu.num_counters);
 
+
+	x86_pmu.attrs			= intel_pmu_attrs;
 	/*
 	 * Quirk: v2 perfmon does not report fixed-purpose events, so
 	 * assume at least 3 events, when not running in a hypervisor:
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index bcbb1d2..110cb9b0 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -561,6 +561,9 @@ struct x86_pmu {
 	ssize_t		(*events_sysfs_show)(char *page, u64 config);
 	struct attribute **cpu_events;
 
+	int		attr_freeze_on_smi;
+	struct attribute **attrs;
+
 	/*
 	 * CPU Hotplug hooks
 	 */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d8b5f8a..bdb00fa 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -134,6 +134,8 @@
 #define DEBUGCTLMSR_BTS_OFF_OS		(1UL <<  9)
 #define DEBUGCTLMSR_BTS_OFF_USR		(1UL << 10)
 #define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI	(1UL << 11)
+#define DEBUGCTLMSR_FREEZE_WHILE_SMM_BIT	14
+#define DEBUGCTLMSR_FREEZE_WHILE_SMM	(1UL << DEBUGCTLMSR_FREEZE_WHILE_SMM_BIT)
 
 #define MSR_PEBS_FRONTEND		0x000003f7
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI
  2017-03-29  1:45 ` [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI kan.liang
@ 2017-03-29  4:18   ` Andi Kleen
  2017-03-29  6:00   ` Thomas Gleixner
  2017-03-29  6:24   ` Ingo Molnar
  2 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2017-03-29  4:18 UTC (permalink / raw)
  To: kan.liang; +Cc: peterz, tglx, mingo, linux-kernel, bp, acme, eranian, jolsa

> +	if (x86_pmu.attr_freeze_on_smi == val)
> +		return count;
> +
> +	mutex_lock(&freeze_on_smi_mutex);

I don't think the mutex is really needed, but if it was it would 
need to include the previous check too to be atomic.

> +
> +	get_online_cpus();
> +
> +	flip_smm_bit(&val);
> +	smp_call_function(flip_smm_bit, &val, 1);
> +
> +	put_online_cpus();

This is on_each_cpu() 

> +
> +	x86_pmu.attr_freeze_on_smi = val;
> +
> +	mutex_unlock(&freeze_on_smi_mutex);


-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 1/2] x86/msr: expose msr_flip_bit function
  2017-03-29  1:45 ` [PATCH V4 1/2] x86/msr: expose msr_flip_bit function kan.liang
@ 2017-03-29  5:47   ` Thomas Gleixner
  2017-03-29 13:01     ` Liang, Kan
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2017-03-29  5:47 UTC (permalink / raw)
  To: Kan Liang; +Cc: peterz, mingo, linux-kernel, bp, acme, eranian, jolsa, ak

On Tue, 28 Mar 2017, kan.liang@intel.com wrote:
> From: Kan Liang <Kan.liang@intel.com>
> 
> There is no exported kernel interfaces which can flip a MSR bit. It has
> to do read-modify-write operation on the MSR through rd/wrmsr*
> interfaces. But the method is not atomic.
> 
> There is already __flip_bit support. Just rename and expose it.

This function is not atomic either. Protection has to be provided by the
caller.

> -static inline int __flip_bit(u32 msr, u8 bit, bool set)
> +int msr_flip_bit(u32 msr, u8 bit, bool set)
>  {
>  	struct msr m, m1;
>  	int err = -EINVAL;
> @@ -85,6 +85,7 @@ static inline int __flip_bit(u32 msr, u8 bit, bool set)
>  
>  	return 1;
>  }
> +EXPORT_SYMBOL_GPL(msr_flip_bit);

That export is not required. The call site is always built in.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI
  2017-03-29  1:45 ` [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI kan.liang
  2017-03-29  4:18   ` Andi Kleen
@ 2017-03-29  6:00   ` Thomas Gleixner
  2017-03-29  6:24   ` Ingo Molnar
  2 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2017-03-29  6:00 UTC (permalink / raw)
  To: Kan Liang; +Cc: peterz, mingo, linux-kernel, bp, acme, eranian, jolsa, ak

On Tue, 28 Mar 2017, kan.liang@intel.com wrote:
> +static void flip_smm_bit(void *data)
> +{
> +	int val = *(int *)data;
> +
> +	msr_flip_bit(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_FREEZE_WHILE_SMM_BIT, (bool)val);

I asked you before to use line breaks for lines over 80 chars. Is it that
hard?

> +static DEFINE_MUTEX(freeze_on_smi_mutex);
> +
> + static ssize_t freeze_on_smi_store(struct device *cdev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t count)
> +{
> +	unsigned long val;
> +	ssize_t ret;
> +
> +	ret = kstrtoul(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	if (val > 1)
> +		return -EINVAL;
> +
> +	if (x86_pmu.attr_freeze_on_smi == val)
> +		return count;
> +
> +	mutex_lock(&freeze_on_smi_mutex);

This wants to protect the check above as well.

> +
> +	get_online_cpus();
> +
> +	flip_smm_bit(&val);

Sigh. This still is racy against preemption and interrupts.

> +	smp_call_function(flip_smm_bit, &val, 1);

Yes, I had smp_call_function() in my example, but I'd expected that you
figure out yourself to use on_each_cpu(), which calls the function on the
calling cpu with interrupts disabled. 

> +	put_online_cpus();
> +
> +	x86_pmu.attr_freeze_on_smi = val;

Crap. So a CPU coming online between put_online_cpus() and the store will
not see it.

Sigh,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI
  2017-03-29  1:45 ` [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI kan.liang
  2017-03-29  4:18   ` Andi Kleen
  2017-03-29  6:00   ` Thomas Gleixner
@ 2017-03-29  6:24   ` Ingo Molnar
  2 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2017-03-29  6:24 UTC (permalink / raw)
  To: kan.liang; +Cc: peterz, tglx, mingo, linux-kernel, bp, acme, eranian, jolsa, ak


* kan.liang@intel.com <kan.liang@intel.com> wrote:

> +static void flip_smm_bit(void *data)
> +{
> +	int val = *(int *)data;
> +
> +	msr_flip_bit(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_FREEZE_WHILE_SMM_BIT, (bool)val);
> +}

BTW., you can probably shorten that and remove a type cast by using a more natural 
type for 'val':

static void flip_smm_bit(void *data)
{
	bool set = *(int *)data;

	msr_flip_bit(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_FREEZE_WHILE_SMM_BIT, set);
}

Also note that 'set' is the more natural local variable name here as well, as it 
matches the parameter name of the msr_flip_bit() function.

BTW. #2, "MSR_IA32_DEBUGCTLMSR" is a bit of a misnomer, why is 'MSR' mentioned 
twice? If it was 'MSR_IA32_DEBUGCTL' then it could all be:

	msr_flip_bit(MSR_IA32_DEBUGCTL, DEBUGCTL_FREEZE_WHILE_SMM_BIT, set);

plus if 'FREEZE_WHILE_SMM' is renamed to 'FREEZE_IN_SMM', we'd have:

	msr_flip_bit(MSR_IA32_DEBUGCTL, DEBUGCTL_FREEZE_IN_SMM_BIT, set);

... which, incidentally, fits into 80 cols nicely.

But that's unrelated to your patch, you just made canonical use of the existing 
nomenclature.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH V4 1/2] x86/msr: expose msr_flip_bit function
  2017-03-29  5:47   ` Thomas Gleixner
@ 2017-03-29 13:01     ` Liang, Kan
  0 siblings, 0 replies; 8+ messages in thread
From: Liang, Kan @ 2017-03-29 13:01 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: peterz, mingo, linux-kernel, bp, acme, eranian, jolsa, ak



> > -static inline int __flip_bit(u32 msr, u8 bit, bool set)
> > +int msr_flip_bit(u32 msr, u8 bit, bool set)
> >  {
> >  	struct msr m, m1;
> >  	int err = -EINVAL;
> > @@ -85,6 +85,7 @@ static inline int __flip_bit(u32 msr, u8 bit, bool
> > set)
> >
> >  	return 1;
> >  }
> > +EXPORT_SYMBOL_GPL(msr_flip_bit);
> 
> That export is not required. The call site is always built in.
> 

If so, msr_set_bit/msr_clear_bit should be enough for our requirement.
msr_flip_bit is just a duplicate interface.
I think I will drop this patch.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-29 13:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-29  1:45 [PATCH V4 0/2] measure SMI cost (kernel) kan.liang
2017-03-29  1:45 ` [PATCH V4 1/2] x86/msr: expose msr_flip_bit function kan.liang
2017-03-29  5:47   ` Thomas Gleixner
2017-03-29 13:01     ` Liang, Kan
2017-03-29  1:45 ` [PATCH V4 2/2] perf/x86: add sysfs entry to freeze counter on SMI kan.liang
2017-03-29  4:18   ` Andi Kleen
2017-03-29  6:00   ` Thomas Gleixner
2017-03-29  6:24   ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.