All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
@ 2022-01-22  7:26 Kyle Huey
  2022-01-24 12:21 ` Peter Zijlstra
  2022-01-27  2:22 ` Andrew Cooper
  0 siblings, 2 replies; 14+ messages in thread
From: Kyle Huey @ 2022-01-22  7:26 UTC (permalink / raw)
  To: linux-kernel, Kan Liang
  Cc: linux-perf-users, H. Peter Anvin, x86, Dave Hansen,
	Borislav Petkov, Thomas Gleixner, Namhyung Kim, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Arnaldo Carvalho de Melo,
	Ingo Molnar, Peter Zijlstra, Robert O'Callahan, Keno Fischer

Beginning in Comet Lake, Intel extended the concept of privilege rings to
SMM.[0] A side effect of this is that events caused by execution of code
in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
set.

rr[1] depends on exact counts of performance events for the user space
tracee, so this change in behavior is fatal for us. It is, however, easily
corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
set freeze_on_smi manually when appropriate, because observing events in
SMM is rarely useful to anyone, we propose to change the default value of
this switch.

In this patch I have assumed that all non-Atom Intel microarchitectures
starting with Comet Lake behave like this but it would be good for someone
at Intel to verify that.

[0] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform"
at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the
end of page 5.

[1] https://rr-project.org/

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/events/intel/core.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index fd9f908debe5..9604e19c8761 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6094,6 +6094,11 @@ __init int intel_pmu_init(void)
 			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
 		}
 
+		if (boot_cpu_data.x86_model == INTEL_FAM6_COMETLAKE_L ||
+		    boot_cpu_data.x86_model == INTEL_FAM6_COMETLAKE) {
+			x86_pmu.attr_freeze_on_smi = 1;
+		}
+
 		pr_cont("Skylake events, ");
 		name = "skylake";
 		break;
@@ -6135,6 +6140,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.num_topdown_events = 4;
 		x86_pmu.update_topdown_event = icl_update_topdown_event;
 		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
+		x86_pmu.attr_freeze_on_smi = 1;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
@@ -6172,6 +6178,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.num_topdown_events = 8;
 		x86_pmu.update_topdown_event = icl_update_topdown_event;
 		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
+		x86_pmu.attr_freeze_on_smi = 1;
 		pr_cont("Sapphire Rapids events, ");
 		name = "sapphire_rapids";
 		break;
@@ -6217,6 +6224,7 @@ __init int intel_pmu_init(void)
 		 * x86_pmu.rtm_abort_event.
 		 */
 		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xc9, .umask=0x04);
+		x86_pmu.attr_freeze_on_smi = 1;
 
 		td_attr = adl_hybrid_events_attrs;
 		mem_attr = adl_hybrid_mem_attrs;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-22  7:26 [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later Kyle Huey
@ 2022-01-24 12:21 ` Peter Zijlstra
  2022-01-24 16:00   ` Liang, Kan
  2022-01-27  2:22 ` Andrew Cooper
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-01-24 12:21 UTC (permalink / raw)
  To: Kyle Huey
  Cc: linux-kernel, Kan Liang, linux-perf-users, H. Peter Anvin, x86,
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer

On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
> Beginning in Comet Lake, Intel extended the concept of privilege rings to
> SMM.[0] A side effect of this is that events caused by execution of code
> in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
> set.
> 
> rr[1] depends on exact counts of performance events for the user space
> tracee, so this change in behavior is fatal for us. It is, however, easily
> corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
> as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
> set freeze_on_smi manually when appropriate, because observing events in
> SMM is rarely useful to anyone, we propose to change the default value of
> this switch.
> 
> In this patch I have assumed that all non-Atom Intel microarchitectures
> starting with Comet Lake behave like this but it would be good for someone
> at Intel to verify that.
> 

Kan, can you look at that?

> [0] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform"
> at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the
> end of page 5.
> 
> [1] https://rr-project.org/
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>

Patch seems sensible enough; I'll go queue it up unless Kan comes back
with anything troublesome.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-24 12:21 ` Peter Zijlstra
@ 2022-01-24 16:00   ` Liang, Kan
  2022-01-24 16:30     ` Peter Zijlstra
  2022-01-25  2:59     ` Kyle Huey
  0 siblings, 2 replies; 14+ messages in thread
From: Liang, Kan @ 2022-01-24 16:00 UTC (permalink / raw)
  To: Peter Zijlstra, Kyle Huey
  Cc: linux-kernel, linux-perf-users, H. Peter Anvin, x86, Dave Hansen,
	Borislav Petkov, Thomas Gleixner, Namhyung Kim, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Arnaldo Carvalho de Melo,
	Ingo Molnar, Robert O'Callahan, Keno Fischer, Andi Kleen



On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
> On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
>> Beginning in Comet Lake, Intel extended the concept of privilege rings to
>> SMM.[0] A side effect of this is that events caused by execution of code
>> in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
>> set.
>>
>> rr[1] depends on exact counts of performance events for the user space
>> tracee, so this change in behavior is fatal for us. It is, however, easily
>> corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
>> as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
>> set freeze_on_smi manually when appropriate, because observing events in
>> SMM is rarely useful to anyone, we propose to change the default value of
>> this switch.

+ Andi

 From we heard many times from sophisticated customers, they really hate 
blind spots. They want to see everything. That's why we set 
freeze_on_smi to 0 as default. I think the patch breaks the principle.

I don't think there is a way to notify all the users that the default 
kernel value will be changed. (Yes, the end user can always check the 
/sys/devices/cpu/freeze_on_smi to get the latest value. But in practice, 
no one checks it unless some errors found.) I think it may bring 
troubles to the users if they rely on the counts in SMM.

The patch only changes the default values for some platforms, not all 
platforms. The default value is not consistent among platforms anymore. 
It can bring confusion.

All in all, we have already exposed an interface for the end-users to 
change the value. If some apps, e.g., rr, doesn't want the default 
value, I think they can always change it in the app for all platforms.
We should still keep the freeze_on_smi to 0 as default, which should 
benefit more users.


>>
>> In this patch I have assumed that all non-Atom Intel microarchitectures
>> starting with Comet Lake behave like this but it would be good for someone
>> at Intel to verify that.
>>
> 
> Kan, can you look at that?
>

I'm asking internally.

Thanks,
Kan

>> [0] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform"
>> at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the
>> end of page 5.
>>
>> [1] https://rr-project.org/
>>
>> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> 
> Patch seems sensible enough; I'll go queue it up unless Kan comes back
> with anything troublesome.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-24 16:00   ` Liang, Kan
@ 2022-01-24 16:30     ` Peter Zijlstra
  2022-01-24 17:03       ` Liang, Kan
  2022-01-25  2:59     ` Kyle Huey
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-01-24 16:30 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Kyle Huey, linux-kernel, linux-perf-users, H. Peter Anvin, x86,
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer, Andi Kleen

On Mon, Jan 24, 2022 at 11:00:56AM -0500, Liang, Kan wrote:
> 
> 
> On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
> > On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
> > > Beginning in Comet Lake, Intel extended the concept of privilege rings to
> > > SMM.[0] A side effect of this is that events caused by execution of code
> > > in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
> > > set.
> > > 
> > > rr[1] depends on exact counts of performance events for the user space
> > > tracee, so this change in behavior is fatal for us. It is, however, easily
> > > corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
> > > as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
> > > set freeze_on_smi manually when appropriate, because observing events in
> > > SMM is rarely useful to anyone, we propose to change the default value of
> > > this switch.
> 
> + Andi
> 
> From we heard many times from sophisticated customers, they really hate
> blind spots. They want to see everything. That's why we set freeze_on_smi to
> 0 as default. I think the patch breaks the principle.

Well, USR really, as in *REALLY* should not be counting SMM. That's just
plain broken.

There's maybe an argument to include it in OS, but USR is ring-3.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-24 16:30     ` Peter Zijlstra
@ 2022-01-24 17:03       ` Liang, Kan
  2022-01-24 17:16         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Liang, Kan @ 2022-01-24 17:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kyle Huey, linux-kernel, linux-perf-users, H. Peter Anvin, x86,
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer, Andi Kleen



On 1/24/2022 11:30 AM, Peter Zijlstra wrote:
> On Mon, Jan 24, 2022 at 11:00:56AM -0500, Liang, Kan wrote:
>>
>>
>> On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
>>> On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
>>>> Beginning in Comet Lake, Intel extended the concept of privilege rings to
>>>> SMM.[0] A side effect of this is that events caused by execution of code
>>>> in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
>>>> set.
>>>>
>>>> rr[1] depends on exact counts of performance events for the user space
>>>> tracee, so this change in behavior is fatal for us. It is, however, easily
>>>> corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
>>>> as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
>>>> set freeze_on_smi manually when appropriate, because observing events in
>>>> SMM is rarely useful to anyone, we propose to change the default value of
>>>> this switch.
>>
>> + Andi
>>
>>  From we heard many times from sophisticated customers, they really hate
>> blind spots. They want to see everything. That's why we set freeze_on_smi to
>> 0 as default. I think the patch breaks the principle.
> 
> Well, USR really, as in *REALLY* should not be counting SMM. That's just
> plain broken.
> 

For the USR only case, the bit could be set.

> There's maybe an argument to include it in OS, but USR is ring-3.

But we don't have an option for the USR only case. The changing will 
impact the SYS case as well. Personally, maybe it's better to let the 
user apace app control the bit as needed rather than changing the 
default kernel value for all cases.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-24 17:03       ` Liang, Kan
@ 2022-01-24 17:16         ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2022-01-24 17:16 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Kyle Huey, linux-kernel, linux-perf-users, H. Peter Anvin, x86,
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer, Andi Kleen

On Mon, Jan 24, 2022 at 12:03:10PM -0500, Liang, Kan wrote:
> 
> 
> On 1/24/2022 11:30 AM, Peter Zijlstra wrote:
> > On Mon, Jan 24, 2022 at 11:00:56AM -0500, Liang, Kan wrote:
> > > 
> > > 
> > > On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
> > > > On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
> > > > > Beginning in Comet Lake, Intel extended the concept of privilege rings to
> > > > > SMM.[0] A side effect of this is that events caused by execution of code
> > > > > in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
> > > > > set.
> > > > > 
> > > > > rr[1] depends on exact counts of performance events for the user space
> > > > > tracee, so this change in behavior is fatal for us. It is, however, easily
> > > > > corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
> > > > > as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
> > > > > set freeze_on_smi manually when appropriate, because observing events in
> > > > > SMM is rarely useful to anyone, we propose to change the default value of
> > > > > this switch.
> > > 
> > > + Andi
> > > 
> > >  From we heard many times from sophisticated customers, they really hate
> > > blind spots. They want to see everything. That's why we set freeze_on_smi to
> > > 0 as default. I think the patch breaks the principle.
> > 
> > Well, USR really, as in *REALLY* should not be counting SMM. That's just
> > plain broken.
> > 
> 
> For the USR only case, the bit could be set.
> 
> > There's maybe an argument to include it in OS, but USR is ring-3.
> 
> But we don't have an option for the USR only case. The changing will impact
> the SYS case as well. Personally, maybe it's better to let the user apace
> app control the bit as needed rather than changing the default kernel value
> for all cases.

The bit is system wide, you can't sanely control it per counter. The
change that made USR include SMM is insane and broken.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-24 16:00   ` Liang, Kan
  2022-01-24 16:30     ` Peter Zijlstra
@ 2022-01-25  2:59     ` Kyle Huey
  2022-01-25 13:57       ` Liang, Kan
  1 sibling, 1 reply; 14+ messages in thread
From: Kyle Huey @ 2022-01-25  2:59 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, open list, linux-perf-users, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer, Andi Kleen

On Mon, Jan 24, 2022 at 8:01 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
> > On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
> >> Beginning in Comet Lake, Intel extended the concept of privilege rings to
> >> SMM.[0] A side effect of this is that events caused by execution of code
> >> in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
> >> set.
> >>
> >> rr[1] depends on exact counts of performance events for the user space
> >> tracee, so this change in behavior is fatal for us. It is, however, easily
> >> corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
> >> as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
> >> set freeze_on_smi manually when appropriate, because observing events in
> >> SMM is rarely useful to anyone, we propose to change the default value of
> >> this switch.
>
> + Andi
>
>  From we heard many times from sophisticated customers, they really hate
> blind spots. They want to see everything. That's why we set
> freeze_on_smi to 0 as default. I think the patch breaks the principle.

The default kernel settings for perf events prioritize preventing
information leaks to less privileged code. perf_event_paranoid
defaults to 2, preventing unprivileged users from observing kernel
space. If "sophisticated customers" want to see everything they have
already needed privileges (or an explicit opt-in through decreasing
perf_event_paranoid) for some time.

The current situation on Comet Lake+ where an unprivileged user
*cannot* observe kernel code due to security concerns but
simultaneously *must* observe SMM code seems rather absurd.

> I don't think there is a way to notify all the users that the default
> kernel value will be changed. (Yes, the end user can always check the
> /sys/devices/cpu/freeze_on_smi to get the latest value. But in practice,
> no one checks it unless some errors found.) I think it may bring
> troubles to the users if they rely on the counts in SMM.

Unfortunately the new hardware has already changed the behavior
without notifying users, no matter what we do here.

> The patch only changes the default values for some platforms, not all
> platforms. The default value is not consistent among platforms anymore.
> It can bring confusion.

I don't personally object to changing freeze_on_smi for all platforms
:) I was merely trying to limit the changes.

> All in all, we have already exposed an interface for the end-users to
> change the value. If some apps, e.g., rr, doesn't want the default
> value, I think they can always change it in the app for all platforms.
> We should still keep the freeze_on_smi to 0 as default, which should
> benefit more users.

I think "people who want to just do userspace profiling like they did
before can just change the value" is an unsatisfying answer,
especially because freeze_on_smi requires root to change.


- Kyle

>
> >>
> >> In this patch I have assumed that all non-Atom Intel microarchitectures
> >> starting with Comet Lake behave like this but it would be good for someone
> >> at Intel to verify that.
> >>
> >
> > Kan, can you look at that?
> >
>
> I'm asking internally.
>
> Thanks,
> Kan
>
> >> [0] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform"
> >> at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the
> >> end of page 5.
> >>
> >> [1] https://rr-project.org/
> >>
> >> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> >
> > Patch seems sensible enough; I'll go queue it up unless Kan comes back
> > with anything troublesome.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-25  2:59     ` Kyle Huey
@ 2022-01-25 13:57       ` Liang, Kan
  2022-01-26 14:04         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Liang, Kan @ 2022-01-25 13:57 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Peter Zijlstra, open list, linux-perf-users, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer, Andi Kleen



On 1/24/2022 9:59 PM, Kyle Huey wrote:
> On Mon, Jan 24, 2022 at 8:01 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
>>> On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
>>>> Beginning in Comet Lake, Intel extended the concept of privilege rings to
>>>> SMM.[0] A side effect of this is that events caused by execution of code
>>>> in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
>>>> set.
>>>>
>>>> rr[1] depends on exact counts of performance events for the user space
>>>> tracee, so this change in behavior is fatal for us. It is, however, easily
>>>> corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
>>>> as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
>>>> set freeze_on_smi manually when appropriate, because observing events in
>>>> SMM is rarely useful to anyone, we propose to change the default value of
>>>> this switch.
>>
>> + Andi
>>
>>   From we heard many times from sophisticated customers, they really hate
>> blind spots. They want to see everything. That's why we set
>> freeze_on_smi to 0 as default. I think the patch breaks the principle.
> 
> The default kernel settings for perf events prioritize preventing
> information leaks to less privileged code. perf_event_paranoid
> defaults to 2, preventing unprivileged users from observing kernel
> space. If "sophisticated customers" want to see everything they have
> already needed privileges (or an explicit opt-in through decreasing
> perf_event_paranoid) for some time.
> 
> The current situation on Comet Lake+ where an unprivileged user
> *cannot* observe kernel code due to security concerns but
> simultaneously *must* observe SMM code seems rather absurd.
>

I see. I was thought the unprivileged user can observe the SMM code on 
the previous platforms. The CML+ change only makes part of the SMM code 
CPL0. Seems I'm wrong. The change looks like changing the previous CPL0 
code to CPL3 code. If so, yes, I think we should prevent the information 
leaks for the unprivileged user.

>> I don't think there is a way to notify all the users that the default
>> kernel value will be changed. (Yes, the end user can always check the
>> /sys/devices/cpu/freeze_on_smi to get the latest value. But in practice,
>> no one checks it unless some errors found.) I think it may bring
>> troubles to the users if they rely on the counts in SMM.
> 
> Unfortunately the new hardware has already changed the behavior
> without notifying users, no matter what we do here.
> 
>> The patch only changes the default values for some platforms, not all
>> platforms. The default value is not consistent among platforms anymore.
>> It can bring confusion.
> 
> I don't personally object to changing freeze_on_smi for all platforms
> :) I was merely trying to limit the changes.


Changing it to all platforms seems a too big hammer. I agree we should 
limit it to the impacted platforms.

I've contacted the author of the white paper. I was told that the change 
is for the client vPro platforms. They are not sure whether it impacts 
Server platform or Atom platforms. I'm still working on it. I will let 
you and Peter know once I get more information.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-25 13:57       ` Liang, Kan
@ 2022-01-26 14:04         ` Peter Zijlstra
  2022-01-26 14:58           ` Liang, Kan
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-01-26 14:04 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Kyle Huey, open list, linux-perf-users, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer, Andi Kleen

On Tue, Jan 25, 2022 at 08:57:09AM -0500, Liang, Kan wrote:
> I see. I was thought the unprivileged user can observe the SMM code on the
> previous platforms. The CML+ change only makes part of the SMM code CPL0.
> Seems I'm wrong. The change looks like changing the previous CPL0 code to
> CPL3 code. If so, yes, I think we should prevent the information leaks for
> the unprivileged user.

Right.

> Changing it to all platforms seems a too big hammer. I agree we should limit
> it to the impacted platforms.
> 
> I've contacted the author of the white paper. I was told that the change is
> for the client vPro platforms. They are not sure whether it impacts Server
> platform or Atom platforms. I'm still working on it. I will let you and
> Peter know once I get more information.

For now I've updated the patch as per the below. I'm tempted to simply
apply it as is and let it be.

Having different defaults for vPro vs !vPro chips seems more confusing
than not.

We should also very much get this change reverted for future chips.

--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6094,6 +6094,16 @@ __init int intel_pmu_init(void)
 			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
 		}
 
+		if (boot_cpu_data.x86_model == INTEL_FAM6_COMETLAKE_L ||
+		    boot_cpu_data.x86_model == INTEL_FAM6_COMETLAKE) {
+			/*
+			 * For some idiotic reason SMM is visible to USR
+			 * counters. Since this is a privilege issue, default
+			 * disable counters in SMM for these chips.
+			 */
+			x86_pmu.attr_freeze_on_smi = 1;
+		}
+
 		pr_cont("Skylake events, ");
 		name = "skylake";
 		break;
@@ -6135,6 +6145,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.num_topdown_events = 4;
 		x86_pmu.update_topdown_event = icl_update_topdown_event;
 		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
+		/* SMM visible in USR, see above */
+		x86_pmu.attr_freeze_on_smi = 1;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
@@ -6172,6 +6184,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.num_topdown_events = 8;
 		x86_pmu.update_topdown_event = icl_update_topdown_event;
 		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
+		/* SMM visible in USR, see above */
+		x86_pmu.attr_freeze_on_smi = 1;
 		pr_cont("Sapphire Rapids events, ");
 		name = "sapphire_rapids";
 		break;
@@ -6217,6 +6231,8 @@ __init int intel_pmu_init(void)
 		 * x86_pmu.rtm_abort_event.
 		 */
 		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xc9, .umask=0x04);
+		/* SMM visible in USR, see above */
+		x86_pmu.attr_freeze_on_smi = 1;
 
 		td_attr = adl_hybrid_events_attrs;
 		mem_attr = adl_hybrid_mem_attrs;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-26 14:04         ` Peter Zijlstra
@ 2022-01-26 14:58           ` Liang, Kan
  0 siblings, 0 replies; 14+ messages in thread
From: Liang, Kan @ 2022-01-26 14:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kyle Huey, open list, linux-perf-users, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Dave Hansen, Borislav Petkov, Thomas Gleixner, Namhyung Kim,
	Jiri Olsa, Alexander Shishkin, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Robert O'Callahan,
	Keno Fischer, Andi Kleen



On 1/26/2022 9:04 AM, Peter Zijlstra wrote:
> On Tue, Jan 25, 2022 at 08:57:09AM -0500, Liang, Kan wrote:
>> I see. I was thought the unprivileged user can observe the SMM code on the
>> previous platforms. The CML+ change only makes part of the SMM code CPL0.
>> Seems I'm wrong. The change looks like changing the previous CPL0 code to
>> CPL3 code. If so, yes, I think we should prevent the information leaks for
>> the unprivileged user.
> 
> Right.
> 
>> Changing it to all platforms seems a too big hammer. I agree we should limit
>> it to the impacted platforms.
>>
>> I've contacted the author of the white paper. I was told that the change is
>> for the client vPro platforms. They are not sure whether it impacts Server
>> platform or Atom platforms. I'm still working on it. I will let you and
>> Peter know once I get more information.
> 
> For now I've updated the patch as per the below. I'm tempted to simply
> apply it as is and let it be.
> 
> Having different defaults for vPro vs !vPro chips seems more confusing
> than not.
>

But I think it should make sense to have different defaults for client 
vs server, or big core vs atom.

I'm still working on it and trying to figure out the affected 
generations. (Hope I can find the right contacts in the next few days.)

> We should also very much get this change reverted for future chips.
>

Yes. I will discuss it with the contacts once I get the name.

Thanks,
Kan

> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -6094,6 +6094,16 @@ __init int intel_pmu_init(void)
>   			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
>   		}
>   
> +		if (boot_cpu_data.x86_model == INTEL_FAM6_COMETLAKE_L ||
> +		    boot_cpu_data.x86_model == INTEL_FAM6_COMETLAKE) {
> +			/*
> +			 * For some idiotic reason SMM is visible to USR
> +			 * counters. Since this is a privilege issue, default
> +			 * disable counters in SMM for these chips.
> +			 */
> +			x86_pmu.attr_freeze_on_smi = 1;
> +		}
> +
>   		pr_cont("Skylake events, ");
>   		name = "skylake";
>   		break;
> @@ -6135,6 +6145,8 @@ __init int intel_pmu_init(void)
>   		x86_pmu.num_topdown_events = 4;
>   		x86_pmu.update_topdown_event = icl_update_topdown_event;
>   		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
> +		/* SMM visible in USR, see above */
> +		x86_pmu.attr_freeze_on_smi = 1;
>   		pr_cont("Icelake events, ");
>   		name = "icelake";
>   		break;
> @@ -6172,6 +6184,8 @@ __init int intel_pmu_init(void)
>   		x86_pmu.num_topdown_events = 8;
>   		x86_pmu.update_topdown_event = icl_update_topdown_event;
>   		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
> +		/* SMM visible in USR, see above */
> +		x86_pmu.attr_freeze_on_smi = 1;
>   		pr_cont("Sapphire Rapids events, ");
>   		name = "sapphire_rapids";
>   		break;
> @@ -6217,6 +6231,8 @@ __init int intel_pmu_init(void)
>   		 * x86_pmu.rtm_abort_event.
>   		 */
>   		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xc9, .umask=0x04);
> +		/* SMM visible in USR, see above */
> +		x86_pmu.attr_freeze_on_smi = 1;
>   
>   		td_attr = adl_hybrid_events_attrs;
>   		mem_attr = adl_hybrid_mem_attrs;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-22  7:26 [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later Kyle Huey
  2022-01-24 12:21 ` Peter Zijlstra
@ 2022-01-27  2:22 ` Andrew Cooper
  2022-01-27 11:31   ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2022-01-27  2:22 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel, Kan Liang
  Cc: linux-perf-users, H. Peter Anvin, x86, Dave Hansen,
	Borislav Petkov, Thomas Gleixner, Namhyung Kim, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Arnaldo Carvalho de Melo,
	Ingo Molnar, Peter Zijlstra, Robert O'Callahan, Keno Fischer,
	Andrew Cooper

On 22/01/2022 07:26, Kyle Huey wrote:
> Beginning in Comet Lake, Intel extended the concept of privilege rings to
> SMM.[0]

SMM has always has full access to all 4 rings of protection.

Blame anyone who uses the terms "Ring -1/-2", because they are horribly
misleading terms and show a fundamental misunderstanding of how this works.

On entry to SMM, the processors is in Real Mode, which is CPL0.  Pretty
much every handler switches to Protected Mode as soon as possible
(because programming for Unreal Mode is evil).  UEFI systems with 64bit
firmware will set up pagetables and switch into Long mode.

But from a code organisation point of view, SMM has traditionally been a
RWX free-for-all with more-than-kernel privileges and all the input
handling gotchas/etc.  It's no wonder that SMM is a fertile source of
security issues.

> A side effect of this is that events caused by execution of code
> in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
> set.

What is new in Comet Lake is a kernel running in SMM CPL0 which sets up
usermode to run the main logic.

However, if e.g. an enterprising Coreboot developer were to decide that
this CPL3 SMM plan might be a good idea, older CPUs would start
manifesting the same behaviour.

> rr[1] depends on exact counts of performance events for the user space
> tracee, so this change in behavior is fatal for us. It is, however, easily
> corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
> as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
> set freeze_on_smi manually when appropriate, because observing events in
> SMM is rarely useful to anyone, we propose to change the default value of
> this switch.

Frankly, it is an error that FREEZE_WHILE_SMM is under the kernels
control, and not SMM's control.  After all, it's SMM handling all the
UEFI secrets/etc.

Linux ought to set FREEZE_WHILE_SMM unilaterally, because most kernel
profiling probably won't want interference from SMM.  Root can always
disable FREEZE_WHILE_SMM if profiling is really wanted.

I'm not sure if anything can be done on pre-FREEZE_WHILE_SMM CPUs.  Nor
AMD CPUs which are also gaining CPL3 SMM logic, and don't appear to have
any equivalent functionality.

~Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-27  2:22 ` Andrew Cooper
@ 2022-01-27 11:31   ` Peter Zijlstra
  2022-01-27 11:56     ` Andrew Cooper
  2022-02-03 14:33     ` [tip: perf/urgent] x86/perf: Default set FREEZE_ON_SMI for all tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2022-01-27 11:31 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kyle Huey, linux-kernel, Kan Liang, linux-perf-users,
	H. Peter Anvin, x86, Dave Hansen, Borislav Petkov,
	Thomas Gleixner, Namhyung Kim, Jiri Olsa, Alexander Shishkin,
	Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar,
	Robert O'Callahan, Keno Fischer

On Thu, Jan 27, 2022 at 02:22:23AM +0000, Andrew Cooper wrote:

> Frankly, it is an error that FREEZE_WHILE_SMM is under the kernels
> control, and not SMM's control.  After all, it's SMM handling all the
> UEFI secrets/etc.
> 
> Linux ought to set FREEZE_WHILE_SMM unilaterally, because most kernel
> profiling probably won't want interference from SMM.  Root can always
> disable FREEZE_WHILE_SMM if profiling is really wanted.
> 
> I'm not sure if anything can be done on pre-FREEZE_WHILE_SMM CPUs.  Nor
> AMD CPUs which are also gaining CPL3 SMM logic, and don't appear to have
> any equivalent functionality.

Which suggests something like this?

---
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c91434056c29..5874fa088630 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4703,6 +4703,19 @@ static __initconst const struct x86_pmu intel_pmu = {
 	.lbr_read		= intel_pmu_lbr_read_64,
 	.lbr_save		= intel_pmu_lbr_save,
 	.lbr_restore		= intel_pmu_lbr_restore,
+
+	/*
+	 * SMM has access to all 4 rings and while traditionally SMM code only
+	 * ran in CPL0, newer firmware is starting to make use of CPL3 in SMM.
+	 *
+	 * Since the EVENTSEL.{USR,OS} CPL filtering makes no distinction
+	 * between SMM or not, this results in what should be pure userspace
+	 * counters including SMM data.
+	 *
+	 * This is a clear privilege issue, therefore globally disable
+	 * counting SMM by default.
+	 */
+	.attr_freeze_on_smi	= 1,
 };
 
 static __init void intel_clovertown_quirk(void)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.
  2022-01-27 11:31   ` Peter Zijlstra
@ 2022-01-27 11:56     ` Andrew Cooper
  2022-02-03 14:33     ` [tip: perf/urgent] x86/perf: Default set FREEZE_ON_SMI for all tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2022-01-27 11:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kyle Huey, linux-kernel, Kan Liang, linux-perf-users,
	H. Peter Anvin, x86, Dave Hansen, Borislav Petkov,
	Thomas Gleixner, Namhyung Kim, Jiri Olsa, Alexander Shishkin,
	Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar,
	Robert O'Callahan, Keno Fischer, Andrew Cooper

On 27/01/2022 11:31, Peter Zijlstra wrote:
> On Thu, Jan 27, 2022 at 02:22:23AM +0000, Andrew Cooper wrote:
>
>> Frankly, it is an error that FREEZE_WHILE_SMM is under the kernels
>> control, and not SMM's control.  After all, it's SMM handling all the
>> UEFI secrets/etc.
>>
>> Linux ought to set FREEZE_WHILE_SMM unilaterally, because most kernel
>> profiling probably won't want interference from SMM.  Root can always
>> disable FREEZE_WHILE_SMM if profiling is really wanted.
>>
>> I'm not sure if anything can be done on pre-FREEZE_WHILE_SMM CPUs.  Nor
>> AMD CPUs which are also gaining CPL3 SMM logic, and don't appear to have
>> any equivalent functionality.
> Which suggests something like this?
>
> ---
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index c91434056c29..5874fa088630 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4703,6 +4703,19 @@ static __initconst const struct x86_pmu intel_pmu = {
>  	.lbr_read		= intel_pmu_lbr_read_64,
>  	.lbr_save		= intel_pmu_lbr_save,
>  	.lbr_restore		= intel_pmu_lbr_restore,
> +
> +	/*
> +	 * SMM has access to all 4 rings and while traditionally SMM code only
> +	 * ran in CPL0, newer firmware is starting to make use of CPL3 in SMM.

"2021-era firmware" ?

> +	 *
> +	 * Since the EVENTSEL.{USR,OS} CPL filtering makes no distinction
> +	 * between SMM or not, this results in what should be pure userspace
> +	 * counters including SMM data.
> +	 *
> +	 * This is a clear privilege issue, therefore globally disable
> +	 * counting SMM by default.
> +	 */
> +	.attr_freeze_on_smi	= 1,
>  };
>  
>  static __init void intel_clovertown_quirk(void)

I'd say so, yes.

Elsewhere on this thread, there has been claims of "everyone like seeing
SMM samples" as an argument for the current default, but I don't buy this.

NMIs are blocked in SMM, so PMIs will be attributed to an arbitrary
wrong instruction boundary, and the entire SMM Handler's worth of
counter changes will disappear into thin air, appearing as over-counting
as far as the profiling target is concerned.

When it comes to SMIs, the two interesting cases are:
1) Am I deterministically triggering SMIs when I oughtn't to be, and
2) If I did race with an SMI, should I discard everything and start again.

Both require paying attention to MSR_SMI_COUNT, even if the answer to 2)
is probably not when freeze_on_smi is active.

~Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: perf/urgent] x86/perf: Default set FREEZE_ON_SMI for all
  2022-01-27 11:31   ` Peter Zijlstra
  2022-01-27 11:56     ` Andrew Cooper
@ 2022-02-03 14:33     ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-02-03 14:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kyle Huey, Andrew Cooper, Peter Zijlstra (Intel),
	stable, x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     a01994f5e5c79d3a35e5e8cf4252c7f2147323c3
Gitweb:        https://git.kernel.org/tip/a01994f5e5c79d3a35e5e8cf4252c7f2147323c3
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 27 Jan 2022 12:32:51 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 02 Feb 2022 13:11:39 +01:00

x86/perf: Default set FREEZE_ON_SMI for all

Kyle reported that rr[0] has started to malfunction on Comet Lake and
later CPUs due to EFI starting to make use of CPL3 [1] and the PMU
event filtering not distinguishing between regular CPL3 and SMM CPL3.

Since this is a privilege violation, default disable SMM visibility
where possible.

Administrators wanting to observe SMM cycles can easily change this
using the sysfs attribute while regular users don't have access to
this file.

[0] https://rr-project.org/

[1] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform"
at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the
end of page 5.

Reported-by: Kyle Huey <me@kylehuey.com>
Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/YfKChjX61OW4CkYm@hirez.programming.kicks-ass.net
---
 arch/x86/events/intel/core.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c914340..a3c7ca8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4703,6 +4703,19 @@ static __initconst const struct x86_pmu intel_pmu = {
 	.lbr_read		= intel_pmu_lbr_read_64,
 	.lbr_save		= intel_pmu_lbr_save,
 	.lbr_restore		= intel_pmu_lbr_restore,
+
+	/*
+	 * SMM has access to all 4 rings and while traditionally SMM code only
+	 * ran in CPL0, 2021-era firmware is starting to make use of CPL3 in SMM.
+	 *
+	 * Since the EVENTSEL.{USR,OS} CPL filtering makes no distinction
+	 * between SMM or not, this results in what should be pure userspace
+	 * counters including SMM data.
+	 *
+	 * This is a clear privilege issue, therefore globally disable
+	 * counting SMM by default.
+	 */
+	.attr_freeze_on_smi	= 1,
 };
 
 static __init void intel_clovertown_quirk(void)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-02-03 14:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-22  7:26 [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later Kyle Huey
2022-01-24 12:21 ` Peter Zijlstra
2022-01-24 16:00   ` Liang, Kan
2022-01-24 16:30     ` Peter Zijlstra
2022-01-24 17:03       ` Liang, Kan
2022-01-24 17:16         ` Peter Zijlstra
2022-01-25  2:59     ` Kyle Huey
2022-01-25 13:57       ` Liang, Kan
2022-01-26 14:04         ` Peter Zijlstra
2022-01-26 14:58           ` Liang, Kan
2022-01-27  2:22 ` Andrew Cooper
2022-01-27 11:31   ` Peter Zijlstra
2022-01-27 11:56     ` Andrew Cooper
2022-02-03 14:33     ` [tip: perf/urgent] x86/perf: Default set FREEZE_ON_SMI for all tip-bot2 for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.