All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
@ 2013-03-15 13:26 Stephane Eranian
  2013-03-15 17:01 ` Linus Torvalds
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Stephane Eranian @ 2013-03-15 13:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, peterz, mingo, tglx, akpm, acme, jolsa, ak


This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
---

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 826054a..0e9bdd3 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -729,3 +729,11 @@ void intel_ds_init(void)
 		}
 	}
 }
+
+void perf_restore_debug_store(void)
+{
+	if (!x86_pmu.bts && !x86_pmu.pebs)
+		return;
+
+	init_debug_store_on_cpu(smp_processor_id());
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 120cee1..3c68768 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <linux/export.h>
 #include <linux/smp.h>
+#include <linux/perf_event.h>
 
 #include <asm/pgtable.h>
 #include <asm/proto.h>
@@ -228,6 +229,7 @@ static void __restore_processor_state(struct saved_context *ctxt)
 	do_fpu_end();
 	x86_platform.restore_sched_clock_state();
 	mtrr_bp_restore();
+	perf_restore_debug_store();
 }
 
 /* Needed by apm.c */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8737e1c..746be3b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -742,6 +742,7 @@ extern void perf_event_enable(struct perf_event *event);
 extern void perf_event_disable(struct perf_event *event);
 extern int __perf_event_disable(void *info);
 extern void perf_event_task_tick(void);
+extern void perf_restore_debug_store(void);
 #else
 static inline void
 perf_event_task_sched_in(struct task_struct *prev,
@@ -781,6 +782,7 @@ static inline void perf_event_enable(struct perf_event *event)		{ }
 static inline void perf_event_disable(struct perf_event *event)		{ }
 static inline int __perf_event_disable(void *info)			{ return -1; }
 static inline void perf_event_task_tick(void)				{ }
+static inline void perf_restore_debug_store(void)			{ }
 #endif
 
 #define perf_output_put(handle, x) perf_output_copy((handle), &(x), sizeof(x))

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 13:26 [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume Stephane Eranian
@ 2013-03-15 17:01 ` Linus Torvalds
  2013-03-15 20:49   ` Thomas Gleixner
  2013-03-15 20:31 ` Greg KH
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2013-03-15 17:01 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Andrew Morton, Arnaldo Carvalho de Melo,
	Jiri Olsa, Andi Kleen

On Fri, Mar 15, 2013 at 6:26 AM, Stephane Eranian <eranian@google.com> wrote:
>
> This patch fixes a kernel crash when using precise sampling (PEBS)
> after a suspend/resume.

Yup, works. Applied.

Can we please get rid of the crazy CPU notifier crap from the perf
code, and do this like we do most other wrmsr's etc? Doing

    git grep "case CPU_" arch/x86/kernel/cpu

shows that the perf layer seems to be full of this kind of BS. This is
all CPU state, it should be initialized by the regular CPU
initialization code, not hooked up with some random callbacks.

                  Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 13:26 [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume Stephane Eranian
  2013-03-15 17:01 ` Linus Torvalds
@ 2013-03-15 20:31 ` Greg KH
  2013-03-15 20:49   ` Stephane Eranian
  2013-03-15 20:53   ` Shuah Khan
  2013-03-16 16:11 ` Parag Warudkar
  2013-03-17 22:49 ` [patch for-] perf,x86: fix link failure for non-Intel configs David Rientjes
  3 siblings, 2 replies; 14+ messages in thread
From: Greg KH @ 2013-03-15 20:31 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, torvalds, peterz, mingo, tglx, akpm, acme, jolsa, ak

On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
> 
> This patch fixes a kernel crash when using precise sampling (PEBS)
> after a suspend/resume. Turns out the CPU notifier code is not invoked
> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> measurement to crash when running on CPU0.
> 
> The workaround is to add a hook in the actual resume code to restore
> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> the DS_AREA will be restored twice but this is harmless.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Stephane Eranian <eranian@google.com>
> ---

Is this needed for the 3.8 or older kernels as well?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 20:31 ` Greg KH
@ 2013-03-15 20:49   ` Stephane Eranian
  2013-03-15 20:56     ` Greg KH
  2013-03-15 20:53   ` Shuah Khan
  1 sibling, 1 reply; 14+ messages in thread
From: Stephane Eranian @ 2013-03-15 20:49 UTC (permalink / raw)
  To: Greg KH
  Cc: LKML, Linus Torvalds, Peter Zijlstra, mingo, Thomas Gleixner,
	Andrew Morton, Arnaldo Carvalho de Melo, Jiri Olsa, ak

On Fri, Mar 15, 2013 at 9:31 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
> >
> > This patch fixes a kernel crash when using precise sampling (PEBS)
> > after a suspend/resume. Turns out the CPU notifier code is not invoked
> > on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
> > by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> > measurement to crash when running on CPU0.
> >
> > The workaround is to add a hook in the actual resume code to restore
> > the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> > the DS_AREA will be restored twice but this is harmless.
> >
> > Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Stephane Eranian <eranian@google.com>
> > ---
>
> Is this needed for the 3.8 or older kernels as well?
>
I suspect so. If CPU0 is not covered by the cpu notifiers
then yes, the patch is needed.

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 17:01 ` Linus Torvalds
@ 2013-03-15 20:49   ` Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2013-03-15 20:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephane Eranian, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Andrew Morton, Arnaldo Carvalho de Melo, Jiri Olsa,
	Andi Kleen

On Fri, 15 Mar 2013, Linus Torvalds wrote:

> On Fri, Mar 15, 2013 at 6:26 AM, Stephane Eranian <eranian@google.com> wrote:
> >
> > This patch fixes a kernel crash when using precise sampling (PEBS)
> > after a suspend/resume.
> 
> Yup, works. Applied.
> 
> Can we please get rid of the crazy CPU notifier crap from the perf
> code, and do this like we do most other wrmsr's etc? Doing
> 
>     git grep "case CPU_" arch/x86/kernel/cpu
> 
> shows that the perf layer seems to be full of this kind of BS. This is
> all CPU state, it should be initialized by the regular CPU
> initialization code, not hooked up with some random callbacks.

It's on my list ....

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 20:31 ` Greg KH
  2013-03-15 20:49   ` Stephane Eranian
@ 2013-03-15 20:53   ` Shuah Khan
  2013-03-15 20:56     ` Stephane Eranian
  1 sibling, 1 reply; 14+ messages in thread
From: Shuah Khan @ 2013-03-15 20:53 UTC (permalink / raw)
  To: Greg KH
  Cc: Stephane Eranian, linux-kernel, torvalds, peterz, mingo, tglx,
	akpm, acme, jolsa, ak

On Fri, Mar 15, 2013 at 2:31 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
>>
>> This patch fixes a kernel crash when using precise sampling (PEBS)
>> after a suspend/resume. Turns out the CPU notifier code is not invoked
>> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
>> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
>> measurement to crash when running on CPU0.
>>
>> The workaround is to add a hook in the actual resume code to restore
>> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
>> the DS_AREA will be restored twice but this is harmless.
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> ---
>
> Is this needed for the 3.8 or older kernels as well?
>
> thanks,
>
> greg k-h

Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
and 3.5. But needs some massaging for 3.0. I have the kernels built,
haven't started testing yet.

-- Shuah

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 20:53   ` Shuah Khan
@ 2013-03-15 20:56     ` Stephane Eranian
  2013-03-16  0:45       ` Shuah Khan
  0 siblings, 1 reply; 14+ messages in thread
From: Stephane Eranian @ 2013-03-15 20:56 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Greg KH, LKML, Linus Torvalds, Peter Zijlstra, mingo,
	Thomas Gleixner, Andrew Morton, Arnaldo Carvalho de Melo,
	Jiri Olsa, ak

On Fri, Mar 15, 2013 at 9:53 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> On Fri, Mar 15, 2013 at 2:31 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
>>>
>>> This patch fixes a kernel crash when using precise sampling (PEBS)
>>> after a suspend/resume. Turns out the CPU notifier code is not invoked
>>> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
>>> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
>>> measurement to crash when running on CPU0.
>>>
>>> The workaround is to add a hook in the actual resume code to restore
>>> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
>>> the DS_AREA will be restored twice but this is harmless.
>>>
>>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> Signed-off-by: Stephane Eranian <eranian@google.com>
>>> ---
>>
>> Is this needed for the 3.8 or older kernels as well?
>>
>> thanks,
>>
>> greg k-h
>
> Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
> and 3.5. But needs some massaging for 3.0. I have the kernels built,
> haven't started testing yet.
>
Testing the patch is easy:

# echo mem >/sys/power/state
Then press the power button again, when you get control again, type:

$ taskset -c 0 perf record -e cycles:pp my_test_program

Note that this problem impacts only Intel processors after Core 2
(PEBS enabled).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 20:49   ` Stephane Eranian
@ 2013-03-15 20:56     ` Greg KH
  0 siblings, 0 replies; 14+ messages in thread
From: Greg KH @ 2013-03-15 20:56 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Linus Torvalds, Peter Zijlstra, mingo, Thomas Gleixner,
	Andrew Morton, Arnaldo Carvalho de Melo, Jiri Olsa, ak

On Fri, Mar 15, 2013 at 09:49:00PM +0100, Stephane Eranian wrote:
> On Fri, Mar 15, 2013 at 9:31 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
> > >
> > > This patch fixes a kernel crash when using precise sampling (PEBS)
> > > after a suspend/resume. Turns out the CPU notifier code is not invoked
> > > on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
> > > by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> > > measurement to crash when running on CPU0.
> > >
> > > The workaround is to add a hook in the actual resume code to restore
> > > the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> > > the DS_AREA will be restored twice but this is harmless.
> > >
> > > Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > Signed-off-by: Stephane Eranian <eranian@google.com>
> > > ---
> >
> > Is this needed for the 3.8 or older kernels as well?
> >
> I suspect so. If CPU0 is not covered by the cpu notifiers
> then yes, the patch is needed.

Ok, thanks, I've queued it up now there.

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 20:56     ` Stephane Eranian
@ 2013-03-16  0:45       ` Shuah Khan
  0 siblings, 0 replies; 14+ messages in thread
From: Shuah Khan @ 2013-03-16  0:45 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Greg KH, LKML, Linus Torvalds, Peter Zijlstra, mingo,
	Thomas Gleixner, Andrew Morton, Arnaldo Carvalho de Melo,
	Jiri Olsa, ak

On Fri, Mar 15, 2013 at 2:56 PM, Stephane Eranian <eranian@google.com> wrote:
> On Fri, Mar 15, 2013 at 9:53 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> On Fri, Mar 15, 2013 at 2:31 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
>>>>
>>>> This patch fixes a kernel crash when using precise sampling (PEBS)
>>>> after a suspend/resume. Turns out the CPU notifier code is not invoked
>>>> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
>>>> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
>>>> measurement to crash when running on CPU0.
>>>>
>>>> The workaround is to add a hook in the actual resume code to restore
>>>> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
>>>> the DS_AREA will be restored twice but this is harmless.
>>>>
>>>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>> Signed-off-by: Stephane Eranian <eranian@google.com>
>>>> ---
>>>
>>> Is this needed for the 3.8 or older kernels as well?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
>> and 3.5. But needs some massaging for 3.0. I have the kernels built,
>> haven't started testing yet.
>>
> Testing the patch is easy:
>
> # echo mem >/sys/power/state
> Then press the power button again, when you get control again, type:
>
> $ taskset -c 0 perf record -e cycles:pp my_test_program
>
> Note that this problem impacts only Intel processors after Core 2
> (PEBS enabled).

Thanks. Reproduced the problem on 3.8.3, 3.4.36, and 3.0.69. Tested
the patch on 3.4 and 3.8 and the problem is fixed. I had to re-cut the
patch for 3.0. Sending it to stable tagged for 3.0

Thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-15 13:26 [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume Stephane Eranian
  2013-03-15 17:01 ` Linus Torvalds
  2013-03-15 20:31 ` Greg KH
@ 2013-03-16 16:11 ` Parag Warudkar
  2013-03-16 17:56   ` Linus Torvalds
  2013-03-17 22:49 ` [patch for-] perf,x86: fix link failure for non-Intel configs David Rientjes
  3 siblings, 1 reply; 14+ messages in thread
From: Parag Warudkar @ 2013-03-16 16:11 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, torvalds, peterz, mingo, tglx, akpm, acme, jolsa, ak

On Fri, Mar 15, 2013 at 9:26 AM, Stephane Eranian <eranian@google.com> wrote:
>
> This patch fixes a kernel crash when using precise sampling (PEBS)
> after a suspend/resume. Turns out the CPU notifier code is not invoked
> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> measurement to crash when running on CPU0.
>
> The workaround is to add a hook in the actual resume code to restore
> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> the DS_AREA will be restored twice but this is harmless.
>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Stephane Eranian <eranian@google.com>
> ---
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index 826054a..0e9bdd3 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -729,3 +729,11 @@ void intel_ds_init(void)
>                 }
>         }
>  }
> +
> +void perf_restore_debug_store(void)
> +{
> +       if (!x86_pmu.bts && !x86_pmu.pebs)
> +               return;
> +
> +       init_debug_store_on_cpu(smp_processor_id());
> +}

This seems to trigger a WARN_ON during suspend/resume.

[ 1479.919313] WARNING: at kernel/smp.c:244
smp_call_function_single+0x11b/0x170()
[ 1479.919314] Hardware name: iMac12,1
[ 1479.919347] Modules linked in: nfsd auth_rpcgss nfs_acl lockd
sunrpc ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc
be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3
mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
iscsi_tcp libiscsi_tcp libiscsi rfcomm bnep scsi_transport_iscsi
nls_utf8 hfsplus arc4 ath9k ath9k_common ath9k_hw ath mac80211
snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_intel snd_hda_codec
snd_hwdep vhost_net uvcvideo cfg80211 btusb tun macvtap macvlan
snd_seq snd_seq_device snd_pcm coretemp bluetooth tg3 snd_page_alloc
kvm_intel videobuf2_vmalloc videobuf2_memops snd_timer videobuf2_core
snd crc32c_intel
[ 1479.919361]  kvm iTCO_wdt iTCO_vendor_support videodev rfkill ptp
media ghash_clmulni_intel joydev soundcore lpc_ich pcspkr applesmc
input_polldev mfd_core i2c_i801 microcode pps_core apple_bl
binfmt_misc uinput hid_logitech_dj usb_storage radeon i915 ttm
i2c_algo_bit drm_kms_helper drm firewire_ohci firewire_core crc_itu_t
i2c_core video
[ 1479.919364] Pid: 3246, comm: pm-suspend Not tainted 3.9.0-rc2+ #2
[ 1479.919364] Call Trace:
[ 1479.919370]  [<ffffffff8105e9df>] warn_slowpath_common+0x7f/0xc0
[ 1479.919374]  [<ffffffff8131ed40>] ? __rdmsr_on_cpu+0x50/0x50
[ 1479.919376]  [<ffffffff8105ea3a>] warn_slowpath_null+0x1a/0x20
[ 1479.919377]  [<ffffffff810be0eb>] smp_call_function_single+0x11b/0x170
[ 1479.919379]  [<ffffffff8131efd3>] wrmsr_on_cpu+0x43/0x50
[ 1479.919382]  [<ffffffff81028d59>] init_debug_store_on_cpu+0x39/0x40
[ 1479.919384]  [<ffffffff81029731>] perf_restore_debug_store+0x21/0x30
[ 1479.919387]  [<ffffffff815363a5>] restore_processor_state+0x225/0x230
[ 1479.919390]  [<ffffffff81036da7>] acpi_suspend_lowlevel+0xf7/0x120
[ 1479.919393]  [<ffffffff8136075b>] acpi_suspend_enter+0x3b/0xb7
[ 1479.919395]  [<ffffffff810a84ef>] suspend_devices_and_enter+0x37f/0x430
[ 1479.919397]  [<ffffffff810a873a>] pm_suspend+0x19a/0x230
[ 1479.919399]  [<ffffffff810a7577>] state_store+0x87/0xf0
[ 1479.919402]  [<ffffffff812fba7f>] kobj_attr_store+0xf/0x20
[ 1479.919405]  [<ffffffff81210078>] sysfs_write_file+0xd8/0x150
[ 1479.919408]  [<ffffffff8119bccc>] vfs_write+0xac/0x180
[ 1479.919410]  [<ffffffff8119c012>] sys_write+0x52/0xa0
[ 1479.919412]  [<ffffffff8166369e>] ? do_page_fault+0xe/0x10
[ 1479.919414]  [<ffffffff81667cd9>] system_call_fastpath+0x16/0x1b
[ 1479.919415] ---[ end trace 2af7ebe5ffee87a9 ]---
[ 1479.919416] ACPI: Low-level resume complete

--Parag

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-16 16:11 ` Parag Warudkar
@ 2013-03-16 17:56   ` Linus Torvalds
  2013-03-16 22:06     ` Parag Warudkar
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2013-03-16 17:56 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Stephane Eranian, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Andrew Morton,
	Arnaldo Carvalho de Melo, Jiri Olsa, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

On Sat, Mar 16, 2013 at 9:11 AM, Parag Warudkar <parag.lkml@gmail.com> wrote:
>
> This seems to trigger a WARN_ON during suspend/resume.

Ugh, yes. It's practically harmless, but it's ugly and technically
wrong (we're using wrmsr_on_cpu() on our current cpu, but in a context
where using it on anything else would be horribly broken).

I think the attached patch should fix it. UNTESTED!

                  Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 635 bytes --]

 arch/x86/kernel/cpu/perf_event_intel_ds.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 0e9bdd3cb01e..b05a575d56f4 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -732,8 +732,10 @@ void intel_ds_init(void)
 
 void perf_restore_debug_store(void)
 {
+	struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
+
 	if (!x86_pmu.bts && !x86_pmu.pebs)
 		return;
 
-	init_debug_store_on_cpu(smp_processor_id());
+	wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds);
 }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
  2013-03-16 17:56   ` Linus Torvalds
@ 2013-03-16 22:06     ` Parag Warudkar
  0 siblings, 0 replies; 14+ messages in thread
From: Parag Warudkar @ 2013-03-16 22:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephane Eranian, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Andrew Morton,
	Arnaldo Carvalho de Melo, Jiri Olsa, Andi Kleen

On Sat, Mar 16, 2013 at 1:56 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Mar 16, 2013 at 9:11 AM, Parag Warudkar <parag.lkml@gmail.com> wrote:
>>
>> This seems to trigger a WARN_ON during suspend/resume.
>
> Ugh, yes. It's practically harmless, but it's ugly and technically
> wrong (we're using wrmsr_on_cpu() on our current cpu, but in a context
> where using it on anything else would be horribly broken).
>
> I think the attached patch should fix it. UNTESTED!

Applied and that seems to have suppressed the warning.

Unrelated to this now it dies in intel_pstate_timer_func doing what
seems to be a divide by zero.

Hopefully I will be able to capture the oops on camera next time.

Thanks,
Parag

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch for-] perf,x86: fix link failure for non-Intel configs
  2013-03-15 13:26 [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume Stephane Eranian
                   ` (2 preceding siblings ...)
  2013-03-16 16:11 ` Parag Warudkar
@ 2013-03-17 22:49 ` David Rientjes
  2013-03-18  0:12   ` Stephane Eranian
  3 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2013-03-17 22:49 UTC (permalink / raw)
  To: Stephane Eranian, Linus Torvalds
  Cc: peterz, Ingo Molnar, tglx, Andrew Morton, acme, jolsa,
	Andi Kleen, linux-kernel

Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after 
suspend/resume") introduces a link failure since 
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:

	arch/x86/power/built-in.o: In function `restore_processor_state':
	(.text+0x45c): undefined reference to `perf_restore_debug_store'

Fix it by defining the dummy function appropriately.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/perf_event.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -758,7 +758,6 @@ extern void perf_event_enable(struct perf_event *event);
 extern void perf_event_disable(struct perf_event *event);
 extern int __perf_event_disable(void *info);
 extern void perf_event_task_tick(void);
-extern void perf_restore_debug_store(void);
 #else
 static inline void
 perf_event_task_sched_in(struct task_struct *prev,
@@ -798,6 +797,11 @@ static inline void perf_event_enable(struct perf_event *event)		{ }
 static inline void perf_event_disable(struct perf_event *event)		{ }
 static inline int __perf_event_disable(void *info)			{ return -1; }
 static inline void perf_event_task_tick(void)				{ }
+#endif
+
+#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
+extern void perf_restore_debug_store(void);
+#else
 static inline void perf_restore_debug_store(void)			{ }
 #endif
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch for-] perf,x86: fix link failure for non-Intel configs
  2013-03-17 22:49 ` [patch for-] perf,x86: fix link failure for non-Intel configs David Rientjes
@ 2013-03-18  0:12   ` Stephane Eranian
  0 siblings, 0 replies; 14+ messages in thread
From: Stephane Eranian @ 2013-03-18  0:12 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Andrew Morton, Arnaldo Carvalho de Melo, Jiri Olsa, Andi Kleen,
	LKML

On Sun, Mar 17, 2013 at 11:49 PM, David Rientjes <rientjes@google.com> wrote:
> Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
> suspend/resume") introduces a link failure since
> perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:
>
>         arch/x86/power/built-in.o: In function `restore_processor_state':
>         (.text+0x45c): undefined reference to `perf_restore_debug_store'
>
> Fix it by defining the dummy function appropriately.
>
> Signed-off-by: David Rientjes <rientjes@google.com>

Yeah, missed that. Too bad we have to pull SUP_INTEL into
a generic header file. But the alternative would be to use a weak()
function in core.c which is not nicer, in my opinion.

Acked-by: Stephane Eranian <eranian@google.com>

> ---
>  include/linux/perf_event.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -758,7 +758,6 @@ extern void perf_event_enable(struct perf_event *event);
>  extern void perf_event_disable(struct perf_event *event);
>  extern int __perf_event_disable(void *info);
>  extern void perf_event_task_tick(void);
> -extern void perf_restore_debug_store(void);
>  #else
>  static inline void
>  perf_event_task_sched_in(struct task_struct *prev,
> @@ -798,6 +797,11 @@ static inline void perf_event_enable(struct perf_event *event)             { }
>  static inline void perf_event_disable(struct perf_event *event)                { }
>  static inline int __perf_event_disable(void *info)                     { return -1; }
>  static inline void perf_event_task_tick(void)                          { }
> +#endif
> +
> +#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
> +extern void perf_restore_debug_store(void);
> +#else
>  static inline void perf_restore_debug_store(void)                      { }
>  #endif
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-03-18  0:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-15 13:26 [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume Stephane Eranian
2013-03-15 17:01 ` Linus Torvalds
2013-03-15 20:49   ` Thomas Gleixner
2013-03-15 20:31 ` Greg KH
2013-03-15 20:49   ` Stephane Eranian
2013-03-15 20:56     ` Greg KH
2013-03-15 20:53   ` Shuah Khan
2013-03-15 20:56     ` Stephane Eranian
2013-03-16  0:45       ` Shuah Khan
2013-03-16 16:11 ` Parag Warudkar
2013-03-16 17:56   ` Linus Torvalds
2013-03-16 22:06     ` Parag Warudkar
2013-03-17 22:49 ` [patch for-] perf,x86: fix link failure for non-Intel configs David Rientjes
2013-03-18  0:12   ` Stephane Eranian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.