linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf: perf_fuzzer triggers GPF in perf_prepare_sample
@ 2018-12-04 15:54 Vince Weaver
  2018-12-05 12:45 ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-12-04 15:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim

Hello,

I was able to trigger another oops with the perf_fuzzer with current git.

This is 4.20-rc5 after the fix for the very similar oops I previously 
reported got committed.

It seems to be pointing to the same location in the source as 
before, I guess maybe triggered a different way?

Unfortunately this crash is not easily reproducible like the last one was.

kernel/events/core.c:6393

if (sample_type & PERF_SAMPLE_CALLCHAIN) {
                int size = 1;

                if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                        data->callchain = perf_callchain(event, regs);

>>>>>>>>>       size += data->callchain->nr;

                header->size += size * sizeof(u64);
        }


Vince

[45050.698745] general protection fault: 0000 [#1] SMP PTI
[45050.698745] CPU: 5 PID: 13475 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5 #124
[45050.698746] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[45050.698746] RIP: 0010:perf_prepare_sample+0x82/0x4a0
[45050.698746] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
[45050.698747] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082
[45050.698747] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f
[45050.698747] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000
[45050.698748] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0
[45050.698748] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40
[45050.698748] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420
[45050.698748] FS:  00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000
[45050.698749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45050.698749] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0
[45050.698749] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000
[45050.698749] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[45050.698750] Call Trace:
[45050.698750]  intel_pmu_drain_bts_buffer+0x151/0x220
[45050.698750]  ? mem_cgroup_commit_charge+0x7a/0x510
[45050.698750]  ? wp_page_copy+0x39e/0x650
[45050.698750]  ? reuse_swap_page+0x129/0x340
[45050.698751]  ? _raw_spin_unlock+0xa/0x10
[45050.698751]  ? do_wp_page+0x30f/0x4d0
[45050.698751]  ? finish_mkwrite_fault+0x140/0x140
[45050.698751]  ? __handle_mm_fault+0xb22/0x12c0
[45050.698751]  intel_pmu_handle_irq+0x6d/0x160
[45050.698752]  perf_event_nmi_handler+0x2d/0x50
[45050.698752]  nmi_handle+0x63/0x110
[45050.698752]  default_do_nmi+0x4e/0x100
[45050.698752]  do_nmi+0x112/0x170
[45050.698752]  nmi+0x8b/0xd4
[45050.698753] RIP: 0033:0x558a6a6366c3
[45050.698753] Code: 01 d0 48 c1 e0 06 48 89 c2 48 8d 05 cf 93 23 00 48 8b 04 02 48 85 c0 74 11 8b 45 f8 3b 45 f4 75 05 8b 45 fc eb 16 83 45 f8 01 <83> 45 fc 01 81 7d fc 9f 86 01 00 7e 96 b8 ff ff ff ff c9 c3 55 48
[45050.698753] RSP: 002b:00007ffc9f521660 EFLAGS: 00000246
[45050.698754] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030
[45050.698754] RDX: 000000000000e740 RSI: 00007ffc9f521634 RDI: 00007fab6612c740
[45050.698754] RBP: 00007ffc9f521670 R08: 00007fab6612c1f0 R09: 00007fab6612c240
[45050.698754] R10: 00007fab661337d0 R11: 0000000000000246 R12: 0000558a6a6364c0
[45050.698755] R13: 00007ffc9f523ad0 R14: 0000000000000000 R15: 0000000000000000
[45050.698755] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel coretemp tpm_tis snd_hda_codec snd_hda_core kvm_intel tpm_tis_core i915 snd_hwdep kvm tpm snd_pcm rng_core wmi_bmof mei_me sg iosf_mbi irqbypass drm_kms_helper evdev crct10dif_pclmul drm mei iTCO_wdt i2c_algo_bit iTCO_vendor_support snd_timer pcc_cpufreq crc32_pclmul ghash_clmulni_intel aesni_intel snd video aes_x86_64 crypto_simd cryptd glue_helper soundcore pcspkr wmi button binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod cdrom ahci libahci ehci_pci xhci_pci libata xhci_hcd ehci_hcd lpc_ich mfd_core crc32c_intel scsi_mod e1000e i2c_i801 usbcore usb_common fan thermal[45051.027024] ---[ end trace 9565944010fbdf23 ]---
[45051.027024] RIP: 0010:perf_prepare_sample+0x82/0x4a0
[45051.027025] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
[45051.027025] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082
[45051.027025] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f
[45051.027025] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000
[45051.027026] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0
[45051.027026] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40
[45051.027026] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420
[45051.027027] FS:  00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000
[45051.027027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45051.027027] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0
[45051.027027] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000
[45051.027027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[45051.027028] Kernel panic - not syncing: Fatal exception in interrupt
[45051.027051] Kernel Offset: disabled
[45051.149441] ---[ end Kernel panic - not syncing: Fatal exception in interrupt]---

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-04 15:54 perf: perf_fuzzer triggers GPF in perf_prepare_sample Vince Weaver
@ 2018-12-05 12:45 ` Jiri Olsa
  2018-12-05 16:38   ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2018-12-05 12:45 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim

On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote:
> Hello,
> 
> I was able to trigger another oops with the perf_fuzzer with current git.
> 
> This is 4.20-rc5 after the fix for the very similar oops I previously 
> reported got committed.
> 
> It seems to be pointing to the same location in the source as 
> before, I guess maybe triggered a different way?

nice.. yep, looks the same

> 
> Unfortunately this crash is not easily reproducible like the last one was.

will check

jirka

> 
> kernel/events/core.c:6393
> 
> if (sample_type & PERF_SAMPLE_CALLCHAIN) {
>                 int size = 1;
> 
>                 if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
>                         data->callchain = perf_callchain(event, regs);
> 
> >>>>>>>>>       size += data->callchain->nr;
> 
>                 header->size += size * sizeof(u64);
>         }
> 
> 
> Vince
> 
> [45050.698745] general protection fault: 0000 [#1] SMP PTI
> [45050.698745] CPU: 5 PID: 13475 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5 #124
> [45050.698746] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [45050.698746] RIP: 0010:perf_prepare_sample+0x82/0x4a0
> [45050.698746] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
> [45050.698747] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082
> [45050.698747] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f
> [45050.698747] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000
> [45050.698748] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0
> [45050.698748] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40
> [45050.698748] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420
> [45050.698748] FS:  00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000
> [45050.698749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [45050.698749] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0
> [45050.698749] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000
> [45050.698749] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> [45050.698750] Call Trace:
> [45050.698750]  intel_pmu_drain_bts_buffer+0x151/0x220
> [45050.698750]  ? mem_cgroup_commit_charge+0x7a/0x510
> [45050.698750]  ? wp_page_copy+0x39e/0x650
> [45050.698750]  ? reuse_swap_page+0x129/0x340
> [45050.698751]  ? _raw_spin_unlock+0xa/0x10
> [45050.698751]  ? do_wp_page+0x30f/0x4d0
> [45050.698751]  ? finish_mkwrite_fault+0x140/0x140
> [45050.698751]  ? __handle_mm_fault+0xb22/0x12c0
> [45050.698751]  intel_pmu_handle_irq+0x6d/0x160
> [45050.698752]  perf_event_nmi_handler+0x2d/0x50
> [45050.698752]  nmi_handle+0x63/0x110
> [45050.698752]  default_do_nmi+0x4e/0x100
> [45050.698752]  do_nmi+0x112/0x170
> [45050.698752]  nmi+0x8b/0xd4
> [45050.698753] RIP: 0033:0x558a6a6366c3
> [45050.698753] Code: 01 d0 48 c1 e0 06 48 89 c2 48 8d 05 cf 93 23 00 48 8b 04 02 48 85 c0 74 11 8b 45 f8 3b 45 f4 75 05 8b 45 fc eb 16 83 45 f8 01 <83> 45 fc 01 81 7d fc 9f 86 01 00 7e 96 b8 ff ff ff ff c9 c3 55 48
> [45050.698753] RSP: 002b:00007ffc9f521660 EFLAGS: 00000246
> [45050.698754] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030
> [45050.698754] RDX: 000000000000e740 RSI: 00007ffc9f521634 RDI: 00007fab6612c740
> [45050.698754] RBP: 00007ffc9f521670 R08: 00007fab6612c1f0 R09: 00007fab6612c240
> [45050.698754] R10: 00007fab661337d0 R11: 0000000000000246 R12: 0000558a6a6364c0
> [45050.698755] R13: 00007ffc9f523ad0 R14: 0000000000000000 R15: 0000000000000000
> [45050.698755] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel coretemp tpm_tis snd_hda_codec snd_hda_core kvm_intel tpm_tis_core i915 snd_hwdep kvm tpm snd_pcm rng_core wmi_bmof mei_me sg iosf_mbi irqbypass drm_kms_helper evdev crct10dif_pclmul drm mei iTCO_wdt i2c_algo_bit iTCO_vendor_support snd_timer pcc_cpufreq crc32_pclmul ghash_clmulni_intel aesni_intel snd video aes_x86_64 crypto_simd cryptd glue_helper soundcore pcspkr wmi button binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod cdrom ahci libahci ehci_pci xhci_pci libata xhci_hcd ehci_hcd lpc_ich mfd_core crc32c_intel scsi_mod e1000e i2c_i801 usbcore usb_common fan thermal[45051.027024] ---[ end trace 9565944010fbdf23 ]---
> [45051.027024] RIP: 0010:perf_prepare_sample+0x82/0x4a0
> [45051.027025] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
> [45051.027025] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082
> [45051.027025] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f
> [45051.027025] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000
> [45051.027026] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0
> [45051.027026] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40
> [45051.027026] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420
> [45051.027027] FS:  00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000
> [45051.027027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [45051.027027] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0
> [45051.027027] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000
> [45051.027027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> [45051.027028] Kernel panic - not syncing: Fatal exception in interrupt
> [45051.027051] Kernel Offset: disabled
> [45051.149441] ---[ end Kernel panic - not syncing: Fatal exception in interrupt]---

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-05 12:45 ` Jiri Olsa
@ 2018-12-05 16:38   ` Jiri Olsa
  2018-12-05 17:11     ` Vince Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2018-12-05 16:38 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim

On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote:
> On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote:
> > Hello,
> > 
> > I was able to trigger another oops with the perf_fuzzer with current git.
> > 
> > This is 4.20-rc5 after the fix for the very similar oops I previously 
> > reported got committed.
> > 
> > It seems to be pointing to the same location in the source as 
> > before, I guess maybe triggered a different way?
> 
> nice.. yep, looks the same
> 
> > 
> > Unfortunately this crash is not easily reproducible like the last one was.
> 
> will check

what model are hitting this on?

jirka

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-05 16:38   ` Jiri Olsa
@ 2018-12-05 17:11     ` Vince Weaver
  2018-12-05 18:33       ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-12-05 17:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim

On Wed, 5 Dec 2018, Jiri Olsa wrote:

> On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote:
> > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote:
> > > Hello,
> > > 
> > > I was able to trigger another oops with the perf_fuzzer with current git.
> > > 
> > > This is 4.20-rc5 after the fix for the very similar oops I previously 
> > > reported got committed.
> > > 
> > > It seems to be pointing to the same location in the source as 
> > > before, I guess maybe triggered a different way?
> > 
> > nice.. yep, looks the same
> > 
> > > 
> > > Unfortunately this crash is not easily reproducible like the last one was.
> > 
> > will check
> 
> what model are hitting this on?

Haswell.  6/60/3.

While I can't deterministically trigger this, the fuzzer usually hits it
within an hour or two.  Is there any debug or printk messages I can
add that would help figure out what's going on?

Vince



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-05 17:11     ` Vince Weaver
@ 2018-12-05 18:33       ` Jiri Olsa
  2018-12-06 15:35         ` Vince Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2018-12-05 18:33 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	Andi Kleen

On Wed, Dec 05, 2018 at 12:11:19PM -0500, Vince Weaver wrote:
> On Wed, 5 Dec 2018, Jiri Olsa wrote:
> 
> > On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote:
> > > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote:
> > > > Hello,
> > > > 
> > > > I was able to trigger another oops with the perf_fuzzer with current git.
> > > > 
> > > > This is 4.20-rc5 after the fix for the very similar oops I previously 
> > > > reported got committed.
> > > > 
> > > > It seems to be pointing to the same location in the source as 
> > > > before, I guess maybe triggered a different way?
> > > 
> > > nice.. yep, looks the same
> > > 
> > > > 
> > > > Unfortunately this crash is not easily reproducible like the last one was.
> > > 
> > > will check
> > 
> > what model are hitting this on?
> 
> Haswell.  6/60/3.
> 
> While I can't deterministically trigger this, the fuzzer usually hits it
> within an hour or two.  Is there any debug or printk messages I can
> add that would help figure out what's going on?

I can't see how we could end up with that config other than
some corruption.. the only way I see could be that we touch
cpu->events array without checking its active_mask bit

but that does not explain why the crash happened in the same
place as before

jirka


---
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ecc3e34ca955..9a2fd5a68d87 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2404,7 +2404,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	struct cpu_hw_events *cpuc;
 	int loops;
 	u64 status;
-	int handled;
+	int handled = 0;
 	int pmu_enabled;
 
 	cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -2423,8 +2423,10 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	intel_bts_disable_local();
 	cpuc->enabled = 0;
 	__intel_pmu_disable_all();
-	handled = intel_pmu_drain_bts_buffer();
-	handled += intel_bts_interrupt();
+	if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
+		handled += intel_pmu_drain_bts_buffer();
+		handled += intel_bts_interrupt();
+	}
 	status = intel_pmu_get_status();
 	if (!status)
 		goto done;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-05 18:33       ` Jiri Olsa
@ 2018-12-06 15:35         ` Vince Weaver
  2018-12-06 15:44           ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-12-06 15:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	Andi Kleen

On Wed, 5 Dec 2018, Jiri Olsa wrote:

> On Wed, Dec 05, 2018 at 12:11:19PM -0500, Vince Weaver wrote:
> > On Wed, 5 Dec 2018, Jiri Olsa wrote:
> > 
> > > On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote:
> > > > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote:
> > > > > Hello,
> > > > > 
> > > > > I was able to trigger another oops with the perf_fuzzer with current git.
> > > > > 
> > > > > This is 4.20-rc5 after the fix for the very similar oops I previously 
> > > > > reported got committed.
> > > > > 
> > > > > It seems to be pointing to the same location in the source as 
> > > > > before, I guess maybe triggered a different way?
> > > > 
> > > > nice.. yep, looks the same
> > > > 
> > > > > 
> > > > > Unfortunately this crash is not easily reproducible like the last one was.
> > > > 
> > > > will check
> > > 
> > > what model are hitting this on?
> > 
> > Haswell.  6/60/3.
> > 
> > While I can't deterministically trigger this, the fuzzer usually hits it
> > within an hour or two.  Is there any debug or printk messages I can
> > add that would help figure out what's going on?
> 
> I can't see how we could end up with that config other than
> some corruption.. the only way I see could be that we touch
> cpu->events array without checking its active_mask bit
> 
> but that does not explain why the crash happened in the same
> place as before

Maybe it is a corruption issue.  I had applied my own debug patch that 
would dump some info if data->callchain was NULL.

But my debug code didn't trigger this time because it looks like 
data->callchain was "1" rather than "0".

[27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[27764.840179] PGD 0 P4D 0 
[27764.840180] Oops: 0000 [#1] SMP PTI
[27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #125
[27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014

Vince

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-06 15:35         ` Vince Weaver
@ 2018-12-06 15:44           ` Jiri Olsa
  2018-12-09  2:08             ` Vince Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2018-12-06 15:44 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	Andi Kleen

On Thu, Dec 06, 2018 at 10:35:28AM -0500, Vince Weaver wrote:
> On Wed, 5 Dec 2018, Jiri Olsa wrote:
> 
> > On Wed, Dec 05, 2018 at 12:11:19PM -0500, Vince Weaver wrote:
> > > On Wed, 5 Dec 2018, Jiri Olsa wrote:
> > > 
> > > > On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote:
> > > > > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > I was able to trigger another oops with the perf_fuzzer with current git.
> > > > > > 
> > > > > > This is 4.20-rc5 after the fix for the very similar oops I previously 
> > > > > > reported got committed.
> > > > > > 
> > > > > > It seems to be pointing to the same location in the source as 
> > > > > > before, I guess maybe triggered a different way?
> > > > > 
> > > > > nice.. yep, looks the same
> > > > > 
> > > > > > 
> > > > > > Unfortunately this crash is not easily reproducible like the last one was.
> > > > > 
> > > > > will check
> > > > 
> > > > what model are hitting this on?
> > > 
> > > Haswell.  6/60/3.
> > > 
> > > While I can't deterministically trigger this, the fuzzer usually hits it
> > > within an hour or two.  Is there any debug or printk messages I can
> > > add that would help figure out what's going on?
> > 
> > I can't see how we could end up with that config other than
> > some corruption.. the only way I see could be that we touch
> > cpu->events array without checking its active_mask bit
> > 
> > but that does not explain why the crash happened in the same
> > place as before
> 
> Maybe it is a corruption issue.  I had applied my own debug patch that 
> would dump some info if data->callchain was NULL.
> 
> But my debug code didn't trigger this time because it looks like 
> data->callchain was "1" rather than "0".
> 
> [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> [27764.840179] PGD 0 P4D 0 
> [27764.840180] Oops: 0000 [#1] SMP PTI
> [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #125
> [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014

actually, you could try that patch from my previous email?

thanks,
jirka

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-06 15:44           ` Jiri Olsa
@ 2018-12-09  2:08             ` Vince Weaver
  2018-12-09 11:55               ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-12-09  2:08 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	Andi Kleen

On Thu, 6 Dec 2018, Jiri Olsa wrote:

> On Thu, Dec 06, 2018 at 10:35:28AM -0500, Vince Weaver wrote:
> > On Wed, 5 Dec 2018, Jiri Olsa wrote:
> > Maybe it is a corruption issue.  I had applied my own debug patch that 
> > would dump some info if data->callchain was NULL.
> > 
> > But my debug code didn't trigger this time because it looks like 
> > data->callchain was "1" rather than "0".
> > 
> > [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > [27764.840179] PGD 0 P4D 0 
> > [27764.840180] Oops: 0000 [#1] SMP PTI
> > [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #125
> > [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> 
> actually, you could try that patch from my previous email?
> 
still crashes with your patch (see below)

I've also been able to replicate this crash on a skylake machine in 
addition to the haswell machine.

Vince

[28269.147232] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[28269.155628] PGD 0 P4D 0 
[28269.158360] Oops: 0000 [#1] SMP PTI
[28269.162087] CPU: 0 PID: 1189 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #128
[28269.171011] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[28269.178935] RIP: 0010:perf_prepare_sample+0x82/0x4a0
[28269.184239] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
[28269.204249] RSP: 0000:ffffc9000aca7a40 EFLAGS: 00010082
[28269.209832] RAX: 0000000000000000 RBX: ffffc9000aca7a98 RCX: ffffc9000aca7ad8
[28269.217484] RDX: 0000000000000000 RSI: ffffc9000aca7b80 RDI: ffffc9000aca7a9e
[28269.225129] RBP: 80000000000bb068 R08: 0000000000000002 R09: 00000000000215c0
[28269.232760] R10: ffff8880ce552000 R11: 0000000000000000 R12: ffffc9000aca7b80
[28269.240380] R13: ffff88803696c800 R14: ffffc9000aca7ad8 R15: ffffe8ffffc06300
[28269.248014] FS:  00007f5927fe7500(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000
[28269.256606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[28269.262739] CR2: 0000000000000000 CR3: 0000000116d98001 CR4: 00000000001607f0
[28269.270349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[28269.277968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[28269.285639] Call Trace:
[28269.288266]  intel_pmu_drain_bts_buffer+0x151/0x220
[28269.293476]  ? radix_tree_delete_item+0x69/0xc0
[28269.298378]  x86_pmu_stop+0x3b/0x90
[28269.302113]  x86_pmu_del+0x57/0x160
[28269.305840]  event_sched_out.isra.106+0x81/0x170
[28269.310780]  group_sched_out.part.108+0x51/0xc0
[28269.315634]  ctx_sched_out+0xf8/0x220
[28269.319551]  __perf_event_task_sched_out+0x18d/0x3f0
[28269.324866]  ? pick_next_task_fair+0x60a/0x660
[28269.329639]  __schedule+0x4b9/0x820
[28269.333367]  ? kill_pid_info+0x34/0x50
[28269.337360]  schedule+0x28/0x80
[28269.340725]  exit_to_usermode_loop+0x4e/0xc0
[28269.345272]  prepare_exit_to_usermode+0x53/0x80
[28269.350109]  retint_user+0x8/0x8
[28269.353541] RIP: 0033:0x56154980b6c3
[28269.357346] Code: 01 d0 48 c1 e0 06 48 89 c2 48 8d 05 cf 93 23 00 48 8b 04 02 48 85 c0 74 11 8b 45 f8 3b 45 f4 75 05 8b 45 fc eb 16 83 45 f8 01 <83> 45 fc 01 81 7d fc 9f 86 01 00 7e 96 b8 ff ff ff ff c9 c3 55 48
[28269.377462] RSP: 002b:00007ffc6a1540a0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[28269.385562] RAX: 0000000000000000 RBX: 000000000000000c RCX: 000000000000003c
[28269.393182] RDX: 0000000000b895c0 RSI: 00007ffc6a154074 RDI: 00007f5927fe0740
[28269.400835] RBP: 00007ffc6a1540b0 R08: 00007f5927fe01f0 R09: 00007f5927fe0240
[28269.408452] R10: 0000000000000000 R11: 0000000000000246 R12: 000056154980b4c0
[28269.416080] R13: 00007ffc6a156510 R14: 0000000000000000 R15: 0000000000000000
[28269.423723] Modules linked in: snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul crc32_pclmul iosf_mbi ghash_clmulni_intel drm_kms_helper aesni_intel snd_hda_codec_realtek aes_x86_64 crypto_simd drm cryptd snd_hda_codec_generic i2c_algo_bit snd_hda_intel evdev glue_helper snd_hda_codec snd_hda_core iTCO_wdt mei_me mei wmi_bmof tpm_tis snd_hwdep tpm_tis_core pcc_cpufreq pcspkr iTCO_vendor_support snd_pcm tpm sg rng_core button snd_timer video snd soundcore wmi binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod cdrom ahci xhci_pci ehci_pci libahci xhci_hcd ehci_hcd libata usbcore lpc_ich mfd_core e1000e scsi_mod i2c_i801 crc32c_intel usb_common fan thermal
[28269.492702] CR2: 0000000000000000
[28269.496246] ---[ end trace 6775846bfda0f18b ]---
[28269.501186] RIP: 0010:perf_prepare_sample+0x82/0x4a0
[28269.506482] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
[28269.526587] RSP: 0000:ffffc9000aca7a40 EFLAGS: 00010082
[28269.532176] RAX: 0000000000000000 RBX: ffffc9000aca7a98 RCX: ffffc9000aca7ad8
[28269.539805] RDX: 0000000000000000 RSI: ffffc9000aca7b80 RDI: ffffc9000aca7a9e
[28269.547450] RBP: 80000000000bb068 R08: 0000000000000002 R09: 00000000000215c0
[28269.555075] R10: ffff8880ce552000 R11: 0000000000000000 R12: ffffc9000aca7b80
[28269.562694] R13: ffff88803696c800 R14: ffffc9000aca7ad8 R15: ffffe8ffffc06300
[28269.570329] FS:  00007f5927fe7500(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000
[28269.578960] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[28269.585123] CR2: 0000000000000000 CR3: 0000000116d98001 CR4: 00000000001607f0
[28269.592740] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[28269.600358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample
  2018-12-09  2:08             ` Vince Weaver
@ 2018-12-09 11:55               ` Jiri Olsa
  0 siblings, 0 replies; 9+ messages in thread
From: Jiri Olsa @ 2018-12-09 11:55 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	Andi Kleen

On Sat, Dec 08, 2018 at 09:08:28PM -0500, Vince Weaver wrote:
> On Thu, 6 Dec 2018, Jiri Olsa wrote:
> 
> > On Thu, Dec 06, 2018 at 10:35:28AM -0500, Vince Weaver wrote:
> > > On Wed, 5 Dec 2018, Jiri Olsa wrote:
> > > Maybe it is a corruption issue.  I had applied my own debug patch that 
> > > would dump some info if data->callchain was NULL.
> > > 
> > > But my debug code didn't trigger this time because it looks like 
> > > data->callchain was "1" rather than "0".
> > > 
> > > [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > > [27764.840179] PGD 0 P4D 0 
> > > [27764.840180] Oops: 0000 [#1] SMP PTI
> > > [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #125
> > > [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> > 
> > actually, you could try that patch from my previous email?
> > 
> still crashes with your patch (see below)
> 
> I've also been able to replicate this crash on a skylake machine in 
> addition to the haswell machine.
> 
> Vince
> 
> [28269.147232] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [28269.155628] PGD 0 P4D 0 
> [28269.158360] Oops: 0000 [#1] SMP PTI
> [28269.162087] CPU: 0 PID: 1189 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #128
> [28269.171011] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [28269.178935] RIP: 0010:perf_prepare_sample+0x82/0x4a0
> [28269.184239] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
> [28269.204249] RSP: 0000:ffffc9000aca7a40 EFLAGS: 00010082
> [28269.209832] RAX: 0000000000000000 RBX: ffffc9000aca7a98 RCX: ffffc9000aca7ad8
> [28269.217484] RDX: 0000000000000000 RSI: ffffc9000aca7b80 RDI: ffffc9000aca7a9e
> [28269.225129] RBP: 80000000000bb068 R08: 0000000000000002 R09: 00000000000215c0
> [28269.232760] R10: ffff8880ce552000 R11: 0000000000000000 R12: ffffc9000aca7b80
> [28269.240380] R13: ffff88803696c800 R14: ffffc9000aca7ad8 R15: ffffe8ffffc06300
> [28269.248014] FS:  00007f5927fe7500(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000
> [28269.256606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [28269.262739] CR2: 0000000000000000 CR3: 0000000116d98001 CR4: 00000000001607f0
> [28269.270349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [28269.277968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> [28269.285639] Call Trace:
> [28269.288266]  intel_pmu_drain_bts_buffer+0x151/0x220
> [28269.293476]  ? radix_tree_delete_item+0x69/0xc0
> [28269.298378]  x86_pmu_stop+0x3b/0x90
> [28269.302113]  x86_pmu_del+0x57/0x160

nice, at least it's in different callstack context, that might help

thanks,
jirka

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-12-09 11:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-04 15:54 perf: perf_fuzzer triggers GPF in perf_prepare_sample Vince Weaver
2018-12-05 12:45 ` Jiri Olsa
2018-12-05 16:38   ` Jiri Olsa
2018-12-05 17:11     ` Vince Weaver
2018-12-05 18:33       ` Jiri Olsa
2018-12-06 15:35         ` Vince Weaver
2018-12-06 15:44           ` Jiri Olsa
2018-12-09  2:08             ` Vince Weaver
2018-12-09 11:55               ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).