From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754010AbaIYQiN (ORCPT ); Thu, 25 Sep 2014 12:38:13 -0400 Received: from mail-lb0-f179.google.com ([209.85.217.179]:35581 "EHLO mail-lb0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751981AbaIYQiK (ORCPT ); Thu, 25 Sep 2014 12:38:10 -0400 MIME-Version: 1.0 In-Reply-To: References: <20140908185115.GI6758@twins.programming.kicks-ass.net> <20140910083136.GP6758@twins.programming.kicks-ass.net> <541059C9.1040200@oracle.com> <20140910143306.GD4783@worktop.ger.corp.intel.com> Date: Thu, 25 Sep 2014 09:38:08 -0700 Message-ID: Subject: Re: perf: perf_fuzzer triggers instant reboot From: Cong Wang To: Vince Weaver Cc: Peter Zijlstra , Sasha Levin , "linux-kernel@vger.kernel.org" , Paul Mackerras , Ingo Molnar , Arnaldo Carvalho de Melo Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 24, 2014 at 9:59 PM, Vince Weaver wrote: > > So I noticed Cong Wang's patch (3577af70a2ce4853d58e57d832e687d739281479) > perf: Fix a race condition in perf_remove_from_context() > > and that sounds a lot like the weird fork()/memory-corruption bug that the > fuzzer has been triggering. > > So I applied that patch alone on top of the 3.17-rc4 kernel that I could > reproducibly reboot... and with the patch I can't trigger the problem > anymore. > > Now that just might mean the patch pushed the code around enough so my > test doesn't trigger, but there is hope that maybe this fixes things. I read this as it fixes your crash as well? > > Cong Wang, do you have more info on how you came across this bug? And how > you tracked down the problem? Sure, as I said in the changelog, it is a soft lockup which was triggered on dozens of machines here, it is actually pretty straightforward: [5108912.562963] BUG: soft lockup - CPU#7 stuck for 22s! [perf:13856] [5108912.563173] Modules linked in: netconsole configfs ipv6 bonding dm_multipath video sbs sbshc hed acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler dell_rbu igb dcdbas shpchp i2c_i801 i2c_core iTCO_wdt i7core_edac edac_core iTCO_vendor_support ioatdma dca microcode [5108912.563198] CPU 7 [5108912.563199] Modules linked in: netconsole configfs ipv6 bonding dm_multipath video sbs sbshc hed acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler dell_rbu igb dcdbas shpchp i2c_i801 i2c_core iTCO_wdt i7core_edac edac_core iTCO_vendor_support ioatdma dca microcode [5108912.563216] [5108912.563219] Pid: 13856, comm: perf Not tainted 3.4.78 #1 Dell Inc. C6100 /0D61XP [5108912.563222] RIP: 0010:[] [] perf_remove_from_context+0x8d/0xb4 [5108912.563233] RSP: 0018:ffff8809ea39bd48 EFLAGS: 00000202 [5108912.563235] RAX: 000000000000006d RBX: ffffffff810d6dcc RCX: 0000000000000000 [5108912.563237] RDX: ffff88123fc8006d RSI: ffffffff810d6dcc RDI: ffff8808f8541c0c [5108912.563239] RBP: ffff8809ea39bd88 R08: 0000000000000001 R09: 0000000000000000 [5108912.563241] R10: ffff8809abf95610 R11: ffff880a3a6c331c R12: 0000000000000000 [5108912.563243] R13: 00000000000000ef R14: 0000000000000001 R15: 0000000000000000 [5108912.563245] FS: 0000000000000000(0000) GS:ffff88123fc20000(0000) knlGS:0000000000000000 [5108912.563248] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [5108912.563250] CR2: 00007f692e787180 CR3: 0000000001a0b000 CR4: 00000000000007e0 [5108912.563252] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [5108912.563254] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [5108912.563256] Process perf (pid: 13856, threadinfo ffff8809ea39a000, task ffff880a2167c470) [5108912.563258] Stack: [5108912.563260] ffff880a3a6c32c0 ffff88048c7c3820 ffff8809ea39bd88 ffff88048c7c3800 [5108912.563265] ffff88048c7c3800 ffff8808f8541c00 ffff8808f8541c10 ffff88091b56c000 [5108912.563269] ffff8809ea39bdb8 ffffffff810d9bbe ffff8809ea39bdb8 ffff880a2167c470 [5108912.563273] Call Trace: [5108912.563278] [] perf_event_release_kernel+0x77/0x91 [5108912.563282] [] put_event+0x7e/0x86 [5108912.563285] [] perf_release+0x10/0x14 [5108912.563291] [] __fput+0xfe/0x1f6 [5108912.563294] [] fput+0x1a/0x1c [5108912.563297] [] filp_close+0x72/0x7d [5108912.563303] [] put_files_struct+0x6c/0xc3 [5108912.563306] [] exit_files+0x41/0x46 [5108912.563309] [] do_exit+0x292/0x3b6 [5108912.563312] [] do_group_exit+0x7d/0xa5 [5108912.563315] [] sys_exit_group+0x17/0x1b [5108912.563320] [] system_call_fastpath+0x16/0x1b