* kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic [not found] <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com> @ 2016-10-10 15:37 ` CAI Qian 2016-10-10 17:09 ` Rob Herring 2016-10-10 17:20 ` Greg Kroah-Hartman 0 siblings, 2 replies; 18+ messages in thread From: CAI Qian @ 2016-10-10 15:37 UTC (permalink / raw) To: Rob Herring; +Cc: linux-kernel, Greg Kroah-Hartman Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's mainline master HEAD. http://people.redhat.com/qcai/tmp/config-kasan-remove [ 36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff] [ 36.339108] Intel CQM monitoring enabled [ 36.343507] Intel MBM enabled [ 36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer [ 36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules [ 36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules [ 36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules [ 36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules [ 36.410064] kasan: CONFIG_KASAN_INLINE enabled [ 36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 36.423111] general protection fault: 0000 [#1] PREEMPT SMP KASAN [ 36.429911] Modules linked in: [ 36.433331] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4 [ 36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [ 36.451974] task: ffff880e524d0000 task.stack: ffff880852880000 [ 36.458578] RIP: 0010:[<ffffffff81ea08c0>] [<ffffffff81ea08c0>] device_del+0x80/0x700 [ 36.467431] RSP: 0000:ffff880852887938 EFLAGS: 00010246 [ 36.473357] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff10109e6f101 [ 36.481319] RDX: dffffc0000000000 RSI: 000000000000000b RDI: 0000000000000000 [ 36.489281] RBP: ffff8808528879e8 R08: 0000000000000001 R09: 0000000000000000 [ 36.497243] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880e501b4b00 [ 36.505208] R13: ffff880e31988480 R14: 0000000000000001 R15: ffff880e31988480 [ 36.513171] FS: 0000000000000000(0000) GS:ffff88085ec80000(0000) knlGS:0000000000000000 [ 36.522201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.528613] CR2: 0000000000000000 CR3: 0000000002e0a000 CR4: 00000000003406e0 [ 36.536576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 36.544537] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 36.552499] Stack: [ 36.554742] 1ffff1010a510f28 1ffff1010a510f2c ffffffff82d3abe4 ffffffff81a6d060 [ 36.563037] 0000000000000296 0000000041b58ab3 ffffffff82d48cc5 ffffffff81ea0840 [ 36.571329] ffffffff828a3040 ffff880800000000 ffff880852887980 ffffffff82f0ba20 [ 36.579624] Call Trace: [ 36.582355] [<ffffffff81a6d060>] ? idr_mark_full+0xc0/0xc0 [ 36.588573] [<ffffffff81ea0840>] ? cleanup_glue_dir+0xe0/0xe0 [ 36.595086] [<ffffffff814c228d>] perf_pmu_unregister+0x18d/0x530 [ 36.601890] [<ffffffff826f8811>] ? _raw_spin_unlock+0x31/0x50 [ 36.608393] [<ffffffff8103c54e>] ? uncore_pcibus_to_physid+0x10e/0x1c0 [ 36.615766] [<ffffffff810418ee>] uncore_pci_remove+0x24e/0x440 [ 36.622375] [<ffffffff81b91662>] pci_device_remove+0xa2/0x1e0 [ 36.628888] [<ffffffff81eadd01>] driver_probe_device+0x171/0xd50 [ 36.635688] [<ffffffff81eae8e0>] ? driver_probe_device+0xd50/0xd50 [ 36.642685] [<ffffffff81eaea79>] __driver_attach+0x199/0x1e0 [ 36.649097] [<ffffffff81ea7fc6>] bus_for_each_dev+0x126/0x1e0 [ 36.655607] [<ffffffff81ea7ea0>] ? subsys_dev_iter_exit+0x10/0x10 [ 36.662508] [<ffffffff812103ae>] ? preempt_count_sub+0x5e/0xe0 [ 36.669105] [<ffffffff81eacc1d>] driver_attach+0x3d/0x50 [ 36.675129] [<ffffffff81eabd84>] bus_add_driver+0x554/0x790 [ 36.681444] [<ffffffff81eb067c>] driver_register+0x18c/0x3b0 [ 36.687861] [<ffffffff812b3212>] ? __raw_spin_lock_init+0x32/0x100 [ 36.694854] [<ffffffff81b8bbea>] __pci_register_driver+0x13a/0x1e0 [ 36.701853] [<ffffffff83492467>] intel_uncore_init+0x465/0x54f [ 36.708459] [<ffffffff83492002>] ? uncore_type_init+0x4d6/0x4d6 [ 36.715165] [<ffffffff81002299>] do_one_initcall+0xa9/0x240 [ 36.721473] [<ffffffff810021f0>] ? initcall_blacklisted+0x180/0x180 [ 36.728568] [<ffffffff811f5a10>] ? parse_args+0x520/0x990 [ 36.734692] [<ffffffff811d5bc2>] ? __usermodehelper_set_disable_depth+0x42/0x50 [ 36.742948] [<ffffffff83485d1f>] kernel_init_freeable+0x540/0x610 [ 36.749845] [<ffffffff834857df>] ? start_kernel+0x70d/0x70d [ 36.756161] [<ffffffff826f88ad>] ? _raw_spin_unlock_irq+0x3d/0x60 [ 36.763060] [<ffffffff8120eb19>] ? finish_task_switch+0x189/0x6c0 [ 36.769957] [<ffffffff8120eaeb>] ? finish_task_switch+0x15b/0x6c0 [ 36.776857] [<ffffffff826e0060>] ? rest_init+0x160/0x160 [ 36.782875] [<ffffffff826e0073>] kernel_init+0x13/0x120 [ 36.788802] [<ffffffff826e0060>] ? rest_init+0x160/0x160 [ 36.794826] [<ffffffff826f93ba>] ret_from_fork+0x2a/0x40 [ 36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 [ 36.822549] RIP [<ffffffff81ea08c0>] device_del+0x80/0x700 [ 36.828778] RSP <ffff880852887938> [ 36.832743] ---[ end trace f3cec3a0c6cb2258 ]--- [ 36.838054] Kernel panic - not syncing: Fatal exception [ 36.843967] ---[ end Kernel panic - not syncing: Fatal exception ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian @ 2016-10-10 17:09 ` Rob Herring 2016-10-10 18:25 ` CAI Qian 2016-10-10 17:20 ` Greg Kroah-Hartman 1 sibling, 1 reply; 18+ messages in thread From: Rob Herring @ 2016-10-10 17:09 UTC (permalink / raw) To: CAI Qian; +Cc: linux-kernel, Greg Kroah-Hartman On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian <caiqian@redhat.com> wrote: > Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's > mainline master HEAD. Looks like it is catching what it is supposed to. Though looking through the code, I haven't found where the problem is. Does bind and unbind for this normally work? > http://people.redhat.com/qcai/tmp/config-kasan-remove > > [ 36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > [ 36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff] > [ 36.339108] Intel CQM monitoring enabled > [ 36.343507] Intel MBM enabled > [ 36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer > [ 36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules > [ 36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules > [ 36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules > [ 36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules > [ 36.410064] kasan: CONFIG_KASAN_INLINE enabled > [ 36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory access > [ 36.423111] general protection fault: 0000 [#1] PREEMPT SMP KASAN > [ 36.429911] Modules linked in: > [ 36.433331] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4 > [ 36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 > [ 36.451974] task: ffff880e524d0000 task.stack: ffff880852880000 > [ 36.458578] RIP: 0010:[<ffffffff81ea08c0>] [<ffffffff81ea08c0>] device_del+0x80/0x700 > [ 36.467431] RSP: 0000:ffff880852887938 EFLAGS: 00010246 > [ 36.473357] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff10109e6f101 > [ 36.481319] RDX: dffffc0000000000 RSI: 000000000000000b RDI: 0000000000000000 > [ 36.489281] RBP: ffff8808528879e8 R08: 0000000000000001 R09: 0000000000000000 > [ 36.497243] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880e501b4b00 > [ 36.505208] R13: ffff880e31988480 R14: 0000000000000001 R15: ffff880e31988480 > [ 36.513171] FS: 0000000000000000(0000) GS:ffff88085ec80000(0000) knlGS:0000000000000000 > [ 36.522201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 36.528613] CR2: 0000000000000000 CR3: 0000000002e0a000 CR4: 00000000003406e0 > [ 36.536576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 36.544537] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 36.552499] Stack: > [ 36.554742] 1ffff1010a510f28 1ffff1010a510f2c ffffffff82d3abe4 ffffffff81a6d060 > [ 36.563037] 0000000000000296 0000000041b58ab3 ffffffff82d48cc5 ffffffff81ea0840 > [ 36.571329] ffffffff828a3040 ffff880800000000 ffff880852887980 ffffffff82f0ba20 > [ 36.579624] Call Trace: > [ 36.582355] [<ffffffff81a6d060>] ? idr_mark_full+0xc0/0xc0 > [ 36.588573] [<ffffffff81ea0840>] ? cleanup_glue_dir+0xe0/0xe0 > [ 36.595086] [<ffffffff814c228d>] perf_pmu_unregister+0x18d/0x530 > [ 36.601890] [<ffffffff826f8811>] ? _raw_spin_unlock+0x31/0x50 > [ 36.608393] [<ffffffff8103c54e>] ? uncore_pcibus_to_physid+0x10e/0x1c0 > [ 36.615766] [<ffffffff810418ee>] uncore_pci_remove+0x24e/0x440 > [ 36.622375] [<ffffffff81b91662>] pci_device_remove+0xa2/0x1e0 > [ 36.628888] [<ffffffff81eadd01>] driver_probe_device+0x171/0xd50 > [ 36.635688] [<ffffffff81eae8e0>] ? driver_probe_device+0xd50/0xd50 > [ 36.642685] [<ffffffff81eaea79>] __driver_attach+0x199/0x1e0 > [ 36.649097] [<ffffffff81ea7fc6>] bus_for_each_dev+0x126/0x1e0 > [ 36.655607] [<ffffffff81ea7ea0>] ? subsys_dev_iter_exit+0x10/0x10 > [ 36.662508] [<ffffffff812103ae>] ? preempt_count_sub+0x5e/0xe0 > [ 36.669105] [<ffffffff81eacc1d>] driver_attach+0x3d/0x50 > [ 36.675129] [<ffffffff81eabd84>] bus_add_driver+0x554/0x790 > [ 36.681444] [<ffffffff81eb067c>] driver_register+0x18c/0x3b0 > [ 36.687861] [<ffffffff812b3212>] ? __raw_spin_lock_init+0x32/0x100 > [ 36.694854] [<ffffffff81b8bbea>] __pci_register_driver+0x13a/0x1e0 > [ 36.701853] [<ffffffff83492467>] intel_uncore_init+0x465/0x54f > [ 36.708459] [<ffffffff83492002>] ? uncore_type_init+0x4d6/0x4d6 > [ 36.715165] [<ffffffff81002299>] do_one_initcall+0xa9/0x240 > [ 36.721473] [<ffffffff810021f0>] ? initcall_blacklisted+0x180/0x180 > [ 36.728568] [<ffffffff811f5a10>] ? parse_args+0x520/0x990 > [ 36.734692] [<ffffffff811d5bc2>] ? __usermodehelper_set_disable_depth+0x42/0x50 > [ 36.742948] [<ffffffff83485d1f>] kernel_init_freeable+0x540/0x610 > [ 36.749845] [<ffffffff834857df>] ? start_kernel+0x70d/0x70d > [ 36.756161] [<ffffffff826f88ad>] ? _raw_spin_unlock_irq+0x3d/0x60 > [ 36.763060] [<ffffffff8120eb19>] ? finish_task_switch+0x189/0x6c0 > [ 36.769957] [<ffffffff8120eaeb>] ? finish_task_switch+0x15b/0x6c0 > [ 36.776857] [<ffffffff826e0060>] ? rest_init+0x160/0x160 > [ 36.782875] [<ffffffff826e0073>] kernel_init+0x13/0x120 > [ 36.788802] [<ffffffff826e0060>] ? rest_init+0x160/0x160 > [ 36.794826] [<ffffffff826f93ba>] ret_from_fork+0x2a/0x40 > [ 36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 > [ 36.822549] RIP [<ffffffff81ea08c0>] device_del+0x80/0x700 > [ 36.828778] RSP <ffff880852887938> > [ 36.832743] ---[ end trace f3cec3a0c6cb2258 ]--- > [ 36.838054] Kernel panic - not syncing: Fatal exception > [ 36.843967] ---[ end Kernel panic - not syncing: Fatal exception ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 17:09 ` Rob Herring @ 2016-10-10 18:25 ` CAI Qian 0 siblings, 0 replies; 18+ messages in thread From: CAI Qian @ 2016-10-10 18:25 UTC (permalink / raw) To: Rob Herring; +Cc: linux-kernel, Greg Kroah-Hartman ----- Original Message ----- > From: "Rob Herring" <robh@kernel.org> > To: "CAI Qian" <caiqian@redhat.com> > Cc: "linux-kernel" <linux-kernel@vger.kernel.org>, "Greg Kroah-Hartman" <gregkh@linuxfoundation.org> > Sent: Monday, October 10, 2016 1:09:43 PM > Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic > > On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian <caiqian@redhat.com> wrote: > > Not sure if anyone reported this before. With this kernel config, it is > > 100% kernel panic so far with today's > > mainline master HEAD. > > Looks like it is catching what it is supposed to. Though looking > through the code, I haven't found where the problem is. Does bind and > unbind for this normally work? I am not sure. It just panic at the bootup. If you can tell me debugging steps you want to run, I can help test it out. CAI qian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian 2016-10-10 17:09 ` Rob Herring @ 2016-10-10 17:20 ` Greg Kroah-Hartman 2016-10-10 18:15 ` Rob Herring 1 sibling, 1 reply; 18+ messages in thread From: Greg Kroah-Hartman @ 2016-10-10 17:20 UTC (permalink / raw) To: CAI Qian; +Cc: Rob Herring, linux-kernel On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote: > Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's > mainline master HEAD. > > http://people.redhat.com/qcai/tmp/config-kasan-remove Oh it breaks things with kasan disabled as well :) See Laszlo's bug report already a few hours ago, Rob is on it... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 17:20 ` Greg Kroah-Hartman @ 2016-10-10 18:15 ` Rob Herring 2016-10-10 18:22 ` CAI Qian 2016-10-19 14:45 ` [4.9-rc1+] intel_uncore builtin " CAI Qian 0 siblings, 2 replies; 18+ messages in thread From: Rob Herring @ 2016-10-10 18:15 UTC (permalink / raw) To: Greg Kroah-Hartman; +Cc: CAI Qian, linux-kernel On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote: >> Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's >> mainline master HEAD. >> >> http://people.redhat.com/qcai/tmp/config-kasan-remove > > Oh it breaks things with kasan disabled as well :) > > See Laszlo's bug report already a few hours ago, Rob is on it... I think this one is different though. It has a remove() hook. Rob ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 18:15 ` Rob Herring @ 2016-10-10 18:22 ` CAI Qian 2016-10-10 19:34 ` Rob Herring 2016-10-19 14:45 ` [4.9-rc1+] intel_uncore builtin " CAI Qian 1 sibling, 1 reply; 18+ messages in thread From: CAI Qian @ 2016-10-10 18:22 UTC (permalink / raw) To: Rob Herring; +Cc: Greg Kroah-Hartman, linux-kernel ----- Original Message ----- > From: "Rob Herring" <robh@kernel.org> > To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org> > Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org> > Sent: Monday, October 10, 2016 2:15:29 PM > Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic > > On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote: > >> Not sure if anyone reported this before. With this kernel config, it is > >> 100% kernel panic so far with today's > >> mainline master HEAD. > >> > >> http://people.redhat.com/qcai/tmp/config-kasan-remove > > > > Oh it breaks things with kasan disabled as well :) > > > > See Laszlo's bug report already a few hours ago, Rob is on it... > > I think this one is different though. It has a remove() hook. FYI, this can also be reproduced without kasan. CAI Qian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 18:22 ` CAI Qian @ 2016-10-10 19:34 ` Rob Herring 2016-10-10 20:09 ` CAI Qian 0 siblings, 1 reply; 18+ messages in thread From: Rob Herring @ 2016-10-10 19:34 UTC (permalink / raw) To: CAI Qian; +Cc: Greg Kroah-Hartman, linux-kernel On Mon, Oct 10, 2016 at 1:22 PM, CAI Qian <caiqian@redhat.com> wrote: > > > ----- Original Message ----- >> From: "Rob Herring" <robh@kernel.org> >> To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org> >> Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org> >> Sent: Monday, October 10, 2016 2:15:29 PM >> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic >> >> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote: >> >> Not sure if anyone reported this before. With this kernel config, it is >> >> 100% kernel panic so far with today's >> >> mainline master HEAD. >> >> >> >> http://people.redhat.com/qcai/tmp/config-kasan-remove >> > >> > Oh it breaks things with kasan disabled as well :) >> > >> > See Laszlo's bug report already a few hours ago, Rob is on it... >> >> I think this one is different though. It has a remove() hook. > FYI, this can also be reproduced without kasan. Is the backtrace the same in that case? Rob ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 19:34 ` Rob Herring @ 2016-10-10 20:09 ` CAI Qian 0 siblings, 0 replies; 18+ messages in thread From: CAI Qian @ 2016-10-10 20:09 UTC (permalink / raw) To: Rob Herring; +Cc: Greg Kroah-Hartman, linux-kernel > Is the backtrace the same in that case? Very close. I saw "intel" there, and here is the list those modules on the system. # lsmod | grep intel intel_rapl 20480 0 intel_powerclamp 16384 0 kvm_intel 208896 0 kvm 630784 1 kvm_intel ghash_clmulni_intel 16384 0 aesni_intel 167936 0 lrw 16384 1 aesni_intel glue_helper 16384 1 aesni_intel ablk_helper 16384 1 aesni_intel cryptd 24576 3 ablk_helper,ghash_clmulni_intel,aesni_intel crc32c_intel 24576 1 [ 17.884926] BUG: unable to handle kernel NULL pointer dereference at (null) [ 17.893700] IP: [<ffffffff81546ff7>] device_del+0x17/0x280 [ 17.899848] PGD 0 [ 17.902109] Oops: 0000 [#1] PREEMPT SMP [ 17.906394] Modules linked in: [ 17.909823] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.8.0-remove-nokasan+ #5 [ 17.917985] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [ 17.929347] task: ffff8810556c8000 task.stack: ffffc90000078000 [ 17.935955] RIP: 0010:[<ffffffff81546ff7>] [<ffffffff81546ff7>] device_del+0x17/0x280 [ 17.944811] RSP: 0000:ffffc9000007bc00 EFLAGS: 00010286 [ 17.950742] RAX: 0000000000000000 RBX: ffff88085c8e3c00 RCX: 0000000000000001 [ 17.958708] RDX: ffff881059d60000 RSI: 000000000000000b RDI: 0000000000000000 [ 17.966675] RBP: ffffc9000007bc38 R08: 00000000d38c0f63 R09: 0000000000000000 [ 17.974640] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 17.982606] R13: ffff881054099000 R14: 0000000000000001 R15: 0000000000000000 [ 17.990574] FS: 0000000000000000(0000) GS:ffff88105e400000(0000) knlGS:0000000000000000 [ 17.999606] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.006022] CR2: 0000000000000000 CR3: 0000000001c06000 CR4: 00000000003406e0 [ 18.013989] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.021954] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.029919] Stack: [ 18.032163] 0000000000000000 00000000dd652bd0 ffff88085c8e3c00 ffff88085c8e3c00 [ 18.040475] ffff88085c8e3400 ffff881054099000 0000000000000001 ffffc9000007bc58 [ 18.048788] ffffffff811c9680 ffff88085c8e3c00 ffff88085c8e3400 ffffc9000007bc88 [ 18.057090] Call Trace: [ 18.059819] [<ffffffff811c9680>] perf_pmu_unregister+0x90/0x150 [ 18.066529] [<ffffffff81017678>] uncore_pci_remove+0xc8/0x160 [ 18.073044] [<ffffffff814428c9>] pci_device_remove+0x39/0xc0 [ 18.079468] [<ffffffff8154bf4e>] driver_probe_device+0xbe/0x4d0 [ 18.086176] [<ffffffff8154c443>] __driver_attach+0xe3/0xf0 [ 18.092399] [<ffffffff8154c360>] ? driver_probe_device+0x4d0/0x4d0 [ 18.099400] [<ffffffff81549b43>] bus_for_each_dev+0x73/0xc0 [ 18.105722] [<ffffffff8154b7de>] driver_attach+0x1e/0x20 [ 18.111752] [<ffffffff8154b290>] bus_add_driver+0x200/0x270 [ 18.118078] [<ffffffff8154d160>] driver_register+0x60/0xe0 [ 18.124303] [<ffffffff81440ee0>] __pci_register_driver+0x60/0x70 [ 18.131117] [<ffffffff81f1e6e1>] intel_uncore_init+0x277/0x2df [ 18.137728] [<ffffffff81f1e46a>] ? uncore_type_init+0x15f/0x15f [ 18.144441] [<ffffffff81002190>] do_one_initcall+0x50/0x190 [ 18.150768] [<ffffffff810c5bf1>] ? parse_args+0x2d1/0x490 [ 18.156894] [<ffffffff81f19243>] kernel_init_freeable+0x1ff/0x29e [ 18.163801] [<ffffffff817dd840>] ? rest_init+0x140/0x140 [ 18.169831] [<ffffffff817dd84e>] kernel_init+0xe/0x100 [ 18.175668] [<ffffffff817e957a>] ret_from_fork+0x2a/0x40 [ 18.181695] Code: e8 cf d4 29 00 5b 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fc 48 83 ec 18 <4c> 8b 2f 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 87 [ 18.203631] RIP [<ffffffff81546ff7>] device_del+0x17/0x280 [ 18.209867] RSP <ffffc9000007bc00> [ 18.213759] CR2: 0000000000000000 [ 18.217548] ---[ end trace 91188545987fc9d9 ]--- [ 18.222706] Kernel panic - not syncing: Fatal exception [ 18.228692] ---[ end Kernel panic - not syncing: Fatal exception ^ permalink raw reply [flat|nested] 18+ messages in thread
* [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-10 18:15 ` Rob Herring 2016-10-10 18:22 ` CAI Qian @ 2016-10-19 14:45 ` CAI Qian 2016-10-19 19:19 ` Jiri Olsa 1 sibling, 1 reply; 18+ messages in thread From: CAI Qian @ 2016-10-19 14:45 UTC (permalink / raw) To: Rob Herring, Jiri Olsa, Peter Zijlstra, Kan Liang Cc: Greg Kroah-Hartman, linux-kernel, Ingo Molnar It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e., not compiled it as a module. The can still be reproduced in the yesterday's mainline. Here is some information about the system, Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0. Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz [ 66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff] [ 66.369911] Intel CQM monitoring enabled [ 66.374445] Intel MBM enabled [ 66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer [ 66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules [ 66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules [ 66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules [ 66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules [ 66.434040] ================================================================================ [ 66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17 [ 66.450653] member access within null pointer of type 'struct device' [ 66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48 [ 66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [ 66.477168] ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf [ 66.485469] ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000 [ 66.493770] ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309 [ 66.502073] Call Trace: [ 66.504811] [<ffffffff81d370b4>] dump_stack+0xc0/0x12c [ 66.510644] [<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4 [ 66.517548] [<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a [ 66.523574] [<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434 [ 66.531253] [<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120 [ 66.537667] [<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a [ 66.543985] [<ffffffff82241acc>] device_del+0x6fc/0x860 [ 66.549917] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 [ 66.557494] [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140 [ 66.564202] [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0 [ 66.571006] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 [ 66.577619] [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0 [ 66.584422] [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510 [ 66.591025] [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240 [ 66.597539] [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0 [ 66.604340] [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0 [ 66.611334] [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230 [ 66.617749] [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200 [ 66.624264] [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110 [ 66.631258] [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100 [ 66.638349] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 [ 66.644959] [<ffffffff8224eaa2>] driver_attach+0x42/0x70 [ 66.650976] [<ffffffff8224d846>] bus_add_driver+0x406/0x870 [ 66.657292] [<ffffffff822535b9>] driver_register+0x1a9/0x3d0 [ 66.663704] [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120 [ 66.670700] [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0 [ 66.677694] [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180 [ 66.684694] [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c [ 66.691300] [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344 [ 66.698006] [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb [ 66.704710] [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0 [ 66.711025] [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0 [ 66.718116] [<ffffffff8132687d>] ? up_write+0x7d/0x120 [ 66.723949] [<ffffffff81326800>] ? up_read+0x40/0x40 [ 66.729587] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 [ 66.737165] [<ffffffff8130db04>] ? __wake_up+0x44/0x50 [ 66.743000] [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768 [ 66.749900] [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751 [ 66.756219] [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0 [ 66.763013] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 [ 66.769039] [<ffffffff82c704d3>] kernel_init+0x13/0x140 [ 66.774967] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 [ 66.780993] [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40 [ 66.787019] ================================================================================ [ 66.796479] kasan: CONFIG_KASAN_INLINE enabled [ 66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [ 66.817878] Modules linked in: [ 66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48 [ 66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [ 66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000 [ 66.847225] RIP: 0010:[<ffffffff82241466>] [<ffffffff82241466>] device_del+0x96/0x860 [ 66.856076] RSP: 0000:ffff880847aff868 EFLAGS: 00010246 [ 66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06 [ 66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007 [ 66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930 [ 66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258 [ 66.901824] FS: 0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000 [ 66.910853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0 [ 66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 66.941154] Stack: [ 66.943396] ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920 [ 66.951698] 0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870 [ 66.959997] ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0 [ 66.968296] Call Trace: [ 66.971025] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 [ 66.978603] [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140 [ 66.985309] [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0 [ 66.992111] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 [ 66.998720] [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0 [ 67.005523] [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510 [ 67.012131] [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240 [ 67.018641] [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0 [ 67.025442] [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0 [ 67.032437] [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230 [ 67.038852] [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200 [ 67.045361] [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110 [ 67.052357] [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100 [ 67.059450] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 [ 67.066056] [<ffffffff8224eaa2>] driver_attach+0x42/0x70 [ 67.072081] [<ffffffff8224d846>] bus_add_driver+0x406/0x870 [ 67.078397] [<ffffffff822535b9>] driver_register+0x1a9/0x3d0 [ 67.084809] [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120 [ 67.091803] [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0 [ 67.098798] [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180 [ 67.105792] [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c [ 67.112399] [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344 [ 67.119103] [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb [ 67.125806] [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0 [ 67.132124] [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0 [ 67.139215] [<ffffffff8132687d>] ? up_write+0x7d/0x120 [ 67.145046] [<ffffffff81326800>] ? up_read+0x40/0x40 [ 67.150684] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 [ 67.158262] [<ffffffff8130db04>] ? __wake_up+0x44/0x50 [ 67.164094] [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768 [ 67.170992] [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751 [ 67.177310] [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0 [ 67.184111] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 [ 67.190137] [<ffffffff82c704d3>] kernel_init+0x13/0x140 [ 67.196064] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 [ 67.202090] [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40 [ 67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 [ 67.229872] RIP [<ffffffff82241466>] device_del+0x96/0x860 [ 67.236101] RSP <ffff880847aff868> [ 67.240059] ---[ end trace 69358e866a1e3f6c ]--- [ 67.245377] Kernel panic - not syncing: Fatal exception [ 67.251271] ---[ end Kernel panic - not syncing: Fatal exception ----- Original Message ----- > From: "Rob Herring" <robh@kernel.org> > To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org> > Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org> > Sent: Monday, October 10, 2016 2:15:29 PM > Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic > > On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote: > >> Not sure if anyone reported this before. With this kernel config, it is > >> 100% kernel panic so far with today's > >> mainline master HEAD. > >> > >> http://people.redhat.com/qcai/tmp/config-kasan-remove > > > > Oh it breaks things with kasan disabled as well :) > > > > See Laszlo's bug report already a few hours ago, Rob is on it... > > I think this one is different though. It has a remove() hook. > > Rob > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-19 14:45 ` [4.9-rc1+] intel_uncore builtin " CAI Qian @ 2016-10-19 19:19 ` Jiri Olsa 2016-10-19 20:18 ` CAI Qian 2016-10-20 5:39 ` Peter Zijlstra 0 siblings, 2 replies; 18+ messages in thread From: Jiri Olsa @ 2016-10-19 19:19 UTC (permalink / raw) To: CAI Qian Cc: Rob Herring, Peter Zijlstra, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar On Wed, Oct 19, 2016 at 10:45:31AM -0400, CAI Qian wrote: > It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e., > not compiled it as a module. The can still be reproduced in the yesterday's mainline. > > Here is some information about the system, > > Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0. > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > [ 66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > [ 66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff] > [ 66.369911] Intel CQM monitoring enabled > [ 66.374445] Intel MBM enabled > [ 66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer > [ 66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules > [ 66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules > [ 66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules > [ 66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules > [ 66.434040] ================================================================================ > [ 66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17 > [ 66.450653] member access within null pointer of type 'struct device' > [ 66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48 > [ 66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 > [ 66.477168] ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf > [ 66.485469] ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000 > [ 66.493770] ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309 > [ 66.502073] Call Trace: > [ 66.504811] [<ffffffff81d370b4>] dump_stack+0xc0/0x12c > [ 66.510644] [<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4 > [ 66.517548] [<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a > [ 66.523574] [<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434 > [ 66.531253] [<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120 > [ 66.537667] [<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a > [ 66.543985] [<ffffffff82241acc>] device_del+0x6fc/0x860 > [ 66.549917] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ 66.557494] [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140 > [ 66.564202] [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0 > [ 66.571006] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 > [ 66.577619] [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0 > [ 66.584422] [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510 > [ 66.591025] [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240 > [ 66.597539] [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0 > [ 66.604340] [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0 > [ 66.611334] [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230 > [ 66.617749] [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200 > [ 66.624264] [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110 > [ 66.631258] [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100 > [ 66.638349] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 > [ 66.644959] [<ffffffff8224eaa2>] driver_attach+0x42/0x70 > [ 66.650976] [<ffffffff8224d846>] bus_add_driver+0x406/0x870 > [ 66.657292] [<ffffffff822535b9>] driver_register+0x1a9/0x3d0 > [ 66.663704] [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120 > [ 66.670700] [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0 > [ 66.677694] [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180 > [ 66.684694] [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c > [ 66.691300] [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344 > [ 66.698006] [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb > [ 66.704710] [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0 > [ 66.711025] [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0 > [ 66.718116] [<ffffffff8132687d>] ? up_write+0x7d/0x120 > [ 66.723949] [<ffffffff81326800>] ? up_read+0x40/0x40 > [ 66.729587] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ 66.737165] [<ffffffff8130db04>] ? __wake_up+0x44/0x50 > [ 66.743000] [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768 > [ 66.749900] [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751 > [ 66.756219] [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0 > [ 66.763013] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 > [ 66.769039] [<ffffffff82c704d3>] kernel_init+0x13/0x140 > [ 66.774967] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 > [ 66.780993] [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40 > [ 66.787019] ================================================================================ > [ 66.796479] kasan: CONFIG_KASAN_INLINE enabled > [ 66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access > [ 66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN > [ 66.817878] Modules linked in: > [ 66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48 > [ 66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 > [ 66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000 > [ 66.847225] RIP: 0010:[<ffffffff82241466>] [<ffffffff82241466>] device_del+0x96/0x860 > [ 66.856076] RSP: 0000:ffff880847aff868 EFLAGS: 00010246 > [ 66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06 > [ 66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007 > [ 66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930 > [ 66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258 > [ 66.901824] FS: 0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000 > [ 66.910853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0 > [ 66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 66.941154] Stack: > [ 66.943396] ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920 > [ 66.951698] 0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870 > [ 66.959997] ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0 > [ 66.968296] Call Trace: > [ 66.971025] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ 66.978603] [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140 > [ 66.985309] [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0 > [ 66.992111] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 > [ 66.998720] [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0 > [ 67.005523] [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510 > [ 67.012131] [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240 > [ 67.018641] [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0 > [ 67.025442] [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0 > [ 67.032437] [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230 > [ 67.038852] [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200 > [ 67.045361] [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110 > [ 67.052357] [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100 > [ 67.059450] [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0 > [ 67.066056] [<ffffffff8224eaa2>] driver_attach+0x42/0x70 > [ 67.072081] [<ffffffff8224d846>] bus_add_driver+0x406/0x870 > [ 67.078397] [<ffffffff822535b9>] driver_register+0x1a9/0x3d0 > [ 67.084809] [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120 > [ 67.091803] [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0 > [ 67.098798] [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180 > [ 67.105792] [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c > [ 67.112399] [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344 > [ 67.119103] [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb > [ 67.125806] [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0 > [ 67.132124] [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0 > [ 67.139215] [<ffffffff8132687d>] ? up_write+0x7d/0x120 > [ 67.145046] [<ffffffff81326800>] ? up_read+0x40/0x40 > [ 67.150684] [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ 67.158262] [<ffffffff8130db04>] ? __wake_up+0x44/0x50 > [ 67.164094] [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768 > [ 67.170992] [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751 > [ 67.177310] [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0 > [ 67.184111] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 > [ 67.190137] [<ffffffff82c704d3>] kernel_init+0x13/0x140 > [ 67.196064] [<ffffffff82c704c0>] ? rest_init+0x190/0x190 > [ 67.202090] [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40 > [ 67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 > [ 67.229872] RIP [<ffffffff82241466>] device_del+0x96/0x860 > [ 67.236101] RSP <ffff880847aff868> > [ 67.240059] ---[ end trace 69358e866a1e3f6c ]--- > [ 67.245377] Kernel panic - not syncing: Fatal exception > [ 67.251271] ---[ end Kernel panic - not syncing: Fatal exception I think the reason here is that presume pmu devices are always added, but we add them only if pmu_bus_running (in perf_event_sysfs_init) is set which might happen after uncore initcall attached patch fixes the issue for me jirka --- diff --git a/kernel/events/core.c b/kernel/events/core.c index c6e47e97b33f..c2099b799d16 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu) idr_remove(&pmu_idr, pmu->type); if (pmu->nr_addr_filters) device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); - device_del(pmu->dev); - put_device(pmu->dev); + if (pmu_bus_running) { + device_del(pmu->dev); + put_device(pmu->dev); + } free_pmu_context(pmu); } EXPORT_SYMBOL_GPL(perf_pmu_unregister); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-19 19:19 ` Jiri Olsa @ 2016-10-19 20:18 ` CAI Qian 2016-10-20 5:39 ` Peter Zijlstra 1 sibling, 0 replies; 18+ messages in thread From: CAI Qian @ 2016-10-19 20:18 UTC (permalink / raw) To: Jiri Olsa Cc: Rob Herring, Peter Zijlstra, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar > I think the reason here is that presume pmu devices are always added, > but we add them only if pmu_bus_running (in perf_event_sysfs_init) > is set which might happen after uncore initcall > > attached patch fixes the issue for me Tested-by: CAI Qian <caiqian@redhat.com> > > jirka > > > --- > diff --git a/kernel/events/core.c b/kernel/events/core.c > index c6e47e97b33f..c2099b799d16 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu) > idr_remove(&pmu_idr, pmu->type); > if (pmu->nr_addr_filters) > device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > - device_del(pmu->dev); > - put_device(pmu->dev); > + if (pmu_bus_running) { > + device_del(pmu->dev); > + put_device(pmu->dev); > + } > free_pmu_context(pmu); > } > EXPORT_SYMBOL_GPL(perf_pmu_unregister); > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-19 19:19 ` Jiri Olsa 2016-10-19 20:18 ` CAI Qian @ 2016-10-20 5:39 ` Peter Zijlstra 2016-10-20 8:58 ` Jiri Olsa 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2016-10-20 5:39 UTC (permalink / raw) To: Jiri Olsa Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar On Wed, Oct 19, 2016 at 09:19:43PM +0200, Jiri Olsa wrote: > I think the reason here is that presume pmu devices are always added, > but we add them only if pmu_bus_running (in perf_event_sysfs_init) > is set which might happen after uncore initcall > > attached patch fixes the issue for me Right, we never expected to be unloaded before userspace runs. Strictly speaking we should only read pmu_bus_running while holding pmus_lock, that way we're serialized against perf_event_sysfs_init() flipping it while we're being removed etc.. With the current setup the introduced race is harmless, but who knows what other crazy these device people will come up with ;-) > --- > diff --git a/kernel/events/core.c b/kernel/events/core.c > index c6e47e97b33f..c2099b799d16 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu) > idr_remove(&pmu_idr, pmu->type); > if (pmu->nr_addr_filters) > device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > - device_del(pmu->dev); > - put_device(pmu->dev); > + if (pmu_bus_running) { > + device_del(pmu->dev); > + put_device(pmu->dev); > + } > free_pmu_context(pmu); > } > EXPORT_SYMBOL_GPL(perf_pmu_unregister); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-20 5:39 ` Peter Zijlstra @ 2016-10-20 8:58 ` Jiri Olsa 2016-10-20 9:04 ` Peter Zijlstra 0 siblings, 1 reply; 18+ messages in thread From: Jiri Olsa @ 2016-10-20 8:58 UTC (permalink / raw) To: Peter Zijlstra Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar On Thu, Oct 20, 2016 at 07:39:44AM +0200, Peter Zijlstra wrote: > On Wed, Oct 19, 2016 at 09:19:43PM +0200, Jiri Olsa wrote: > > I think the reason here is that presume pmu devices are always added, > > but we add them only if pmu_bus_running (in perf_event_sysfs_init) > > is set which might happen after uncore initcall > > > > attached patch fixes the issue for me > > Right, we never expected to be unloaded before userspace runs. > > Strictly speaking we should only read pmu_bus_running while holding > pmus_lock, that way we're serialized against perf_event_sysfs_init() > flipping it while we're being removed etc.. > > With the current setup the introduced race is harmless, but who knows > what other crazy these device people will come up with ;-) > right, did not think of that ;-) also I did not noticed device_remove_file call for pmu->nr_addr_filters and we could save one lock/unlock call later.. I'm testing attached patch now thanks, jirka --- diff --git a/kernel/events/core.c b/kernel/events/core.c index c6e47e97b33f..224dffbc3b9b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8581,24 +8581,24 @@ static void update_pmu_context(struct pmu *pmu, struct pmu *old_pmu) } } +/* + * The pmus_lock lock must be taken. + */ static void free_pmu_context(struct pmu *pmu) { struct pmu *i; - mutex_lock(&pmus_lock); /* * Like a real lame refcount. */ list_for_each_entry(i, &pmus, entry) { if (i->pmu_cpu_context == pmu->pmu_cpu_context) { update_pmu_context(i, pmu); - goto out; + return; } } free_percpu(pmu->pmu_cpu_context); -out: - mutex_unlock(&pmus_lock); } /* @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu) free_percpu(pmu->pmu_disable_count); if (pmu->type >= PERF_TYPE_MAX) idr_remove(&pmu_idr, pmu->type); - if (pmu->nr_addr_filters) - device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); - device_del(pmu->dev); - put_device(pmu->dev); + mutex_lock(&pmus_lock); + if (pmu_bus_running) { + if (pmu->nr_addr_filters) + device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); + device_del(pmu->dev); + put_device(pmu->dev); + } free_pmu_context(pmu); + mutex_unlock(&pmus_lock); } EXPORT_SYMBOL_GPL(perf_pmu_unregister); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-20 8:58 ` Jiri Olsa @ 2016-10-20 9:04 ` Peter Zijlstra 2016-10-20 9:42 ` Jiri Olsa 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2016-10-20 9:04 UTC (permalink / raw) To: Jiri Olsa Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote: > @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu) > free_percpu(pmu->pmu_disable_count); > if (pmu->type >= PERF_TYPE_MAX) > idr_remove(&pmu_idr, pmu->type); > - if (pmu->nr_addr_filters) > - device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > - device_del(pmu->dev); > - put_device(pmu->dev); > + mutex_lock(&pmus_lock); > + if (pmu_bus_running) { > + if (pmu->nr_addr_filters) > + device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > + device_del(pmu->dev); > + put_device(pmu->dev); > + } > free_pmu_context(pmu); > + mutex_unlock(&pmus_lock); > } > EXPORT_SYMBOL_GPL(perf_pmu_unregister); I think that is still racy.. unregister: sysfs_init: mutex_lock(&pmus_lock); list_del_rcu(&pmu->entry); mutex_unlock(&pmus_lock); synchronize_*rcu(); mutex_lock(&pmus_lock); list_for_each_entry(pmu, &pmus, entry) { /* add device muck */ /* will _NOT_ see our PMU */ } pmus_bus_running = 1; mutex_unlock(&pmus_lock); mutex_lock(&pmus_lock); if (pmu_bus_running) { device_del() /* OOPS */ What you want is to read pmu_bus_running in the same pmus_lock section as we do the list_del, and then use that local copy later. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-20 9:04 ` Peter Zijlstra @ 2016-10-20 9:42 ` Jiri Olsa 2016-10-20 11:10 ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa 0 siblings, 1 reply; 18+ messages in thread From: Jiri Olsa @ 2016-10-20 9:42 UTC (permalink / raw) To: Peter Zijlstra Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar On Thu, Oct 20, 2016 at 11:04:16AM +0200, Peter Zijlstra wrote: > On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote: > > > @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu) > > free_percpu(pmu->pmu_disable_count); > > if (pmu->type >= PERF_TYPE_MAX) > > idr_remove(&pmu_idr, pmu->type); > > - if (pmu->nr_addr_filters) > > - device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > > - device_del(pmu->dev); > > - put_device(pmu->dev); > > + mutex_lock(&pmus_lock); > > + if (pmu_bus_running) { > > + if (pmu->nr_addr_filters) > > + device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > > + device_del(pmu->dev); > > + put_device(pmu->dev); > > + } > > free_pmu_context(pmu); > > + mutex_unlock(&pmus_lock); > > } > > EXPORT_SYMBOL_GPL(perf_pmu_unregister); > > I think that is still racy.. > > > unregister: sysfs_init: > > mutex_lock(&pmus_lock); > list_del_rcu(&pmu->entry); > mutex_unlock(&pmus_lock); > > synchronize_*rcu(); > > mutex_lock(&pmus_lock); > list_for_each_entry(pmu, &pmus, entry) { > /* add device muck */ ah, I thought this part would add the device back.. but it's already out of the pmu list.. right :-\ thanks, jirka > /* will _NOT_ see our PMU */ > } > pmus_bus_running = 1; > mutex_unlock(&pmus_lock); > > mutex_lock(&pmus_lock); > if (pmu_bus_running) { > device_del() /* OOPS */ > > > What you want is to read pmu_bus_running in the same pmus_lock section > as we do the list_del, and then use that local copy later. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-20 9:42 ` Jiri Olsa @ 2016-10-20 11:10 ` Jiri Olsa 2016-10-20 14:30 ` CAI Qian 2016-10-28 10:10 ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa 0 siblings, 2 replies; 18+ messages in thread From: Jiri Olsa @ 2016-10-20 11:10 UTC (permalink / raw) To: Peter Zijlstra Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar On Thu, Oct 20, 2016 at 11:42:59AM +0200, Jiri Olsa wrote: > On Thu, Oct 20, 2016 at 11:04:16AM +0200, Peter Zijlstra wrote: > > On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote: > > > > > @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu) > > > free_percpu(pmu->pmu_disable_count); > > > if (pmu->type >= PERF_TYPE_MAX) > > > idr_remove(&pmu_idr, pmu->type); > > > - if (pmu->nr_addr_filters) > > > - device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > > > - device_del(pmu->dev); > > > - put_device(pmu->dev); > > > + mutex_lock(&pmus_lock); > > > + if (pmu_bus_running) { > > > + if (pmu->nr_addr_filters) > > > + device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); > > > + device_del(pmu->dev); > > > + put_device(pmu->dev); > > > + } > > > free_pmu_context(pmu); > > > + mutex_unlock(&pmus_lock); > > > } > > > EXPORT_SYMBOL_GPL(perf_pmu_unregister); > > > > I think that is still racy.. > > > > > > unregister: sysfs_init: > > > > mutex_lock(&pmus_lock); > > list_del_rcu(&pmu->entry); > > mutex_unlock(&pmus_lock); > > > > synchronize_*rcu(); > > > > mutex_lock(&pmus_lock); > > list_for_each_entry(pmu, &pmus, entry) { > > /* add device muck */ > > ah, I thought this part would add the device back.. but it's > already out of the pmu list.. right :-\ attached fix, thanks jirka --- CAI Qian reported crash [1] in uncore device removal related to CONFIG_DEBUG_TEST_DRIVER_REMOVE option. The reason for crash is that perf_pmu_unregister tries to remove pmu device which is not added at this point. We add pmu devices only after pmu_bus is registered which happens in perf_event_sysfs_init init call and sets pmu_bus_running flag. The fix is to get the pmu_bus_running flag state at the point the pmu is taken out of the pmus list and remove the device later only if it's set. [1] https://marc.info/?l=linux-kernel&m=147688837328451 Reported-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- kernel/events/core.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index c6e47e97b33f..a5d2e62faf7e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8855,7 +8855,10 @@ EXPORT_SYMBOL_GPL(perf_pmu_register); void perf_pmu_unregister(struct pmu *pmu) { + int remove_device; + mutex_lock(&pmus_lock); + remove_device = pmu_bus_running; list_del_rcu(&pmu->entry); mutex_unlock(&pmus_lock); @@ -8869,10 +8872,12 @@ void perf_pmu_unregister(struct pmu *pmu) free_percpu(pmu->pmu_disable_count); if (pmu->type >= PERF_TYPE_MAX) idr_remove(&pmu_idr, pmu->type); - if (pmu->nr_addr_filters) - device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); - device_del(pmu->dev); - put_device(pmu->dev); + if (remove_device) { + if (pmu->nr_addr_filters) + device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); + device_del(pmu->dev); + put_device(pmu->dev); + } free_pmu_context(pmu); } EXPORT_SYMBOL_GPL(perf_pmu_unregister); -- 2.7.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic 2016-10-20 11:10 ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa @ 2016-10-20 14:30 ` CAI Qian 2016-10-28 10:10 ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa 1 sibling, 0 replies; 18+ messages in thread From: CAI Qian @ 2016-10-20 14:30 UTC (permalink / raw) To: Jiri Olsa Cc: Peter Zijlstra, Rob Herring, Kan Liang, Greg Kroah-Hartman, linux-kernel, Ingo Molnar > CAI Qian reported crash [1] in uncore device removal related > to CONFIG_DEBUG_TEST_DRIVER_REMOVE option. > > The reason for crash is that perf_pmu_unregister tries to remove > pmu device which is not added at this point. We add pmu devices > only after pmu_bus is registered which happens in perf_event_sysfs_init > init call and sets pmu_bus_running flag. > > The fix is to get the pmu_bus_running flag state at the point > the pmu is taken out of the pmus list and remove the device > later only if it's set. > > [1] https://marc.info/?l=linux-kernel&m=147688837328451 > > Reported-by: CAI Qian <caiqian@redhat.com> > Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: CAI Qian <caiqian@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic 2016-10-20 11:10 ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa 2016-10-20 14:30 ` CAI Qian @ 2016-10-28 10:10 ` tip-bot for Jiri Olsa 1 sibling, 0 replies; 18+ messages in thread From: tip-bot for Jiri Olsa @ 2016-10-28 10:10 UTC (permalink / raw) To: linux-tip-commits Cc: peterz, mingo, jolsa, kan.liang, caiqian, gregkh, tglx, robh, torvalds, alexander.shishkin, acme, linux-kernel, jolsa, hpa Commit-ID: 0933840acf7b65d6d30a5b6089d882afea57aca3 Gitweb: http://git.kernel.org/tip/0933840acf7b65d6d30a5b6089d882afea57aca3 Author: Jiri Olsa <jolsa@redhat.com> AuthorDate: Thu, 20 Oct 2016 13:10:11 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 28 Oct 2016 11:06:25 +0200 perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic CAI Qian reported a crash in the PMU uncore device removal code, enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option: https://marc.info/?l=linux-kernel&m=147688837328451 The reason for the crash is that perf_pmu_unregister() tries to remove a PMU device which is not added at this point. We add PMU devices only after pmu_bus is registered, which happens in the perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag. The fix is to get the 'pmu_bus_running' flag state at the point the PMU is taken out of the PMU list and remove the device later only if it's set. Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/events/core.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index c6e47e9..a5d2e62 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8855,7 +8855,10 @@ EXPORT_SYMBOL_GPL(perf_pmu_register); void perf_pmu_unregister(struct pmu *pmu) { + int remove_device; + mutex_lock(&pmus_lock); + remove_device = pmu_bus_running; list_del_rcu(&pmu->entry); mutex_unlock(&pmus_lock); @@ -8869,10 +8872,12 @@ void perf_pmu_unregister(struct pmu *pmu) free_percpu(pmu->pmu_disable_count); if (pmu->type >= PERF_TYPE_MAX) idr_remove(&pmu_idr, pmu->type); - if (pmu->nr_addr_filters) - device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); - device_del(pmu->dev); - put_device(pmu->dev); + if (remove_device) { + if (pmu->nr_addr_filters) + device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); + device_del(pmu->dev); + put_device(pmu->dev); + } free_pmu_context(pmu); } EXPORT_SYMBOL_GPL(perf_pmu_unregister); ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-10-28 10:10 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com> 2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian 2016-10-10 17:09 ` Rob Herring 2016-10-10 18:25 ` CAI Qian 2016-10-10 17:20 ` Greg Kroah-Hartman 2016-10-10 18:15 ` Rob Herring 2016-10-10 18:22 ` CAI Qian 2016-10-10 19:34 ` Rob Herring 2016-10-10 20:09 ` CAI Qian 2016-10-19 14:45 ` [4.9-rc1+] intel_uncore builtin " CAI Qian 2016-10-19 19:19 ` Jiri Olsa 2016-10-19 20:18 ` CAI Qian 2016-10-20 5:39 ` Peter Zijlstra 2016-10-20 8:58 ` Jiri Olsa 2016-10-20 9:04 ` Peter Zijlstra 2016-10-20 9:42 ` Jiri Olsa 2016-10-20 11:10 ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa 2016-10-20 14:30 ` CAI Qian 2016-10-28 10:10 ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).