All of lore.kernel.org
 help / color / mirror / Atom feed
* kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
       [not found] <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com>
@ 2016-10-10 15:37 ` CAI Qian
  2016-10-10 17:09   ` Rob Herring
  2016-10-10 17:20   ` Greg Kroah-Hartman
  0 siblings, 2 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-10 15:37 UTC (permalink / raw)
  To: Rob Herring; +Cc: linux-kernel, Greg Kroah-Hartman

Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
mainline master HEAD.

http://people.redhat.com/qcai/tmp/config-kasan-remove

[   36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
[   36.339108] Intel CQM monitoring enabled
[   36.343507] Intel MBM enabled
[   36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
[   36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[   36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules
[   36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules
[   36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[   36.410064] kasan: CONFIG_KASAN_INLINE enabled
[   36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory access
[   36.423111] general protection fault: 0000 [#1] PREEMPT SMP KASAN
[   36.429911] Modules linked in:
[   36.433331] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4
[   36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   36.451974] task: ffff880e524d0000 task.stack: ffff880852880000
[   36.458578] RIP: 0010:[<ffffffff81ea08c0>]  [<ffffffff81ea08c0>] device_del+0x80/0x700
[   36.467431] RSP: 0000:ffff880852887938  EFLAGS: 00010246
[   36.473357] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff10109e6f101
[   36.481319] RDX: dffffc0000000000 RSI: 000000000000000b RDI: 0000000000000000
[   36.489281] RBP: ffff8808528879e8 R08: 0000000000000001 R09: 0000000000000000
[   36.497243] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880e501b4b00
[   36.505208] R13: ffff880e31988480 R14: 0000000000000001 R15: ffff880e31988480
[   36.513171] FS:  0000000000000000(0000) GS:ffff88085ec80000(0000) knlGS:0000000000000000
[   36.522201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.528613] CR2: 0000000000000000 CR3: 0000000002e0a000 CR4: 00000000003406e0
[   36.536576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   36.544537] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   36.552499] Stack:
[   36.554742]  1ffff1010a510f28 1ffff1010a510f2c ffffffff82d3abe4 ffffffff81a6d060
[   36.563037]  0000000000000296 0000000041b58ab3 ffffffff82d48cc5 ffffffff81ea0840
[   36.571329]  ffffffff828a3040 ffff880800000000 ffff880852887980 ffffffff82f0ba20
[   36.579624] Call Trace:
[   36.582355]  [<ffffffff81a6d060>] ? idr_mark_full+0xc0/0xc0
[   36.588573]  [<ffffffff81ea0840>] ? cleanup_glue_dir+0xe0/0xe0
[   36.595086]  [<ffffffff814c228d>] perf_pmu_unregister+0x18d/0x530
[   36.601890]  [<ffffffff826f8811>] ? _raw_spin_unlock+0x31/0x50
[   36.608393]  [<ffffffff8103c54e>] ? uncore_pcibus_to_physid+0x10e/0x1c0
[   36.615766]  [<ffffffff810418ee>] uncore_pci_remove+0x24e/0x440
[   36.622375]  [<ffffffff81b91662>] pci_device_remove+0xa2/0x1e0
[   36.628888]  [<ffffffff81eadd01>] driver_probe_device+0x171/0xd50
[   36.635688]  [<ffffffff81eae8e0>] ? driver_probe_device+0xd50/0xd50
[   36.642685]  [<ffffffff81eaea79>] __driver_attach+0x199/0x1e0
[   36.649097]  [<ffffffff81ea7fc6>] bus_for_each_dev+0x126/0x1e0
[   36.655607]  [<ffffffff81ea7ea0>] ? subsys_dev_iter_exit+0x10/0x10
[   36.662508]  [<ffffffff812103ae>] ? preempt_count_sub+0x5e/0xe0
[   36.669105]  [<ffffffff81eacc1d>] driver_attach+0x3d/0x50
[   36.675129]  [<ffffffff81eabd84>] bus_add_driver+0x554/0x790
[   36.681444]  [<ffffffff81eb067c>] driver_register+0x18c/0x3b0
[   36.687861]  [<ffffffff812b3212>] ? __raw_spin_lock_init+0x32/0x100
[   36.694854]  [<ffffffff81b8bbea>] __pci_register_driver+0x13a/0x1e0
[   36.701853]  [<ffffffff83492467>] intel_uncore_init+0x465/0x54f
[   36.708459]  [<ffffffff83492002>] ? uncore_type_init+0x4d6/0x4d6
[   36.715165]  [<ffffffff81002299>] do_one_initcall+0xa9/0x240
[   36.721473]  [<ffffffff810021f0>] ? initcall_blacklisted+0x180/0x180
[   36.728568]  [<ffffffff811f5a10>] ? parse_args+0x520/0x990
[   36.734692]  [<ffffffff811d5bc2>] ? __usermodehelper_set_disable_depth+0x42/0x50
[   36.742948]  [<ffffffff83485d1f>] kernel_init_freeable+0x540/0x610
[   36.749845]  [<ffffffff834857df>] ? start_kernel+0x70d/0x70d
[   36.756161]  [<ffffffff826f88ad>] ? _raw_spin_unlock_irq+0x3d/0x60
[   36.763060]  [<ffffffff8120eb19>] ? finish_task_switch+0x189/0x6c0
[   36.769957]  [<ffffffff8120eaeb>] ? finish_task_switch+0x15b/0x6c0
[   36.776857]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
[   36.782875]  [<ffffffff826e0073>] kernel_init+0x13/0x120
[   36.788802]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
[   36.794826]  [<ffffffff826f93ba>] ret_from_fork+0x2a/0x40
[   36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   36.822549] RIP  [<ffffffff81ea08c0>] device_del+0x80/0x700
[   36.828778]  RSP <ffff880852887938>
[   36.832743] ---[ end trace f3cec3a0c6cb2258 ]---
[   36.838054] Kernel panic - not syncing: Fatal exception
[   36.843967] ---[ end Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian
@ 2016-10-10 17:09   ` Rob Herring
  2016-10-10 18:25     ` CAI Qian
  2016-10-10 17:20   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 18+ messages in thread
From: Rob Herring @ 2016-10-10 17:09 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-kernel, Greg Kroah-Hartman

On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian <caiqian@redhat.com> wrote:
> Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
> mainline master HEAD.

Looks like it is catching what it is supposed to. Though looking
through the code, I haven't found where the problem is. Does bind and
unbind for this normally work?

> http://people.redhat.com/qcai/tmp/config-kasan-remove
>
> [   36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [   36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
> [   36.339108] Intel CQM monitoring enabled
> [   36.343507] Intel MBM enabled
> [   36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
> [   36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
> [   36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules
> [   36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules
> [   36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
> [   36.410064] kasan: CONFIG_KASAN_INLINE enabled
> [   36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory access
> [   36.423111] general protection fault: 0000 [#1] PREEMPT SMP KASAN
> [   36.429911] Modules linked in:
> [   36.433331] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4
> [   36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   36.451974] task: ffff880e524d0000 task.stack: ffff880852880000
> [   36.458578] RIP: 0010:[<ffffffff81ea08c0>]  [<ffffffff81ea08c0>] device_del+0x80/0x700
> [   36.467431] RSP: 0000:ffff880852887938  EFLAGS: 00010246
> [   36.473357] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff10109e6f101
> [   36.481319] RDX: dffffc0000000000 RSI: 000000000000000b RDI: 0000000000000000
> [   36.489281] RBP: ffff8808528879e8 R08: 0000000000000001 R09: 0000000000000000
> [   36.497243] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880e501b4b00
> [   36.505208] R13: ffff880e31988480 R14: 0000000000000001 R15: ffff880e31988480
> [   36.513171] FS:  0000000000000000(0000) GS:ffff88085ec80000(0000) knlGS:0000000000000000
> [   36.522201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   36.528613] CR2: 0000000000000000 CR3: 0000000002e0a000 CR4: 00000000003406e0
> [   36.536576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   36.544537] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   36.552499] Stack:
> [   36.554742]  1ffff1010a510f28 1ffff1010a510f2c ffffffff82d3abe4 ffffffff81a6d060
> [   36.563037]  0000000000000296 0000000041b58ab3 ffffffff82d48cc5 ffffffff81ea0840
> [   36.571329]  ffffffff828a3040 ffff880800000000 ffff880852887980 ffffffff82f0ba20
> [   36.579624] Call Trace:
> [   36.582355]  [<ffffffff81a6d060>] ? idr_mark_full+0xc0/0xc0
> [   36.588573]  [<ffffffff81ea0840>] ? cleanup_glue_dir+0xe0/0xe0
> [   36.595086]  [<ffffffff814c228d>] perf_pmu_unregister+0x18d/0x530
> [   36.601890]  [<ffffffff826f8811>] ? _raw_spin_unlock+0x31/0x50
> [   36.608393]  [<ffffffff8103c54e>] ? uncore_pcibus_to_physid+0x10e/0x1c0
> [   36.615766]  [<ffffffff810418ee>] uncore_pci_remove+0x24e/0x440
> [   36.622375]  [<ffffffff81b91662>] pci_device_remove+0xa2/0x1e0
> [   36.628888]  [<ffffffff81eadd01>] driver_probe_device+0x171/0xd50
> [   36.635688]  [<ffffffff81eae8e0>] ? driver_probe_device+0xd50/0xd50
> [   36.642685]  [<ffffffff81eaea79>] __driver_attach+0x199/0x1e0
> [   36.649097]  [<ffffffff81ea7fc6>] bus_for_each_dev+0x126/0x1e0
> [   36.655607]  [<ffffffff81ea7ea0>] ? subsys_dev_iter_exit+0x10/0x10
> [   36.662508]  [<ffffffff812103ae>] ? preempt_count_sub+0x5e/0xe0
> [   36.669105]  [<ffffffff81eacc1d>] driver_attach+0x3d/0x50
> [   36.675129]  [<ffffffff81eabd84>] bus_add_driver+0x554/0x790
> [   36.681444]  [<ffffffff81eb067c>] driver_register+0x18c/0x3b0
> [   36.687861]  [<ffffffff812b3212>] ? __raw_spin_lock_init+0x32/0x100
> [   36.694854]  [<ffffffff81b8bbea>] __pci_register_driver+0x13a/0x1e0
> [   36.701853]  [<ffffffff83492467>] intel_uncore_init+0x465/0x54f
> [   36.708459]  [<ffffffff83492002>] ? uncore_type_init+0x4d6/0x4d6
> [   36.715165]  [<ffffffff81002299>] do_one_initcall+0xa9/0x240
> [   36.721473]  [<ffffffff810021f0>] ? initcall_blacklisted+0x180/0x180
> [   36.728568]  [<ffffffff811f5a10>] ? parse_args+0x520/0x990
> [   36.734692]  [<ffffffff811d5bc2>] ? __usermodehelper_set_disable_depth+0x42/0x50
> [   36.742948]  [<ffffffff83485d1f>] kernel_init_freeable+0x540/0x610
> [   36.749845]  [<ffffffff834857df>] ? start_kernel+0x70d/0x70d
> [   36.756161]  [<ffffffff826f88ad>] ? _raw_spin_unlock_irq+0x3d/0x60
> [   36.763060]  [<ffffffff8120eb19>] ? finish_task_switch+0x189/0x6c0
> [   36.769957]  [<ffffffff8120eaeb>] ? finish_task_switch+0x15b/0x6c0
> [   36.776857]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
> [   36.782875]  [<ffffffff826e0073>] kernel_init+0x13/0x120
> [   36.788802]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
> [   36.794826]  [<ffffffff826f93ba>] ret_from_fork+0x2a/0x40
> [   36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48
> [   36.822549] RIP  [<ffffffff81ea08c0>] device_del+0x80/0x700
> [   36.828778]  RSP <ffff880852887938>
> [   36.832743] ---[ end trace f3cec3a0c6cb2258 ]---
> [   36.838054] Kernel panic - not syncing: Fatal exception
> [   36.843967] ---[ end Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian
  2016-10-10 17:09   ` Rob Herring
@ 2016-10-10 17:20   ` Greg Kroah-Hartman
  2016-10-10 18:15     ` Rob Herring
  1 sibling, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2016-10-10 17:20 UTC (permalink / raw)
  To: CAI Qian; +Cc: Rob Herring, linux-kernel

On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
> mainline master HEAD.
> 
> http://people.redhat.com/qcai/tmp/config-kasan-remove

Oh it breaks things with kasan disabled as well :)

See Laszlo's bug report already a few hours ago, Rob is on it...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 17:20   ` Greg Kroah-Hartman
@ 2016-10-10 18:15     ` Rob Herring
  2016-10-10 18:22       ` CAI Qian
  2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
  0 siblings, 2 replies; 18+ messages in thread
From: Rob Herring @ 2016-10-10 18:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: CAI Qian, linux-kernel

On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
>> Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
>> mainline master HEAD.
>>
>> http://people.redhat.com/qcai/tmp/config-kasan-remove
>
> Oh it breaks things with kasan disabled as well :)
>
> See Laszlo's bug report already a few hours ago, Rob is on it...

I think this one is different though. It has a remove() hook.

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 18:15     ` Rob Herring
@ 2016-10-10 18:22       ` CAI Qian
  2016-10-10 19:34         ` Rob Herring
  2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
  1 sibling, 1 reply; 18+ messages in thread
From: CAI Qian @ 2016-10-10 18:22 UTC (permalink / raw)
  To: Rob Herring; +Cc: Greg Kroah-Hartman, linux-kernel



----- Original Message -----
> From: "Rob Herring" <robh@kernel.org>
> To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
> Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
FYI, this can also be reproduced without kasan.
    CAI Qian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 17:09   ` Rob Herring
@ 2016-10-10 18:25     ` CAI Qian
  0 siblings, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-10 18:25 UTC (permalink / raw)
  To: Rob Herring; +Cc: linux-kernel, Greg Kroah-Hartman



----- Original Message -----
> From: "Rob Herring" <robh@kernel.org>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "linux-kernel" <linux-kernel@vger.kernel.org>, "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
> Sent: Monday, October 10, 2016 1:09:43 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian <caiqian@redhat.com> wrote:
> > Not sure if anyone reported this before. With this kernel config, it is
> > 100% kernel panic so far with today's
> > mainline master HEAD.
> 
> Looks like it is catching what it is supposed to. Though looking
> through the code, I haven't found where the problem is. Does bind and
> unbind for this normally work?
I am not sure. It just panic at the bootup. If you can tell me debugging steps
you want to run, I can help test it out.
   CAI qian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 18:22       ` CAI Qian
@ 2016-10-10 19:34         ` Rob Herring
  2016-10-10 20:09           ` CAI Qian
  0 siblings, 1 reply; 18+ messages in thread
From: Rob Herring @ 2016-10-10 19:34 UTC (permalink / raw)
  To: CAI Qian; +Cc: Greg Kroah-Hartman, linux-kernel

On Mon, Oct 10, 2016 at 1:22 PM, CAI Qian <caiqian@redhat.com> wrote:
>
>
> ----- Original Message -----
>> From: "Rob Herring" <robh@kernel.org>
>> To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
>> Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org>
>> Sent: Monday, October 10, 2016 2:15:29 PM
>> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
>>
>> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
>> >> Not sure if anyone reported this before. With this kernel config, it is
>> >> 100% kernel panic so far with today's
>> >> mainline master HEAD.
>> >>
>> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
>> >
>> > Oh it breaks things with kasan disabled as well :)
>> >
>> > See Laszlo's bug report already a few hours ago, Rob is on it...
>>
>> I think this one is different though. It has a remove() hook.
> FYI, this can also be reproduced without kasan.

Is the backtrace the same in that case?

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 19:34         ` Rob Herring
@ 2016-10-10 20:09           ` CAI Qian
  0 siblings, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-10 20:09 UTC (permalink / raw)
  To: Rob Herring; +Cc: Greg Kroah-Hartman, linux-kernel


> Is the backtrace the same in that case?
Very close. I saw "intel" there, and here is the list those modules on the system.

# lsmod | grep intel
intel_rapl             20480  0 
intel_powerclamp       16384  0 
kvm_intel             208896  0 
kvm                   630784  1 kvm_intel
ghash_clmulni_intel    16384  0 
aesni_intel           167936  0 
lrw                    16384  1 aesni_intel
glue_helper            16384  1 aesni_intel
ablk_helper            16384  1 aesni_intel
cryptd                 24576  3 ablk_helper,ghash_clmulni_intel,aesni_intel
crc32c_intel           24576  1

[   17.884926] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   17.893700] IP: [<ffffffff81546ff7>] device_del+0x17/0x280
[   17.899848] PGD 0 
[   17.902109] Oops: 0000 [#1] PREEMPT SMP
[   17.906394] Modules linked in:
[   17.909823] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.8.0-remove-nokasan+ #5
[   17.917985] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   17.929347] task: ffff8810556c8000 task.stack: ffffc90000078000
[   17.935955] RIP: 0010:[<ffffffff81546ff7>]  [<ffffffff81546ff7>] device_del+0x17/0x280
[   17.944811] RSP: 0000:ffffc9000007bc00  EFLAGS: 00010286
[   17.950742] RAX: 0000000000000000 RBX: ffff88085c8e3c00 RCX: 0000000000000001
[   17.958708] RDX: ffff881059d60000 RSI: 000000000000000b RDI: 0000000000000000
[   17.966675] RBP: ffffc9000007bc38 R08: 00000000d38c0f63 R09: 0000000000000000
[   17.974640] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   17.982606] R13: ffff881054099000 R14: 0000000000000001 R15: 0000000000000000
[   17.990574] FS:  0000000000000000(0000) GS:ffff88105e400000(0000) knlGS:0000000000000000
[   17.999606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.006022] CR2: 0000000000000000 CR3: 0000000001c06000 CR4: 00000000003406e0
[   18.013989] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.021954] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.029919] Stack:
[   18.032163]  0000000000000000 00000000dd652bd0 ffff88085c8e3c00 ffff88085c8e3c00
[   18.040475]  ffff88085c8e3400 ffff881054099000 0000000000000001 ffffc9000007bc58
[   18.048788]  ffffffff811c9680 ffff88085c8e3c00 ffff88085c8e3400 ffffc9000007bc88
[   18.057090] Call Trace:
[   18.059819]  [<ffffffff811c9680>] perf_pmu_unregister+0x90/0x150
[   18.066529]  [<ffffffff81017678>] uncore_pci_remove+0xc8/0x160
[   18.073044]  [<ffffffff814428c9>] pci_device_remove+0x39/0xc0
[   18.079468]  [<ffffffff8154bf4e>] driver_probe_device+0xbe/0x4d0
[   18.086176]  [<ffffffff8154c443>] __driver_attach+0xe3/0xf0
[   18.092399]  [<ffffffff8154c360>] ? driver_probe_device+0x4d0/0x4d0
[   18.099400]  [<ffffffff81549b43>] bus_for_each_dev+0x73/0xc0
[   18.105722]  [<ffffffff8154b7de>] driver_attach+0x1e/0x20
[   18.111752]  [<ffffffff8154b290>] bus_add_driver+0x200/0x270
[   18.118078]  [<ffffffff8154d160>] driver_register+0x60/0xe0
[   18.124303]  [<ffffffff81440ee0>] __pci_register_driver+0x60/0x70
[   18.131117]  [<ffffffff81f1e6e1>] intel_uncore_init+0x277/0x2df
[   18.137728]  [<ffffffff81f1e46a>] ? uncore_type_init+0x15f/0x15f
[   18.144441]  [<ffffffff81002190>] do_one_initcall+0x50/0x190
[   18.150768]  [<ffffffff810c5bf1>] ? parse_args+0x2d1/0x490
[   18.156894]  [<ffffffff81f19243>] kernel_init_freeable+0x1ff/0x29e
[   18.163801]  [<ffffffff817dd840>] ? rest_init+0x140/0x140
[   18.169831]  [<ffffffff817dd84e>] kernel_init+0xe/0x100
[   18.175668]  [<ffffffff817e957a>] ret_from_fork+0x2a/0x40
[   18.181695] Code: e8 cf d4 29 00 5b 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fc 48 83 ec 18 <4c> 8b 2f 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 87 
[   18.203631] RIP  [<ffffffff81546ff7>] device_del+0x17/0x280
[   18.209867]  RSP <ffffc9000007bc00>
[   18.213759] CR2: 0000000000000000
[   18.217548] ---[ end trace 91188545987fc9d9 ]---
[   18.222706] Kernel panic - not syncing: Fatal exception
[   18.228692] ---[ end Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 18:15     ` Rob Herring
  2016-10-10 18:22       ` CAI Qian
@ 2016-10-19 14:45       ` CAI Qian
  2016-10-19 19:19         ` Jiri Olsa
  1 sibling, 1 reply; 18+ messages in thread
From: CAI Qian @ 2016-10-19 14:45 UTC (permalink / raw)
  To: Rob Herring, Jiri Olsa, Peter Zijlstra, Kan Liang
  Cc: Greg Kroah-Hartman, linux-kernel, Ingo Molnar

It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e.,
not compiled it as a module. The can still be reproduced in the yesterday's mainline.

Here is some information about the system,

Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0.
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

[   66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
[   66.369911] Intel CQM monitoring enabled
[   66.374445] Intel MBM enabled
[   66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
[   66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[   66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules
[   66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules
[   66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[   66.434040] ================================================================================
[   66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17
[   66.450653] member access within null pointer of type 'struct device'
[   66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
[   66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   66.477168]  ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf
[   66.485469]  ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000
[   66.493770]  ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309
[   66.502073] Call Trace:
[   66.504811]  [<ffffffff81d370b4>] dump_stack+0xc0/0x12c
[   66.510644]  [<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4
[   66.517548]  [<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a
[   66.523574]  [<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434
[   66.531253]  [<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120
[   66.537667]  [<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a
[   66.543985]  [<ffffffff82241acc>] device_del+0x6fc/0x860
[   66.549917]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.557494]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
[   66.564202]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
[   66.571006]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   66.577619]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
[   66.584422]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
[   66.591025]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
[   66.597539]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
[   66.604340]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
[   66.611334]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
[   66.617749]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
[   66.624264]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
[   66.631258]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
[   66.638349]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   66.644959]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
[   66.650976]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
[   66.657292]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
[   66.663704]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
[   66.670700]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
[   66.677694]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
[   66.684694]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
[   66.691300]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
[   66.698006]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
[   66.704710]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
[   66.711025]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
[   66.718116]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
[   66.723949]  [<ffffffff81326800>] ? up_read+0x40/0x40
[   66.729587]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.737165]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
[   66.743000]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
[   66.749900]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
[   66.756219]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
[   66.763013]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   66.769039]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
[   66.774967]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   66.780993]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
[   66.787019] ================================================================================
[   66.796479] kasan: CONFIG_KASAN_INLINE enabled
[   66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access
[   66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   66.817878] Modules linked in:
[   66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
[   66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000
[   66.847225] RIP: 0010:[<ffffffff82241466>]  [<ffffffff82241466>] device_del+0x96/0x860
[   66.856076] RSP: 0000:ffff880847aff868  EFLAGS: 00010246
[   66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06
[   66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007
[   66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930
[   66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258
[   66.901824] FS:  0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000
[   66.910853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0
[   66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   66.941154] Stack:
[   66.943396]  ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920
[   66.951698]  0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870
[   66.959997]  ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0
[   66.968296] Call Trace:
[   66.971025]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.978603]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
[   66.985309]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
[   66.992111]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   66.998720]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
[   67.005523]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
[   67.012131]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
[   67.018641]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
[   67.025442]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
[   67.032437]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
[   67.038852]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
[   67.045361]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
[   67.052357]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
[   67.059450]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   67.066056]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
[   67.072081]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
[   67.078397]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
[   67.084809]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
[   67.091803]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
[   67.098798]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
[   67.105792]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
[   67.112399]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
[   67.119103]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
[   67.125806]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
[   67.132124]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
[   67.139215]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
[   67.145046]  [<ffffffff81326800>] ? up_read+0x40/0x40
[   67.150684]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   67.158262]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
[   67.164094]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
[   67.170992]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
[   67.177310]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
[   67.184111]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   67.190137]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
[   67.196064]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   67.202090]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
[   67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   67.229872] RIP  [<ffffffff82241466>] device_del+0x96/0x860
[   67.236101]  RSP <ffff880847aff868>
[   67.240059] ---[ end trace 69358e866a1e3f6c ]---
[   67.245377] Kernel panic - not syncing: Fatal exception
[   67.251271] ---[ end Kernel panic - not syncing: Fatal exception


----- Original Message -----
> From: "Rob Herring" <robh@kernel.org>
> To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
> Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
> 
> Rob
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
@ 2016-10-19 19:19         ` Jiri Olsa
  2016-10-19 20:18           ` CAI Qian
  2016-10-20  5:39           ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Jiri Olsa @ 2016-10-19 19:19 UTC (permalink / raw)
  To: CAI Qian
  Cc: Rob Herring, Peter Zijlstra, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Wed, Oct 19, 2016 at 10:45:31AM -0400, CAI Qian wrote:
> It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e.,
> not compiled it as a module. The can still be reproduced in the yesterday's mainline.
> 
> Here is some information about the system,
> 
> Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0.
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> 
> [   66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [   66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
> [   66.369911] Intel CQM monitoring enabled
> [   66.374445] Intel MBM enabled
> [   66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
> [   66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
> [   66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules
> [   66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules
> [   66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
> [   66.434040] ================================================================================
> [   66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17
> [   66.450653] member access within null pointer of type 'struct device'
> [   66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
> [   66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   66.477168]  ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf
> [   66.485469]  ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000
> [   66.493770]  ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309
> [   66.502073] Call Trace:
> [   66.504811]  [<ffffffff81d370b4>] dump_stack+0xc0/0x12c
> [   66.510644]  [<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4
> [   66.517548]  [<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a
> [   66.523574]  [<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434
> [   66.531253]  [<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120
> [   66.537667]  [<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a
> [   66.543985]  [<ffffffff82241acc>] device_del+0x6fc/0x860
> [   66.549917]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.557494]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
> [   66.564202]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
> [   66.571006]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.577619]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
> [   66.584422]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
> [   66.591025]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
> [   66.597539]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
> [   66.604340]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
> [   66.611334]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
> [   66.617749]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
> [   66.624264]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
> [   66.631258]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
> [   66.638349]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.644959]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
> [   66.650976]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
> [   66.657292]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
> [   66.663704]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
> [   66.670700]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
> [   66.677694]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
> [   66.684694]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
> [   66.691300]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
> [   66.698006]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
> [   66.704710]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
> [   66.711025]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
> [   66.718116]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
> [   66.723949]  [<ffffffff81326800>] ? up_read+0x40/0x40
> [   66.729587]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.737165]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
> [   66.743000]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
> [   66.749900]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
> [   66.756219]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
> [   66.763013]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   66.769039]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
> [   66.774967]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   66.780993]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
> [   66.787019] ================================================================================
> [   66.796479] kasan: CONFIG_KASAN_INLINE enabled
> [   66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access
> [   66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [   66.817878] Modules linked in:
> [   66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
> [   66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000
> [   66.847225] RIP: 0010:[<ffffffff82241466>]  [<ffffffff82241466>] device_del+0x96/0x860
> [   66.856076] RSP: 0000:ffff880847aff868  EFLAGS: 00010246
> [   66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [   66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06
> [   66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007
> [   66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930
> [   66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258
> [   66.901824] FS:  0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000
> [   66.910853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0
> [   66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   66.941154] Stack:
> [   66.943396]  ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920
> [   66.951698]  0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870
> [   66.959997]  ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0
> [   66.968296] Call Trace:
> [   66.971025]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.978603]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
> [   66.985309]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
> [   66.992111]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.998720]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
> [   67.005523]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
> [   67.012131]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
> [   67.018641]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
> [   67.025442]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
> [   67.032437]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
> [   67.038852]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
> [   67.045361]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
> [   67.052357]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
> [   67.059450]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   67.066056]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
> [   67.072081]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
> [   67.078397]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
> [   67.084809]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
> [   67.091803]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
> [   67.098798]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
> [   67.105792]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
> [   67.112399]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
> [   67.119103]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
> [   67.125806]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
> [   67.132124]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
> [   67.139215]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
> [   67.145046]  [<ffffffff81326800>] ? up_read+0x40/0x40
> [   67.150684]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   67.158262]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
> [   67.164094]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
> [   67.170992]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
> [   67.177310]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
> [   67.184111]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   67.190137]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
> [   67.196064]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   67.202090]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
> [   67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
> [   67.229872] RIP  [<ffffffff82241466>] device_del+0x96/0x860
> [   67.236101]  RSP <ffff880847aff868>
> [   67.240059] ---[ end trace 69358e866a1e3f6c ]---
> [   67.245377] Kernel panic - not syncing: Fatal exception
> [   67.251271] ---[ end Kernel panic - not syncing: Fatal exception

I think the reason here is that presume pmu devices are always added,
but we add them only if pmu_bus_running (in perf_event_sysfs_init)
is set which might happen after uncore initcall

attached patch fixes the issue for me

jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e97b33f..c2099b799d16 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
 		idr_remove(&pmu_idr, pmu->type);
 	if (pmu->nr_addr_filters)
 		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	if (pmu_bus_running) {
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-19 19:19         ` Jiri Olsa
@ 2016-10-19 20:18           ` CAI Qian
  2016-10-20  5:39           ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-19 20:18 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Rob Herring, Peter Zijlstra, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar


> I think the reason here is that presume pmu devices are always added,
> but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> is set which might happen after uncore initcall
> 
> attached patch fixes the issue for me
Tested-by: CAI Qian <caiqian@redhat.com>
> 
> jirka
> 
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c6e47e97b33f..c2099b799d16 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
>  		idr_remove(&pmu_idr, pmu->type);
>  	if (pmu->nr_addr_filters)
>  		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> -	device_del(pmu->dev);
> -	put_device(pmu->dev);
> +	if (pmu_bus_running) {
> +		device_del(pmu->dev);
> +		put_device(pmu->dev);
> +	}
>  	free_pmu_context(pmu);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-19 19:19         ` Jiri Olsa
  2016-10-19 20:18           ` CAI Qian
@ 2016-10-20  5:39           ` Peter Zijlstra
  2016-10-20  8:58             ` Jiri Olsa
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-10-20  5:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Wed, Oct 19, 2016 at 09:19:43PM +0200, Jiri Olsa wrote:
> I think the reason here is that presume pmu devices are always added,
> but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> is set which might happen after uncore initcall
> 
> attached patch fixes the issue for me

Right, we never expected to be unloaded before userspace runs.

Strictly speaking we should only read pmu_bus_running while holding
pmus_lock, that way we're serialized against perf_event_sysfs_init()
flipping it while we're being removed etc..

With the current setup the introduced race is harmless, but who knows
what other crazy these device people will come up with ;-)

> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c6e47e97b33f..c2099b799d16 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
>  		idr_remove(&pmu_idr, pmu->type);
>  	if (pmu->nr_addr_filters)
>  		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> -	device_del(pmu->dev);
> -	put_device(pmu->dev);
> +	if (pmu_bus_running) {
> +		device_del(pmu->dev);
> +		put_device(pmu->dev);
> +	}
>  	free_pmu_context(pmu);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  5:39           ` Peter Zijlstra
@ 2016-10-20  8:58             ` Jiri Olsa
  2016-10-20  9:04               ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Olsa @ 2016-10-20  8:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 07:39:44AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 19, 2016 at 09:19:43PM +0200, Jiri Olsa wrote:
> > I think the reason here is that presume pmu devices are always added,
> > but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> > is set which might happen after uncore initcall
> > 
> > attached patch fixes the issue for me
> 
> Right, we never expected to be unloaded before userspace runs.
> 
> Strictly speaking we should only read pmu_bus_running while holding
> pmus_lock, that way we're serialized against perf_event_sysfs_init()
> flipping it while we're being removed etc..
> 
> With the current setup the introduced race is harmless, but who knows
> what other crazy these device people will come up with ;-)
> 

right, did not think of that ;-)

also I did not noticed device_remove_file call for pmu->nr_addr_filters
and we could save one lock/unlock call later.. I'm testing attached patch
now

thanks,
jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e97b33f..224dffbc3b9b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8581,24 +8581,24 @@ static void update_pmu_context(struct pmu *pmu, struct pmu *old_pmu)
 	}
 }
 
+/*
+ * The pmus_lock lock must be taken.
+ */
 static void free_pmu_context(struct pmu *pmu)
 {
 	struct pmu *i;
 
-	mutex_lock(&pmus_lock);
 	/*
 	 * Like a real lame refcount.
 	 */
 	list_for_each_entry(i, &pmus, entry) {
 		if (i->pmu_cpu_context == pmu->pmu_cpu_context) {
 			update_pmu_context(i, pmu);
-			goto out;
+			return;
 		}
 	}
 
 	free_percpu(pmu->pmu_cpu_context);
-out:
-	mutex_unlock(&pmus_lock);
 }
 
 /*
@@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
 	free_percpu(pmu->pmu_disable_count);
 	if (pmu->type >= PERF_TYPE_MAX)
 		idr_remove(&pmu_idr, pmu->type);
-	if (pmu->nr_addr_filters)
-		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	mutex_lock(&pmus_lock);
+	if (pmu_bus_running) {
+		if (pmu->nr_addr_filters)
+			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
+	mutex_unlock(&pmus_lock);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  8:58             ` Jiri Olsa
@ 2016-10-20  9:04               ` Peter Zijlstra
  2016-10-20  9:42                 ` Jiri Olsa
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-10-20  9:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote:

> @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
>  	free_percpu(pmu->pmu_disable_count);
>  	if (pmu->type >= PERF_TYPE_MAX)
>  		idr_remove(&pmu_idr, pmu->type);
> -	if (pmu->nr_addr_filters)
> -		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> -	device_del(pmu->dev);
> -	put_device(pmu->dev);
> +	mutex_lock(&pmus_lock);
> +	if (pmu_bus_running) {
> +		if (pmu->nr_addr_filters)
> +			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> +		device_del(pmu->dev);
> +		put_device(pmu->dev);
> +	}
>  	free_pmu_context(pmu);
> +	mutex_unlock(&pmus_lock);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);

I think that is still racy..


unregister:		sysfs_init:

mutex_lock(&pmus_lock);
list_del_rcu(&pmu->entry);
mutex_unlock(&pmus_lock);

synchronize_*rcu();

			mutex_lock(&pmus_lock);
			list_for_each_entry(pmu, &pmus, entry) {
				/* add device muck */
				/* will _NOT_ see our PMU */
			}
			pmus_bus_running = 1;
			mutex_unlock(&pmus_lock);

mutex_lock(&pmus_lock);
if (pmu_bus_running) {
	device_del() /* OOPS */


What you want is to read pmu_bus_running in the same pmus_lock section
as we do the list_del, and then use that local copy later.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  9:04               ` Peter Zijlstra
@ 2016-10-20  9:42                 ` Jiri Olsa
  2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Olsa @ 2016-10-20  9:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 11:04:16AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote:
> 
> > @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
> >  	free_percpu(pmu->pmu_disable_count);
> >  	if (pmu->type >= PERF_TYPE_MAX)
> >  		idr_remove(&pmu_idr, pmu->type);
> > -	if (pmu->nr_addr_filters)
> > -		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > -	device_del(pmu->dev);
> > -	put_device(pmu->dev);
> > +	mutex_lock(&pmus_lock);
> > +	if (pmu_bus_running) {
> > +		if (pmu->nr_addr_filters)
> > +			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > +		device_del(pmu->dev);
> > +		put_device(pmu->dev);
> > +	}
> >  	free_pmu_context(pmu);
> > +	mutex_unlock(&pmus_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
> 
> I think that is still racy..
> 
> 
> unregister:		sysfs_init:
> 
> mutex_lock(&pmus_lock);
> list_del_rcu(&pmu->entry);
> mutex_unlock(&pmus_lock);
> 
> synchronize_*rcu();
> 
> 			mutex_lock(&pmus_lock);
> 			list_for_each_entry(pmu, &pmus, entry) {
> 				/* add device muck */

ah, I thought this part would add the device back.. but it's
already out of the pmu list.. right :-\

thanks,
jirka

> 				/* will _NOT_ see our PMU */
> 			}
> 			pmus_bus_running = 1;
> 			mutex_unlock(&pmus_lock);
> 
> mutex_lock(&pmus_lock);
> if (pmu_bus_running) {
> 	device_del() /* OOPS */
> 
> 
> What you want is to read pmu_bus_running in the same pmus_lock section
> as we do the list_del, and then use that local copy later.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  9:42                 ` Jiri Olsa
@ 2016-10-20 11:10                   ` Jiri Olsa
  2016-10-20 14:30                     ` CAI Qian
  2016-10-28 10:10                     ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa
  0 siblings, 2 replies; 18+ messages in thread
From: Jiri Olsa @ 2016-10-20 11:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 11:42:59AM +0200, Jiri Olsa wrote:
> On Thu, Oct 20, 2016 at 11:04:16AM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote:
> > 
> > > @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
> > >  	free_percpu(pmu->pmu_disable_count);
> > >  	if (pmu->type >= PERF_TYPE_MAX)
> > >  		idr_remove(&pmu_idr, pmu->type);
> > > -	if (pmu->nr_addr_filters)
> > > -		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > > -	device_del(pmu->dev);
> > > -	put_device(pmu->dev);
> > > +	mutex_lock(&pmus_lock);
> > > +	if (pmu_bus_running) {
> > > +		if (pmu->nr_addr_filters)
> > > +			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > > +		device_del(pmu->dev);
> > > +		put_device(pmu->dev);
> > > +	}
> > >  	free_pmu_context(pmu);
> > > +	mutex_unlock(&pmus_lock);
> > >  }
> > >  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
> > 
> > I think that is still racy..
> > 
> > 
> > unregister:		sysfs_init:
> > 
> > mutex_lock(&pmus_lock);
> > list_del_rcu(&pmu->entry);
> > mutex_unlock(&pmus_lock);
> > 
> > synchronize_*rcu();
> > 
> > 			mutex_lock(&pmus_lock);
> > 			list_for_each_entry(pmu, &pmus, entry) {
> > 				/* add device muck */
> 
> ah, I thought this part would add the device back.. but it's
> already out of the pmu list.. right :-\

attached fix, thanks

jirka


---
CAI Qian reported crash [1] in uncore device removal related
to CONFIG_DEBUG_TEST_DRIVER_REMOVE option.

The reason for crash is that  perf_pmu_unregister tries to remove
pmu device which is not added at this point. We add pmu devices
only after pmu_bus is registered which happens in perf_event_sysfs_init
init call and sets pmu_bus_running flag.

The fix is to get the pmu_bus_running flag state at the point
the pmu is taken  out of the pmus list and  remove the device
later only if it's set.

[1] https://marc.info/?l=linux-kernel&m=147688837328451

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/core.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e97b33f..a5d2e62faf7e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8855,7 +8855,10 @@ EXPORT_SYMBOL_GPL(perf_pmu_register);
 
 void perf_pmu_unregister(struct pmu *pmu)
 {
+	int remove_device;
+
 	mutex_lock(&pmus_lock);
+	remove_device = pmu_bus_running;
 	list_del_rcu(&pmu->entry);
 	mutex_unlock(&pmus_lock);
 
@@ -8869,10 +8872,12 @@ void perf_pmu_unregister(struct pmu *pmu)
 	free_percpu(pmu->pmu_disable_count);
 	if (pmu->type >= PERF_TYPE_MAX)
 		idr_remove(&pmu_idr, pmu->type);
-	if (pmu->nr_addr_filters)
-		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	if (remove_device) {
+		if (pmu->nr_addr_filters)
+			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
@ 2016-10-20 14:30                     ` CAI Qian
  2016-10-28 10:10                     ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa
  1 sibling, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-20 14:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar


> CAI Qian reported crash [1] in uncore device removal related
> to CONFIG_DEBUG_TEST_DRIVER_REMOVE option.
> 
> The reason for crash is that  perf_pmu_unregister tries to remove
> pmu device which is not added at this point. We add pmu devices
> only after pmu_bus is registered which happens in perf_event_sysfs_init
> init call and sets pmu_bus_running flag.
> 
> The fix is to get the pmu_bus_running flag state at the point
> the pmu is taken  out of the pmus list and  remove the device
> later only if it's set.
> 
> [1] https://marc.info/?l=linux-kernel&m=147688837328451
> 
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Tested-by: CAI Qian <caiqian@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
  2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
  2016-10-20 14:30                     ` CAI Qian
@ 2016-10-28 10:10                     ` tip-bot for Jiri Olsa
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot for Jiri Olsa @ 2016-10-28 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, mingo, jolsa, kan.liang, caiqian, gregkh, tglx, robh,
	torvalds, alexander.shishkin, acme, linux-kernel, jolsa, hpa

Commit-ID:  0933840acf7b65d6d30a5b6089d882afea57aca3
Gitweb:     http://git.kernel.org/tip/0933840acf7b65d6d30a5b6089d882afea57aca3
Author:     Jiri Olsa <jolsa@redhat.com>
AuthorDate: Thu, 20 Oct 2016 13:10:11 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 28 Oct 2016 11:06:25 +0200

perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic

CAI Qian reported a crash in the PMU uncore device removal code,
enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option:

  https://marc.info/?l=linux-kernel&m=147688837328451

The reason for the crash is that perf_pmu_unregister() tries to remove
a PMU device which is not added at this point. We add PMU devices
only after pmu_bus is registered, which happens in the
perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag.

The fix is to get the 'pmu_bus_running' flag state at the point
the PMU is taken out of the PMU list and remove the device
later only if it's set.

Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e9..a5d2e62 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8855,7 +8855,10 @@ EXPORT_SYMBOL_GPL(perf_pmu_register);
 
 void perf_pmu_unregister(struct pmu *pmu)
 {
+	int remove_device;
+
 	mutex_lock(&pmus_lock);
+	remove_device = pmu_bus_running;
 	list_del_rcu(&pmu->entry);
 	mutex_unlock(&pmus_lock);
 
@@ -8869,10 +8872,12 @@ void perf_pmu_unregister(struct pmu *pmu)
 	free_percpu(pmu->pmu_disable_count);
 	if (pmu->type >= PERF_TYPE_MAX)
 		idr_remove(&pmu_idr, pmu->type);
-	if (pmu->nr_addr_filters)
-		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	if (remove_device) {
+		if (pmu->nr_addr_filters)
+			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-10-28 10:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com>
2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian
2016-10-10 17:09   ` Rob Herring
2016-10-10 18:25     ` CAI Qian
2016-10-10 17:20   ` Greg Kroah-Hartman
2016-10-10 18:15     ` Rob Herring
2016-10-10 18:22       ` CAI Qian
2016-10-10 19:34         ` Rob Herring
2016-10-10 20:09           ` CAI Qian
2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
2016-10-19 19:19         ` Jiri Olsa
2016-10-19 20:18           ` CAI Qian
2016-10-20  5:39           ` Peter Zijlstra
2016-10-20  8:58             ` Jiri Olsa
2016-10-20  9:04               ` Peter Zijlstra
2016-10-20  9:42                 ` Jiri Olsa
2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
2016-10-20 14:30                     ` CAI Qian
2016-10-28 10:10                     ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.