kasan inline + CONFIG_DEBUG_TEST_DRIVER

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
       [not found] <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com>
@ 2016-10-10 15:37 ` CAI Qian
  2016-10-10 17:09   ` Rob Herring
  2016-10-10 17:20   ` Greg Kroah-Hartman
  0 siblings, 2 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-10 15:37 UTC (permalink / raw)
  To: Rob Herring; +Cc: linux-kernel, Greg Kroah-Hartman

Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
mainline master HEAD.

http://people.redhat.com/qcai/tmp/config-kasan-remove

[   36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
[   36.339108] Intel CQM monitoring enabled
[   36.343507] Intel MBM enabled
[   36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
[   36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[   36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules
[   36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules
[   36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[   36.410064] kasan: CONFIG_KASAN_INLINE enabled
[   36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory access
[   36.423111] general protection fault: 0000 [#1] PREEMPT SMP KASAN
[   36.429911] Modules linked in:
[   36.433331] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4
[   36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   36.451974] task: ffff880e524d0000 task.stack: ffff880852880000
[   36.458578] RIP: 0010:[<ffffffff81ea08c0>]  [<ffffffff81ea08c0>] device_del+0x80/0x700
[   36.467431] RSP: 0000:ffff880852887938  EFLAGS: 00010246
[   36.473357] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff10109e6f101
[   36.481319] RDX: dffffc0000000000 RSI: 000000000000000b RDI: 0000000000000000
[   36.489281] RBP: ffff8808528879e8 R08: 0000000000000001 R09: 0000000000000000
[   36.497243] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880e501b4b00
[   36.505208] R13: ffff880e31988480 R14: 0000000000000001 R15: ffff880e31988480
[   36.513171] FS:  0000000000000000(0000) GS:ffff88085ec80000(0000) knlGS:0000000000000000
[   36.522201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.528613] CR2: 0000000000000000 CR3: 0000000002e0a000 CR4: 00000000003406e0
[   36.536576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   36.544537] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   36.552499] Stack:
[   36.554742]  1ffff1010a510f28 1ffff1010a510f2c ffffffff82d3abe4 ffffffff81a6d060
[   36.563037]  0000000000000296 0000000041b58ab3 ffffffff82d48cc5 ffffffff81ea0840
[   36.571329]  ffffffff828a3040 ffff880800000000 ffff880852887980 ffffffff82f0ba20
[   36.579624] Call Trace:
[   36.582355]  [<ffffffff81a6d060>] ? idr_mark_full+0xc0/0xc0
[   36.588573]  [<ffffffff81ea0840>] ? cleanup_glue_dir+0xe0/0xe0
[   36.595086]  [<ffffffff814c228d>] perf_pmu_unregister+0x18d/0x530
[   36.601890]  [<ffffffff826f8811>] ? _raw_spin_unlock+0x31/0x50
[   36.608393]  [<ffffffff8103c54e>] ? uncore_pcibus_to_physid+0x10e/0x1c0
[   36.615766]  [<ffffffff810418ee>] uncore_pci_remove+0x24e/0x440
[   36.622375]  [<ffffffff81b91662>] pci_device_remove+0xa2/0x1e0
[   36.628888]  [<ffffffff81eadd01>] driver_probe_device+0x171/0xd50
[   36.635688]  [<ffffffff81eae8e0>] ? driver_probe_device+0xd50/0xd50
[   36.642685]  [<ffffffff81eaea79>] __driver_attach+0x199/0x1e0
[   36.649097]  [<ffffffff81ea7fc6>] bus_for_each_dev+0x126/0x1e0
[   36.655607]  [<ffffffff81ea7ea0>] ? subsys_dev_iter_exit+0x10/0x10
[   36.662508]  [<ffffffff812103ae>] ? preempt_count_sub+0x5e/0xe0
[   36.669105]  [<ffffffff81eacc1d>] driver_attach+0x3d/0x50
[   36.675129]  [<ffffffff81eabd84>] bus_add_driver+0x554/0x790
[   36.681444]  [<ffffffff81eb067c>] driver_register+0x18c/0x3b0
[   36.687861]  [<ffffffff812b3212>] ? __raw_spin_lock_init+0x32/0x100
[   36.694854]  [<ffffffff81b8bbea>] __pci_register_driver+0x13a/0x1e0
[   36.701853]  [<ffffffff83492467>] intel_uncore_init+0x465/0x54f
[   36.708459]  [<ffffffff83492002>] ? uncore_type_init+0x4d6/0x4d6
[   36.715165]  [<ffffffff81002299>] do_one_initcall+0xa9/0x240
[   36.721473]  [<ffffffff810021f0>] ? initcall_blacklisted+0x180/0x180
[   36.728568]  [<ffffffff811f5a10>] ? parse_args+0x520/0x990
[   36.734692]  [<ffffffff811d5bc2>] ? __usermodehelper_set_disable_depth+0x42/0x50
[   36.742948]  [<ffffffff83485d1f>] kernel_init_freeable+0x540/0x610
[   36.749845]  [<ffffffff834857df>] ? start_kernel+0x70d/0x70d
[   36.756161]  [<ffffffff826f88ad>] ? _raw_spin_unlock_irq+0x3d/0x60
[   36.763060]  [<ffffffff8120eb19>] ? finish_task_switch+0x189/0x6c0
[   36.769957]  [<ffffffff8120eaeb>] ? finish_task_switch+0x15b/0x6c0
[   36.776857]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
[   36.782875]  [<ffffffff826e0073>] kernel_init+0x13/0x120
[   36.788802]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
[   36.794826]  [<ffffffff826f93ba>] ret_from_fork+0x2a/0x40
[   36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   36.822549] RIP  [<ffffffff81ea08c0>] device_del+0x80/0x700
[   36.828778]  RSP <ffff880852887938>
[   36.832743] ---[ end trace f3cec3a0c6cb2258 ]---
[   36.838054] Kernel panic - not syncing: Fatal exception
[   36.843967] ---[ end Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian
@ 2016-10-10 17:09   ` Rob Herring
  2016-10-10 18:25     ` CAI Qian
  2016-10-10 17:20   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 18+ messages in thread
From: Rob Herring @ 2016-10-10 17:09 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-kernel, Greg Kroah-Hartman

On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian <caiqian@redhat.com> wrote:
> Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
> mainline master HEAD.

Looks like it is catching what it is supposed to. Though looking
through the code, I haven't found where the problem is. Does bind and
unbind for this normally work?

> http://people.redhat.com/qcai/tmp/config-kasan-remove
>
> [   36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [   36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
> [   36.339108] Intel CQM monitoring enabled
> [   36.343507] Intel MBM enabled
> [   36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
> [   36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
> [   36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules
> [   36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules
> [   36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
> [   36.410064] kasan: CONFIG_KASAN_INLINE enabled
> [   36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory access
> [   36.423111] general protection fault: 0000 [#1] PREEMPT SMP KASAN
> [   36.429911] Modules linked in:
> [   36.433331] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4
> [   36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   36.451974] task: ffff880e524d0000 task.stack: ffff880852880000
> [   36.458578] RIP: 0010:[<ffffffff81ea08c0>]  [<ffffffff81ea08c0>] device_del+0x80/0x700
> [   36.467431] RSP: 0000:ffff880852887938  EFLAGS: 00010246
> [   36.473357] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff10109e6f101
> [   36.481319] RDX: dffffc0000000000 RSI: 000000000000000b RDI: 0000000000000000
> [   36.489281] RBP: ffff8808528879e8 R08: 0000000000000001 R09: 0000000000000000
> [   36.497243] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880e501b4b00
> [   36.505208] R13: ffff880e31988480 R14: 0000000000000001 R15: ffff880e31988480
> [   36.513171] FS:  0000000000000000(0000) GS:ffff88085ec80000(0000) knlGS:0000000000000000
> [   36.522201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   36.528613] CR2: 0000000000000000 CR3: 0000000002e0a000 CR4: 00000000003406e0
> [   36.536576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   36.544537] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   36.552499] Stack:
> [   36.554742]  1ffff1010a510f28 1ffff1010a510f2c ffffffff82d3abe4 ffffffff81a6d060
> [   36.563037]  0000000000000296 0000000041b58ab3 ffffffff82d48cc5 ffffffff81ea0840
> [   36.571329]  ffffffff828a3040 ffff880800000000 ffff880852887980 ffffffff82f0ba20
> [   36.579624] Call Trace:
> [   36.582355]  [<ffffffff81a6d060>] ? idr_mark_full+0xc0/0xc0
> [   36.588573]  [<ffffffff81ea0840>] ? cleanup_glue_dir+0xe0/0xe0
> [   36.595086]  [<ffffffff814c228d>] perf_pmu_unregister+0x18d/0x530
> [   36.601890]  [<ffffffff826f8811>] ? _raw_spin_unlock+0x31/0x50
> [   36.608393]  [<ffffffff8103c54e>] ? uncore_pcibus_to_physid+0x10e/0x1c0
> [   36.615766]  [<ffffffff810418ee>] uncore_pci_remove+0x24e/0x440
> [   36.622375]  [<ffffffff81b91662>] pci_device_remove+0xa2/0x1e0
> [   36.628888]  [<ffffffff81eadd01>] driver_probe_device+0x171/0xd50
> [   36.635688]  [<ffffffff81eae8e0>] ? driver_probe_device+0xd50/0xd50
> [   36.642685]  [<ffffffff81eaea79>] __driver_attach+0x199/0x1e0
> [   36.649097]  [<ffffffff81ea7fc6>] bus_for_each_dev+0x126/0x1e0
> [   36.655607]  [<ffffffff81ea7ea0>] ? subsys_dev_iter_exit+0x10/0x10
> [   36.662508]  [<ffffffff812103ae>] ? preempt_count_sub+0x5e/0xe0
> [   36.669105]  [<ffffffff81eacc1d>] driver_attach+0x3d/0x50
> [   36.675129]  [<ffffffff81eabd84>] bus_add_driver+0x554/0x790
> [   36.681444]  [<ffffffff81eb067c>] driver_register+0x18c/0x3b0
> [   36.687861]  [<ffffffff812b3212>] ? __raw_spin_lock_init+0x32/0x100
> [   36.694854]  [<ffffffff81b8bbea>] __pci_register_driver+0x13a/0x1e0
> [   36.701853]  [<ffffffff83492467>] intel_uncore_init+0x465/0x54f
> [   36.708459]  [<ffffffff83492002>] ? uncore_type_init+0x4d6/0x4d6
> [   36.715165]  [<ffffffff81002299>] do_one_initcall+0xa9/0x240
> [   36.721473]  [<ffffffff810021f0>] ? initcall_blacklisted+0x180/0x180
> [   36.728568]  [<ffffffff811f5a10>] ? parse_args+0x520/0x990
> [   36.734692]  [<ffffffff811d5bc2>] ? __usermodehelper_set_disable_depth+0x42/0x50
> [   36.742948]  [<ffffffff83485d1f>] kernel_init_freeable+0x540/0x610
> [   36.749845]  [<ffffffff834857df>] ? start_kernel+0x70d/0x70d
> [   36.756161]  [<ffffffff826f88ad>] ? _raw_spin_unlock_irq+0x3d/0x60
> [   36.763060]  [<ffffffff8120eb19>] ? finish_task_switch+0x189/0x6c0
> [   36.769957]  [<ffffffff8120eaeb>] ? finish_task_switch+0x15b/0x6c0
> [   36.776857]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
> [   36.782875]  [<ffffffff826e0073>] kernel_init+0x13/0x120
> [   36.788802]  [<ffffffff826e0060>] ? rest_init+0x160/0x160
> [   36.794826]  [<ffffffff826f93ba>] ret_from_fork+0x2a/0x40
> [   36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48
> [   36.822549] RIP  [<ffffffff81ea08c0>] device_del+0x80/0x700
> [   36.828778]  RSP <ffff880852887938>
> [   36.832743] ---[ end trace f3cec3a0c6cb2258 ]---
> [   36.838054] Kernel panic - not syncing: Fatal exception
> [   36.843967] ---[ end Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 17:09   ` Rob Herring
@ 2016-10-10 18:25     ` CAI Qian
  0 siblings, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-10 18:25 UTC (permalink / raw)
  To: Rob Herring; +Cc: linux-kernel, Greg Kroah-Hartman



----- Original Message -----
> From: "Rob Herring" <robh@kernel.org>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "linux-kernel" <linux-kernel@vger.kernel.org>, "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
> Sent: Monday, October 10, 2016 1:09:43 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian <caiqian@redhat.com> wrote:
> > Not sure if anyone reported this before. With this kernel config, it is
> > 100% kernel panic so far with today's
> > mainline master HEAD.
> 
> Looks like it is catching what it is supposed to. Though looking
> through the code, I haven't found where the problem is. Does bind and
> unbind for this normally work?
I am not sure. It just panic at the bootup. If you can tell me debugging steps
you want to run, I can help test it out.
   CAI qian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian
  2016-10-10 17:09   ` Rob Herring
@ 2016-10-10 17:20   ` Greg Kroah-Hartman
  2016-10-10 18:15     ` Rob Herring
  1 sibling, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2016-10-10 17:20 UTC (permalink / raw)
  To: CAI Qian; +Cc: Rob Herring, linux-kernel

On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
> mainline master HEAD.
> 
> http://people.redhat.com/qcai/tmp/config-kasan-remove

Oh it breaks things with kasan disabled as well :)

See Laszlo's bug report already a few hours ago, Rob is on it...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 17:20   ` Greg Kroah-Hartman
@ 2016-10-10 18:15     ` Rob Herring
  2016-10-10 18:22       ` CAI Qian
  2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
  0 siblings, 2 replies; 18+ messages in thread
From: Rob Herring @ 2016-10-10 18:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: CAI Qian, linux-kernel

On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
>> Not sure if anyone reported this before. With this kernel config, it is 100% kernel panic so far with today's
>> mainline master HEAD.
>>
>> http://people.redhat.com/qcai/tmp/config-kasan-remove
>
> Oh it breaks things with kasan disabled as well :)
>
> See Laszlo's bug report already a few hours ago, Rob is on it...

I think this one is different though. It has a remove() hook.

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 18:15     ` Rob Herring
@ 2016-10-10 18:22       ` CAI Qian
  2016-10-10 19:34         ` Rob Herring
  2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
  1 sibling, 1 reply; 18+ messages in thread
From: CAI Qian @ 2016-10-10 18:22 UTC (permalink / raw)
  To: Rob Herring; +Cc: Greg Kroah-Hartman, linux-kernel



----- Original Message -----
> From: "Rob Herring" <robh@kernel.org>
> To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
> Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
FYI, this can also be reproduced without kasan.
    CAI Qian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 18:22       ` CAI Qian
@ 2016-10-10 19:34         ` Rob Herring
  2016-10-10 20:09           ` CAI Qian
  0 siblings, 1 reply; 18+ messages in thread
From: Rob Herring @ 2016-10-10 19:34 UTC (permalink / raw)
  To: CAI Qian; +Cc: Greg Kroah-Hartman, linux-kernel

On Mon, Oct 10, 2016 at 1:22 PM, CAI Qian <caiqian@redhat.com> wrote:
>
>
> ----- Original Message -----
>> From: "Rob Herring" <robh@kernel.org>
>> To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
>> Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org>
>> Sent: Monday, October 10, 2016 2:15:29 PM
>> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
>>
>> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
>> >> Not sure if anyone reported this before. With this kernel config, it is
>> >> 100% kernel panic so far with today's
>> >> mainline master HEAD.
>> >>
>> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
>> >
>> > Oh it breaks things with kasan disabled as well :)
>> >
>> > See Laszlo's bug report already a few hours ago, Rob is on it...
>>
>> I think this one is different though. It has a remove() hook.
> FYI, this can also be reproduced without kasan.

Is the backtrace the same in that case?

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 19:34         ` Rob Herring
@ 2016-10-10 20:09           ` CAI Qian
  0 siblings, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-10 20:09 UTC (permalink / raw)
  To: Rob Herring; +Cc: Greg Kroah-Hartman, linux-kernel


> Is the backtrace the same in that case?
Very close. I saw "intel" there, and here is the list those modules on the system.

# lsmod | grep intel
intel_rapl             20480  0 
intel_powerclamp       16384  0 
kvm_intel             208896  0 
kvm                   630784  1 kvm_intel
ghash_clmulni_intel    16384  0 
aesni_intel           167936  0 
lrw                    16384  1 aesni_intel
glue_helper            16384  1 aesni_intel
ablk_helper            16384  1 aesni_intel
cryptd                 24576  3 ablk_helper,ghash_clmulni_intel,aesni_intel
crc32c_intel           24576  1

[   17.884926] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   17.893700] IP: [<ffffffff81546ff7>] device_del+0x17/0x280
[   17.899848] PGD 0 
[   17.902109] Oops: 0000 [#1] PREEMPT SMP
[   17.906394] Modules linked in:
[   17.909823] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.8.0-remove-nokasan+ #5
[   17.917985] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   17.929347] task: ffff8810556c8000 task.stack: ffffc90000078000
[   17.935955] RIP: 0010:[<ffffffff81546ff7>]  [<ffffffff81546ff7>] device_del+0x17/0x280
[   17.944811] RSP: 0000:ffffc9000007bc00  EFLAGS: 00010286
[   17.950742] RAX: 0000000000000000 RBX: ffff88085c8e3c00 RCX: 0000000000000001
[   17.958708] RDX: ffff881059d60000 RSI: 000000000000000b RDI: 0000000000000000
[   17.966675] RBP: ffffc9000007bc38 R08: 00000000d38c0f63 R09: 0000000000000000
[   17.974640] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   17.982606] R13: ffff881054099000 R14: 0000000000000001 R15: 0000000000000000
[   17.990574] FS:  0000000000000000(0000) GS:ffff88105e400000(0000) knlGS:0000000000000000
[   17.999606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.006022] CR2: 0000000000000000 CR3: 0000000001c06000 CR4: 00000000003406e0
[   18.013989] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.021954] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.029919] Stack:
[   18.032163]  0000000000000000 00000000dd652bd0 ffff88085c8e3c00 ffff88085c8e3c00
[   18.040475]  ffff88085c8e3400 ffff881054099000 0000000000000001 ffffc9000007bc58
[   18.048788]  ffffffff811c9680 ffff88085c8e3c00 ffff88085c8e3400 ffffc9000007bc88
[   18.057090] Call Trace:
[   18.059819]  [<ffffffff811c9680>] perf_pmu_unregister+0x90/0x150
[   18.066529]  [<ffffffff81017678>] uncore_pci_remove+0xc8/0x160
[   18.073044]  [<ffffffff814428c9>] pci_device_remove+0x39/0xc0
[   18.079468]  [<ffffffff8154bf4e>] driver_probe_device+0xbe/0x4d0
[   18.086176]  [<ffffffff8154c443>] __driver_attach+0xe3/0xf0
[   18.092399]  [<ffffffff8154c360>] ? driver_probe_device+0x4d0/0x4d0
[   18.099400]  [<ffffffff81549b43>] bus_for_each_dev+0x73/0xc0
[   18.105722]  [<ffffffff8154b7de>] driver_attach+0x1e/0x20
[   18.111752]  [<ffffffff8154b290>] bus_add_driver+0x200/0x270
[   18.118078]  [<ffffffff8154d160>] driver_register+0x60/0xe0
[   18.124303]  [<ffffffff81440ee0>] __pci_register_driver+0x60/0x70
[   18.131117]  [<ffffffff81f1e6e1>] intel_uncore_init+0x277/0x2df
[   18.137728]  [<ffffffff81f1e46a>] ? uncore_type_init+0x15f/0x15f
[   18.144441]  [<ffffffff81002190>] do_one_initcall+0x50/0x190
[   18.150768]  [<ffffffff810c5bf1>] ? parse_args+0x2d1/0x490
[   18.156894]  [<ffffffff81f19243>] kernel_init_freeable+0x1ff/0x29e
[   18.163801]  [<ffffffff817dd840>] ? rest_init+0x140/0x140
[   18.169831]  [<ffffffff817dd84e>] kernel_init+0xe/0x100
[   18.175668]  [<ffffffff817e957a>] ret_from_fork+0x2a/0x40
[   18.181695] Code: e8 cf d4 29 00 5b 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fc 48 83 ec 18 <4c> 8b 2f 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 87 
[   18.203631] RIP  [<ffffffff81546ff7>] device_del+0x17/0x280
[   18.209867]  RSP <ffffc9000007bc00>
[   18.213759] CR2: 0000000000000000
[   18.217548] ---[ end trace 91188545987fc9d9 ]---
[   18.222706] Kernel panic - not syncing: Fatal exception
[   18.228692] ---[ end Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-10 18:15     ` Rob Herring
  2016-10-10 18:22       ` CAI Qian
@ 2016-10-19 14:45       ` CAI Qian
  2016-10-19 19:19         ` Jiri Olsa
  1 sibling, 1 reply; 18+ messages in thread
From: CAI Qian @ 2016-10-19 14:45 UTC (permalink / raw)
  To: Rob Herring, Jiri Olsa, Peter Zijlstra, Kan Liang
  Cc: Greg Kroah-Hartman, linux-kernel, Ingo Molnar

It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e.,
not compiled it as a module. The can still be reproduced in the yesterday's mainline.

Here is some information about the system,

Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0.
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

[   66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
[   66.369911] Intel CQM monitoring enabled
[   66.374445] Intel MBM enabled
[   66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
[   66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[   66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules
[   66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules
[   66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[   66.434040] ================================================================================
[   66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17
[   66.450653] member access within null pointer of type 'struct device'
[   66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
[   66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   66.477168]  ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf
[   66.485469]  ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000
[   66.493770]  ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309
[   66.502073] Call Trace:
[   66.504811]  [<ffffffff81d370b4>] dump_stack+0xc0/0x12c
[   66.510644]  [<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4
[   66.517548]  [<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a
[   66.523574]  [<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434
[   66.531253]  [<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120
[   66.537667]  [<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a
[   66.543985]  [<ffffffff82241acc>] device_del+0x6fc/0x860
[   66.549917]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.557494]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
[   66.564202]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
[   66.571006]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   66.577619]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
[   66.584422]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
[   66.591025]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
[   66.597539]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
[   66.604340]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
[   66.611334]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
[   66.617749]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
[   66.624264]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
[   66.631258]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
[   66.638349]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   66.644959]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
[   66.650976]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
[   66.657292]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
[   66.663704]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
[   66.670700]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
[   66.677694]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
[   66.684694]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
[   66.691300]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
[   66.698006]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
[   66.704710]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
[   66.711025]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
[   66.718116]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
[   66.723949]  [<ffffffff81326800>] ? up_read+0x40/0x40
[   66.729587]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.737165]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
[   66.743000]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
[   66.749900]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
[   66.756219]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
[   66.763013]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   66.769039]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
[   66.774967]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   66.780993]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
[   66.787019] ================================================================================
[   66.796479] kasan: CONFIG_KASAN_INLINE enabled
[   66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access
[   66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   66.817878] Modules linked in:
[   66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
[   66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000
[   66.847225] RIP: 0010:[<ffffffff82241466>]  [<ffffffff82241466>] device_del+0x96/0x860
[   66.856076] RSP: 0000:ffff880847aff868  EFLAGS: 00010246
[   66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06
[   66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007
[   66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930
[   66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258
[   66.901824] FS:  0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000
[   66.910853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0
[   66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   66.941154] Stack:
[   66.943396]  ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920
[   66.951698]  0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870
[   66.959997]  ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0
[   66.968296] Call Trace:
[   66.971025]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.978603]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
[   66.985309]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
[   66.992111]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   66.998720]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
[   67.005523]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
[   67.012131]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
[   67.018641]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
[   67.025442]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
[   67.032437]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
[   67.038852]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
[   67.045361]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
[   67.052357]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
[   67.059450]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[   67.066056]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
[   67.072081]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
[   67.078397]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
[   67.084809]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
[   67.091803]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
[   67.098798]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
[   67.105792]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
[   67.112399]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
[   67.119103]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
[   67.125806]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
[   67.132124]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
[   67.139215]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
[   67.145046]  [<ffffffff81326800>] ? up_read+0x40/0x40
[   67.150684]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   67.158262]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
[   67.164094]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
[   67.170992]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
[   67.177310]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
[   67.184111]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   67.190137]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
[   67.196064]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
[   67.202090]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
[   67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   67.229872] RIP  [<ffffffff82241466>] device_del+0x96/0x860
[   67.236101]  RSP <ffff880847aff868>
[   67.240059] ---[ end trace 69358e866a1e3f6c ]---
[   67.245377] Kernel panic - not syncing: Fatal exception
[   67.251271] ---[ end Kernel panic - not syncing: Fatal exception


----- Original Message -----
> From: "Rob Herring" <robh@kernel.org>
> To: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>
> Cc: "CAI Qian" <caiqian@redhat.com>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
> 
> Rob
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
@ 2016-10-19 19:19         ` Jiri Olsa
  2016-10-19 20:18           ` CAI Qian
  2016-10-20  5:39           ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Jiri Olsa @ 2016-10-19 19:19 UTC (permalink / raw)
  To: CAI Qian
  Cc: Rob Herring, Peter Zijlstra, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Wed, Oct 19, 2016 at 10:45:31AM -0400, CAI Qian wrote:
> It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e.,
> not compiled it as a module. The can still be reproduced in the yesterday's mainline.
> 
> Here is some information about the system,
> 
> Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0.
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> 
> [   66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [   66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
> [   66.369911] Intel CQM monitoring enabled
> [   66.374445] Intel MBM enabled
> [   66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
> [   66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
> [   66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules
> [   66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules
> [   66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
> [   66.434040] ================================================================================
> [   66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17
> [   66.450653] member access within null pointer of type 'struct device'
> [   66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
> [   66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   66.477168]  ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf
> [   66.485469]  ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000
> [   66.493770]  ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309
> [   66.502073] Call Trace:
> [   66.504811]  [<ffffffff81d370b4>] dump_stack+0xc0/0x12c
> [   66.510644]  [<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4
> [   66.517548]  [<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a
> [   66.523574]  [<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434
> [   66.531253]  [<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120
> [   66.537667]  [<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a
> [   66.543985]  [<ffffffff82241acc>] device_del+0x6fc/0x860
> [   66.549917]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.557494]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
> [   66.564202]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
> [   66.571006]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.577619]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
> [   66.584422]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
> [   66.591025]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
> [   66.597539]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
> [   66.604340]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
> [   66.611334]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
> [   66.617749]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
> [   66.624264]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
> [   66.631258]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
> [   66.638349]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.644959]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
> [   66.650976]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
> [   66.657292]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
> [   66.663704]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
> [   66.670700]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
> [   66.677694]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
> [   66.684694]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
> [   66.691300]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
> [   66.698006]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
> [   66.704710]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
> [   66.711025]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
> [   66.718116]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
> [   66.723949]  [<ffffffff81326800>] ? up_read+0x40/0x40
> [   66.729587]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.737165]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
> [   66.743000]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
> [   66.749900]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
> [   66.756219]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
> [   66.763013]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   66.769039]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
> [   66.774967]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   66.780993]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
> [   66.787019] ================================================================================
> [   66.796479] kasan: CONFIG_KASAN_INLINE enabled
> [   66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access
> [   66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [   66.817878] Modules linked in:
> [   66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
> [   66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000
> [   66.847225] RIP: 0010:[<ffffffff82241466>]  [<ffffffff82241466>] device_del+0x96/0x860
> [   66.856076] RSP: 0000:ffff880847aff868  EFLAGS: 00010246
> [   66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [   66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06
> [   66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007
> [   66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930
> [   66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258
> [   66.901824] FS:  0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000
> [   66.910853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0
> [   66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   66.941154] Stack:
> [   66.943396]  ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920
> [   66.951698]  0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870
> [   66.959997]  ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0
> [   66.968296] Call Trace:
> [   66.971025]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.978603]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
> [   66.985309]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
> [   66.992111]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.998720]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
> [   67.005523]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
> [   67.012131]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
> [   67.018641]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
> [   67.025442]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
> [   67.032437]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
> [   67.038852]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
> [   67.045361]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
> [   67.052357]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
> [   67.059450]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   67.066056]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
> [   67.072081]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
> [   67.078397]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
> [   67.084809]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
> [   67.091803]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
> [   67.098798]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
> [   67.105792]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
> [   67.112399]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
> [   67.119103]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
> [   67.125806]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
> [   67.132124]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
> [   67.139215]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
> [   67.145046]  [<ffffffff81326800>] ? up_read+0x40/0x40
> [   67.150684]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   67.158262]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
> [   67.164094]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
> [   67.170992]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
> [   67.177310]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
> [   67.184111]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   67.190137]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
> [   67.196064]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   67.202090]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
> [   67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
> [   67.229872] RIP  [<ffffffff82241466>] device_del+0x96/0x860
> [   67.236101]  RSP <ffff880847aff868>
> [   67.240059] ---[ end trace 69358e866a1e3f6c ]---
> [   67.245377] Kernel panic - not syncing: Fatal exception
> [   67.251271] ---[ end Kernel panic - not syncing: Fatal exception

I think the reason here is that presume pmu devices are always added,
but we add them only if pmu_bus_running (in perf_event_sysfs_init)
is set which might happen after uncore initcall

attached patch fixes the issue for me

jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e97b33f..c2099b799d16 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
 		idr_remove(&pmu_idr, pmu->type);
 	if (pmu->nr_addr_filters)
 		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	if (pmu_bus_running) {
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-19 19:19         ` Jiri Olsa
@ 2016-10-19 20:18           ` CAI Qian
  2016-10-20  5:39           ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-19 20:18 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Rob Herring, Peter Zijlstra, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar


> I think the reason here is that presume pmu devices are always added,
> but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> is set which might happen after uncore initcall
> 
> attached patch fixes the issue for me
Tested-by: CAI Qian <caiqian@redhat.com>
> 
> jirka
> 
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c6e47e97b33f..c2099b799d16 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
>  		idr_remove(&pmu_idr, pmu->type);
>  	if (pmu->nr_addr_filters)
>  		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> -	device_del(pmu->dev);
> -	put_device(pmu->dev);
> +	if (pmu_bus_running) {
> +		device_del(pmu->dev);
> +		put_device(pmu->dev);
> +	}
>  	free_pmu_context(pmu);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-19 19:19         ` Jiri Olsa
  2016-10-19 20:18           ` CAI Qian
@ 2016-10-20  5:39           ` Peter Zijlstra
  2016-10-20  8:58             ` Jiri Olsa
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-10-20  5:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Wed, Oct 19, 2016 at 09:19:43PM +0200, Jiri Olsa wrote:
> I think the reason here is that presume pmu devices are always added,
> but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> is set which might happen after uncore initcall
> 
> attached patch fixes the issue for me

Right, we never expected to be unloaded before userspace runs.

Strictly speaking we should only read pmu_bus_running while holding
pmus_lock, that way we're serialized against perf_event_sysfs_init()
flipping it while we're being removed etc..

With the current setup the introduced race is harmless, but who knows
what other crazy these device people will come up with ;-)

> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c6e47e97b33f..c2099b799d16 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
>  		idr_remove(&pmu_idr, pmu->type);
>  	if (pmu->nr_addr_filters)
>  		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> -	device_del(pmu->dev);
> -	put_device(pmu->dev);
> +	if (pmu_bus_running) {
> +		device_del(pmu->dev);
> +		put_device(pmu->dev);
> +	}
>  	free_pmu_context(pmu);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  5:39           ` Peter Zijlstra
@ 2016-10-20  8:58             ` Jiri Olsa
  2016-10-20  9:04               ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Olsa @ 2016-10-20  8:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 07:39:44AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 19, 2016 at 09:19:43PM +0200, Jiri Olsa wrote:
> > I think the reason here is that presume pmu devices are always added,
> > but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> > is set which might happen after uncore initcall
> > 
> > attached patch fixes the issue for me
> 
> Right, we never expected to be unloaded before userspace runs.
> 
> Strictly speaking we should only read pmu_bus_running while holding
> pmus_lock, that way we're serialized against perf_event_sysfs_init()
> flipping it while we're being removed etc..
> 
> With the current setup the introduced race is harmless, but who knows
> what other crazy these device people will come up with ;-)
> 

right, did not think of that ;-)

also I did not noticed device_remove_file call for pmu->nr_addr_filters
and we could save one lock/unlock call later.. I'm testing attached patch
now

thanks,
jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e97b33f..224dffbc3b9b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8581,24 +8581,24 @@ static void update_pmu_context(struct pmu *pmu, struct pmu *old_pmu)
 	}
 }
 
+/*
+ * The pmus_lock lock must be taken.
+ */
 static void free_pmu_context(struct pmu *pmu)
 {
 	struct pmu *i;
 
-	mutex_lock(&pmus_lock);
 	/*
 	 * Like a real lame refcount.
 	 */
 	list_for_each_entry(i, &pmus, entry) {
 		if (i->pmu_cpu_context == pmu->pmu_cpu_context) {
 			update_pmu_context(i, pmu);
-			goto out;
+			return;
 		}
 	}
 
 	free_percpu(pmu->pmu_cpu_context);
-out:
-	mutex_unlock(&pmus_lock);
 }
 
 /*
@@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
 	free_percpu(pmu->pmu_disable_count);
 	if (pmu->type >= PERF_TYPE_MAX)
 		idr_remove(&pmu_idr, pmu->type);
-	if (pmu->nr_addr_filters)
-		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	mutex_lock(&pmus_lock);
+	if (pmu_bus_running) {
+		if (pmu->nr_addr_filters)
+			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
+	mutex_unlock(&pmus_lock);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  8:58             ` Jiri Olsa
@ 2016-10-20  9:04               ` Peter Zijlstra
  2016-10-20  9:42                 ` Jiri Olsa
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-10-20  9:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote:

> @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
>  	free_percpu(pmu->pmu_disable_count);
>  	if (pmu->type >= PERF_TYPE_MAX)
>  		idr_remove(&pmu_idr, pmu->type);
> -	if (pmu->nr_addr_filters)
> -		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> -	device_del(pmu->dev);
> -	put_device(pmu->dev);
> +	mutex_lock(&pmus_lock);
> +	if (pmu_bus_running) {
> +		if (pmu->nr_addr_filters)
> +			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> +		device_del(pmu->dev);
> +		put_device(pmu->dev);
> +	}
>  	free_pmu_context(pmu);
> +	mutex_unlock(&pmus_lock);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);

I think that is still racy..


unregister:		sysfs_init:

mutex_lock(&pmus_lock);
list_del_rcu(&pmu->entry);
mutex_unlock(&pmus_lock);

synchronize_*rcu();

			mutex_lock(&pmus_lock);
			list_for_each_entry(pmu, &pmus, entry) {
				/* add device muck */
				/* will _NOT_ see our PMU */
			}
			pmus_bus_running = 1;
			mutex_unlock(&pmus_lock);

mutex_lock(&pmus_lock);
if (pmu_bus_running) {
	device_del() /* OOPS */


What you want is to read pmu_bus_running in the same pmus_lock section
as we do the list_del, and then use that local copy later.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  9:04               ` Peter Zijlstra
@ 2016-10-20  9:42                 ` Jiri Olsa
  2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Olsa @ 2016-10-20  9:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 11:04:16AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote:
> 
> > @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
> >  	free_percpu(pmu->pmu_disable_count);
> >  	if (pmu->type >= PERF_TYPE_MAX)
> >  		idr_remove(&pmu_idr, pmu->type);
> > -	if (pmu->nr_addr_filters)
> > -		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > -	device_del(pmu->dev);
> > -	put_device(pmu->dev);
> > +	mutex_lock(&pmus_lock);
> > +	if (pmu_bus_running) {
> > +		if (pmu->nr_addr_filters)
> > +			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > +		device_del(pmu->dev);
> > +		put_device(pmu->dev);
> > +	}
> >  	free_pmu_context(pmu);
> > +	mutex_unlock(&pmus_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
> 
> I think that is still racy..
> 
> 
> unregister:		sysfs_init:
> 
> mutex_lock(&pmus_lock);
> list_del_rcu(&pmu->entry);
> mutex_unlock(&pmus_lock);
> 
> synchronize_*rcu();
> 
> 			mutex_lock(&pmus_lock);
> 			list_for_each_entry(pmu, &pmus, entry) {
> 				/* add device muck */

ah, I thought this part would add the device back.. but it's
already out of the pmu list.. right :-\

thanks,
jirka

> 				/* will _NOT_ see our PMU */
> 			}
> 			pmus_bus_running = 1;
> 			mutex_unlock(&pmus_lock);
> 
> mutex_lock(&pmus_lock);
> if (pmu_bus_running) {
> 	device_del() /* OOPS */
> 
> 
> What you want is to read pmu_bus_running in the same pmus_lock section
> as we do the list_del, and then use that local copy later.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20  9:42                 ` Jiri Olsa
@ 2016-10-20 11:10                   ` Jiri Olsa
  2016-10-20 14:30                     ` CAI Qian
  2016-10-28 10:10                     ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa
  0 siblings, 2 replies; 18+ messages in thread
From: Jiri Olsa @ 2016-10-20 11:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: CAI Qian, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar

On Thu, Oct 20, 2016 at 11:42:59AM +0200, Jiri Olsa wrote:
> On Thu, Oct 20, 2016 at 11:04:16AM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 20, 2016 at 10:58:03AM +0200, Jiri Olsa wrote:
> > 
> > > @@ -8869,11 +8869,15 @@ void perf_pmu_unregister(struct pmu *pmu)
> > >  	free_percpu(pmu->pmu_disable_count);
> > >  	if (pmu->type >= PERF_TYPE_MAX)
> > >  		idr_remove(&pmu_idr, pmu->type);
> > > -	if (pmu->nr_addr_filters)
> > > -		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > > -	device_del(pmu->dev);
> > > -	put_device(pmu->dev);
> > > +	mutex_lock(&pmus_lock);
> > > +	if (pmu_bus_running) {
> > > +		if (pmu->nr_addr_filters)
> > > +			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
> > > +		device_del(pmu->dev);
> > > +		put_device(pmu->dev);
> > > +	}
> > >  	free_pmu_context(pmu);
> > > +	mutex_unlock(&pmus_lock);
> > >  }
> > >  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
> > 
> > I think that is still racy..
> > 
> > 
> > unregister:		sysfs_init:
> > 
> > mutex_lock(&pmus_lock);
> > list_del_rcu(&pmu->entry);
> > mutex_unlock(&pmus_lock);
> > 
> > synchronize_*rcu();
> > 
> > 			mutex_lock(&pmus_lock);
> > 			list_for_each_entry(pmu, &pmus, entry) {
> > 				/* add device muck */
> 
> ah, I thought this part would add the device back.. but it's
> already out of the pmu list.. right :-\

attached fix, thanks

jirka


---
CAI Qian reported crash [1] in uncore device removal related
to CONFIG_DEBUG_TEST_DRIVER_REMOVE option.

The reason for crash is that  perf_pmu_unregister tries to remove
pmu device which is not added at this point. We add pmu devices
only after pmu_bus is registered which happens in perf_event_sysfs_init
init call and sets pmu_bus_running flag.

The fix is to get the pmu_bus_running flag state at the point
the pmu is taken  out of the pmus list and  remove the device
later only if it's set.

[1] https://marc.info/?l=linux-kernel&m=147688837328451

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/core.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e97b33f..a5d2e62faf7e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8855,7 +8855,10 @@ EXPORT_SYMBOL_GPL(perf_pmu_register);
 
 void perf_pmu_unregister(struct pmu *pmu)
 {
+	int remove_device;
+
 	mutex_lock(&pmus_lock);
+	remove_device = pmu_bus_running;
 	list_del_rcu(&pmu->entry);
 	mutex_unlock(&pmus_lock);
 
@@ -8869,10 +8872,12 @@ void perf_pmu_unregister(struct pmu *pmu)
 	free_percpu(pmu->pmu_disable_count);
 	if (pmu->type >= PERF_TYPE_MAX)
 		idr_remove(&pmu_idr, pmu->type);
-	if (pmu->nr_addr_filters)
-		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	if (remove_device) {
+		if (pmu->nr_addr_filters)
+			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
  2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
@ 2016-10-20 14:30                     ` CAI Qian
  2016-10-28 10:10                     ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa
  1 sibling, 0 replies; 18+ messages in thread
From: CAI Qian @ 2016-10-20 14:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Rob Herring, Kan Liang, Greg Kroah-Hartman,
	linux-kernel, Ingo Molnar


> CAI Qian reported crash [1] in uncore device removal related
> to CONFIG_DEBUG_TEST_DRIVER_REMOVE option.
> 
> The reason for crash is that  perf_pmu_unregister tries to remove
> pmu device which is not added at this point. We add pmu devices
> only after pmu_bus is registered which happens in perf_event_sysfs_init
> init call and sets pmu_bus_running flag.
> 
> The fix is to get the pmu_bus_running flag state at the point
> the pmu is taken  out of the pmus list and  remove the device
> later only if it's set.
> 
> [1] https://marc.info/?l=linux-kernel&m=147688837328451
> 
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Tested-by: CAI Qian <caiqian@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
  2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
  2016-10-20 14:30                     ` CAI Qian
@ 2016-10-28 10:10                     ` tip-bot for Jiri Olsa
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot for Jiri Olsa @ 2016-10-28 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, mingo, jolsa, kan.liang, caiqian, gregkh, tglx, robh,
	torvalds, alexander.shishkin, acme, linux-kernel, jolsa, hpa

Commit-ID:  0933840acf7b65d6d30a5b6089d882afea57aca3
Gitweb:     http://git.kernel.org/tip/0933840acf7b65d6d30a5b6089d882afea57aca3
Author:     Jiri Olsa <jolsa@redhat.com>
AuthorDate: Thu, 20 Oct 2016 13:10:11 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 28 Oct 2016 11:06:25 +0200

perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic

CAI Qian reported a crash in the PMU uncore device removal code,
enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option:

  https://marc.info/?l=linux-kernel&m=147688837328451

The reason for the crash is that perf_pmu_unregister() tries to remove
a PMU device which is not added at this point. We add PMU devices
only after pmu_bus is registered, which happens in the
perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag.

The fix is to get the 'pmu_bus_running' flag state at the point
the PMU is taken out of the PMU list and remove the device
later only if it's set.

Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e9..a5d2e62 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8855,7 +8855,10 @@ EXPORT_SYMBOL_GPL(perf_pmu_register);
 
 void perf_pmu_unregister(struct pmu *pmu)
 {
+	int remove_device;
+
 	mutex_lock(&pmus_lock);
+	remove_device = pmu_bus_running;
 	list_del_rcu(&pmu->entry);
 	mutex_unlock(&pmus_lock);
 
@@ -8869,10 +8872,12 @@ void perf_pmu_unregister(struct pmu *pmu)
 	free_percpu(pmu->pmu_disable_count);
 	if (pmu->type >= PERF_TYPE_MAX)
 		idr_remove(&pmu_idr, pmu->type);
-	if (pmu->nr_addr_filters)
-		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	if (remove_device) {
+		if (pmu->nr_addr_filters)
+			device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-10-28 10:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com>
2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian
2016-10-10 17:09   ` Rob Herring
2016-10-10 18:25     ` CAI Qian
2016-10-10 17:20   ` Greg Kroah-Hartman
2016-10-10 18:15     ` Rob Herring
2016-10-10 18:22       ` CAI Qian
2016-10-10 19:34         ` Rob Herring
2016-10-10 20:09           ` CAI Qian
2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
2016-10-19 19:19         ` Jiri Olsa
2016-10-19 20:18           ` CAI Qian
2016-10-20  5:39           ` Peter Zijlstra
2016-10-20  8:58             ` Jiri Olsa
2016-10-20  9:04               ` Peter Zijlstra
2016-10-20  9:42                 ` Jiri Olsa
2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
2016-10-20 14:30                     ` CAI Qian
2016-10-28 10:10                     ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).