linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: CAI Qian <caiqian@redhat.com>
Cc: Rob Herring <robh@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Kan Liang <kan.liang@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
Date: Wed, 19 Oct 2016 21:19:43 +0200	[thread overview]
Message-ID: <20161019191943.GA7951@krava> (raw)
In-Reply-To: <1035662571.647973.1476888331396.JavaMail.zimbra@redhat.com>

On Wed, Oct 19, 2016 at 10:45:31AM -0400, CAI Qian wrote:
> It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e.,
> not compiled it as a module. The can still be reproduced in the yesterday's mainline.
> 
> Here is some information about the system,
> 
> Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0.
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> 
> [   66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [   66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
> [   66.369911] Intel CQM monitoring enabled
> [   66.374445] Intel MBM enabled
> [   66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
> [   66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
> [   66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules
> [   66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules
> [   66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
> [   66.434040] ================================================================================
> [   66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17
> [   66.450653] member access within null pointer of type 'struct device'
> [   66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
> [   66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   66.477168]  ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf
> [   66.485469]  ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000
> [   66.493770]  ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309
> [   66.502073] Call Trace:
> [   66.504811]  [<ffffffff81d370b4>] dump_stack+0xc0/0x12c
> [   66.510644]  [<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4
> [   66.517548]  [<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a
> [   66.523574]  [<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434
> [   66.531253]  [<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120
> [   66.537667]  [<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a
> [   66.543985]  [<ffffffff82241acc>] device_del+0x6fc/0x860
> [   66.549917]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.557494]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
> [   66.564202]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
> [   66.571006]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.577619]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
> [   66.584422]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
> [   66.591025]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
> [   66.597539]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
> [   66.604340]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
> [   66.611334]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
> [   66.617749]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
> [   66.624264]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
> [   66.631258]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
> [   66.638349]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.644959]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
> [   66.650976]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
> [   66.657292]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
> [   66.663704]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
> [   66.670700]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
> [   66.677694]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
> [   66.684694]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
> [   66.691300]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
> [   66.698006]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
> [   66.704710]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
> [   66.711025]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
> [   66.718116]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
> [   66.723949]  [<ffffffff81326800>] ? up_read+0x40/0x40
> [   66.729587]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.737165]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
> [   66.743000]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
> [   66.749900]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
> [   66.756219]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
> [   66.763013]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   66.769039]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
> [   66.774967]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   66.780993]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
> [   66.787019] ================================================================================
> [   66.796479] kasan: CONFIG_KASAN_INLINE enabled
> [   66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access
> [   66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [   66.817878] Modules linked in:
> [   66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
> [   66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [   66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000
> [   66.847225] RIP: 0010:[<ffffffff82241466>]  [<ffffffff82241466>] device_del+0x96/0x860
> [   66.856076] RSP: 0000:ffff880847aff868  EFLAGS: 00010246
> [   66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [   66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06
> [   66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007
> [   66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930
> [   66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258
> [   66.901824] FS:  0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000
> [   66.910853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0
> [   66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   66.941154] Stack:
> [   66.943396]  ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920
> [   66.951698]  0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870
> [   66.959997]  ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0
> [   66.968296] Call Trace:
> [   66.971025]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   66.978603]  [<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
> [   66.985309]  [<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
> [   66.992111]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   66.998720]  [<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
> [   67.005523]  [<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
> [   67.012131]  [<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
> [   67.018641]  [<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
> [   67.025442]  [<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
> [   67.032437]  [<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
> [   67.038852]  [<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
> [   67.045361]  [<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
> [   67.052357]  [<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
> [   67.059450]  [<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
> [   67.066056]  [<ffffffff8224eaa2>] driver_attach+0x42/0x70
> [   67.072081]  [<ffffffff8224d846>] bus_add_driver+0x406/0x870
> [   67.078397]  [<ffffffff822535b9>] driver_register+0x1a9/0x3d0
> [   67.084809]  [<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
> [   67.091803]  [<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
> [   67.098798]  [<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
> [   67.105792]  [<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
> [   67.112399]  [<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
> [   67.119103]  [<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
> [   67.125806]  [<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
> [   67.132124]  [<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
> [   67.139215]  [<ffffffff8132687d>] ? up_write+0x7d/0x120
> [   67.145046]  [<ffffffff81326800>] ? up_read+0x40/0x40
> [   67.150684]  [<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [   67.158262]  [<ffffffff8130db04>] ? __wake_up+0x44/0x50
> [   67.164094]  [<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
> [   67.170992]  [<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
> [   67.177310]  [<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
> [   67.184111]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   67.190137]  [<ffffffff82c704d3>] kernel_init+0x13/0x140
> [   67.196064]  [<ffffffff82c704c0>] ? rest_init+0x190/0x190
> [   67.202090]  [<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
> [   67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
> [   67.229872] RIP  [<ffffffff82241466>] device_del+0x96/0x860
> [   67.236101]  RSP <ffff880847aff868>
> [   67.240059] ---[ end trace 69358e866a1e3f6c ]---
> [   67.245377] Kernel panic - not syncing: Fatal exception
> [   67.251271] ---[ end Kernel panic - not syncing: Fatal exception

I think the reason here is that presume pmu devices are always added,
but we add them only if pmu_bus_running (in perf_event_sysfs_init)
is set which might happen after uncore initcall

attached patch fixes the issue for me

jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6e47e97b33f..c2099b799d16 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
 		idr_remove(&pmu_idr, pmu->type);
 	if (pmu->nr_addr_filters)
 		device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
-	device_del(pmu->dev);
-	put_device(pmu->dev);
+	if (pmu_bus_running) {
+		device_del(pmu->dev);
+		put_device(pmu->dev);
+	}
 	free_pmu_context(pmu);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_unregister);

  reply	other threads:[~2016-10-19 19:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com>
2016-10-10 15:37 ` kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic CAI Qian
2016-10-10 17:09   ` Rob Herring
2016-10-10 18:25     ` CAI Qian
2016-10-10 17:20   ` Greg Kroah-Hartman
2016-10-10 18:15     ` Rob Herring
2016-10-10 18:22       ` CAI Qian
2016-10-10 19:34         ` Rob Herring
2016-10-10 20:09           ` CAI Qian
2016-10-19 14:45       ` [4.9-rc1+] intel_uncore builtin " CAI Qian
2016-10-19 19:19         ` Jiri Olsa [this message]
2016-10-19 20:18           ` CAI Qian
2016-10-20  5:39           ` Peter Zijlstra
2016-10-20  8:58             ` Jiri Olsa
2016-10-20  9:04               ` Peter Zijlstra
2016-10-20  9:42                 ` Jiri Olsa
2016-10-20 11:10                   ` [PATCH] perf: Protect pmu device removal with pmu_bus_running check " Jiri Olsa
2016-10-20 14:30                     ` CAI Qian
2016-10-28 10:10                     ` [tip:perf/urgent] perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y " tip-bot for Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161019191943.GA7951@krava \
    --to=jolsa@redhat.com \
    --cc=caiqian@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).