From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422724AbcFMHcO (ORCPT ); Mon, 13 Jun 2016 03:32:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48063 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422700AbcFMHcL (ORCPT ); Mon, 13 Jun 2016 03:32:11 -0400 Date: Mon, 13 Jun 2016 02:32:09 -0500 From: Josh Poimboeuf To: Ingo Molnar Cc: Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org, Mel Gorman , Matt Fleming , Srikar Dronamraju Subject: [PATCH v2] sched/debug: fix deadlock when enabling sched events Message-ID: <20160613073209.gdvdybiruljbkn3p@treble> References: <50a0f7f2a5f8d69e0440c3430f06be7ccbe59903.1465322027.git.jpoimboe@redhat.com> <20160608075612.GA9369@gmail.com> <20160608131237.3tixvrtpqvfv5kmg@treble> <20160608131559.GA15843@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160608131559.GA15843@gmail.com> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 13 Jun 2016 07:32:11 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 08, 2016 at 03:15:59PM +0200, Ingo Molnar wrote: > > * Josh Poimboeuf wrote: > > > On Wed, Jun 08, 2016 at 09:56:12AM +0200, Ingo Molnar wrote: > > > > > > * Josh Poimboeuf wrote: > > > > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -789,6 +789,13 @@ static void update_curr_fair(struct rq *rq) > > > > update_curr(cfs_rq_of(&rq->curr->se)); > > > > } > > > > > > > > +void trace_sched_stat_register(void) > > > > +{ > > > > +#ifdef CONFIG_SCHEDSTATS > > > > + force_schedstat_enabled(); > > > > +#endif > > > > +} > > > > > > I think it would be cleaner to provide an empty force_schedstat_enabled() > > > definition in sched.h, on !CONFIG_SCHEDSTATS. > > > > Yes, agreed. > > > > > But it might make sense to further decouple schedstats from tracing? > > > > Looking at how we could do that: > > > > - The sched_stat_wait tracepoint is dependent on the wait_start schedstat. > > Can we maintain wait_start unconditionally? Having config dependent tracepoints > sucks. > > > - The sched_stat_sleep tracepoint is dependent on the sleep_start > > schedstat. > > Ditto. > > > - The sched_stat_iowait and sched_stat_blocked tracepoints are dependent > > on the block_start schedstat. > > Ditto. > > > We could move those three schedstats values out of sched_statistics and > > into sched_entity, and then always update them regardless of > > CONFIG_SCHEDSTATS. > > > > That would ensure these tracepoints always work. But then again it > > would add a little bit of runtime overhead. > > > > I don't have a strong opinion either way. Thoughts? > > Please do a before/after 'size vmlinux' dump of a defconfig build with tracing > disabled. I.e. how much code overhead does the updating of those variables add? When trying to decouple the sched_stat_* tracepoints from schedstats, I see a significant (~12%) performance degradation in pipetest for the CONFIG_SCHEDSTATS case. Still looking into it... In the meantime, here's a straightforward fix for the deadlock. --- From: Josh Poimboeuf Subject: [PATCH v2] sched/debug: fix deadlock when enabling sched events I see a hang when enabling sched events: echo 1 > /sys/kernel/debug/tracing/events/sched/enable The printk buffer shows: BUG: spinlock recursion on CPU#1, swapper/1/0 lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 0000000000000086 3a8e170cf69ad23f ffff88007d403898 ffffffff8143d663 ffff88007c535480 ffff88007d5d8c00 ffff88007d4038b8 ffffffff81115948 ffff88007d5d8c00 ffff88007d5d8c00 ffff88007d4038e8 ffffffff81115aea Call Trace: [] dump_stack+0x85/0xc2 [] spin_dump+0x78/0xc0 [] do_raw_spin_lock+0x11a/0x150 [] _raw_spin_lock+0x61/0x80 [] ? try_to_wake_up+0x256/0x4e0 [] try_to_wake_up+0x256/0x4e0 [] ? _raw_spin_unlock_irqrestore+0x4a/0x80 [] wake_up_process+0x15/0x20 [] insert_work+0x84/0xc0 [] __queue_work+0x18f/0x660 [] queue_work_on+0x46/0x90 [] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper] [] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper] [] soft_cursor+0x1ad/0x230 [] bit_cursor+0x649/0x680 [] ? update_attr.isra.2+0x90/0x90 [] fbcon_cursor+0x14a/0x1c0 [] hide_cursor+0x28/0x90 [] vt_console_print+0x3bf/0x3f0 [] call_console_drivers.constprop.24+0x183/0x200 [] console_unlock+0x3d4/0x610 [] vprintk_emit+0x3c5/0x610 [] vprintk_default+0x29/0x40 [] printk+0x57/0x73 [] enqueue_entity+0xc2e/0xc70 [] enqueue_task_fair+0x59/0xab0 [] ? kvm_sched_clock_read+0x9/0x20 [] ? sched_clock+0x9/0x10 [] activate_task+0x5c/0xa0 [] ttwu_do_activate+0x54/0xb0 [] sched_ttwu_pending+0x7a/0xb0 [] scheduler_ipi+0x61/0x170 [] smp_trace_reschedule_interrupt+0x4f/0x2a0 [] trace_reschedule_interrupt+0x96/0xa0 [] ? native_safe_halt+0x6/0x10 [] ? trace_hardirqs_on+0xd/0x10 [] default_idle+0x20/0x1a0 [] arch_cpu_idle+0xf/0x20 [] default_idle_call+0x2f/0x50 [] cpu_startup_entry+0x37e/0x450 [] start_secondary+0x160/0x1a0 Note the hang only occurs when echoing the above from a physical serial console, not from an ssh session. The bug is caused by a deadlock where the task is trying to grab the rq lock twice because printk()'s aren't safe in sched code. Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default") Cc: stable@vger.kernel.org Signed-off-by: Josh Poimboeuf --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c6dd8ba..52d9b37 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3220,7 +3220,7 @@ static inline void check_schedstat_required(void) trace_sched_stat_iowait_enabled() || trace_sched_stat_blocked_enabled() || trace_sched_stat_runtime_enabled()) { - pr_warn_once("Scheduler tracepoints stat_sleep, stat_iowait, " + printk_deferred_once("Scheduler tracepoints stat_sleep, stat_iowait, " "stat_blocked and stat_runtime require the " "kernel parameter schedstats=enabled or " "kernel.sched_schedstats=1\n"); -- 2.4.11