From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755680AbcBBQbM (ORCPT ); Tue, 2 Feb 2016 11:31:12 -0500 Received: from parrot.pmhahn.de ([88.198.50.102]:35148 "EHLO parrot.pmhahn.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750794AbcBBQbJ (ORCPT ); Tue, 2 Feb 2016 11:31:09 -0500 X-Greylist: delayed 320 seconds by postgrey-1.27 at vger.kernel.org; Tue, 02 Feb 2016 11:31:09 EST To: Sasha Levin , Hannes Frederic Sowa , Rainer Weikusat , Andrey Vagin , Aaron Conole , "David S. Miller" , linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman From: Philipp Hahn Subject: Bug 4.1.16: self-detected stall in net/unix/? X-Enigmail-Draft-Status: N1110 Message-ID: <56B0D88A.1020609@pmhahn.de> Date: Tue, 2 Feb 2016 17:25:46 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, we recently updated our kernel to 4.1.16 + patch for "unix: properly account for FDs passed over unix sockets" and have since then self-detected stalls triggered by the Samba daemon: > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] INFO: rcu_sched self-detected stall on CPU { 3} (t=162780 jiffies g=47565 c=47564 q=1055670) > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] Task dump for CPU 3: > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] smbd R running task 0 5938 1 0x0000000c > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] 0000000000000004 ffffffff81851340 ffffffff810d3c84 000000000000b9cd > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] ffff8801bfd97100 ffffffff81851340 ffffffff81851340 ffffffff818f6c60 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] ffffffff810d7659 0000000000000000 0000000000000000 00001e847fc2f700 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] Call Trace: > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? rcu_dump_cpu_stacks+0x84/0xc0 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? rcu_check_callbacks+0x449/0x740 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? tick_sched_do_timer+0x40/0x40 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? update_process_times+0x34/0x70 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? tick_sched_handle.isra.12+0x2c/0x70 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? tick_sched_timer+0x49/0x80 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? __run_hrtimer+0x6d/0x1b0 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? hrtimer_interrupt+0xed/0x210 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? smp_apic_timer_interrupt+0x39/0x50 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? apic_timer_interrupt+0x6e/0x80 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? _raw_spin_lock+0x35/0x50 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? unix_dgram_connect+0x93/0x200 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? SYSC_connect+0xe8/0x100 > Feb 1 09:03:14 dcs1 kernel: [ 1152.840007] [] ? system_call_fast_compare_end+0xc/0x6b > Feb 1 11:48:13 ucs22f kernel: [307999.162254] INFO: rcu_sched self-detected stall on CPU { 0} (t=5250 jiffies g=6586733 c=6586732 q=6757) > Feb 1 11:48:13 ucs22f kernel: [307999.162264] Task dump for CPU 0: > Feb 1 11:48:13 ucs22f kernel: [307999.162267] smbd R running 0 4615 4609 0x00000008 > Feb 1 11:48:13 ucs22f kernel: [307999.162272] 00200082 f5863b90 c10b3fe9 c1682cc0 c1682cc0 c1682cc0 f79d2b00 f 5863bdc > Feb 1 11:48:13 ucs22f kernel: [307999.162276] c10b722d c15bd400 00001482 0064816d 0064816c 00001a65 f79cd840 c 108b166 > Feb 1 11:48:13 ucs22f kernel: [307999.162280] 00000001 f5863bb8 f5863bb8 00000000 c1682cc0 f6808cf0 00001a65 f 6808cf0 > Feb 1 11:48:13 ucs22f kernel: [307999.162285] Call Trace: > Feb 1 11:48:13 ucs22f kernel: [307999.162296] [] ? rcu_dump_cpu_stacks+0x79/0xc0 > Feb 1 11:48:13 ucs22f kernel: [307999.162300] [] ? rcu_check_callbacks+0x3cd/0x630 > Feb 1 11:48:13 ucs22f kernel: [307999.162304] [] ? account_process_tick+0x66/0x160 > Feb 1 11:48:13 ucs22f kernel: [307999.162307] [] ? update_process_times+0x2f/0x60 > Feb 1 11:48:13 ucs22f kernel: [307999.162310] [] ? tick_sched_handle.isra.12+0x2d/0x60 > Feb 1 11:48:13 ucs22f kernel: [307999.162328] [] ? tick_sched_timer+0x40/0x80 > Feb 1 11:48:13 ucs22f kernel: [307999.162331] [] ? __remove_hrtimer+0x40/0xa0 > Feb 1 11:48:13 ucs22f kernel: [307999.162334] [] ? __run_hrtimer+0x6f/0x190 > Feb 1 11:48:13 ucs22f kernel: [307999.162337] [] ? tick_sched_do_timer+0x30/0x30 > Feb 1 11:48:13 ucs22f kernel: [307999.162339] [] ? hrtimer_interrupt+0xef/0x260 > Feb 1 11:48:13 ucs22f kernel: [307999.162343] [] ? getname_kernel+0x2d/0x100 > Feb 1 11:48:13 ucs22f kernel: [307999.162348] [] ? local_apic_timer_interrupt+0x2f/0x60 > Feb 1 11:48:13 ucs22f kernel: [307999.162353] [] ? smp_apic_timer_interrupt+0x33/0x50 > Feb 1 11:48:13 ucs22f kernel: [307999.162355] [] ? apic_timer_interrupt+0x34/0x3c > Feb 1 11:48:13 ucs22f kernel: [307999.162358] [] ? _raw_spin_lock+0x51/0x70 > Feb 1 11:48:13 ucs22f kernel: [307999.162362] [] ? unix_state_double_lock+0x25/0x60 > Feb 1 11:48:13 ucs22f kernel: [307999.162365] [] ? unix_dgram_connect+0x90/0x1f0 > Feb 1 11:48:13 ucs22f kernel: [307999.162369] [] ? SYSC_connect+0xc7/0xe0 > Feb 1 11:48:13 ucs22f kernel: [307999.162371] [] ? sock_map_fd+0x41/0x60 > Feb 1 11:48:13 ucs22f kernel: [307999.162374] [] ? SYSC_socketcall+0x1b4/0xa20 > Feb 1 11:48:13 ucs22f kernel: [307999.162376] [] ? ktime_get+0x50/0x100 > Feb 1 11:48:13 ucs22f kernel: [307999.162379] [] ? lapic_next_event+0x1b/0x20 > Feb 1 11:48:13 ucs22f kernel: [307999.162381] [] ? clockevents_program_event+0x9d/0x140 > Feb 1 11:48:13 ucs22f kernel: [307999.162385] [] ? list_del+0x8/0x20 > Feb 1 11:48:13 ucs22f kernel: [307999.162388] [] ? remove_wait_queue+0x27/0x40 > Feb 1 11:48:13 ucs22f kernel: [307999.162392] [] ? inotify_read+0x295/0x340 > Feb 1 11:48:13 ucs22f kernel: [307999.162396] [] ? handle_irq_event_percpu+0xa6/0x1a0 > Feb 1 11:48:13 ucs22f kernel: [307999.162399] [] ? set_close_on_exec+0x2f/0x60 > Feb 1 11:48:13 ucs22f kernel: [307999.162402] [] ? do_fcntl+0x2f4/0x4e0 > Feb 1 11:48:13 ucs22f kernel: [307999.162405] [] ? commit_creds+0xff/0x1f0 > Feb 1 11:48:13 ucs22f kernel: [307999.162407] [] ? SyS_fcntl64+0x60/0x100 > Feb 1 11:48:13 ucs22f kernel: [307999.162409] [] ? SyS_socketcall+0x13/0x20 > Feb 1 11:48:13 ucs22f kernel: [307999.162412] [] ? sysenter_do_call+0x12/0x12 We have not yet been able to reproduce the hang, but going back to our previous kernel 4.1.12 makes the problem go away. Is this a known issue or do you have an idea where to look? What information should I collect next time it happens? (Can unix_diag.ko with `ss` help?) What other kernel configs should I enable do debug this dead-lock? Thanks in advance Philipp