linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Philipp Hahn <pmhahn@pmhahn.de>
To: Sasha Levin <sasha.levin@oracle.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Rainer Weikusat <rweikusat@mobileactivedefense.com>,
	Andrey Vagin <avagin@openvz.org>,
	Aaron Conole <aconole@bytheb.org>,
	"David S. Miller" <davem@davemloft.net>,
	linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Bug 4.1.16: self-detected stall in net/unix/?
Date: Tue, 2 Feb 2016 17:25:46 +0100	[thread overview]
Message-ID: <56B0D88A.1020609@pmhahn.de> (raw)

Hi,

we recently updated our kernel to 4.1.16 + patch for "unix: properly
account for FDs passed over unix sockets" and have since then
self-detected stalls triggered by the Samba daemon:

> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] INFO: rcu_sched self-detected stall on CPU { 3}  (t=162780 jiffies g=47565 c=47564 q=1055670)
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] Task dump for CPU 3:
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] smbd            R  running task        0  5938      1 0x0000000c
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  0000000000000004 ffffffff81851340 ffffffff810d3c84 000000000000b9cd
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  ffff8801bfd97100 ffffffff81851340 ffffffff81851340 ffffffff818f6c60
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  ffffffff810d7659 0000000000000000 0000000000000000 00001e847fc2f700
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] Call Trace:
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  <IRQ>  [<ffffffff810d3c84>] ? rcu_dump_cpu_stacks+0x84/0xc0
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810d7659>] ? rcu_check_callbacks+0x449/0x740
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ec7c0>] ? tick_sched_do_timer+0x40/0x40
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810dcc54>] ? update_process_times+0x34/0x70
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ec45c>] ? tick_sched_handle.isra.12+0x2c/0x70
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ec809>] ? tick_sched_timer+0x49/0x80
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810dd57d>] ? __run_hrtimer+0x6d/0x1b0
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ddd4d>] ? hrtimer_interrupt+0xed/0x210
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff815a0ed9>] ? smp_apic_timer_interrupt+0x39/0x50
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8159ef7e>] ? apic_timer_interrupt+0x6e/0x80

> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  <EOI>  [<ffffffff8159de85>] ? _raw_spin_lock+0x35/0x50
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8153b343>] ? unix_dgram_connect+0x93/0x200
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8147f248>] ? SYSC_connect+0xe8/0x100
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8159e0f2>] ? system_call_fast_compare_end+0xc/0x6b


> Feb  1 11:48:13 ucs22f kernel: [307999.162254] INFO: rcu_sched self-detected stall on CPU { 0}  (t=5250 jiffies                                                                   g=6586733 c=6586732 q=6757)
> Feb  1 11:48:13 ucs22f kernel: [307999.162264] Task dump for CPU 0:
> Feb  1 11:48:13 ucs22f kernel: [307999.162267] smbd            R running      0  4615   4609 0x00000008
> Feb  1 11:48:13 ucs22f kernel: [307999.162272]  00200082 f5863b90 c10b3fe9 c1682cc0 c1682cc0 c1682cc0 f79d2b00 f                                                                  5863bdc
> Feb  1 11:48:13 ucs22f kernel: [307999.162276]  c10b722d c15bd400 00001482 0064816d 0064816c 00001a65 f79cd840 c                                                                  108b166
> Feb  1 11:48:13 ucs22f kernel: [307999.162280]  00000001 f5863bb8 f5863bb8 00000000 c1682cc0 f6808cf0 00001a65 f                                                                  6808cf0
> Feb  1 11:48:13 ucs22f kernel: [307999.162285] Call Trace:
> Feb  1 11:48:13 ucs22f kernel: [307999.162296]  [<c10b3fe9>] ? rcu_dump_cpu_stacks+0x79/0xc0
> Feb  1 11:48:13 ucs22f kernel: [307999.162300]  [<c10b722d>] ? rcu_check_callbacks+0x3cd/0x630
> Feb  1 11:48:13 ucs22f kernel: [307999.162304]  [<c108b166>] ? account_process_tick+0x66/0x160
> Feb  1 11:48:13 ucs22f kernel: [307999.162307]  [<c10bbe4f>] ? update_process_times+0x2f/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162310]  [<c10cbf9d>] ? tick_sched_handle.isra.12+0x2d/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162328]  [<c10cc210>] ? tick_sched_timer+0x40/0x80
> Feb  1 11:48:13 ucs22f kernel: [307999.162331]  [<c10bc6b0>] ? __remove_hrtimer+0x40/0xa0
> Feb  1 11:48:13 ucs22f kernel: [307999.162334]  [<c10bc97f>] ? __run_hrtimer+0x6f/0x190
> Feb  1 11:48:13 ucs22f kernel: [307999.162337]  [<c10cc1d0>] ? tick_sched_do_timer+0x30/0x30
> Feb  1 11:48:13 ucs22f kernel: [307999.162339]  [<c10bd15f>] ? hrtimer_interrupt+0xef/0x260
> Feb  1 11:48:13 ucs22f kernel: [307999.162343]  [<c119ae3d>] ? getname_kernel+0x2d/0x100
> Feb  1 11:48:13 ucs22f kernel: [307999.162348]  [<c1046f7f>] ? local_apic_timer_interrupt+0x2f/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162353]  [<c14e4543>] ? smp_apic_timer_interrupt+0x33/0x50
> Feb  1 11:48:13 ucs22f kernel: [307999.162355]  [<c14e3c7c>] ? apic_timer_interrupt+0x34/0x3c

> Feb  1 11:48:13 ucs22f kernel: [307999.162358]  [<c14e2dc1>] ? _raw_spin_lock+0x51/0x70
> Feb  1 11:48:13 ucs22f kernel: [307999.162362]  [<c148c075>] ? unix_state_double_lock+0x25/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162365]  [<c148de10>] ? unix_dgram_connect+0x90/0x1f0
> Feb  1 11:48:13 ucs22f kernel: [307999.162369]  [<c13e4267>] ? SYSC_connect+0xc7/0xe0
> Feb  1 11:48:13 ucs22f kernel: [307999.162371]  [<c13e2931>] ? sock_map_fd+0x41/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162374]  [<c13e5014>] ? SYSC_socketcall+0x1b4/0xa20
> Feb  1 11:48:13 ucs22f kernel: [307999.162376]  [<c10c2940>] ? ktime_get+0x50/0x100
> Feb  1 11:48:13 ucs22f kernel: [307999.162379]  [<c10466db>] ? lapic_next_event+0x1b/0x20
> Feb  1 11:48:13 ucs22f kernel: [307999.162381]  [<c10ca0ed>] ? clockevents_program_event+0x9d/0x140
> Feb  1 11:48:13 ucs22f kernel: [307999.162385]  [<c129e068>] ? list_del+0x8/0x20
> Feb  1 11:48:13 ucs22f kernel: [307999.162388]  [<c1097ef7>] ? remove_wait_queue+0x27/0x40
> Feb  1 11:48:13 ucs22f kernel: [307999.162392]  [<c11c8795>] ? inotify_read+0x295/0x340
> Feb  1 11:48:13 ucs22f kernel: [307999.162396]  [<c10acc76>] ? handle_irq_event_percpu+0xa6/0x1a0
> Feb  1 11:48:13 ucs22f kernel: [307999.162399]  [<c11a786f>] ? set_close_on_exec+0x2f/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162402]  [<c119d084>] ? do_fcntl+0x2f4/0x4e0
> Feb  1 11:48:13 ucs22f kernel: [307999.162405]  [<c107d6df>] ? commit_creds+0xff/0x1f0
> Feb  1 11:48:13 ucs22f kernel: [307999.162407]  [<c119d380>] ? SyS_fcntl64+0x60/0x100
> Feb  1 11:48:13 ucs22f kernel: [307999.162409]  [<c13e5953>] ? SyS_socketcall+0x13/0x20
> Feb  1 11:48:13 ucs22f kernel: [307999.162412]  [<c14e30db>] ? sysenter_do_call+0x12/0x12

We have not yet been able to reproduce the hang, but going back to our
previous kernel 4.1.12 makes the problem go away.

Is this a known issue or do you have an idea where to look?
What information should I collect next time it happens?

(Can unix_diag.ko with `ss` help?)
What other kernel configs should I enable do debug this dead-lock?

Thanks in advance
Philipp

             reply	other threads:[~2016-02-02 16:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-02 16:25 Philipp Hahn [this message]
2016-02-03  1:43 ` Bug 4.1.16: self-detected stall in net/unix/? Hannes Frederic Sowa
2016-02-05 15:28   ` Philipp Hahn
2016-02-11 13:47     ` Philipp Hahn
2016-02-11 15:55       ` Rainer Weikusat
2016-02-11 17:03         ` Ben Hutchings
2016-02-11 17:40           ` Rainer Weikusat
2016-02-11 17:54             ` Rainer Weikusat
2016-02-11 18:31             ` Rainer Weikusat
2016-02-11 19:37               ` [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg Rainer Weikusat
2016-02-12  9:19                 ` Philipp Hahn
2016-02-12 13:25                   ` Rainer Weikusat
2016-02-12 19:54                     ` Ben Hutchings
2016-02-12 20:17                       ` Rainer Weikusat
2016-02-12 20:47                         ` Ben Hutchings
2016-02-12 20:59                           ` Rainer Weikusat
2016-02-16 17:54                 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B0D88A.1020609@pmhahn.de \
    --to=pmhahn@pmhahn.de \
    --cc=aconole@bytheb.org \
    --cc=avagin@openvz.org \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@stressinduktion.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rweikusat@mobileactivedefense.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).