From: Eric Dumazet <eric.dumazet@gmail.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>, Thomas Graf <tgraf@suug.ch>,
netdev <netdev@vger.kernel.org>
Subject: Re: netlink & rhashtable status
Date: Wed, 13 May 2015 21:13:38 -0700 [thread overview]
Message-ID: <1431576818.27831.36.camel@edumazet-glaptop2.roam.corp.google.com> (raw)
In-Reply-To: <1431575890.27831.34.camel@edumazet-glaptop2.roam.corp.google.com>
On Wed, 2015-05-13 at 20:58 -0700, Eric Dumazet wrote:
> On Thu, 2015-05-14 at 11:34 +0800, Herbert Xu wrote:
> > On Wed, May 13, 2015 at 08:17:43PM -0700, Eric Dumazet wrote:
> > >
> > > The initial bug report was on 3.18 for sure.
> > >
> > > (Tester had to leave the program run ~8 hours to get the problem, on a 8
> > > vCPU VM)
> > >
> > > I can reproduce the bug quite easily (in a few seconds) on 4.0.3, I did
> > > not spent lot of time trying 3.18, but it seems a bit harder.
> >
> > No what I'm asking is on 3.18 was it permanent? I can imagine
> > there being a lookup bug in 3.18 that triggers during a rehash
> > but I cannot find any permanent corruption issues.
>
> Let me try to reproduce this on 3.18.13. I'll give you an update.
OK I reproduced a hang after few minutes :
Out of my 200 processes, one of them is stuck in the recvmsg() system
call :
lpaa23:~# ps aux|grep addrinfo
root 33416 0.0 0.0 3692 376 pts/0 S+ 21:09 0:00 /bin/bash ./getaddrinfo_many.sh
root 33417 0.0 0.0 3692 376 pts/0 S+ 21:09 0:00 /bin/bash ./getaddrinfo_many.sh
root 33418 0.0 0.0 3744 2108 pts/0 S+ 21:09 0:00 /bin/bash ./getaddrinfo_many.sh
root 33428 0.0 0.0 3696 1752 pts/0 S+ 21:09 0:00 /bin/bash ./getaddrinfo_many.sh
root 33431 0.0 0.0 1172 4 pts/0 S+ 21:09 0:00 ./getaddrinfo 500
root 34102 0.0 0.0 2600 1312 pts/1 S+ 21:11 0:00 grep addrinfo
root 40236 0.0 0.0 3692 2920 pts/0 S+ 21:09 0:00 /bin/bash ./getaddrinfo_many.sh
lpaa23:~# strace -p 33431
Process 33431 attached
recvmsg(3, ^CProcess 33431 detached
<detached ...>
lpaa23:~# lsof -p 33431
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
getaddrin 33431 root cwd DIR 8,1 12288 16394 /root
getaddrin 33431 root rtd DIR 8,1 4096 2 /
getaddrin 33431 root txt REG 8,1 978477 87 /root/getaddrinfo
getaddrin 33431 root 0r CHR 1,3 0t0 2521 /dev/null
getaddrin 33431 root 1w REG 8,1 0 6919 /root/5.out
getaddrin 33431 root 2w REG 8,1 0 6919 /root/5.out
getaddrin 33431 root 3u netlink 0t0 57052903 ROUTE
lpaa23:~# cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
ffff881f6d8b8000 0 33431 00000000 0 0 0 2 0 57052903
ffff881fe1d98400 0 0 00000000 0 0 0 2 0 3
ffff881f6d8b8000 0 33431 00000000 0 0 0 2 0 57052903
ffff881fe1066400 8 0 00000000 0 0 0 2 0 13355
ffff881fe1066400 8 0 00000000 0 0 0 2 0 13355
ffff883fe1204800 9 0 00000000 0 0 0 2 0 2056
ffff883fe1204800 9 0 00000000 0 0 0 2 0 2056
ffff883feecf6400 10 0 00000000 0 0 0 2 0 9602
ffff883fe1208000 11 0 00000000 0 0 0 2 0 2051
ffff883fe1208000 11 0 00000000 0 0 0 2 0 2051
ffff881fe0f4ac00 16 0 00000000 0 0 0 2 0 2054
ffff881fe0f4ac00 16 0 00000000 0 0 0 2 0 2054
So it looks like we lost an skb or something....
next prev parent reply other threads:[~2015-05-14 4:13 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-13 5:30 netlink & rhashtable status Eric Dumazet
2015-05-13 5:40 ` Herbert Xu
2015-05-13 6:15 ` Eric Dumazet
2015-05-13 6:20 ` Herbert Xu
2015-05-13 13:04 ` Eric Dumazet
2015-05-13 16:18 ` Eric Dumazet
2015-05-13 16:35 ` David Miller
2015-05-14 2:55 ` Herbert Xu
2015-05-14 2:53 ` Herbert Xu
2015-05-14 3:17 ` Eric Dumazet
2015-05-14 3:34 ` Herbert Xu
2015-05-14 3:58 ` Eric Dumazet
2015-05-14 4:13 ` Eric Dumazet [this message]
2015-05-14 4:16 ` Herbert Xu
2015-05-14 4:21 ` Herbert Xu
2015-05-14 4:38 ` Eric Dumazet
2015-05-14 5:03 ` Herbert Xu
2015-05-14 5:56 ` Red Hat INTERNAL-ONLY kernel discussion list <rhkernel-list@redhat.com> Herbert Xu
2015-05-14 5:58 ` netlink: Disable insertions/removals during rehash Herbert Xu
2015-05-14 6:02 ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
2015-05-15 16:49 ` David Miller
2015-05-15 18:01 ` Eric Dumazet
2015-05-16 16:50 ` Eric Dumazet
2015-05-16 20:58 ` David Miller
2015-05-15 17:02 ` David Miller
2015-05-16 12:32 ` Herbert Xu
2015-05-16 13:40 ` [net] netlink: Make autobind rover an atomic_t Herbert Xu
2015-05-16 13:50 ` [net] netlink: Reset portid after netlink_insert failure Herbert Xu
2015-05-16 21:09 ` David Miller
2015-05-16 21:08 ` [net] netlink: Make autobind rover an atomic_t David Miller
2015-05-17 2:45 ` [net-next] netlink: Use random autobind rover Herbert Xu
2015-05-18 3:44 ` David Miller
2015-05-14 14:37 ` netlink: Disable insertions/removals during rehash Eric Dumazet
2015-05-15 0:06 ` Herbert Xu
2015-05-20 23:53 ` Thomas Graf
2015-05-21 0:31 ` Eric Dumazet
2015-05-15 17:02 ` David Miller
2015-05-16 13:16 ` Herbert Xu
2015-05-16 21:10 ` David Miller
2015-06-04 16:27 ` Guenter Roeck
2015-06-04 18:59 ` David Miller
2015-06-04 20:44 ` Eric Dumazet
2015-06-04 20:58 ` Guenter Roeck
2015-06-05 3:52 ` Herbert Xu
2015-06-05 5:27 ` Guenter Roeck
2015-06-26 10:44 ` netlink & rhashtable status Konstantin Khlebnikov
2015-06-27 7:09 ` Herbert Xu
2015-05-14 4:17 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1431576818.27831.36.camel@edumazet-glaptop2.roam.corp.google.com \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.