All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>, Thomas Graf <tgraf@suug.ch>,
	netdev <netdev@vger.kernel.org>
Subject: Re: netlink & rhashtable status
Date: Wed, 13 May 2015 21:13:38 -0700	[thread overview]
Message-ID: <1431576818.27831.36.camel@edumazet-glaptop2.roam.corp.google.com> (raw)
In-Reply-To: <1431575890.27831.34.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, 2015-05-13 at 20:58 -0700, Eric Dumazet wrote:
> On Thu, 2015-05-14 at 11:34 +0800, Herbert Xu wrote:
> > On Wed, May 13, 2015 at 08:17:43PM -0700, Eric Dumazet wrote:
> > >
> > > The initial bug report was on 3.18 for sure.
> > > 
> > > (Tester had to leave the program run ~8 hours to get the problem, on a 8
> > > vCPU VM)
> > > 
> > > I can reproduce the bug quite easily (in a few seconds) on 4.0.3, I did
> > > not spent lot of time trying 3.18, but it seems a bit harder.
> > 
> > No what I'm asking is on 3.18 was it permanent? I can imagine
> > there being a lookup bug in 3.18 that triggers during a rehash
> > but I cannot find any permanent corruption issues.
> 
> Let me try to reproduce this on 3.18.13. I'll give you an update.

OK I reproduced a hang after few minutes :

Out of my 200 processes, one of them is stuck in the recvmsg() system
call :

lpaa23:~# ps aux|grep addrinfo
root     33416  0.0  0.0   3692   376 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33417  0.0  0.0   3692   376 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33418  0.0  0.0   3744  2108 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33428  0.0  0.0   3696  1752 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33431  0.0  0.0   1172     4 pts/0    S+   21:09   0:00 ./getaddrinfo 500
root     34102  0.0  0.0   2600  1312 pts/1    S+   21:11   0:00 grep addrinfo
root     40236  0.0  0.0   3692  2920 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
lpaa23:~# strace -p 33431
Process 33431 attached
recvmsg(3, ^CProcess 33431 detached
 <detached ...>

lpaa23:~# lsof -p 33431
COMMAND     PID USER   FD      TYPE DEVICE SIZE/OFF     NODE NAME
getaddrin 33431 root  cwd       DIR    8,1    12288    16394 /root
getaddrin 33431 root  rtd       DIR    8,1     4096        2 /
getaddrin 33431 root  txt       REG    8,1   978477       87 /root/getaddrinfo
getaddrin 33431 root    0r      CHR    1,3      0t0     2521 /dev/null
getaddrin 33431 root    1w      REG    8,1        0     6919 /root/5.out
getaddrin 33431 root    2w      REG    8,1        0     6919 /root/5.out
getaddrin 33431 root    3u  netlink             0t0 57052903 ROUTE

lpaa23:~# cat /proc/net/netlink 
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903
ffff881fe1d98400 0   0      00000000 0        0        0 2        0        3       
ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883feecf6400 10  0      00000000 0        0        0 2        0        9602    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    

So it looks like we lost an skb or something....

  reply	other threads:[~2015-05-14  4:13 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13  5:30 netlink & rhashtable status Eric Dumazet
2015-05-13  5:40 ` Herbert Xu
2015-05-13  6:15   ` Eric Dumazet
2015-05-13  6:20     ` Herbert Xu
2015-05-13 13:04       ` Eric Dumazet
2015-05-13 16:18         ` Eric Dumazet
2015-05-13 16:35           ` David Miller
2015-05-14  2:55             ` Herbert Xu
2015-05-14  2:53           ` Herbert Xu
2015-05-14  3:17             ` Eric Dumazet
2015-05-14  3:34               ` Herbert Xu
2015-05-14  3:58                 ` Eric Dumazet
2015-05-14  4:13                   ` Eric Dumazet [this message]
2015-05-14  4:16                     ` Herbert Xu
2015-05-14  4:21                       ` Herbert Xu
2015-05-14  4:38                         ` Eric Dumazet
2015-05-14  5:03                           ` Herbert Xu
2015-05-14  5:56                         ` Red Hat INTERNAL-ONLY kernel discussion list <rhkernel-list@redhat.com> Herbert Xu
2015-05-14  5:58                         ` netlink: Disable insertions/removals during rehash Herbert Xu
2015-05-14  6:02                           ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
2015-05-15 16:49                             ` David Miller
2015-05-15 18:01                               ` Eric Dumazet
2015-05-16 16:50                                 ` Eric Dumazet
2015-05-16 20:58                                   ` David Miller
2015-05-15 17:02                             ` David Miller
2015-05-16 12:32                               ` Herbert Xu
2015-05-16 13:40                                 ` [net] netlink: Make autobind rover an atomic_t Herbert Xu
2015-05-16 13:50                                   ` [net] netlink: Reset portid after netlink_insert failure Herbert Xu
2015-05-16 21:09                                     ` David Miller
2015-05-16 21:08                                   ` [net] netlink: Make autobind rover an atomic_t David Miller
2015-05-17  2:45                                     ` [net-next] netlink: Use random autobind rover Herbert Xu
2015-05-18  3:44                                       ` David Miller
2015-05-14 14:37                           ` netlink: Disable insertions/removals during rehash Eric Dumazet
2015-05-15  0:06                             ` Herbert Xu
2015-05-20 23:53                               ` Thomas Graf
2015-05-21  0:31                                 ` Eric Dumazet
2015-05-15 17:02                           ` David Miller
2015-05-16 13:16                             ` Herbert Xu
2015-05-16 21:10                               ` David Miller
2015-06-04 16:27                                 ` Guenter Roeck
2015-06-04 18:59                                   ` David Miller
2015-06-04 20:44                                     ` Eric Dumazet
2015-06-04 20:58                                     ` Guenter Roeck
2015-06-05  3:52                                   ` Herbert Xu
2015-06-05  5:27                                     ` Guenter Roeck
2015-06-26 10:44                         ` netlink & rhashtable status Konstantin Khlebnikov
2015-06-27  7:09                           ` Herbert Xu
2015-05-14  4:17                     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1431576818.27831.36.camel@edumazet-glaptop2.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.