All of lore.kernel.org
 help / color / mirror / Atom feed
* netlink & rhashtable status
@ 2015-05-13  5:30 Eric Dumazet
  2015-05-13  5:40 ` Herbert Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-13  5:30 UTC (permalink / raw)
  To: David Miller, Thomas Graf, Herbert Xu; +Cc: netdev


linux-3.17, 3.18, 3.19 and 4.0.2 have issues with netlink/rhashtable,
making things like getaddrinfo() not working after a while :

socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE) = 3
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, pid=24658, groups=00000000}, [12]) = 0
sendto(3, "\24\0\0\0\26\0\1\3)\320RU\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = -1 ECONNREFUSED (Connection refused)


lpaa23:~# tc qd
Cannot send dump request: Connection refused
lpaa23:~# ip ro ls
Cannot send dump request: Connection refused


Before the problem we have following visible sockets.
 
lpaa23:~# cat /proc/net/netlink 
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff883ff23e7400 0   0      00000000 0        0        0 2        0        3       
ffff883fff0f5800 4   0      00000000 0        0        0 2        0        13314   
ffff881ff0f41800 8   0      00000000 0        0        0 2        0        11308   
ffff883ff0902400 9   0      00000000 0        0        0 2        0        10241   
ffff883ff167c800 10  0      00000000 0        0        0 2        0        9719    
ffff881ff13cfc00 11  0      00000000 0        0        0 2        0        2051    
ffff881fde863800 15  7460   00000001 0        0        0 2        0        509     
ffff883fff055000 15  4294963199 00000001 0        0        0 2        0        30754   
ffff881ff2204c00 15  0      00000000 0        0        0 2        0        5       
ffff881ff13cf400 16  0      00000000 0        0        0 2        0        2054    
ffff881ff0a0f000 20  0      00000000 0        0        0 2        0        2084    


When problem happens we have : (note the first socket disappeared)

lpaa23:~# cat /proc/net/netlink 
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff883fff0f5800 4   0      00000000 0        0        0 2        0        13314   
ffff881ff0f41800 8   0      00000000 0        0        0 2        0        11308   
ffff883ff0902400 9   0      00000000 0        0        0 2        0        10241   
ffff883ff167c800 10  0      00000000 0        0        0 2        0        9719    
ffff881ff13cfc00 11  0      00000000 0        0        0 2        0        2051    
ffff881fde863800 15  7460   00000001 0        0        0 2        0        509     
ffff883fff055000 15  4294963199 00000001 0        0        0 2        0        30754   
ffff881ff2204c00 15  0      00000000 0        0        0 2        0        5       
ffff881ff13cf400 16  0      00000000 0        0        0 2        0        2054    
ffff881ff0a0f000 20  0      00000000 0        0        0 2        0        2084    


We probably need to find out what fixes are needed for stable kernels,
if not already done, sorry if I missed something.

Thanks

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13  5:30 netlink & rhashtable status Eric Dumazet
@ 2015-05-13  5:40 ` Herbert Xu
  2015-05-13  6:15   ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-13  5:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Tue, May 12, 2015 at 10:30:36PM -0700, Eric Dumazet wrote:
> 
> linux-3.17, 3.18, 3.19 and 4.0.2 have issues with netlink/rhashtable,
> making things like getaddrinfo() not working after a while :

Do you use namespaces?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13  5:40 ` Herbert Xu
@ 2015-05-13  6:15   ` Eric Dumazet
  2015-05-13  6:20     ` Herbert Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-13  6:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Wed, 2015-05-13 at 13:40 +0800, Herbert Xu wrote:
> On Tue, May 12, 2015 at 10:30:36PM -0700, Eric Dumazet wrote:
> > 
> > linux-3.17, 3.18, 3.19 and 4.0.2 have issues with netlink/rhashtable,
> > making things like getaddrinfo() not working after a while :
> 
> Do you use namespaces?
> 
> Cheers,

No.

Trick is to start about 200 threads using getaddrinfo()

This was spotted using netperf, but you probably can use something like
this program.

Thanks.



#include <netdb.h>
#include <netinet/in.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>

struct addrinfo *resolve_host(const char *hostname, const char *port,
                              int family)
{
  struct addrinfo hints;
  struct addrinfo *ai;
  int count;
  int error;

  memset(&hints, 0, sizeof(hints));
  hints.ai_family = family;
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_protocol = IPPROTO_TCP;
  hints.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
  count = 0;
  do {
    error = getaddrinfo(hostname, port, &hints, &ai);
    count += 1;
    if (error == EAI_AGAIN) {
      sleep(1);
    }
  } while ((error == EAI_AGAIN) && (count <= 5));

  if (error) {
    printf("error\n");
    return (NULL);
  }

  return (ai);
}

int main(int argc, char **argv)
{
  const char host[] = "127.0.0.1";
  const char *port = NULL;
  const int family = 0;
  unsigned long iterations, i;

  if (argc != 2) {
    printf("Expected one arg.\n");
    return -1;
  }
  iterations = atol(argv[1]);
  for (i = 0; i < iterations; ++i) {
    resolve_host(host, port, family);
  }
  return 0;
}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13  6:15   ` Eric Dumazet
@ 2015-05-13  6:20     ` Herbert Xu
  2015-05-13 13:04       ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-13  6:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Tue, May 12, 2015 at 11:15:40PM -0700, Eric Dumazet wrote:
> 
> Trick is to start about 200 threads using getaddrinfo()

When it loses the kernel socket, is it permanent or intermittent?

I'm trying to figure out whether it's the hashtable reader missing
an entry that's there or whether the hashtable has been corrupted
and an entry is gone forever.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13  6:20     ` Herbert Xu
@ 2015-05-13 13:04       ` Eric Dumazet
  2015-05-13 16:18         ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-13 13:04 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Wed, 2015-05-13 at 14:20 +0800, Herbert Xu wrote:
> On Tue, May 12, 2015 at 11:15:40PM -0700, Eric Dumazet wrote:
> > 
> > Trick is to start about 200 threads using getaddrinfo()
> 
> When it loses the kernel socket, is it permanent or intermittent?
> 
> I'm trying to figure out whether it's the hashtable reader missing
> an entry that's there or whether the hashtable has been corrupted
> and an entry is gone forever.
> 
> Cheers,

This is permanent. We have to reboot the host.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13 13:04       ` Eric Dumazet
@ 2015-05-13 16:18         ` Eric Dumazet
  2015-05-13 16:35           ` David Miller
  2015-05-14  2:53           ` Herbert Xu
  0 siblings, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2015-05-13 16:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Wed, 2015-05-13 at 06:04 -0700, Eric Dumazet wrote:
> On Wed, 2015-05-13 at 14:20 +0800, Herbert Xu wrote:
> > On Tue, May 12, 2015 at 11:15:40PM -0700, Eric Dumazet wrote:
> > > 
> > > Trick is to start about 200 threads using getaddrinfo()
> > 
> > When it loses the kernel socket, is it permanent or intermittent?
> > 
> > I'm trying to figure out whether it's the hashtable reader missing
> > an entry that's there or whether the hashtable has been corrupted
> > and an entry is gone forever.
> > 
> > Cheers,
> 
> This is permanent. We have to reboot the host.
> 

For 4.0.3 I replaced the two rhashtable files by current Linus version,
and problem is gone, so the fix is not in net/netlink

 include/linux/rhashtable.h |   10 
 lib/rhashtable.c           |  582 ++++++++++++-----------------------
 2 files changed, 215 insertions(+), 377 deletions(-)

I did a bisection but ended to 

393619474ec0 rhashtable: Fix read-side crash during rehash

And simply backporting it does not solve the problem

$ git log --no-merges --oneline v4.0.3..master include/linux/rhashtable.h lib/rhashtable.c
1d8dc3d3c8f1 rhashtable: don't attempt to grow when at max_size
a87b9ebf1709 rhashtable: Do not schedule more than one rehash if we can't grow further
e2307ed6cbe7 rhashtable: Schedule async resize when sync realloc fails
49f7b33e63fe rhashtable: provide len to obj_hashfn
6b6f302ceda7 rhashtable: Add rhashtable_free_and_destroy()
b5e2c150ac91 rhashtable: Disable automatic shrinking by default
ac833bddb591 rhashtable: Mark internal/private inline functions as such
299e5c32a37a rhashtable: Use 'unsigned int' consistently
58be8a583d8d rhashtable: Extend RCU read lock into rhashtable_insert_rehash()
27ed44a5d6d8 rhashtable: Add comment on choice of elasticity value
ba7c95ea3870 rhashtable: Fix sleeping inside RCU critical section in walk_stop
ccd57b1bd324 rhashtable: Add immediate rehash during insertion
b9ecfdaa1090 rhashtable: Allow GFP_ATOMIC bucket table allocation
b824478b2145 rhashtable: Add multiple rehash support
18093d1c0d1e rhashtable: Shrink to fit
31ccde2dacea rhashtable: Allow hashfn to be unset
de91b25c8011 rhashtable: Eliminate unnecessary branch in rht_key_hashfn
d88252f9bb74 rhashtable: Add barrier to ensure we see new tables in walker
6626af692692 rhashtable: Fix undeclared EEXIST build error on ia64
dc0ee268d850 rhashtable: Rip out obsolete out-of-line interface
02fd97c3d4a8 rhashtable: Allow hash/comparison functions to be inlined
488fb86ee91d rhashtable: Make rhashtable_init params argument const
a998f712f77e rhashtable: Round up/down min/max_size to ensure we respect limit
e2e21c1c5808 rhashtable: Remove max_shift and min_shift
c2e213cff701 rhashtable: Introduce max_size/min_size
6aebd940840a rhashtable: Remove shift from bucket_table
617011e7d555 rhashtable: Avoid calculating hash again to unlock
db4374f48a6c rhashtable: Annotate RCU locking of walkers
565e86404e4c rhashtable: Fix rhashtable_remove failures
963ecbd41a10 rhashtable: Fix use-after-free in rhashtable_walk_stop
c4db8848af6a rhashtable: Move future_tbl into struct bucket_table
63d512d0cffc rhashtable: Add rehash counter to bucket_table
9d901bc05153 rhashtable: Free bucket tables asynchronously after rehash
5269b53da4d4 rhashtable: Move seed init into bucket_table_alloc
8f2484bdb55d rhashtable: Use SINGLE_DEPTH_NESTING
eddee5ba34eb rhashtable: Fix walker behaviour during rehash
393619474ec0 rhashtable: Fix read-side crash during rehash
a5b6846f9e1a rhashtable: kill ht->shift atomic operations
9497df88ab55 rhashtable: Fix reader/rehash race
ec9f71c59e00 rhashtable: Remove obj_raw_hashfn
cffaa9cb9224 rhashtable: Remove key length argument to key_hashfn
eca849333017 rhashtable: Use head_hashfn instead of obj_raw_hashfn
8d2b18793d16 rhashtable: Move masking back into key_hashfn
84ed82b74dcb rhashtable: Add annotation to nested lock
aa34a6cb0478 rhashtable: Add arbitrary rehash function
988dfbd795cf rhashtable: Move hash_rnd into bucket_table

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13 16:18         ` Eric Dumazet
@ 2015-05-13 16:35           ` David Miller
  2015-05-14  2:55             ` Herbert Xu
  2015-05-14  2:53           ` Herbert Xu
  1 sibling, 1 reply; 48+ messages in thread
From: David Miller @ 2015-05-13 16:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: herbert, tgraf, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 13 May 2015 09:18:04 -0700

> On Wed, 2015-05-13 at 06:04 -0700, Eric Dumazet wrote:
>> On Wed, 2015-05-13 at 14:20 +0800, Herbert Xu wrote:
>> > On Tue, May 12, 2015 at 11:15:40PM -0700, Eric Dumazet wrote:
>> > > 
>> > > Trick is to start about 200 threads using getaddrinfo()
>> > 
>> > When it loses the kernel socket, is it permanent or intermittent?
>> > 
>> > I'm trying to figure out whether it's the hashtable reader missing
>> > an entry that's there or whether the hashtable has been corrupted
>> > and an entry is gone forever.
>> > 
>> > Cheers,
>> 
>> This is permanent. We have to reboot the host.
>> 
> 
> For 4.0.3 I replaced the two rhashtable files by current Linus version,
> and problem is gone, so the fix is not in net/netlink
> 
>  include/linux/rhashtable.h |   10 
>  lib/rhashtable.c           |  582 ++++++++++++-----------------------
>  2 files changed, 215 insertions(+), 377 deletions(-)
> 
> I did a bisection but ended to 
> 
> 393619474ec0 rhashtable: Fix read-side crash during rehash
> 
> And simply backporting it does not solve the problem

Backporting all of the rhashtable bits is going to be really painful
and potentially quite risky.  However, if someone is confident enough,
I'm willing to entertain this idea.

Alternatively, we could consider reverting the rhashtable conversion
of netlink in the interim.  It might be the safest solution for
-stable.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13 16:18         ` Eric Dumazet
  2015-05-13 16:35           ` David Miller
@ 2015-05-14  2:53           ` Herbert Xu
  2015-05-14  3:17             ` Eric Dumazet
  1 sibling, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  2:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Wed, May 13, 2015 at 09:18:04AM -0700, Eric Dumazet wrote:
>
> For 4.0.3 I replaced the two rhashtable files by current Linus version,
> and problem is gone, so the fix is not in net/netlink

Are you sure the same bug happens in 3.17 too? I can certainly see
this happening after the dynamic rehashing code went in.

But in 3.17 it was doing everything under one single lock so I can't
really see how things can go wrong.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-13 16:35           ` David Miller
@ 2015-05-14  2:55             ` Herbert Xu
  0 siblings, 0 replies; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  2:55 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, tgraf, netdev

On Wed, May 13, 2015 at 12:35:20PM -0400, David Miller wrote:
>
> Alternatively, we could consider reverting the rhashtable conversion
> of netlink in the interim.  It might be the safest solution for
> -stable.

The most suspicous change is the dynamic rehashing, i.e., allowing
insertions/removals during a rehash.  But then Eric says that the
same thing happens under 3.17, where every opertion is under a
single mutex.

So I'm at a loss as to what is wrong and if we can't figure it out
then yes I agree that reverting the netlink rhashtable conversion is
probably the safest option.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  2:53           ` Herbert Xu
@ 2015-05-14  3:17             ` Eric Dumazet
  2015-05-14  3:34               ` Herbert Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-14  3:17 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Thu, 2015-05-14 at 10:53 +0800, Herbert Xu wrote:
> On Wed, May 13, 2015 at 09:18:04AM -0700, Eric Dumazet wrote:
> >
> > For 4.0.3 I replaced the two rhashtable files by current Linus version,
> > and problem is gone, so the fix is not in net/netlink
> 
> Are you sure the same bug happens in 3.17 too? I can certainly see
> this happening after the dynamic rehashing code went in.
> 
> But in 3.17 it was doing everything under one single lock so I can't
> really see how things can go wrong.

The initial bug report was on 3.18 for sure.

(Tester had to leave the program run ~8 hours to get the problem, on a 8
vCPU VM)

I can reproduce the bug quite easily (in a few seconds) on 4.0.3, I did
not spent lot of time trying 3.18, but it seems a bit harder.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  3:17             ` Eric Dumazet
@ 2015-05-14  3:34               ` Herbert Xu
  2015-05-14  3:58                 ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  3:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Wed, May 13, 2015 at 08:17:43PM -0700, Eric Dumazet wrote:
>
> The initial bug report was on 3.18 for sure.
> 
> (Tester had to leave the program run ~8 hours to get the problem, on a 8
> vCPU VM)
> 
> I can reproduce the bug quite easily (in a few seconds) on 4.0.3, I did
> not spent lot of time trying 3.18, but it seems a bit harder.

No what I'm asking is on 3.18 was it permanent? I can imagine
there being a lookup bug in 3.18 that triggers during a rehash
but I cannot find any permanent corruption issues.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  3:34               ` Herbert Xu
@ 2015-05-14  3:58                 ` Eric Dumazet
  2015-05-14  4:13                   ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-14  3:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Thu, 2015-05-14 at 11:34 +0800, Herbert Xu wrote:
> On Wed, May 13, 2015 at 08:17:43PM -0700, Eric Dumazet wrote:
> >
> > The initial bug report was on 3.18 for sure.
> > 
> > (Tester had to leave the program run ~8 hours to get the problem, on a 8
> > vCPU VM)
> > 
> > I can reproduce the bug quite easily (in a few seconds) on 4.0.3, I did
> > not spent lot of time trying 3.18, but it seems a bit harder.
> 
> No what I'm asking is on 3.18 was it permanent? I can imagine
> there being a lookup bug in 3.18 that triggers during a rehash
> but I cannot find any permanent corruption issues.

Let me try to reproduce this on 3.18.13. I'll give you an update.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  3:58                 ` Eric Dumazet
@ 2015-05-14  4:13                   ` Eric Dumazet
  2015-05-14  4:16                     ` Herbert Xu
  2015-05-14  4:17                     ` Eric Dumazet
  0 siblings, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2015-05-14  4:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Wed, 2015-05-13 at 20:58 -0700, Eric Dumazet wrote:
> On Thu, 2015-05-14 at 11:34 +0800, Herbert Xu wrote:
> > On Wed, May 13, 2015 at 08:17:43PM -0700, Eric Dumazet wrote:
> > >
> > > The initial bug report was on 3.18 for sure.
> > > 
> > > (Tester had to leave the program run ~8 hours to get the problem, on a 8
> > > vCPU VM)
> > > 
> > > I can reproduce the bug quite easily (in a few seconds) on 4.0.3, I did
> > > not spent lot of time trying 3.18, but it seems a bit harder.
> > 
> > No what I'm asking is on 3.18 was it permanent? I can imagine
> > there being a lookup bug in 3.18 that triggers during a rehash
> > but I cannot find any permanent corruption issues.
> 
> Let me try to reproduce this on 3.18.13. I'll give you an update.

OK I reproduced a hang after few minutes :

Out of my 200 processes, one of them is stuck in the recvmsg() system
call :

lpaa23:~# ps aux|grep addrinfo
root     33416  0.0  0.0   3692   376 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33417  0.0  0.0   3692   376 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33418  0.0  0.0   3744  2108 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33428  0.0  0.0   3696  1752 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
root     33431  0.0  0.0   1172     4 pts/0    S+   21:09   0:00 ./getaddrinfo 500
root     34102  0.0  0.0   2600  1312 pts/1    S+   21:11   0:00 grep addrinfo
root     40236  0.0  0.0   3692  2920 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
lpaa23:~# strace -p 33431
Process 33431 attached
recvmsg(3, ^CProcess 33431 detached
 <detached ...>

lpaa23:~# lsof -p 33431
COMMAND     PID USER   FD      TYPE DEVICE SIZE/OFF     NODE NAME
getaddrin 33431 root  cwd       DIR    8,1    12288    16394 /root
getaddrin 33431 root  rtd       DIR    8,1     4096        2 /
getaddrin 33431 root  txt       REG    8,1   978477       87 /root/getaddrinfo
getaddrin 33431 root    0r      CHR    1,3      0t0     2521 /dev/null
getaddrin 33431 root    1w      REG    8,1        0     6919 /root/5.out
getaddrin 33431 root    2w      REG    8,1        0     6919 /root/5.out
getaddrin 33431 root    3u  netlink             0t0 57052903 ROUTE

lpaa23:~# cat /proc/net/netlink 
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903
ffff881fe1d98400 0   0      00000000 0        0        0 2        0        3       
ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883feecf6400 10  0      00000000 0        0        0 2        0        9602    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    

So it looks like we lost an skb or something....

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  4:13                   ` Eric Dumazet
@ 2015-05-14  4:16                     ` Herbert Xu
  2015-05-14  4:21                       ` Herbert Xu
  2015-05-14  4:17                     ` Eric Dumazet
  1 sibling, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  4:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Wed, May 13, 2015 at 09:13:38PM -0700, Eric Dumazet wrote:
>
> So it looks like we lost an skb or something....

OK that sounds reasonable.  So my plan is to disable dynamic
rehashing and then hunt down this lookup bug.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  4:13                   ` Eric Dumazet
  2015-05-14  4:16                     ` Herbert Xu
@ 2015-05-14  4:17                     ` Eric Dumazet
  1 sibling, 0 replies; 48+ messages in thread
From: Eric Dumazet @ 2015-05-14  4:17 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Wed, 2015-05-13 at 21:13 -0700, Eric Dumazet wrote:
> On Wed, 2015-05-13 at 20:58 -0700, Eric Dumazet wrote:
> > On Thu, 2015-05-14 at 11:34 +0800, Herbert Xu wrote:
> > > On Wed, May 13, 2015 at 08:17:43PM -0700, Eric Dumazet wrote:
> > > >
> > > > The initial bug report was on 3.18 for sure.
> > > > 
> > > > (Tester had to leave the program run ~8 hours to get the problem, on a 8
> > > > vCPU VM)
> > > > 
> > > > I can reproduce the bug quite easily (in a few seconds) on 4.0.3, I did
> > > > not spent lot of time trying 3.18, but it seems a bit harder.
> > > 
> > > No what I'm asking is on 3.18 was it permanent? I can imagine
> > > there being a lookup bug in 3.18 that triggers during a rehash
> > > but I cannot find any permanent corruption issues.
> > 
> > Let me try to reproduce this on 3.18.13. I'll give you an update.
> 
> OK I reproduced a hang after few minutes :
> 
> Out of my 200 processes, one of them is stuck in the recvmsg() system
> call :
> 
> lpaa23:~# ps aux|grep addrinfo
> root     33416  0.0  0.0   3692   376 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
> root     33417  0.0  0.0   3692   376 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
> root     33418  0.0  0.0   3744  2108 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
> root     33428  0.0  0.0   3696  1752 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
> root     33431  0.0  0.0   1172     4 pts/0    S+   21:09   0:00 ./getaddrinfo 500
> root     34102  0.0  0.0   2600  1312 pts/1    S+   21:11   0:00 grep addrinfo
> root     40236  0.0  0.0   3692  2920 pts/0    S+   21:09   0:00 /bin/bash ./getaddrinfo_many.sh
> lpaa23:~# strace -p 33431
> Process 33431 attached
> recvmsg(3, ^CProcess 33431 detached
>  <detached ...>
> 
> lpaa23:~# lsof -p 33431
> COMMAND     PID USER   FD      TYPE DEVICE SIZE/OFF     NODE NAME
> getaddrin 33431 root  cwd       DIR    8,1    12288    16394 /root
> getaddrin 33431 root  rtd       DIR    8,1     4096        2 /
> getaddrin 33431 root  txt       REG    8,1   978477       87 /root/getaddrinfo
> getaddrin 33431 root    0r      CHR    1,3      0t0     2521 /dev/null
> getaddrin 33431 root    1w      REG    8,1        0     6919 /root/5.out
> getaddrin 33431 root    2w      REG    8,1        0     6919 /root/5.out
> getaddrin 33431 root    3u  netlink             0t0 57052903 ROUTE
> 
> lpaa23:~# cat /proc/net/netlink 
> sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
> ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903
> ffff881fe1d98400 0   0      00000000 0        0        0 2        0        3       
> ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903
> ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
> ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
> ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
> ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
> ffff883feecf6400 10  0      00000000 0        0        0 2        0        9602    
> ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
> ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
> ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    
> ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    
> 
> So it looks like we lost an skb or something....
> 

Ah, the user socket is listed twice in /proc/net/netlink !

It is permanent until I kill task :

lpaa23:~# grep ffff881f6d8b8000 /proc/net/netlink
ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903
ffff881f6d8b8000 0   33431  00000000 0        0        0 2        0        57052903


After kill, I got another hang after 20 seconds.

And we can again see a socket twice in /proc/net/netlink


lpaa23:~# cat /proc/net/netlink 
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff881fcac36c00 0   47169  00000000 0        0        0 2        0        59270869
ffff881fe1d98400 0   0      00000000 0        0        0 2        0        3       
ffff881fcac36c00 0   47169  00000000 0        0        0 2        0        59270869
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883feecf6400 10  0      00000000 0        0        0 2        0        9602    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  4:16                     ` Herbert Xu
@ 2015-05-14  4:21                       ` Herbert Xu
  2015-05-14  4:38                         ` Eric Dumazet
                                           ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  4:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Thu, May 14, 2015 at 12:16:28PM +0800, Herbert Xu wrote:
> On Wed, May 13, 2015 at 09:13:38PM -0700, Eric Dumazet wrote:
> >
> > So it looks like we lost an skb or something....
> 
> OK that sounds reasonable.  So my plan is to disable dynamic
> rehashing and then hunt down this lookup bug.

Oh wait this isn't even a lookup failure since that should return
ECONNREFUSED.  Could it be that this hang is a separate bug that's
not related to rhashtable?

If that was the case then we simply need to get rid of dynamic
rehashing.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  4:21                       ` Herbert Xu
@ 2015-05-14  4:38                         ` Eric Dumazet
  2015-05-14  5:03                           ` Herbert Xu
  2015-05-14  5:56                         ` Red Hat INTERNAL-ONLY kernel discussion list <rhkernel-list@redhat.com> Herbert Xu
                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-14  4:38 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Thu, 2015-05-14 at 12:21 +0800, Herbert Xu wrote:
> On Thu, May 14, 2015 at 12:16:28PM +0800, Herbert Xu wrote:
> > On Wed, May 13, 2015 at 09:13:38PM -0700, Eric Dumazet wrote:
> > >
> > > So it looks like we lost an skb or something....
> > 
> > OK that sounds reasonable.  So my plan is to disable dynamic
> > rehashing and then hunt down this lookup bug.
> 
> Oh wait this isn't even a lookup failure since that should return
> ECONNREFUSED.  Could it be that this hang is a separate bug that's
> not related to rhashtable?
> 
> If that was the case then we simply need to get rid of dynamic
> rehashing.

Well, /proc/net/netlink consistently show same socket twice when I get a
hang :

At this moment I have more than one process blocked :

lpaa23:~# ps aux|grep addrinfo
root     10597  0.0  0.0   3696   376 pts/0    S    21:20   0:00 /bin/bash ./getaddrinfo_many.sh
root     10601  0.0  0.0   1172     4 pts/0    S    21:20   0:00 ./getaddrinfo 500
root     11449  0.0  0.0   3700   384 pts/0    S    21:17   0:00 /bin/bash ./getaddrinfo_many.sh
root     11454  0.0  0.0   1172     4 pts/0    S    21:17   0:00 ./getaddrinfo 500
root     21424  0.0  0.0   3696   376 pts/0    S+   21:30   0:00 /bin/bash ./getaddrinfo_many.sh
root     21425  0.0  0.0   3696   376 pts/0    S+   21:30   0:00 /bin/bash ./getaddrinfo_many.sh
root     21426  0.0  0.0   3744  2236 pts/0    S+   21:30   0:00 /bin/bash ./getaddrinfo_many.sh
root     21470  0.0  0.0   3704   384 pts/0    S+   21:30   0:00 /bin/bash ./getaddrinfo_many.sh
root     21476  0.0  0.0   1172     4 pts/0    S+   21:30   0:00 ./getaddrinfo 500
root     22241  0.0  0.0   2604  1280 pts/1    S+   21:36   0:00 grep addrinfo
root     37231  0.0  0.0   3696   376 pts/0    S    21:19   0:00 /bin/bash ./getaddrinfo_many.sh
root     37235  0.0  0.0   1172     4 pts/0    S    21:19   0:00 ./getaddrinfo 500
root     48499  0.0  0.0   3696  2804 pts/0    S+   21:28   0:00 /bin/bash ./getaddrinfo_many.sh

And only one of the socket is listed twice (ffff881f6eceb000)

Apparently this is the one _after_ kernel socket.

Does it ring a bell ?

lpaa23:~# cat /proc/net/netlink 
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff881f6eceb000 0   11454  00000000 0        0        0 2        0        61386380
ffff881fe08aa400 0   10601  00000000 0        0        0 2        0        69235237
ffff881fd3c80c00 0   37235  00000000 0        0        0 2        0        65612209
ffff881fd5356400 0   21476  00000000 0        0        0 2        0        116743320
ffff881fe1d98400 0   0      00000000 0        0        0 2        0        3       
ffff881f6eceb000 0   11454  00000000 0        0        0 2        0        61386380     << double >>
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff881fe1066400 8   0      00000000 0        0        0 2        0        13355   
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883fe1204800 9   0      00000000 0        0        0 2        0        2056    
ffff883feecf6400 10  0      00000000 0        0        0 2        0        9602    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff883fe1208000 11  0      00000000 0        0        0 2        0        2051    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    
ffff881fe0f4ac00 16  0      00000000 0        0        0 2        0        2054    

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  4:38                         ` Eric Dumazet
@ 2015-05-14  5:03                           ` Herbert Xu
  0 siblings, 0 replies; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  5:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Wed, May 13, 2015 at 09:38:41PM -0700, Eric Dumazet wrote:
>
> And only one of the socket is listed twice (ffff881f6eceb000)
> 
> Apparently this is the one _after_ kernel socket.
> 
> Does it ring a bell ?

No that's just a walker bug.  In netlink_seq_next when we come
around the loop after ++i, we fail to refresh the ht pointer so
we walk the previous table again.

This bug was introduced in 3.18:

commit 78fd1d0ab072d4d9b5f0b7c14a1516665170b565
Author: Thomas Graf <tgraf@suug.ch>
Date:   Tue Oct 21 22:05:38 2014 +0200

    netlink: Re-add locking to netlink_lookup() and seq walker

This should have zero impact on functionality though.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Red Hat INTERNAL-ONLY kernel discussion list <rhkernel-list@redhat.com>
  2015-05-14  4:21                       ` Herbert Xu
  2015-05-14  4:38                         ` Eric Dumazet
@ 2015-05-14  5:56                         ` Herbert Xu
  2015-05-14  5:58                         ` netlink: Disable insertions/removals during rehash Herbert Xu
  2015-06-26 10:44                         ` netlink & rhashtable status Konstantin Khlebnikov
  3 siblings, 0 replies; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  5:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev, Ying Xue

On Thu, May 14, 2015 at 12:21:51PM +0800, Herbert Xu wrote:
> 
> If that was the case then we simply need to get rid of dynamic
> rehashing.

Here is a totally untested patch against 4.0.

---8<---
The current rhashtable rehash code is buggy and can't deal with
parallel insertions/removals without corrupting the hash table.

This patch disables it by partially reverting
c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
nl_sk_hash_lock").

This patch also removes a bogus socket lock introduced by that
very same patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 05919bf..322fe03 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1052,7 +1052,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	struct netlink_table *table = &nl_table[sk->sk_protocol];
 	int err;
 
-	lock_sock(sk);
+	mutex_lock(&table->hash.mutex);
 
 	err = -EBUSY;
 	if (nlk_sk(sk)->portid)
@@ -1073,7 +1073,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	}
 
 err:
-	release_sock(sk);
+	mutex_unlock(&table->hash.mutex);
 	return err;
 }
 
@@ -1082,10 +1082,12 @@ static void netlink_remove(struct sock *sk)
 	struct netlink_table *table;
 
 	table = &nl_table[sk->sk_protocol];
+	mutex_lock(&table->hash.mutex);
 	if (rhashtable_remove(&table->hash, &nlk_sk(sk)->node)) {
 		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
+	mutex_unlock(&table->hash.mutex);
 
 	netlink_table_grab();
 	if (nlk_sk(sk)->subscriptions) {

-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* netlink: Disable insertions/removals during rehash
  2015-05-14  4:21                       ` Herbert Xu
  2015-05-14  4:38                         ` Eric Dumazet
  2015-05-14  5:56                         ` Red Hat INTERNAL-ONLY kernel discussion list <rhkernel-list@redhat.com> Herbert Xu
@ 2015-05-14  5:58                         ` Herbert Xu
  2015-05-14  6:02                           ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
                                             ` (2 more replies)
  2015-06-26 10:44                         ` netlink & rhashtable status Konstantin Khlebnikov
  3 siblings, 3 replies; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  5:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

Resend with fixed subject.

On Thu, May 14, 2015 at 12:21:51PM +0800, Herbert Xu wrote:
> 
> If that was the case then we simply need to get rid of dynamic
> rehashing.

Here is a totally untested patch against 4.0.

---8<---
The current rhashtable rehash code is buggy and can't deal with
parallel insertions/removals without corrupting the hash table.

This patch disables it by partially reverting
c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
nl_sk_hash_lock").

This patch also removes a bogus socket lock introduced by that
very same patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 05919bf..322fe03 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1052,7 +1052,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	struct netlink_table *table = &nl_table[sk->sk_protocol];
 	int err;
 
-	lock_sock(sk);
+	mutex_lock(&table->hash.mutex);
 
 	err = -EBUSY;
 	if (nlk_sk(sk)->portid)
@@ -1073,7 +1073,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	}
 
 err:
-	release_sock(sk);
+	mutex_unlock(&table->hash.mutex);
 	return err;
 }
 
@@ -1082,10 +1082,12 @@ static void netlink_remove(struct sock *sk)
 	struct netlink_table *table;
 
 	table = &nl_table[sk->sk_protocol];
+	mutex_lock(&table->hash.mutex);
 	if (rhashtable_remove(&table->hash, &nlk_sk(sk)->node)) {
 		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
+	mutex_unlock(&table->hash.mutex);
 
 	netlink_table_grab();
 	if (nlk_sk(sk)->subscriptions) {

-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* netlink: Kill bogus lock_sock in netlink_insert
  2015-05-14  5:58                         ` netlink: Disable insertions/removals during rehash Herbert Xu
@ 2015-05-14  6:02                           ` Herbert Xu
  2015-05-15 16:49                             ` David Miller
  2015-05-15 17:02                             ` David Miller
  2015-05-14 14:37                           ` netlink: Disable insertions/removals during rehash Eric Dumazet
  2015-05-15 17:02                           ` David Miller
  2 siblings, 2 replies; 48+ messages in thread
From: Herbert Xu @ 2015-05-14  6:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev, Ying Xue

On Thu, May 14, 2015 at 01:58:24PM +0800, Herbert Xu wrote:
> 
> This patch also removes a bogus socket lock introduced by that
> very same patch.

This bogus socket lock is still there in the current tree.

---8<---
The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink:
eliminate nl_sk_hash_lock") added a lock_sock to netlink_insert
with no justifications whatsoever.

This patch kills it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ec4adbd..9460561 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1063,8 +1063,6 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	struct netlink_table *table = &nl_table[sk->sk_protocol];
 	int err;
 
-	lock_sock(sk);
-
 	err = -EBUSY;
 	if (nlk_sk(sk)->portid)
 		goto err;
@@ -1085,7 +1083,6 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	}
 
 err:
-	release_sock(sk);
 	return err;
 }
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-14  5:58                         ` netlink: Disable insertions/removals during rehash Herbert Xu
  2015-05-14  6:02                           ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
@ 2015-05-14 14:37                           ` Eric Dumazet
  2015-05-15  0:06                             ` Herbert Xu
  2015-05-15 17:02                           ` David Miller
  2 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-14 14:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Thomas Graf, netdev

On Thu, 2015-05-14 at 13:58 +0800, Herbert Xu wrote:
> Resend with fixed subject.
> 
> On Thu, May 14, 2015 at 12:21:51PM +0800, Herbert Xu wrote:
> > 
> > If that was the case then we simply need to get rid of dynamic
> > rehashing.
> 
> Here is a totally untested patch against 4.0.

This solves the corruption thanks Herbert.

But wasn't rhashtable meant to be faster ? ;)

Before rhashtable (3.15) :

$ ./getaddrinfo_many.sh 
TPS: 143884.892086  Seconds Elapsed: 1  Current time: Thu May 14 07:36:09 PDT 2015
TPS: 144300.1443  Seconds Elapsed: 2  Current time: Thu May 14 07:36:10 PDT 2015
TPS: 143884.892086  Seconds Elapsed: 2  Current time: Thu May 14 07:36:10 PDT 2015
TPS: 143472.022956  Seconds Elapsed: 3  Current time: Thu May 14 07:36:11 PDT 2015
TPS: 144300.1443  Seconds Elapsed: 4  Current time: Thu May 14 07:36:12 PDT 2015
TPS: 144300.1443  Seconds Elapsed: 4  Current time: Thu May 14 07:36:12 PDT 2015
TPS: 144508.67052  Seconds Elapsed: 5  Current time: Thu May 14 07:36:13 PDT 2015
TPS: 144300.1443  Seconds Elapsed: 6  Current time: Thu May 14 07:36:14 PDT 2015
TPS: 143884.892086  Seconds Elapsed: 7  Current time: Thu May 14 07:36:15 PDT 2015
TPS: 143884.892086  Seconds Elapsed: 7  Current time: Thu May 14 07:36:15 PDT 2015
TPS: 143884.892086  Seconds Elapsed: 8  Current time: Thu May 14 07:36:16 PDT 2015
TPS: 144300.1443  Seconds Elapsed: 9  Current time: Thu May 14 07:36:17 PDT 2015
TPS: 144092.21902  Seconds Elapsed: 9  Current time: Thu May 14 07:36:17 PDT 2015

With rhashtable (4.0.3 + your fix)

$ ./getaddrinfo_many.sh 
TPS: 80450.5229284  Seconds Elapsed: 1  Current time: Thu May 14 06:51:39 PDT 2015
TPS: 85106.3829787  Seconds Elapsed: 2  Current time: Thu May 14 06:51:40 PDT 2015
TPS: 85763.2933105  Seconds Elapsed: 4  Current time: Thu May 14 06:51:42 PDT 2015
TPS: 84530.8537616  Seconds Elapsed: 5  Current time: Thu May 14 06:51:43 PDT 2015
TPS: 85689.8029135  Seconds Elapsed: 6  Current time: Thu May 14 06:51:44 PDT 2015
TPS: 82712.9859388  Seconds Elapsed: 7  Current time: Thu May 14 06:51:45 PDT 2015
TPS: 83472.4540902  Seconds Elapsed: 8  Current time: Thu May 14 06:51:46 PDT 2015
TPS: 86505.1903114  Seconds Elapsed: 10  Current time: Thu May 14 06:51:48 PDT 2015

Current tree is still slower than 3.15 :(

$ ./getaddrinfo_many.sh 
TPS: 123152.70936  Seconds Elapsed: 0  Current time: Thu May 14 07:36:52 PDT 2015
TPS: 123304.562269  Seconds Elapsed: 1  Current time: Thu May 14 07:36:53 PDT 2015
TPS: 123915.737299  Seconds Elapsed: 2  Current time: Thu May 14 07:36:54 PDT 2015
TPS: 123456.790123  Seconds Elapsed: 3  Current time: Thu May 14 07:36:55 PDT 2015
TPS: 124843.945069  Seconds Elapsed: 4  Current time: Thu May 14 07:36:56 PDT 2015
TPS: 122850.12285  Seconds Elapsed: 4  Current time: Thu May 14 07:36:56 PDT 2015
TPS: 123001.230012  Seconds Elapsed: 5  Current time: Thu May 14 07:36:57 PDT 2015
TPS: 123001.230012  Seconds Elapsed: 6  Current time: Thu May 14 07:36:58 PDT 2015
TPS: 122549.019608  Seconds Elapsed: 7  Current time: Thu May 14 07:36:59 PDT 2015
TPS: 123762.376238  Seconds Elapsed: 8  Current time: Thu May 14 07:37:00 PDT 2015
TPS: 123304.562269  Seconds Elapsed: 9  Current time: Thu May 14 07:37:01 PDT 2015
TPS: 123152.70936  Seconds Elapsed: 9  Current time: Thu May 14 07:37:01 PDT 2015
TPS: 123001.230012  Seconds Elapsed: 10  Current time: Thu May 14 07:37:02 PDT 2015

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-14 14:37                           ` netlink: Disable insertions/removals during rehash Eric Dumazet
@ 2015-05-15  0:06                             ` Herbert Xu
  2015-05-20 23:53                               ` Thomas Graf
  0 siblings, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-15  0:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On Thu, May 14, 2015 at 07:37:56AM -0700, Eric Dumazet wrote:
>
> This solves the corruption thanks Herbert.

Great.

> But wasn't rhashtable meant to be faster ? ;)

Is it, that's news to me :)

> Current tree is still slower than 3.15 :(
> 
> $ ./getaddrinfo_many.sh 
> TPS: 123152.70936  Seconds Elapsed: 0  Current time: Thu May 14 07:36:52 PDT 2015
> TPS: 123304.562269  Seconds Elapsed: 1  Current time: Thu May 14 07:36:53 PDT 2015
> TPS: 123915.737299  Seconds Elapsed: 2  Current time: Thu May 14 07:36:54 PDT 2015
> TPS: 123456.790123  Seconds Elapsed: 3  Current time: Thu May 14 07:36:55 PDT 2015
> TPS: 124843.945069  Seconds Elapsed: 4  Current time: Thu May 14 07:36:56 PDT 2015
> TPS: 122850.12285  Seconds Elapsed: 4  Current time: Thu May 14 07:36:56 PDT 2015
> TPS: 123001.230012  Seconds Elapsed: 5  Current time: Thu May 14 07:36:57 PDT 2015
> TPS: 123001.230012  Seconds Elapsed: 6  Current time: Thu May 14 07:36:58 PDT 2015
> TPS: 122549.019608  Seconds Elapsed: 7  Current time: Thu May 14 07:36:59 PDT 2015
> TPS: 123762.376238  Seconds Elapsed: 8  Current time: Thu May 14 07:37:00 PDT 2015
> TPS: 123304.562269  Seconds Elapsed: 9  Current time: Thu May 14 07:37:01 PDT 2015
> TPS: 123152.70936  Seconds Elapsed: 9  Current time: Thu May 14 07:37:01 PDT 2015
> TPS: 123001.230012  Seconds Elapsed: 10  Current time: Thu May 14 07:37:02 PDT 2015

OK I'll try to see if we can make this faster.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Kill bogus lock_sock in netlink_insert
  2015-05-14  6:02                           ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
@ 2015-05-15 16:49                             ` David Miller
  2015-05-15 18:01                               ` Eric Dumazet
  2015-05-15 17:02                             ` David Miller
  1 sibling, 1 reply; 48+ messages in thread
From: David Miller @ 2015-05-15 16:49 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, tgraf, netdev, ying.xue

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 14 May 2015 14:02:30 +0800

> On Thu, May 14, 2015 at 01:58:24PM +0800, Herbert Xu wrote:
>> 
>> This patch also removes a bogus socket lock introduced by that
>> very same patch.
> 
> This bogus socket lock is still there in the current tree.
> 
> ---8<---
> The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink:
> eliminate nl_sk_hash_lock") added a lock_sock to netlink_insert
> with no justifications whatsoever.
> 
> This patch kills it.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Eric, does this improve your benchmark?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Kill bogus lock_sock in netlink_insert
  2015-05-14  6:02                           ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
  2015-05-15 16:49                             ` David Miller
@ 2015-05-15 17:02                             ` David Miller
  2015-05-16 12:32                               ` Herbert Xu
  1 sibling, 1 reply; 48+ messages in thread
From: David Miller @ 2015-05-15 17:02 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, tgraf, netdev, ying.xue

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 14 May 2015 14:02:30 +0800

> The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink:
> eliminate nl_sk_hash_lock") added a lock_sock to netlink_insert
> with no justifications whatsoever.
> 
> This patch kills it.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Actually, Herbert, I think this lock_sock() is needed.

Otherwise nothing protects nlk_sk(sk)->portid, upon which we
perform a non-atomic test-and-set operation here.

If you remove the lock_sock(), two parallel bind/inserts are
possible on the same socket, potentially resulting in socket
state corruption.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-14  5:58                         ` netlink: Disable insertions/removals during rehash Herbert Xu
  2015-05-14  6:02                           ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
  2015-05-14 14:37                           ` netlink: Disable insertions/removals during rehash Eric Dumazet
@ 2015-05-15 17:02                           ` David Miller
  2015-05-16 13:16                             ` Herbert Xu
  2 siblings, 1 reply; 48+ messages in thread
From: David Miller @ 2015-05-15 17:02 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, tgraf, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 14 May 2015 13:58:24 +0800

> The current rhashtable rehash code is buggy and can't deal with
> parallel insertions/removals without corrupting the hash table.
> 
> This patch disables it by partially reverting
> c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> nl_sk_hash_lock").
> 
> This patch also removes a bogus socket lock introduced by that
> very same patch.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Herbert, if you agree with me in the other thread that the lock_sock()
or something like it has to remain, you'll need to respin this.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Kill bogus lock_sock in netlink_insert
  2015-05-15 16:49                             ` David Miller
@ 2015-05-15 18:01                               ` Eric Dumazet
  2015-05-16 16:50                                 ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-15 18:01 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, tgraf, netdev, ying.xue

On Fri, 2015-05-15 at 12:49 -0400, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Thu, 14 May 2015 14:02:30 +0800
> 
> > On Thu, May 14, 2015 at 01:58:24PM +0800, Herbert Xu wrote:
> >> 
> >> This patch also removes a bogus socket lock introduced by that
> >> very same patch.
> > 
> > This bogus socket lock is still there in the current tree.
> > 
> > ---8<---
> > The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink:
> > eliminate nl_sk_hash_lock") added a lock_sock to netlink_insert
> > with no justifications whatsoever.
> > 
> > This patch kills it.
> > 
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Eric, does this improve your benchmark?

I don't think so, but I can test this in about one hour.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Kill bogus lock_sock in netlink_insert
  2015-05-15 17:02                             ` David Miller
@ 2015-05-16 12:32                               ` Herbert Xu
  2015-05-16 13:40                                 ` [net] netlink: Make autobind rover an atomic_t Herbert Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-16 12:32 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, tgraf, netdev, ying.xue

On Fri, May 15, 2015 at 01:02:19PM -0400, David Miller wrote:
>
> Actually, Herbert, I think this lock_sock() is needed.
> 
> Otherwise nothing protects nlk_sk(sk)->portid, upon which we
> perform a non-atomic test-and-set operation here.
> 
> If you remove the lock_sock(), two parallel bind/inserts are
> possible on the same socket, potentially resulting in socket
> state corruption.

You're quite right.  I forgot about that case.

However, the code as is still buggy because it fails to reset
portid when our auto-allocated portid collides with somebody
else, resulting in a bogus bind failure.

I'll fix that up too in a new patch.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-15 17:02                           ` David Miller
@ 2015-05-16 13:16                             ` Herbert Xu
  2015-05-16 21:10                               ` David Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-16 13:16 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, tgraf, netdev

On Fri, May 15, 2015 at 01:02:57PM -0400, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Thu, 14 May 2015 13:58:24 +0800
> 
> > The current rhashtable rehash code is buggy and can't deal with
> > parallel insertions/removals without corrupting the hash table.
> > 
> > This patch disables it by partially reverting
> > c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> > nl_sk_hash_lock").
> > 
> > This patch also removes a bogus socket lock introduced by that
> > very same patch.
> > 
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Herbert, if you agree with me in the other thread that the lock_sock()
> or something like it has to remain, you'll need to respin this.

Actually I think this one is OK because I'm replacing it with the
hash table mutex which is just like the previous global lock except
that it is per-family.  As you cannot change the family on a netlink
socket this should be good enough.

But the changelog message is wrong so here is an updated version.

Cheers,

---8<---
The current rhashtable rehash code is buggy and can't deal with
parallel insertions/removals without corrupting the hash table.

This patch disables it by partially reverting
c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
nl_sk_hash_lock").

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 05919bf..322fe03 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1052,7 +1052,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	struct netlink_table *table = &nl_table[sk->sk_protocol];
 	int err;
 
-	lock_sock(sk);
+	mutex_lock(&table->hash.mutex);
 
 	err = -EBUSY;
 	if (nlk_sk(sk)->portid)
@@ -1073,7 +1073,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	}
 
 err:
-	release_sock(sk);
+	mutex_unlock(&table->hash.mutex);
 	return err;
 }
 
@@ -1082,10 +1082,12 @@ static void netlink_remove(struct sock *sk)
 	struct netlink_table *table;
 
 	table = &nl_table[sk->sk_protocol];
+	mutex_lock(&table->hash.mutex);
 	if (rhashtable_remove(&table->hash, &nlk_sk(sk)->node)) {
 		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
+	mutex_unlock(&table->hash.mutex);
 
 	netlink_table_grab();
 	if (nlk_sk(sk)->subscriptions) {

-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [net] netlink: Make autobind rover an atomic_t
  2015-05-16 12:32                               ` Herbert Xu
@ 2015-05-16 13:40                                 ` Herbert Xu
  2015-05-16 13:50                                   ` [net] netlink: Reset portid after netlink_insert failure Herbert Xu
  2015-05-16 21:08                                   ` [net] netlink: Make autobind rover an atomic_t David Miller
  0 siblings, 2 replies; 48+ messages in thread
From: Herbert Xu @ 2015-05-16 13:40 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, tgraf, netdev, ying.xue

The commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7 ("netlink:
Lockless lookup with RCU grace period in socket release") removed
the locks around the autobind rover without making the rover itself
safe for use by multiple threads.

This patch converts rover to an atomic_t to make it at least
somewhat safe to use locklessly.  The tricky bit is when the
rover wraps around.  This patch simply deals with it by blindly
doing an atomic_set.  So if many threads encounter the wraparound
simultaneously then they'll all step on each other's toes and
all try to bind to -4097.  But this should eventually sort itself
out as they loop around and try the atomic_dec_return after the
last thread does an atomic_set.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ec4adbd..6ffce5b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1292,24 +1292,27 @@ static int netlink_release(struct socket *sock)
 static int netlink_autobind(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
+	struct sock *sk2;
 	struct net *net = sock_net(sk);
 	struct netlink_table *table = &nl_table[sk->sk_protocol];
 	s32 portid = task_tgid_vnr(current);
 	int err;
-	static s32 rover = -4097;
+	static atomic_t rover = ATOMIC_INIT(-4096);
 
 retry:
 	cond_resched();
 	rcu_read_lock();
-	if (__netlink_lookup(table, portid, net)) {
+	sk2 = __netlink_lookup(table, portid, net);
+	rcu_read_unlock();
+	if (sk2) {
 		/* Bind collision, search negative portid values. */
-		portid = rover--;
-		if (rover > -4097)
-			rover = -4097;
-		rcu_read_unlock();
+		portid = atomic_dec_return(&rover);
+		if (unlikely(portid > -4097)) {
+			atomic_set(&rover, -4097);
+			portid = -4097;
+		}
 		goto retry;
 	}
-	rcu_read_unlock();
 
 	err = netlink_insert(sk, portid);
 	if (err == -EADDRINUSE)
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [net] netlink: Reset portid after netlink_insert failure
  2015-05-16 13:40                                 ` [net] netlink: Make autobind rover an atomic_t Herbert Xu
@ 2015-05-16 13:50                                   ` Herbert Xu
  2015-05-16 21:09                                     ` David Miller
  2015-05-16 21:08                                   ` [net] netlink: Make autobind rover an atomic_t David Miller
  1 sibling, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-16 13:50 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, tgraf, netdev, ying.xue

The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink:
eliminate nl_sk_hash_lock") breaks the autobind retry mechanism
because it doesn't reset portid after a failed netlink_insert.

This means that should autobind fail the first time around, then
the socket will be stuck in limbo as it can never be bound again
since it already has a non-zero portid.

Fixes: c5adde9468b0 ("netlink: eliminate nl_sk_hash_lock") 
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6ffce5b..d2a4438 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1081,6 +1081,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
 	if (err) {
 		if (err == -EEXIST)
 			err = -EADDRINUSE;
+		nlk_sk(sk)->portid = 0;
 		sock_put(sk);
 	}
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: netlink: Kill bogus lock_sock in netlink_insert
  2015-05-15 18:01                               ` Eric Dumazet
@ 2015-05-16 16:50                                 ` Eric Dumazet
  2015-05-16 20:58                                   ` David Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2015-05-16 16:50 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, tgraf, netdev, ying.xue

On Fri, 2015-05-15 at 11:01 -0700, Eric Dumazet wrote:
> On Fri, 2015-05-15 at 12:49 -0400, David Miller wrote:
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Thu, 14 May 2015 14:02:30 +0800
> > 
> > > On Thu, May 14, 2015 at 01:58:24PM +0800, Herbert Xu wrote:
> > >> 
> > >> This patch also removes a bogus socket lock introduced by that
> > >> very same patch.
> > > 
> > > This bogus socket lock is still there in the current tree.
> > > 
> > > ---8<---
> > > The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink:
> > > eliminate nl_sk_hash_lock") added a lock_sock to netlink_insert
> > > with no justifications whatsoever.
> > > 
> > > This patch kills it.
> > > 
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> > 
> > Eric, does this improve your benchmark?
> 
> I don't think so, but I can test this in about one hour.

Maybe the difference between 3.15 and 4.1 is not coming from
rhashtable/netlink but other core stuff, like mutex code.

The workload is largely dominated by mutex acquisition anyway.

    88.42%  [kernel]          [k] osq_lock                      
     1.91%  [kernel]          [k] mutex_spin_on_owner.isra.4    
     0.46%  [kernel]          [k] mutex_optimistic_spin         
     0.33%  [kernel]          [k] _raw_spin_lock                
     0.32%  [kernel]          [k] inet_dump_ifaddr              
     0.31%  [kernel]          [k] inet6_dump_addr               
     0.23%  [kernel]          [k] mutex_unlock                  
     0.23%  [kernel]          [k] _raw_read_lock_bh             
     0.21%  [kernel]          [k] _raw_spin_lock_irqsave        
     0.20%  perf              [.] 0x0000000000046273            

-  88.49%      getaddrinfo  [kernel.kallsyms]    [k] osq_lock                                                                                                         ◆
   - osq_lock                                                                                                                                                         ▒
      - 100.00% mutex_optimistic_spin                                                                                                                                 ▒
           __mutex_lock_slowpath                                                                                                                                      ▒
         - mutex_lock                                                                                                                                                 ▒
            - 45.48% netlink_dump                                                                                                                                     ▒
               + 87.27% netlink_recvmsg                                                                                                                               ▒
               + 12.73% __netlink_dump_start                                                                                                                          ▒
            - 38.93% rtnl_lock                                                                                                                                        ▒
               + 52.52% rtnetlink_rcv                                                                                                                                 ▒
               + 47.48% rtnetlink_rcv_msg                                                                                                                             ▒
            - 15.59% __netlink_dump_start                                                                                                                             ▒
                 rtnetlink_rcv_msg                                                                                                                                    ▒
                 netlink_rcv_skb                                                                                                                                      ▒
                 rtnetlink_rcv                                                                                                                                        ▒
                 netlink_unicast                                                                                                                                      ▒
                 netlink_sendmsg                                                                                                                                      ▒
                 sock_sendmsg                                                                                                                                         ▒
                 SYSC_sendto                                                                                                                                          ▒
                 sys_sendto                                                                                                                                           ▒
                 system_call_fastpath                                                                                                                                 ▒
               + __sendto_nocancel                                                                                                                                    ▒
+   1.96%      getaddrinfo  [kernel.kallsyms]    [k] mutex_spin_on_owner.isra.4     

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Kill bogus lock_sock in netlink_insert
  2015-05-16 16:50                                 ` Eric Dumazet
@ 2015-05-16 20:58                                   ` David Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2015-05-16 20:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: herbert, tgraf, netdev, ying.xue

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 16 May 2015 09:50:11 -0700

> Maybe the difference between 3.15 and 4.1 is not coming from
> rhashtable/netlink but other core stuff, like mutex code.
> 
> The workload is largely dominated by mutex acquisition anyway.

Hmmm, the only commit that sticks out for me in that time-frame is:

commit 90631822c5d307b5410500806e8ac3e63928aa3e
Author: Jason Low <jason.low2@hp.com>
Date:   Mon Jul 14 10:27:49 2014 -0700

    locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead
    

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [net] netlink: Make autobind rover an atomic_t
  2015-05-16 13:40                                 ` [net] netlink: Make autobind rover an atomic_t Herbert Xu
  2015-05-16 13:50                                   ` [net] netlink: Reset portid after netlink_insert failure Herbert Xu
@ 2015-05-16 21:08                                   ` David Miller
  2015-05-17  2:45                                     ` [net-next] netlink: Use random autobind rover Herbert Xu
  1 sibling, 1 reply; 48+ messages in thread
From: David Miller @ 2015-05-16 21:08 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, tgraf, netdev, ying.xue

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 16 May 2015 21:40:07 +0800

> The commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7 ("netlink:
> Lockless lookup with RCU grace period in socket release") removed
> the locks around the autobind rover without making the rover itself
> safe for use by multiple threads.
> 
> This patch converts rover to an atomic_t to make it at least
> somewhat safe to use locklessly.  The tricky bit is when the
> rover wraps around.  This patch simply deals with it by blindly
> doing an atomic_set.  So if many threads encounter the wraparound
> simultaneously then they'll all step on each other's toes and
> all try to bind to -4097.  But this should eventually sort itself
> out as they loop around and try the atomic_dec_return after the
> last thread does an atomic_set.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

As far as I can tell, this ought to be fine as-is.

Everyone synchronizes on the netlink_insert().

And the rover is just a heuristic to find a free negative
portid quickly.  If the cpus walk on top of eachother, it
will sort itself out in the end.

There is one part of your patch we certainly do need, and
that's the correction of 'portid' when rover rolls over.

Something like the following.

What do you think?

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index dbe8859..bd26e69 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1305,7 +1305,7 @@ retry:
 		/* Bind collision, search negative portid values. */
 		portid = rover--;
 		if (rover > -4097)
-			rover = -4097;
+			portid = rover = -4097;
 		rcu_read_unlock();
 		goto retry;
 	}

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [net] netlink: Reset portid after netlink_insert failure
  2015-05-16 13:50                                   ` [net] netlink: Reset portid after netlink_insert failure Herbert Xu
@ 2015-05-16 21:09                                     ` David Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2015-05-16 21:09 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, tgraf, netdev, ying.xue

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 16 May 2015 21:50:28 +0800

> The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink:
> eliminate nl_sk_hash_lock") breaks the autobind retry mechanism
> because it doesn't reset portid after a failed netlink_insert.
> 
> This means that should autobind fail the first time around, then
> the socket will be stuck in limbo as it can never be bound again
> since it already has a non-zero portid.
> 
> Fixes: c5adde9468b0 ("netlink: eliminate nl_sk_hash_lock") 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Looks good, applied and queued up for -stable.

Thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-16 13:16                             ` Herbert Xu
@ 2015-05-16 21:10                               ` David Miller
  2015-06-04 16:27                                 ` Guenter Roeck
  0 siblings, 1 reply; 48+ messages in thread
From: David Miller @ 2015-05-16 21:10 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, tgraf, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 16 May 2015 21:16:28 +0800

> On Fri, May 15, 2015 at 01:02:57PM -0400, David Miller wrote:
>> From: Herbert Xu <herbert@gondor.apana.org.au>
>> Date: Thu, 14 May 2015 13:58:24 +0800
>> 
>> > The current rhashtable rehash code is buggy and can't deal with
>> > parallel insertions/removals without corrupting the hash table.
>> > 
>> > This patch disables it by partially reverting
>> > c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
>> > nl_sk_hash_lock").
>> > 
>> > This patch also removes a bogus socket lock introduced by that
>> > very same patch.
>> > 
>> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>> 
>> Herbert, if you agree with me in the other thread that the lock_sock()
>> or something like it has to remain, you'll need to respin this.
> 
> Actually I think this one is OK because I'm replacing it with the
> hash table mutex which is just like the previous global lock except
> that it is per-family.  As you cannot change the family on a netlink
> socket this should be good enough.
> 
> But the changelog message is wrong so here is an updated version.
> 
> Cheers,
> 
> ---8<---
> The current rhashtable rehash code is buggy and can't deal with
> parallel insertions/removals without corrupting the hash table.
> 
> This patch disables it by partially reverting
> c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> nl_sk_hash_lock").
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Ok, I've queued this up for -stable, thanks Herbert.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [net-next] netlink: Use random autobind rover
  2015-05-16 21:08                                   ` [net] netlink: Make autobind rover an atomic_t David Miller
@ 2015-05-17  2:45                                     ` Herbert Xu
  2015-05-18  3:44                                       ` David Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-05-17  2:45 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, tgraf, netdev, ying.xue

On Sat, May 16, 2015 at 05:08:19PM -0400, David Miller wrote:
>
> As far as I can tell, this ought to be fine as-is.

OK what about dropping the global rover altogether? Let's just
start randomly like UDP.

---8<---
Currently we use a global rover to select a port ID that is unique.
This used to work consistently when it was protected with a global
lock.  However as we're now lockless, the global rover can exhibit
pathological behaviour should multiple threads all stomp on it at
the same time.

Granted this will eventually resolve itself but the process is
suboptimal.

This patch replaces the global rover with a pseudorandom starting
point to avoid this issue.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ec4adbd..716c6d2 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1296,20 +1296,24 @@ static int netlink_autobind(struct socket *sock)
 	struct netlink_table *table = &nl_table[sk->sk_protocol];
 	s32 portid = task_tgid_vnr(current);
 	int err;
-	static s32 rover = -4097;
+	s32 rover = -4096;
+	bool ok;
 
 retry:
 	cond_resched();
 	rcu_read_lock();
-	if (__netlink_lookup(table, portid, net)) {
+	ok = !__netlink_lookup(table, portid, net);
+	rcu_read_unlock();
+	if (!ok) {
 		/* Bind collision, search negative portid values. */
-		portid = rover--;
-		if (rover > -4097)
+		if (rover == -4096)
+			/* rover will be in range [S32_MIN, -4097] */
+			rover = S32_MIN + prandom_u32_max(-4096 - S32_MIN);
+		else if (rover >= -4096)
 			rover = -4097;
-		rcu_read_unlock();
+		portid = rover--;
 		goto retry;
 	}
-	rcu_read_unlock();
 
 	err = netlink_insert(sk, portid);
 	if (err == -EADDRINUSE)
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [net-next] netlink: Use random autobind rover
  2015-05-17  2:45                                     ` [net-next] netlink: Use random autobind rover Herbert Xu
@ 2015-05-18  3:44                                       ` David Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2015-05-18  3:44 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, tgraf, netdev, ying.xue

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 17 May 2015 10:45:34 +0800

> On Sat, May 16, 2015 at 05:08:19PM -0400, David Miller wrote:
>>
>> As far as I can tell, this ought to be fine as-is.
> 
> OK what about dropping the global rover altogether? Let's just
> start randomly like UDP.
> 
> ---8<---
> Currently we use a global rover to select a port ID that is unique.
> This used to work consistently when it was protected with a global
> lock.  However as we're now lockless, the global rover can exhibit
> pathological behaviour should multiple threads all stomp on it at
> the same time.
> 
> Granted this will eventually resolve itself but the process is
> suboptimal.
> 
> This patch replaces the global rover with a pseudorandom starting
> point to avoid this issue.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Yep, this is fine, applied.

Thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-15  0:06                             ` Herbert Xu
@ 2015-05-20 23:53                               ` Thomas Graf
  2015-05-21  0:31                                 ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Thomas Graf @ 2015-05-20 23:53 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, David Miller, netdev

On 05/15/15 at 08:06am, Herbert Xu wrote:
> On Thu, May 14, 2015 at 07:37:56AM -0700, Eric Dumazet wrote:
> >
> > This solves the corruption thanks Herbert.
> 
> Great.
> 
> > But wasn't rhashtable meant to be faster ? ;)
> 
> Is it, that's news to me :)

Eric, can you share the scripts you used to test this?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-20 23:53                               ` Thomas Graf
@ 2015-05-21  0:31                                 ` Eric Dumazet
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Dumazet @ 2015-05-21  0:31 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Herbert Xu, David Miller, netdev

On Thu, 2015-05-21 at 01:53 +0200, Thomas Graf wrote:
> On 05/15/15 at 08:06am, Herbert Xu wrote:
> > On Thu, May 14, 2015 at 07:37:56AM -0700, Eric Dumazet wrote:
> > >
> > > This solves the corruption thanks Herbert.
> > 
> > Great.
> > 
> > > But wasn't rhashtable meant to be faster ? ;)
> > 
> > Is it, that's news to me :)
> 
> Eric, can you share the scripts you used to test this?

I believe I posted the getaddrinfo C prog already.

Here is the shell script :

# cat getaddrinfo_many.sh 
#!/bin/bash

TRANSACTIONS_PER_JOB=500
JOBS=200

run_netperf() {
  for ((i=0; i<$JOBS; i++)); do
    ./getaddrinfo $TRANSACTIONS_PER_JOB &> $i.out && rm $i.out &
  done
  wait
  return 0
}

run_once() {
  seconds=$(TIMEFORMAT="%R";(time run_netperf) 2>&1)
  total_transactions=$(($TRANSACTIONS_PER_JOB * $JOBS))
  transactions_per_second=$(echo "print $total_transactions / $seconds" | python)
  echo $transactions_per_second
  return 0
}

start_time=$(date +%s)
while true; do
  result=$(run_once)
  current_time=$(date +%s)
  elapsed=$(($current_time - $start_time))
  echo -n "TPS: $result  Seconds Elapsed: $elapsed  Current time: "
  date
done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-05-16 21:10                               ` David Miller
@ 2015-06-04 16:27                                 ` Guenter Roeck
  2015-06-04 18:59                                   ` David Miller
  2015-06-05  3:52                                   ` Herbert Xu
  0 siblings, 2 replies; 48+ messages in thread
From: Guenter Roeck @ 2015-06-04 16:27 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, eric.dumazet, tgraf, netdev

On Sat, May 16, 2015 at 05:10:19PM -0400, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sat, 16 May 2015 21:16:28 +0800
> 
> > On Fri, May 15, 2015 at 01:02:57PM -0400, David Miller wrote:
> >> From: Herbert Xu <herbert@gondor.apana.org.au>
> >> Date: Thu, 14 May 2015 13:58:24 +0800
> >> 
> >> > The current rhashtable rehash code is buggy and can't deal with
> >> > parallel insertions/removals without corrupting the hash table.
> >> > 
> >> > This patch disables it by partially reverting
> >> > c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> >> > nl_sk_hash_lock").
> >> > 
> >> > This patch also removes a bogus socket lock introduced by that
> >> > very same patch.
> >> > 
> >> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> >> 
> >> Herbert, if you agree with me in the other thread that the lock_sock()
> >> or something like it has to remain, you'll need to respin this.
> > 
> > Actually I think this one is OK because I'm replacing it with the
> > hash table mutex which is just like the previous global lock except
> > that it is per-family.  As you cannot change the family on a netlink
> > socket this should be good enough.
> > 
> > But the changelog message is wrong so here is an updated version.
> > 
> > Cheers,
> > 
> > ---8<---
> > The current rhashtable rehash code is buggy and can't deal with
> > parallel insertions/removals without corrupting the hash table.
> > 
> > This patch disables it by partially reverting
> > c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> > nl_sk_hash_lock").
> > 
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Ok, I've queued this up for -stable, thanks Herbert.
> 
Hi David,

sorry for bothering you - I don't see this patch in any of your trees,
and it is marked as "changes requested" in patchwork. Did I look
at the wrong places, do you still plan to apply the patch as-is,
or do you expect some changes ?

As side info, I have been trying to track down the getaddrinfo
hang problem observed by others, which we see in 3.19.4 and 4.0.4.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-06-04 16:27                                 ` Guenter Roeck
@ 2015-06-04 18:59                                   ` David Miller
  2015-06-04 20:44                                     ` Eric Dumazet
  2015-06-04 20:58                                     ` Guenter Roeck
  2015-06-05  3:52                                   ` Herbert Xu
  1 sibling, 2 replies; 48+ messages in thread
From: David Miller @ 2015-06-04 18:59 UTC (permalink / raw)
  To: linux; +Cc: herbert, eric.dumazet, tgraf, netdev

From: Guenter Roeck <linux@roeck-us.net>
Date: Thu, 4 Jun 2015 09:27:27 -0700

> sorry for bothering you - I don't see this patch in any of your trees,
> and it is marked as "changes requested" in patchwork. Did I look
> at the wrong places, do you still plan to apply the patch as-is,
> or do you expect some changes ?
> 
> As side info, I have been trying to track down the getaddrinfo
> hang problem observed by others, which we see in 3.19.4 and 4.0.4.

Changes-requested, amazingly, means that someone gave you feedback
and changes along with a fresh resubmission is expected of you.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-06-04 18:59                                   ` David Miller
@ 2015-06-04 20:44                                     ` Eric Dumazet
  2015-06-04 20:58                                     ` Guenter Roeck
  1 sibling, 0 replies; 48+ messages in thread
From: Eric Dumazet @ 2015-06-04 20:44 UTC (permalink / raw)
  To: David Miller; +Cc: linux, herbert, tgraf, netdev

On Thu, 2015-06-04 at 11:59 -0700, David Miller wrote:
> From: Guenter Roeck <linux@roeck-us.net>
> Date: Thu, 4 Jun 2015 09:27:27 -0700
> 
> > sorry for bothering you - I don't see this patch in any of your trees,
> > and it is marked as "changes requested" in patchwork. Did I look
> > at the wrong places, do you still plan to apply the patch as-is,
> > or do you expect some changes ?
> > 
> > As side info, I have been trying to track down the getaddrinfo
> > hang problem observed by others, which we see in 3.19.4 and 4.0.4.
> 
> Changes-requested, amazingly, means that someone gave you feedback
> and changes along with a fresh resubmission is expected of you.

Arg.

I hope Herbert can sort out this soon enough, I thought patch was
already queued.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-06-04 18:59                                   ` David Miller
  2015-06-04 20:44                                     ` Eric Dumazet
@ 2015-06-04 20:58                                     ` Guenter Roeck
  1 sibling, 0 replies; 48+ messages in thread
From: Guenter Roeck @ 2015-06-04 20:58 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, eric.dumazet, tgraf, netdev

On 06/04/2015 11:59 AM, David Miller wrote:
> From: Guenter Roeck <linux@roeck-us.net>
> Date: Thu, 4 Jun 2015 09:27:27 -0700
>
>> sorry for bothering you - I don't see this patch in any of your trees,
>> and it is marked as "changes requested" in patchwork. Did I look
>> at the wrong places, do you still plan to apply the patch as-is,
>> or do you expect some changes ?
>>
>> As side info, I have been trying to track down the getaddrinfo
>> hang problem observed by others, which we see in 3.19.4 and 4.0.4.
>
> Changes-requested, amazingly, means that someone gave you feedback
> and changes along with a fresh resubmission is expected of you.
>

Yes, I understand. My problem is/was that your last e-mail on the subject
suggested that you would apply the patch, while at the same time patchwork
suggests that you expect to see changes. Sorry that I was - and still
am - unable to correlate those two pieces of information.

Guenter

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-06-04 16:27                                 ` Guenter Roeck
  2015-06-04 18:59                                   ` David Miller
@ 2015-06-05  3:52                                   ` Herbert Xu
  2015-06-05  5:27                                     ` Guenter Roeck
  1 sibling, 1 reply; 48+ messages in thread
From: Herbert Xu @ 2015-06-05  3:52 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: David Miller, eric.dumazet, tgraf, netdev

On Thu, Jun 04, 2015 at 09:27:27AM -0700, Guenter Roeck wrote:
>
> sorry for bothering you - I don't see this patch in any of your trees,
> and it is marked as "changes requested" in patchwork. Did I look
> at the wrong places, do you still plan to apply the patch as-is,
> or do you expect some changes ?

I just looked up the patchwork entry and it actually says "not
applicable" which is correct:

https://patchwork.ozlabs.org/patch/473041/

Because the patch only applies to stable and is not needed in either
net or net-next.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink: Disable insertions/removals during rehash
  2015-06-05  3:52                                   ` Herbert Xu
@ 2015-06-05  5:27                                     ` Guenter Roeck
  0 siblings, 0 replies; 48+ messages in thread
From: Guenter Roeck @ 2015-06-05  5:27 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, eric.dumazet, tgraf, netdev

On 06/04/2015 08:52 PM, Herbert Xu wrote:
> On Thu, Jun 04, 2015 at 09:27:27AM -0700, Guenter Roeck wrote:
>>
>> sorry for bothering you - I don't see this patch in any of your trees,
>> and it is marked as "changes requested" in patchwork. Did I look
>> at the wrong places, do you still plan to apply the patch as-is,
>> or do you expect some changes ?
>
> I just looked up the patchwork entry and it actually says "not
> applicable" which is correct:
>
> https://patchwork.ozlabs.org/patch/473041/
>
> Because the patch only applies to stable and is not needed in either
> net or net-next.
>

Ah, so I was looking at the wrong branch after all.

Thanks a lot for the clarification,

Guenter

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-05-14  4:21                       ` Herbert Xu
                                           ` (2 preceding siblings ...)
  2015-05-14  5:58                         ` netlink: Disable insertions/removals during rehash Herbert Xu
@ 2015-06-26 10:44                         ` Konstantin Khlebnikov
  2015-06-27  7:09                           ` Herbert Xu
  3 siblings, 1 reply; 48+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-26 10:44 UTC (permalink / raw)
  To: Herbert Xu, Eric Dumazet; +Cc: David Miller, Thomas Graf, netdev

On 14.05.2015 07:21, Herbert Xu wrote:
> On Thu, May 14, 2015 at 12:16:28PM +0800, Herbert Xu wrote:
>> On Wed, May 13, 2015 at 09:13:38PM -0700, Eric Dumazet wrote:
>>>
>>> So it looks like we lost an skb or something....
>>
>> OK that sounds reasonable.  So my plan is to disable dynamic
>> rehashing and then hunt down this lookup bug.
>
> Oh wait this isn't even a lookup failure since that should return
> ECONNREFUSED.  Could it be that this hang is a separate bug that's
> not related to rhashtable?

Hang in getaddrinfo is a bug in libc: function make_request in
sysdeps/unix/sysv/linux/check_pf.c ignores NLMSG_ERROR
(as well as messsages with nlmh->nlmsg_pid != pid)

It hangs forever in case of any error or netlink pid collision.
And I've seen ECONNREFUSED in message buffer when connected to hang
process with gdb.


I've found race in v3.18 in __netlink_lookup: rhashtable_hashfn
computes hash using one table and following rhashtable_lookup_compare
dereferences ht->tbl once again and could see different table.

patch follows...

>
> If that was the case then we simply need to get rid of dynamic
> rehashing.
>
> Cheers,
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: netlink & rhashtable status
  2015-06-26 10:44                         ` netlink & rhashtable status Konstantin Khlebnikov
@ 2015-06-27  7:09                           ` Herbert Xu
  0 siblings, 0 replies; 48+ messages in thread
From: Herbert Xu @ 2015-06-27  7:09 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: Eric Dumazet, David Miller, Thomas Graf, netdev

On Fri, Jun 26, 2015 at 01:44:04PM +0300, Konstantin Khlebnikov wrote:
>
> I've found race in v3.18 in __netlink_lookup: rhashtable_hashfn
> computes hash using one table and following rhashtable_lookup_compare
> dereferences ht->tbl once again and could see different table.
> 
> patch follows...

Sounds like this is it.  Thanks for digging into this!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-06-27  7:09 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13  5:30 netlink & rhashtable status Eric Dumazet
2015-05-13  5:40 ` Herbert Xu
2015-05-13  6:15   ` Eric Dumazet
2015-05-13  6:20     ` Herbert Xu
2015-05-13 13:04       ` Eric Dumazet
2015-05-13 16:18         ` Eric Dumazet
2015-05-13 16:35           ` David Miller
2015-05-14  2:55             ` Herbert Xu
2015-05-14  2:53           ` Herbert Xu
2015-05-14  3:17             ` Eric Dumazet
2015-05-14  3:34               ` Herbert Xu
2015-05-14  3:58                 ` Eric Dumazet
2015-05-14  4:13                   ` Eric Dumazet
2015-05-14  4:16                     ` Herbert Xu
2015-05-14  4:21                       ` Herbert Xu
2015-05-14  4:38                         ` Eric Dumazet
2015-05-14  5:03                           ` Herbert Xu
2015-05-14  5:56                         ` Red Hat INTERNAL-ONLY kernel discussion list <rhkernel-list@redhat.com> Herbert Xu
2015-05-14  5:58                         ` netlink: Disable insertions/removals during rehash Herbert Xu
2015-05-14  6:02                           ` netlink: Kill bogus lock_sock in netlink_insert Herbert Xu
2015-05-15 16:49                             ` David Miller
2015-05-15 18:01                               ` Eric Dumazet
2015-05-16 16:50                                 ` Eric Dumazet
2015-05-16 20:58                                   ` David Miller
2015-05-15 17:02                             ` David Miller
2015-05-16 12:32                               ` Herbert Xu
2015-05-16 13:40                                 ` [net] netlink: Make autobind rover an atomic_t Herbert Xu
2015-05-16 13:50                                   ` [net] netlink: Reset portid after netlink_insert failure Herbert Xu
2015-05-16 21:09                                     ` David Miller
2015-05-16 21:08                                   ` [net] netlink: Make autobind rover an atomic_t David Miller
2015-05-17  2:45                                     ` [net-next] netlink: Use random autobind rover Herbert Xu
2015-05-18  3:44                                       ` David Miller
2015-05-14 14:37                           ` netlink: Disable insertions/removals during rehash Eric Dumazet
2015-05-15  0:06                             ` Herbert Xu
2015-05-20 23:53                               ` Thomas Graf
2015-05-21  0:31                                 ` Eric Dumazet
2015-05-15 17:02                           ` David Miller
2015-05-16 13:16                             ` Herbert Xu
2015-05-16 21:10                               ` David Miller
2015-06-04 16:27                                 ` Guenter Roeck
2015-06-04 18:59                                   ` David Miller
2015-06-04 20:44                                     ` Eric Dumazet
2015-06-04 20:58                                     ` Guenter Roeck
2015-06-05  3:52                                   ` Herbert Xu
2015-06-05  5:27                                     ` Guenter Roeck
2015-06-26 10:44                         ` netlink & rhashtable status Konstantin Khlebnikov
2015-06-27  7:09                           ` Herbert Xu
2015-05-14  4:17                     ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.