netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
@ 2014-07-17 10:52 Denys Fedoryshchenko
  2014-07-17 11:51 ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-17 10:52 UTC (permalink / raw)
  To: netdev; +Cc: kaber, davem

Hi

I noticed TCP transfer rate slowdown after few days of operation on 
kernel 3.15.3, after some digging found out this:

balancer-backup ~ # cat /proc/net/sockstat
sockets: used 118236
TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 1 memory 2

after shutting down program
balancer-backup ~ # cat /proc/net/sockstat
sockets: used 47
TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
UDP: inuse 0 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

sysctl settings:
net.ipv4.tcp_mem = 1767103      2045612 3068412

I restarted recently process, and mem value didnt changed (while because 
it is sockets should release all memory), also it looks incorrect, 
because at same time:
balancer-backup ~ # cat /proc/meminfo
MemTotal:       32939492 kB
MemFree:        29876564 kB

While 1955339 * 4096 should be around 8GB.
Probably it is just accounting issue or is it real memory leak?
What other info i can provide to troubleshoot this info more properly?
I will upgrade to 3.15.5 also now, to see if issue persist there.

Also i noticed several warnings:
[1116634.378936] ------------[ cut here ]------------
[1116634.379169] WARNING: CPU: 0 PID: 28350 at net/core/stream.c:201 
sk_stream_kill_queues+0xff/0x104()
[1116634.379606] Modules linked in: microcode xt_tcpudp xt_mark 
iptable_mangle ip_tables x_tables 8021q garp stp mrp llc
[1116634.380069] CPU: 0 PID: 28350 Comm: haproxy Tainted: G        W     
3.15.3-build-0007 #2
[1116634.380492] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 
2.0.11 02/26/2010
[1116634.380921]  0000000000000000 ffff880778393db0 ffffffff8160042b 
0000000000000000
[1116634.381352]  ffff880778393de8 ffffffff810b4e03 ffffffff81584db2 
ffff8807e4df9380
[1116634.381780]  ffff8807e4df94c8 0000000000000007 ffff8807e4df93f0 
ffff880778393df8
[1116634.382212] Call Trace:
[1116634.382440]  [<ffffffff8160042b>] dump_stack+0x45/0x56
[1116634.382659]  [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[1116634.382871]  [<ffffffff81584db2>] ? 
sk_stream_kill_queues+0xff/0x104
[1116634.383087]  [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[1116634.383308]  [<ffffffff81584db2>] sk_stream_kill_queues+0xff/0x104
[1116634.383522]  [<ffffffff815bf066>] inet_csk_destroy_sock+0x77/0xb7
[1116634.383741]  [<ffffffff815c31c1>] tcp_close+0x287/0x37a
[1116634.383953]  [<ffffffff815e03d5>] inet_release+0x6f/0x76
[1116634.384167]  [<ffffffff81578bfe>] sock_release+0x1a/0x79
[1116634.384379]  [<ffffffff81578c6a>] sock_close+0xd/0x11
[1116634.384600]  [<ffffffff8115750e>] __fput+0xdc/0x18d
[1116634.384826]  [<ffffffff811575eb>] ____fput+0x9/0xb
[1116634.385052]  [<ffffffff810ca5e2>] task_work_run+0x78/0x8e
[1116634.385276]  [<ffffffff81002880>] do_notify_resume+0x52/0x60
[1116634.385504]  [<ffffffff81606970>] int_signal+0x12/0x17
[1116634.385728] ---[ end trace fb11499084e23ab6 ]---
[1116634.386531] ------------[ cut here ]------------
[1116634.386792] WARNING: CPU: 0 PID: 28350 at net/ipv4/af_inet.c:153 
inet_sock_destruct+0x160/0x189()
[1116634.387264] Modules linked in: microcode xt_tcpudp xt_mark 
iptable_mangle ip_tables x_tables 8021q garp stp mrp llc
[1116634.387781] CPU: 0 PID: 28350 Comm: haproxy Tainted: G        W     
3.15.3-build-0007 #2
[1116634.388236] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 
2.0.11 02/26/2010
[1116634.388680]  0000000000000000 ffff880778393d98 ffffffff8160042b 
0000000000000000
[1116634.389157]  ffff880778393dd0 ffffffff810b4e03 ffffffff815e00d7 
ffff8807e4df9380
[1116634.389602]  ffff8807e4df94c8 0000000000000007 ffff8807e4df93f0 
ffff880778393de0
[1116634.390046] Call Trace:
[1116634.390270]  [<ffffffff8160042b>] dump_stack+0x45/0x56
[1116634.390499]  [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[1116634.390742]  [<ffffffff815e00d7>] ? inet_sock_destruct+0x160/0x189
[1116634.390979]  [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[1116634.391221]  [<ffffffff815e00d7>] inet_sock_destruct+0x160/0x189
[1116634.391457]  [<ffffffff8157ca3a>] __sk_free+0x18/0xd5
[1116634.391688]  [<ffffffff8157cb0a>] sk_free+0x13/0x15
[1116634.391927]  [<ffffffff815c32a8>] tcp_close+0x36e/0x37a
[1116634.392141]  [<ffffffff815e03d5>] inet_release+0x6f/0x76
[1116634.392484]  [<ffffffff81578bfe>] sock_release+0x1a/0x79
[1116634.392774]  [<ffffffff81578c6a>] sock_close+0xd/0x11
[1116634.392996]  [<ffffffff8115750e>] __fput+0xdc/0x18d
[1116634.393226]  [<ffffffff811575eb>] ____fput+0x9/0xb
[1116634.393450]  [<ffffffff810ca5e2>] task_work_run+0x78/0x8e
[1116634.393696]  [<ffffffff81002880>] do_notify_resume+0x52/0x60
[1116634.393935]  [<ffffffff81606970>] int_signal+0x12/0x17
[1116634.394159] ---[ end trace fb11499084e23ab7 ]---

P.S. After restarting server and around 5 minutes of operation:
sockets: used 109439
TCP: inuse 110642 orphan 1372 tw 98904 alloc 110768 mem 215254
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
  2014-07-17 10:52 /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? Denys Fedoryshchenko
@ 2014-07-17 11:51 ` Eric Dumazet
  2014-07-17 12:13   ` Denys Fedoryshchenko
  2014-07-17 13:12   ` Denys Fedoryshchenko
  0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-07-17 11:51 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem

On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote:
> Hi
> 
> I noticed TCP transfer rate slowdown after few days of operation on 
> kernel 3.15.3, after some digging found out this:

What was previous version you were using without this problem ?

> 
> balancer-backup ~ # cat /proc/net/sockstat
> sockets: used 118236
> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
> UDP: inuse 1 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 1 memory 2
> 
> after shutting down program
> balancer-backup ~ # cat /proc/net/sockstat
> sockets: used 47
> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
> UDP: inuse 0 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 0 memory 0
> 
> sysctl settings:
> net.ipv4.tcp_mem = 1767103      2045612 3068412
> 
> I restarted recently process, and mem value didnt changed (while because 
> it is sockets should release all memory), also it looks incorrect, 
> because at same time:
> balancer-backup ~ # cat /proc/meminfo
> MemTotal:       32939492 kB
> MemFree:        29876564 kB
> 
> While 1955339 * 4096 should be around 8GB.
> Probably it is just accounting issue or is it real memory leak?
> What other info i can provide to troubleshoot this info more properly?
> I will upgrade to 3.15.5 also now, to see if issue persist there.

I see nothing really wrong in your report.

It looks like you have a lot of sockets around after shutdown of the
program. Each FIN-WAIT socket might consume a lot of buffers in its
write queue, unless you use/force SO_LINGER or something.

If you try the following command, you might see how many sockets have
outstanding data.

ss -amn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
  2014-07-17 11:51 ` Eric Dumazet
@ 2014-07-17 12:13   ` Denys Fedoryshchenko
  2014-07-17 13:12   ` Denys Fedoryshchenko
  1 sibling, 0 replies; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-17 12:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, kaber, davem

On 2014-07-17 14:51, Eric Dumazet wrote:
> On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote:
>> Hi
>> 
>> I noticed TCP transfer rate slowdown after few days of operation on
>> kernel 3.15.3, after some digging found out this:
> 
> What was previous version you were using without this problem ?
It is fresh installations, just noticed on units with heavy load 
(200-300k sockets at peak time) after few days it will exhaust memory 
completely, in one case it makes unit
completely unreachable, while trying to ssh there i can establish it 
establish tcp, but no ssh banner message shown from server side.
But because unit is remote, and no kvm - i can't see what is really 
wrong there, seems i should try netconsole.
On other units it just causes tcp performance degradation.
> 
>> 
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 118236
>> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
>> UDP: inuse 1 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 1 memory 2
>> 
>> after shutting down program
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 47
>> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
>> UDP: inuse 0 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 0 memory 0
>> 
>> sysctl settings:
>> net.ipv4.tcp_mem = 1767103      2045612 3068412
>> 
>> I restarted recently process, and mem value didnt changed (while 
>> because
>> it is sockets should release all memory), also it looks incorrect,
>> because at same time:
>> balancer-backup ~ # cat /proc/meminfo
>> MemTotal:       32939492 kB
>> MemFree:        29876564 kB
>> 
>> While 1955339 * 4096 should be around 8GB.
>> Probably it is just accounting issue or is it real memory leak?
>> What other info i can provide to troubleshoot this info more properly?
>> I will upgrade to 3.15.5 also now, to see if issue persist there.
> 
> I see nothing really wrong in your report.
> 
> It looks like you have a lot of sockets around after shutdown of the
> program. Each FIN-WAIT socket might consume a lot of buffers in its
> write queue, unless you use/force SO_LINGER or something.
> 
> If you try the following command, you might see how many sockets have
> outstanding data.
> 
> ss -amn
I will check that next time, thanks a lot!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
  2014-07-17 11:51 ` Eric Dumazet
  2014-07-17 12:13   ` Denys Fedoryshchenko
@ 2014-07-17 13:12   ` Denys Fedoryshchenko
  2014-07-17 13:42     ` Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-17 13:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, kaber, davem

Well, it seems not normal. But i am not sure, is mem value means this 
memory is allocated and not released?
If it is allocated pages, MemFree in /proc/memstat should be different 
value.
I just shut down balancer process now and bypassed all traffic, and 
waited all(ok, almost all) sockets to close, this particular server had 
300k connections before test.

sockets: used 78
TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

MemTotal:       32664776 kB
MemFree:        32462456 kB
MemAvailable:   32449704 kB

ss output:

Netid State      Recv-Q Send-Q                                           
                                                           Local 
Address:Port                                                             
                                           Peer Address:Port
nl    UNCONN     0      0                                                
                                                                       
0:0                                                                      
                                                 *
nl    UNCONN     0      0                                                
                                                                       
4:0                                                                      
                                                 *
nl    UNCONN     0      0                                                
                                                                      
10:0                                                                     
                                                  *
nl    UNCONN     0      0                                                
                                                                      
11:0                                                                     
                                                  *
nl    UNCONN     0      0                                                
                                                                      
15:0                                                                     
                                                  *
nl    UNCONN     0      0                                                
                                                                      
16:0                                                                     
                                                  *
nl    UNCONN     0      0                                                
                                                                      
18:0                                                                     
                                                  *
nl    UNCONN     0      0                                                
                                                                      
21:0                                                                     
                                                  *
tcp   UNCONN     0      0                                                
                                                                       
*:49484                                                                  
                                                *:*
          skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
tcp   LISTEN     0      1280                                             
                                                                       
*:8880                                                                   
                                                *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp   LISTEN     0      20                                               
                                                                       
*:22                                                                     
                                                *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp   LISTEN     0      1                                                
                                                                       
*:23                                                                     
                                                *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp   LISTEN     0      1280                                             
                                                                       
*:443                                                                    
                                                *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp   FIN-WAIT-1 0      4545                                             
                                                           
157.56.194.74:443                                                        
                                                  172.17.37.4:52473
          skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
tcp   ESTAB      0      3008                                             
                                                              
10.0.22.22:22                                                            
                                            194.146.153.22:59156
          skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0)
tcp   FIN-WAIT-1 0      1                                                
                                                          
54.225.250.234:443                                                       
                                                 172.17.32.251:35969
          skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0)
tcp   FIN-WAIT-2 0      0                                                
                                                          
206.53.159.203:443                                                       
                                                  172.17.76.10:54688

tcp   FIN-WAIT-2 0      0                                                
                                                           
173.194.39.40:443                                                        
                                                172.17.64.221:39482

tcp   FIN-WAIT-1 0      1                                                
                                                           
69.171.235.48:443                                                        
                                                172.17.32.251:52731
          skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0)
tcp   FIN-WAIT-2 0      0                                                
                                                           
173.194.39.38:443                                                        
                                                172.17.64.228:52013
          skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp   FIN-WAIT-1 0      4645                                             
                                                            
2.16.216.121:443                                                         
                                               172.17.40.215:51985
          skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
tcp   FIN-WAIT-2 0      0                                                
                                                          
173.192.231.39:443                                                       
                                                  172.17.10.41:56736
          skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp   FIN-WAIT-2 0      0                                                
                                                           
173.194.70.95:443                                                        
                                                172.17.64.221:59055

tcp   FIN-WAIT-2 0      0                                                
                                                            
46.33.74.129:443                                                         
                                               172.17.74.223:53207
          skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp   FIN-WAIT-2 0      0                                                
                                                           
173.194.70.99:443                                                        
                                                 172.17.43.95:57045
          skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp   FIN-WAIT-2 0      0                                                
                                                          
198.11.217.194:443                                                       
                                                  172.17.75.46:63384
          skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp   FIN-WAIT-1 0      3601                                             
                                                             
31.13.81.33:443                                                          
                                              172.17.14.192:54971
          skmem:(r0,rb8192,t0,tb8192,f1280,w6912,o0,bl0)
tcp   FIN-WAIT-1 0      4561                                             
                                                          
174.36.208.130:443                                                       
                                                 172.17.45.227:56934
          skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
tcp   FIN-WAIT-2 0      0                                                
                                                           
93.186.18.171:443                                                        
                                                 172.17.38.62:57496

tcp   FIN-WAIT-2 0      0                                                
                                                          
173.194.70.104:443                                                       
                                                 172.17.70.233:42575
          skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp   LAST-ACK   0      3752                                             
                                                             
31.13.93.33:443                                                          
                                               172.17.46.52:6869
          skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)


On 2014-07-17 14:51, Eric Dumazet wrote:
> On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote:
>> Hi
>> 
>> I noticed TCP transfer rate slowdown after few days of operation on
>> kernel 3.15.3, after some digging found out this:
> 
> What was previous version you were using without this problem ?
> 
>> 
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 118236
>> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
>> UDP: inuse 1 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 1 memory 2
>> 
>> after shutting down program
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 47
>> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
>> UDP: inuse 0 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 0 memory 0
>> 
>> sysctl settings:
>> net.ipv4.tcp_mem = 1767103      2045612 3068412
>> 
>> I restarted recently process, and mem value didnt changed (while 
>> because
>> it is sockets should release all memory), also it looks incorrect,
>> because at same time:
>> balancer-backup ~ # cat /proc/meminfo
>> MemTotal:       32939492 kB
>> MemFree:        29876564 kB
>> 
>> While 1955339 * 4096 should be around 8GB.
>> Probably it is just accounting issue or is it real memory leak?
>> What other info i can provide to troubleshoot this info more properly?
>> I will upgrade to 3.15.5 also now, to see if issue persist there.
> 
> I see nothing really wrong in your report.
> 
> It looks like you have a lot of sockets around after shutdown of the
> program. Each FIN-WAIT socket might consume a lot of buffers in its
> write queue, unless you use/force SO_LINGER or something.
> 
> If you try the following command, you might see how many sockets have
> outstanding data.
> 
> ss -amn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
  2014-07-17 13:12   ` Denys Fedoryshchenko
@ 2014-07-17 13:42     ` Eric Dumazet
  2014-07-18  8:12       ` Denys Fedoryshchenko
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-07-17 13:42 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem

On Thu, 2014-07-17 at 16:12 +0300, Denys Fedoryshchenko wrote:
> Well, it seems not normal. But i am not sure, is mem value means this 
> memory is allocated and not released?

mem means : sockets have forward_alloc reservations. They can use this
amount of memory without worrying.

> If it is allocated pages, MemFree in /proc/memstat should be different 
> value.

Its not allocated pages.

> I just shut down balancer process now and bypassed all traffic, and 
> waited all(ok, almost all) sockets to close, this particular server had 
> 300k connections before test.
> 
> sockets: used 78
> TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593
> UDP: inuse 1 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 0 memory 0
> 
> MemTotal:       32664776 kB
> MemFree:        32462456 kB
> MemAvailable:   32449704 kB
> 
> ss output:
> 
> Netid State      Recv-Q Send-Q                                           
>                                                            Local 
> Address:Port                                                             
>                                            Peer Address:Port
> nl    UNCONN     0      0                                                
>                                                                        
> 0:0                                                                      
>                                                  *
> nl    UNCONN     0      0                                                
>                                                                        
> 4:0                                                                      
>                                                  *
> nl    UNCONN     0      0                                                
>                                                                       
> 10:0                                                                     
>                                                   *
> nl    UNCONN     0      0                                                
>                                                                       
> 11:0                                                                     
>                                                   *
> nl    UNCONN     0      0                                                
>                                                                       
> 15:0                                                                     
>                                                   *
> nl    UNCONN     0      0                                                
>                                                                       
> 16:0                                                                     
>                                                   *
> nl    UNCONN     0      0                                                
>                                                                       
> 18:0                                                                     
>                                                   *
> nl    UNCONN     0      0                                                
>                                                                       
> 21:0                                                                     
>                                                   *
> tcp   UNCONN     0      0                                                
>                                                                        
> *:49484                                                                  
>                                                 *:*
>           skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
> tcp   LISTEN     0      1280                                             
>                                                                        
> *:8880                                                                   
>                                                 *:*
>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp   LISTEN     0      20                                               
>                                                                        
> *:22                                                                     
>                                                 *:*
>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp   LISTEN     0      1                                                
>                                                                        
> *:23                                                                     
>                                                 *:*
>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp   LISTEN     0      1280                                             
>                                                                        
> *:443                                                                    
>                                                 *:*
>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp   FIN-WAIT-1 0      4545                                             
>                                                            
> 157.56.194.74:443                                                        
>                                                   172.17.37.4:52473
>           skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
> tcp   ESTAB      0      3008                                             
>                                                               
> 10.0.22.22:22                                                            
>                                             194.146.153.22:59156
>           skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0)
> tcp   FIN-WAIT-1 0      1                                                
>                                                           
> 54.225.250.234:443                                                       
>                                                  172.17.32.251:35969
>           skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0)
> tcp   FIN-WAIT-2 0      0                                                
>                                                           
> 206.53.159.203:443                                                       
>                                                   172.17.76.10:54688
> 
> tcp   FIN-WAIT-2 0      0                                                
>                                                            
> 173.194.39.40:443                                                        
>                                                 172.17.64.221:39482
> 
> tcp   FIN-WAIT-1 0      1                                                
>                                                            
> 69.171.235.48:443                                                        
>                                                 172.17.32.251:52731
>           skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0)

This looks bad : 

f4294966016 : 0xFFFFFB00

There is a bug of some sort, that leads to overflows.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
  2014-07-17 13:42     ` Eric Dumazet
@ 2014-07-18  8:12       ` Denys Fedoryshchenko
  2014-07-18  8:41         ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-18  8:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, kaber, davem

More updates. Now i found out how to see if tcp are under memory 
pressure mode, i guess when it is active, speed might degrade. Now 
almost all sockets are released, is it normal that tcp pressure still 
"yes" ?

balancer-backup ~ # cat /proc/net/protocols
protocol  size sockets  memory press maxhdr  slab module     cl co di ac 
io in de sh ss gs se re sp bi br ha uh gp em
PACKET    1216      0      -1   NI       0   no   kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
ALG        624      0       0   NI       0   no   kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
UNIX       832      1      -1   NI       0   yes  kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
UDP-Lite   800      0      -1   NI       0   yes  kernel      y  y  y  n 
  y  y  y  n  y  y  y  y  y  n  y  y  y  y  n
PING       760      0      -1   NI       0   yes  kernel      y  y  y  n 
  n  y  n  n  y  y  y  y  n  y  y  y  y  y  n
RAW        768      0      -1   NI       0   yes  kernel      y  y  y  n 
  y  y  y  n  y  y  y  y  n  y  y  y  y  n  n
UDP        800      1       0   NI       0   yes  kernel      y  y  y  n 
  y  n  y  n  y  y  y  y  y  n  y  y  y  y  n
TCP       1608     24  1823119   yes    208   yes  kernel      y  y  y  
y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
NETLINK    992      0      -1   NI       0   no   kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
balancer-backup ~ # cat /proc/net/sockstat
sockets: used 51
TCP: inuse 21 orphan 13 tw 2 alloc 21 mem 1823120
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory

Netid  State      Recv-Q Send-Q     Local Address:Port       Peer 
Address:Port
nl     UNCONN     0      0                      0:0                      
*
nl     UNCONN     0      0                      4:0                      
*
nl     UNCONN     0      0                     10:0                      
*
nl     UNCONN     0      0                     11:0                      
*
nl     UNCONN     0      0                     15:0                      
*
nl     UNCONN     0      0                     16:0                      
*
nl     UNCONN     0      0                     18:0                      
*
nl     UNCONN     0      0                     21:0                      
*
u_str  LISTEN     0      0      /tmp/dropbear-b661327d/auth-ac4c35f5-6 
398981316                 * 0
tcp    UNCONN     0      0                      *:21084                 
*:*
          skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
tcp    LISTEN     0      10000                  *:80                    
*:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      10000                  *:8880                  
*:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      9                      *:8080                  
*:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      20                     *:22                    
*:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      1                      *:23                    
*:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      10000                  *:443                   
*:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    ESTAB      0      176          212.22.93.7:22       
109.110.116.35:59925
          skmem:(r0,rb359040,t0,tb46080,f3584,w4608,o0,bl0)
tcp    FIN-WAIT-1 0      27545     173.194.19.109:80         
185.4.252.44:61532
          skmem:(r0,rb359040,t0,tb46080,f0,w45056,o0,bl0)
tcp    SYN-SENT   0      1            212.22.93.7:63910     
54.225.188.51:80
          skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp    FIN-WAIT-1 0      28333     151.249.89.224:80         
185.4.252.27:56892
          skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0)
tcp    FIN-WAIT-1 0      28009         2.16.61.24:80        
185.4.252.139:52608
          skmem:(r0,rb359040,t0,tb46080,f0,w45056,o0,bl0)
tcp    FIN-WAIT-1 0      21753      173.194.19.46:80        
185.4.252.214:51174
          skmem:(r0,rb359040,t0,tb46080,f1024,w35840,o0,bl0)
tcp    FIN-WAIT-1 0      28437      87.248.223.79:80        
185.4.253.107:60924
          skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0)
tcp    FIN-WAIT-1 0      46194       212.25.73.21:80        
185.4.253.107:50242
          skmem:(r0,rb359040,t0,tb78336,f0,w73728,o0,bl0)
tcp    FIN-WAIT-1 0      65001     173.194.35.119:443        
185.4.252.58:50040
          skmem:(r0,rb333440,t0,tb119808,f3584,w115200,o0,bl0)
tcp    SYN-SENT   0      1            212.22.93.7:27514     
54.225.188.51:80
          skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp    FIN-WAIT-1 0      3863        198.23.87.67:443       
185.4.253.115:1693
          skmem:(r0,rb333440,t0,tb46080,f3072,w9216,o0,bl0)
tcp    SYN-SENT   0      1            212.22.93.7:24487     
54.225.188.51:80
          skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp    FIN-WAIT-2 0      0         54.225.251.189:443        
185.4.253.19:49317

tcp    FIN-WAIT-2 0      0         184.173.136.67:443       
185.4.252.214:56383
          skmem:(r0,rb333440,t0,tb46080,f0,w0,o0,bl0)
tcp    SYN-SENT   0      1            212.22.93.7:7830      
54.225.188.51:80
          skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp    FIN-WAIT-1 0      6042       23.200.86.144:80        
185.4.252.139:54631
          skmem:(r0,rb359040,t0,tb46080,f1792,w10496,o0,bl0)
tcp    FIN-WAIT-1 0      28961     87.248.223.141:80         
185.4.253.18:61083
          skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0)


There was many like this one (with overflow?)
tcp    SYN-SENT   0      1            212.22.93.7:7830      
54.225.188.51:80
          skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)


After 5-10 minutes they disappeared, but still it is in tcp pressure 
mode:
balancer-backup ~ # ss -amn
Netid  State      Recv-Q Send-Q                                          
                    Local Address:Port                                    
                            Peer Address:Port
nl     UNCONN     0      0                                               
                                0:0                                       
                                        *
nl     UNCONN     0      0                                               
                                4:0                                       
                                        *
nl     UNCONN     0      0                                               
                               10:0                                       
                                        *
nl     UNCONN     0      0                                               
                               11:0                                       
                                        *
nl     UNCONN     0      0                                               
                               15:0                                       
                                        *
nl     UNCONN     0      0                                               
                               16:0                                       
                                        *
nl     UNCONN     0      0                                               
                               18:0                                       
                                        *
nl     UNCONN     0      0                                               
                               21:0                                       
                                        *
u_str  LISTEN     0      0                                         
/tmp/dropbear-b661327d/auth-ac4c35f5-6 398981316                         
                                                 * 0
tcp    UNCONN     0      0                                               
                                *:56684                                   
                                       *:*
          skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
tcp    LISTEN     0      10000                                           
                                *:80                                      
                                       *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      10000                                           
                                *:8880                                    
                                       *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      9                                               
                                *:8080                                    
                                       *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      20                                              
                                *:22                                      
                                       *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      1                                               
                                *:23                                      
                                       *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    LISTEN     0      10000                                           
                                *:443                                     
                                       *:*
          skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp    ESTAB      0      2688                                            
                      212.22.93.7:22                                      
                          109.110.116.35:59925
          skmem:(r0,rb359040,t2304,tb46080,f256,w16128,o0,bl0)
tcp    FIN-WAIT-1 0      21753                                           
                    173.194.19.46:80                                      
                           185.4.252.214:51174
          skmem:(r0,rb359040,t0,tb46080,f1024,w35840,o0,bl0)
tcp    FIN-WAIT-1 0      46194                                           
                     212.25.73.21:80                                      
                           185.4.253.107:50242
          skmem:(r0,rb359040,t0,tb78336,f0,w73728,o0,bl0)
tcp    FIN-WAIT-1 0      65001                                           
                   173.194.35.119:443                                     
                            185.4.252.58:50040
          skmem:(r0,rb333440,t0,tb119808,f3584,w115200,o0,bl0)
tcp    FIN-WAIT-1 0      3863                                            
                     198.23.87.67:443                                     
                           185.4.253.115:1693
          skmem:(r0,rb333440,t0,tb46080,f3072,w9216,o0,bl0)

balancer-backup ~ # cat /proc/net/protocols
protocol  size sockets  memory press maxhdr  slab module     cl co di ac 
io in de sh ss gs se re sp bi br ha uh gp em
PACKET    1216      0      -1   NI       0   no   kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
ALG        624      0       0   NI       0   no   kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
UNIX       832      1      -1   NI       0   yes  kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
UDP-Lite   800      0      -1   NI       0   yes  kernel      y  y  y  n 
  y  y  y  n  y  y  y  y  y  n  y  y  y  y  n
PING       760      0      -1   NI       0   yes  kernel      y  y  y  n 
  n  y  n  n  y  y  y  y  n  y  y  y  y  y  n
RAW        768      0      -1   NI       0   yes  kernel      y  y  y  n 
  y  y  y  n  y  y  y  y  n  y  y  y  y  n  n
UDP        800      1       0   NI       0   yes  kernel      y  y  y  n 
  y  n  y  n  y  y  y  y  y  n  y  y  y  y  n
TCP       1608     15  1823049   yes    208   yes  kernel      y  y  y  
y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
NETLINK    992      0      -1   NI       0   no   kernel      n  n  n  n 
  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n

Also two warnings shown on latest stable (well almost latest, 3.15.6 
released very recently, i have 3.15.5).

[73563.552667] ------------[ cut here ]------------
[73563.552889] WARNING: CPU: 3 PID: 0 at net/core/stream.c:201 
sk_stream_kill_queues+0xff/0x104()
[73563.553308] Modules linked in: iptable_filter tcp_diag udp_diag 
inet_diag unix_diag microcode xt_tcpudp xt_mark iptable_mangle ip_tables 
x_tables 8021q garp stp mrp llc
[73563.553975] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 
3.15.5-build-0007 #1
[73563.554190] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 
2.0.11 02/26/2010
[73563.559607]  0000000000000000 ffff88080f263a38 ffffffff8160054b 
0000000000000000
[73563.560029]  ffff88080f263a70 ffffffff810b4e03 ffffffff81584ed6 
ffff880093b9d480
[73563.560451]  ffff880093b9d5c8 ffff8806dcecb0a8 0000000000000000 
ffff88080f263a80
[73563.560874] Call Trace:
[73563.561080]  <IRQ>  [<ffffffff8160054b>] dump_stack+0x45/0x56
[73563.561304]  [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[73563.561517]  [<ffffffff81584ed6>] ? sk_stream_kill_queues+0xff/0x104
[73563.561730]  [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[73563.561943]  [<ffffffff81584ed6>] sk_stream_kill_queues+0xff/0x104
[73563.562158]  [<ffffffff815bf18a>] inet_csk_destroy_sock+0x77/0xb7
[73563.562372]  [<ffffffff815c0555>] tcp_done+0x78/0x7d
[73563.562583]  [<ffffffff815c8d2c>] tcp_reset+0x53/0x56
[73563.562795]  [<ffffffff815c8e96>] tcp_validate_incoming+0x167/0x261
[73563.563012]  [<ffffffff815c9d7c>] tcp_rcv_state_process+0x613/0xaa3
[73563.563228]  [<ffffffff815cfae3>] tcp_v4_do_rcv+0x291/0x2db
[73563.563440]  [<ffffffff815b531c>] ? inet_del_offload+0x38/0x38
[73563.563653]  [<ffffffff815d2197>] tcp_v4_rcv+0x329/0x617
[73563.563868]  [<ffffffff815b06ff>] ? nf_hook_slow+0x6a/0xfb
[73563.564080]  [<ffffffff815b53d1>] ip_local_deliver_finish+0xb5/0x125
[73563.564293]  [<ffffffff815b584f>] ip_local_deliver+0x6d/0x71
[73563.564505]  [<ffffffff815b56b3>] ip_rcv_finish+0x272/0x2b4
[73563.564717]  [<ffffffff815b5aea>] ip_rcv+0x297/0x305
[73563.564929]  [<ffffffff8158b3c3>] 
__netif_receive_skb_core+0x55b/0x5a4
[73563.565142]  [<ffffffff8158b41f>] __netif_receive_skb+0x13/0x55
[73563.565353]  [<ffffffff8158b49c>] 
netif_receive_skb_internal+0x3b/0x72
[73563.565565]  [<ffffffff8158ba45>] napi_gro_receive+0x39/0x7b
[73563.565779]  [<ffffffff813b2bd2>] bnx2_poll_work+0xf8a/0x10be
[73563.565990]  [<ffffffff815c13d1>] ? tcp_sendmsg+0x435/0xb18
[73563.566201]  [<ffffffff813b2d32>] bnx2_poll_msix+0x2c/0xae
[73563.566413]  [<ffffffff8158c133>] net_rx_action+0xc9/0x162
[73563.566628]  [<ffffffff810b846a>] __do_softirq+0xd2/0x1c3
[73563.566841]  [<ffffffff810b86b2>] irq_exit+0x3a/0x81
[73563.567059]  [<ffffffff81004021>] do_IRQ+0xb4/0xcd
[73563.567275]  [<ffffffff816061aa>] common_interrupt+0x6a/0x6a
[73563.567490]  <EOI>  [<ffffffff81559e73>] ? 
cpuidle_enter_state+0x4d/0x9e
[73563.567713]  [<ffffffff81559ee6>] cpuidle_enter+0x12/0x14
[73563.567933]  [<ffffffff810e27f7>] cpu_startup_entry+0x163/0x204
[73563.568150]  [<ffffffff810239af>] start_secondary+0x1b4/0x1b9
[73563.568366] ---[ end trace 098a1058f104bc5a ]---
[73563.568577] ------------[ cut here ]------------
[73563.568791] WARNING: CPU: 3 PID: 0 at net/ipv4/af_inet.c:153 
inet_sock_destruct+0x160/0x189()
[73563.569211] Modules linked in: iptable_filter tcp_diag udp_diag 
inet_diag unix_diag microcode xt_tcpudp xt_mark iptable_mangle ip_tables 
x_tables 8021q garp stp mrp llc
[73563.569878] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W     
3.15.5-build-0007 #1
[73563.570300] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 
2.0.11 02/26/2010
[73563.570719]  0000000000000000 ffff88080f263b38 ffffffff8160054b 
0000000000000000
[73563.571145]  ffff88080f263b70 ffffffff810b4e03 ffffffff815e01fb 
ffff880093b9d480
[73563.571573]  ffff880093b9d5c8 ffff880093b9d4d0 0000000000000000 
ffff88080f263b80
[73563.572000] Call Trace:
[73563.572210]  <IRQ>  [<ffffffff8160054b>] dump_stack+0x45/0x56
[73563.572430]  [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[73563.572646]  [<ffffffff815e01fb>] ? inet_sock_destruct+0x160/0x189
[73563.572862]  [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[73563.573076]  [<ffffffff815e01fb>] inet_sock_destruct+0x160/0x189
[73563.573292]  [<ffffffff8157cb5e>] __sk_free+0x18/0xd5
[73563.573504]  [<ffffffff8157cc2e>] sk_free+0x13/0x15
[73563.573719]  [<ffffffff815d2464>] tcp_v4_rcv+0x5f6/0x617
[73563.573933]  [<ffffffff815b06ff>] ? nf_hook_slow+0x6a/0xfb
[73563.574147]  [<ffffffff815b53d1>] ip_local_deliver_finish+0xb5/0x125
[73563.574362]  [<ffffffff815b584f>] ip_local_deliver+0x6d/0x71
[73563.574576]  [<ffffffff815b56b3>] ip_rcv_finish+0x272/0x2b4
[73563.574789]  [<ffffffff815b5aea>] ip_rcv+0x297/0x305
[73563.575002]  [<ffffffff8158b3c3>] 
__netif_receive_skb_core+0x55b/0x5a4
[73563.575218]  [<ffffffff8158b41f>] __netif_receive_skb+0x13/0x55
[73563.575432]  [<ffffffff8158b49c>] 
netif_receive_skb_internal+0x3b/0x72
[73563.575650]  [<ffffffff8158ba45>] napi_gro_receive+0x39/0x7b
[73563.575866]  [<ffffffff813b2bd2>] bnx2_poll_work+0xf8a/0x10be
[73563.576083]  [<ffffffff815c13d1>] ? tcp_sendmsg+0x435/0xb18
[73563.576298]  [<ffffffff813b2d32>] bnx2_poll_msix+0x2c/0xae
[73563.576512]  [<ffffffff8158c133>] net_rx_action+0xc9/0x162
[73563.576729]  [<ffffffff810b846a>] __do_softirq+0xd2/0x1c3
[73563.576944]  [<ffffffff810b86b2>] irq_exit+0x3a/0x81
[73563.577159]  [<ffffffff81004021>] do_IRQ+0xb4/0xcd
[73563.577377]  [<ffffffff816061aa>] common_interrupt+0x6a/0x6a
[73563.577591]  <EOI>  [<ffffffff81559e73>] ? 
cpuidle_enter_state+0x4d/0x9e
[73563.577814]  [<ffffffff81559ee6>] cpuidle_enter+0x12/0x14
[73563.578029]  [<ffffffff810e27f7>] cpu_startup_entry+0x163/0x204
[73563.578246]  [<ffffffff810239af>] start_secondary+0x1b4/0x1b9
[73563.578460] ---[ end trace 098a1058f104bc5b ]---

On 2014-07-17 16:42, Eric Dumazet wrote:
> On Thu, 2014-07-17 at 16:12 +0300, Denys Fedoryshchenko wrote:
>> Well, it seems not normal. But i am not sure, is mem value means this
>> memory is allocated and not released?
> 
> mem means : sockets have forward_alloc reservations. They can use this
> amount of memory without worrying.
> 
>> If it is allocated pages, MemFree in /proc/memstat should be different
>> value.
> 
> Its not allocated pages.
> 
>> I just shut down balancer process now and bypassed all traffic, and
>> waited all(ok, almost all) sockets to close, this particular server 
>> had
>> 300k connections before test.
>> 
>> sockets: used 78
>> TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593
>> UDP: inuse 1 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 0 memory 0
>> 
>> MemTotal:       32664776 kB
>> MemFree:        32462456 kB
>> MemAvailable:   32449704 kB
>> 
>> ss output:
>> 
>> Netid State      Recv-Q Send-Q
>>                                                            Local
>> Address:Port
>>                                            Peer Address:Port
>> nl    UNCONN     0      0
>> 
>> 0:0
>>                                                  *
>> nl    UNCONN     0      0
>> 
>> 4:0
>>                                                  *
>> nl    UNCONN     0      0
>> 
>> 10:0
>>                                                   *
>> nl    UNCONN     0      0
>> 
>> 11:0
>>                                                   *
>> nl    UNCONN     0      0
>> 
>> 15:0
>>                                                   *
>> nl    UNCONN     0      0
>> 
>> 16:0
>>                                                   *
>> nl    UNCONN     0      0
>> 
>> 18:0
>>                                                   *
>> nl    UNCONN     0      0
>> 
>> 21:0
>>                                                   *
>> tcp   UNCONN     0      0
>> 
>> *:49484
>>                                                 *:*
>>           skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
>> tcp   LISTEN     0      1280
>> 
>> *:8880
>>                                                 *:*
>>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp   LISTEN     0      20
>> 
>> *:22
>>                                                 *:*
>>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp   LISTEN     0      1
>> 
>> *:23
>>                                                 *:*
>>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp   LISTEN     0      1280
>> 
>> *:443
>>                                                 *:*
>>           skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp   FIN-WAIT-1 0      4545
>> 
>> 157.56.194.74:443
>>                                                   172.17.37.4:52473
>>           skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
>> tcp   ESTAB      0      3008
>> 
>> 10.0.22.22:22
>>                                             194.146.153.22:59156
>>           skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0)
>> tcp   FIN-WAIT-1 0      1
>> 
>> 54.225.250.234:443
>>                                                  172.17.32.251:35969
>>           skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0)
>> tcp   FIN-WAIT-2 0      0
>> 
>> 206.53.159.203:443
>>                                                   172.17.76.10:54688
>> 
>> tcp   FIN-WAIT-2 0      0
>> 
>> 173.194.39.40:443
>>                                                 172.17.64.221:39482
>> 
>> tcp   FIN-WAIT-1 0      1
>> 
>> 69.171.235.48:443
>>                                                 172.17.32.251:52731
>>           skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0)
> 
> This looks bad :
> 
> f4294966016 : 0xFFFFFB00
> 
> There is a bug of some sort, that leads to overflows.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
  2014-07-18  8:12       ` Denys Fedoryshchenko
@ 2014-07-18  8:41         ` Eric Dumazet
  2014-09-23 20:14           ` Semen Slepov
       [not found]           ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-07-18  8:41 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem

On Fri, 2014-07-18 at 11:12 +0300, Denys Fedoryshchenko wrote:
> More updates. Now i found out how to see if tcp are under memory 
> pressure mode, i guess when it is active, speed might degrade. Now 
> almost all sockets are released, is it normal that tcp pressure still 
> "yes" ?
> 
> balancer-backup ~ # cat /proc/net/protocols
> protocol  size sockets  memory press maxhdr  slab module     cl co di ac 
> io in de sh ss gs se re sp bi br ha uh gp em
> PACKET    1216      0      -1   NI       0   no   kernel      n  n  n  n 
>   n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
> ALG        624      0       0   NI       0   no   kernel      n  n  n  n 
>   n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
> UNIX       832      1      -1   NI       0   yes  kernel      n  n  n  n 
>   n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
> UDP-Lite   800      0      -1   NI       0   yes  kernel      y  y  y  n 
>   y  y  y  n  y  y  y  y  y  n  y  y  y  y  n
> PING       760      0      -1   NI       0   yes  kernel      y  y  y  n 
>   n  y  n  n  y  y  y  y  n  y  y  y  y  y  n
> RAW        768      0      -1   NI       0   yes  kernel      y  y  y  n 
>   y  y  y  n  y  y  y  y  n  y  y  y  y  n  n
> UDP        800      1       0   NI       0   yes  kernel      y  y  y  n 
>   y  n  y  n  y  y  y  y  y  n  y  y  y  y  n
> TCP       1608     24  1823119   yes    208   yes  kernel      y  y  y  
> y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
> NETLINK    992      0      -1   NI       0   no   kernel      n  n  n  n 
>   n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
> balancer-backup ~ # cat /proc/net/sockstat
> sockets: used 51
> TCP: inuse 21 orphan 13 tw 2 alloc 21 mem 1823120
> UDP: inuse 1 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 0 memory


tcp believes its under pressure because of some bug on accounting.

If you change /proc/sys/net/ipv4/tcp_mem   (3rd value), it might say it
is not under pressure.

I am taking a look, thanks again for the report.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
  2014-07-18  8:41         ` Eric Dumazet
@ 2014-09-23 20:14           ` Semen Slepov
       [not found]           ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Semen Slepov @ 2014-09-23 20:14 UTC (permalink / raw)
  To: netdev

Eric Dumazet <eric.dumazet <at> gmail.com> writes:

> 
> 
> tcp believes its under pressure because of some bug on accounting.
> 
> If you change /proc/sys/net/ipv4/tcp_mem   (3rd value), it might say it
> is not under pressure.
> 
> I am taking a look, thanks again for the report.
> 
> 

Still have this accounting bug on 3.15.7 . Is there fix in further kernel
releases (or patch)?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
       [not found]                         ` <8f138d88e3b850130eb9bb15e3e27069@visp.net.lb>
@ 2014-11-12 17:07                           ` Denys Fedoryshchenko
  2014-11-12 17:27                             ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-12 17:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hi


I've been able to trace invalid memory shown for sk_forward_alloc. Not 
sure if it is related to bug when all tcp stack are wrecked on highload 
servers (i had even such crash on server with plain torrents, not that 
much of them).
First they were appearing in ss output as f4294966016, but in tc it 
shows as unsigned integer, while in kernel it is signed integer. Should 
i provide patch for iproute2?
After changing value to correct one, here is what is got:
          skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
          skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)

So as it is signed integer, it is actually appears a lot as -1280 bytes.
After placing several WARN_ON_ONCE on fall functions involving 
sk_forward_alloc, where negative value may
appear i traced down at least to tcp_connect_queue_skb().
[   13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476 
tcp_connect_queue_skb+0x9f/0xd0()
After adding some debug values, it was confirmed:


skb_header_release(skb);
__tcp_add_write_queue_tail(sk, skb);
sk->sk_wmem_queued += skb->truesize;
sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is 0, 
and skb->truesize is 1280, so -1280 is result
tp->write_seq = tcb->end_seq;
tp->packets_out += tcp_skb_pcount(skb);

Maybe it should not be subtracted here? Or maybe logic are inverted all 
over the code?
Please help :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-12 17:07                           ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Denys Fedoryshchenko
@ 2014-11-12 17:27                             ` Eric Dumazet
  2014-11-16  8:54                               ` Denys Fedoryshchenko
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-12 17:27 UTC (permalink / raw)
  To: Denys Fedoryshchenko, Neal Cardwell, Yuchung Cheng; +Cc: netdev

On Wed, 2014-11-12 at 19:07 +0200, Denys Fedoryshchenko wrote:
> Hi
> 
> 
> I've been able to trace invalid memory shown for sk_forward_alloc. Not 
> sure if it is related to bug when all tcp stack are wrecked on highload 
> servers (i had even such crash on server with plain torrents, not that 
> much of them).
> First they were appearing in ss output as f4294966016, but in tc it 
> shows as unsigned integer, while in kernel it is signed integer. Should 
> i provide patch for iproute2?
> After changing value to correct one, here is what is got:
>           skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> 
> So as it is signed integer, it is actually appears a lot as -1280 bytes.
> After placing several WARN_ON_ONCE on fall functions involving 
> sk_forward_alloc, where negative value may
> appear i traced down at least to tcp_connect_queue_skb().
> [   13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476 
> tcp_connect_queue_skb+0x9f/0xd0()
> After adding some debug values, it was confirmed:
> 
> 
> skb_header_release(skb);
> __tcp_add_write_queue_tail(sk, skb);
> sk->sk_wmem_queued += skb->truesize;
> sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is 0, 
> and skb->truesize is 1280, so -1280 is result
> tp->write_seq = tcb->end_seq;
> tp->packets_out += tcp_skb_pcount(skb);
> 
> Maybe it should not be subtracted here? Or maybe logic are inverted all 
> over the code?
> Please help :)
> 

Thanks a lot for the report, I am CCing other experts ;)

AFAIK, a negative forward alloc should not be a problem, a long as the
incursion is bound.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-12 17:27                             ` Eric Dumazet
@ 2014-11-16  8:54                               ` Denys Fedoryshchenko
  2014-11-16 18:11                                 ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-16  8:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev

As latest findings, when servers are going crazy because of tcp memory 
invalid accounting.
First of all i upgraded kernel to latest version 3.17.3 and added also 
patch from upcoming kernel,
"12) Don't call sock_kfree_s() with NULL pointers, this function also 
has the side effect of adjusting
the socket memory usage.  From Cong Wang.", but it didnt helped.

I added printk_ratelimited to places where suspicious values might 
appear, and got some more information.
First, is not very suspicious, no idea if it is a problem:
[ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 4352
[ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 4352
[ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 4352
[ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 4352
[ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 4352
[ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 4352
[ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 4352
[ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 4352
[ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 4352
[ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
[ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 4352
[ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
[ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 4352
[ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 4352
[ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 4352
[ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 4352
[ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 4352
Second is always linked with crashes, it is sk_mem_uncharge and 
sk_forward_alloc goes negative. Patch to show message
for sk_mem_uncharge in sock.h is very simple:

  static inline void sk_mem_uncharge(struct sock *sk, int size)
@@ -1480,6 +1485,8 @@
         if (!sk_has_account(sk))
                 return;
         sk->sk_forward_alloc += size;
+       if (sk->sk_forward_alloc < -8192)
+           printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge 
negative %d by %d\n", sk, sk->sk_forward_alloc, size);
  }


This is what i am usually setting before box are rebooted:
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.363437] sk ffff88155e904100 
sk_mem_uncharge negative -2147482496 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.363837] sk ffff88155e904100 
sk_mem_uncharge negative -2147480192 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.364232] sk ffff88155e904100 
sk_mem_uncharge negative -2147477888 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.364627] sk ffff88155e904100 
sk_mem_uncharge negative -2147475584 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365022] sk ffff88155e904100 
sk_mem_uncharge negative -2147473280 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365416] sk ffff88155e904100 
sk_mem_uncharge negative -2147470976 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365811] sk ffff88155e904100 
sk_mem_uncharge negative -2147468672 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.366214] sk ffff88155e904100 
sk_mem_uncharge negative -2147466368 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.366611] sk ffff88155e904100 
sk_mem_uncharge negative -2147464064 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.367004] sk ffff88155e904100 
sk_mem_uncharge negative -2147461760 by 2304
Nov 15 13:20:58 g1 user.warn kernel: [ 2007.363601] sk_mem_uncharge: 
576170 callbacks suppressed
Nov 15 13:20:58 g1 user.warn kernel: [ 2007.364001] sk ffff88155e904100 
sk_mem_uncharge negative -1517794432 by 2304
.....
until
...
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.393667] sk ffff88170141ad80 
sk_mem_uncharge negative -169088 by 2944
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394064] sk ffff88170141ad80 
sk_mem_uncharge negative -166144 by 2944
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394472] sk ffff88170141ad80 
sk_mem_uncharge negative -163200 by 2944
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394871] sk ffff88170141ad80 
sk_mem_uncharge negative -158208 by 4992
<reboot>

Or:
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758129] sk ffff8815f8014e00 
sk_mem_uncharge negative -2147483520 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758536] sk ffff8815f8014e00 
sk_mem_uncharge negative -2147481216 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758935] sk ffff8815f8014e00 
sk_mem_uncharge negative -2147478912 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.759332] sk ffff8815f8014e00 
sk_mem_uncharge negative -2147476608 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.759728] sk ffff8815f8014e00 
sk_mem_uncharge negative -2147474304 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.760124] sk ffff8815f8014e00 
sk_mem_uncharge negative -2147472000 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.760518] sk ffff8815f8014e00 
sk_mem_uncharge negative -2147469696 by 2304
...
Nov 15 15:03:27 g1 user.warn kernel: [ 2472.170857] sk ffff88163f429a00 
sk_mem_uncharge negative -277169024 by 896
Nov 15 15:03:30 g1 user.warn kernel: [ 2474.611109] sk ffff88163f429a00 
sk_mem_uncharge negative -277164928 by 896
Nov 15 15:03:30 g1 user.warn kernel: [ 2474.611511] sk ffff88163f429a00 
sk_mem_uncharge negative -277164032 by 896
Nov 15 15:03:30 g1 user.warn kernel: [ 2474.706237] sk ffff88163f429a00 
sk_mem_uncharge negative -277159936 by 2304
Nov 15 15:03:34 g1 user.warn kernel: [ 2478.889110] sk ffff88163f429a00 
sk_mem_uncharge negative -277155840 by 2304
Nov 15 15:03:34 g1 user.warn kernel: [ 2478.911516] sk ffff88163f429a00 
sk_mem_uncharge negative -277147648 by 1536
Nov 15 15:03:35 g1 user.warn kernel: [ 2479.320851] sk ffff88163f429a00 
sk_mem_uncharge negative -277143552 by 2304
Nov 15 15:04:33 g1 user.warn kernel: [ 2537.408184] sk ffff88163f429a00 
sk_mem_uncharge negative -277139456 by 2304
Nov 15 15:04:33 g1 user.warn kernel: [ 2537.409790] sk ffff88163f429a00 
sk_mem_uncharge negative -277135360 by 896
(here i can see two sk at same time are like this)


Or

Nov 15 23:37:19 g1 user.warn kernel: [14137.049570] sk ffff8816b3312700 
sk_mem_uncharge negative -2147481856 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.049973] sk ffff8816b3312700 
sk_mem_uncharge negative -2147479552 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.050372] sk ffff8816b3312700 
sk_mem_uncharge negative -2147477248 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.050770] sk ffff8816b3312700 
sk_mem_uncharge negative -2147474944 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.051170] sk ffff8816b3312700 
sk_mem_uncharge negative -2147472640 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.051572] sk ffff8816b3312700 
sk_mem_uncharge negative -2147470336 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.051971] sk ffff8816b3312700 
sk_mem_uncharge negative -2147468032 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.052371] sk ffff8816b3312700 
sk_mem_uncharge negative -2147465728 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.052771] sk ffff8816b3312700 
sk_mem_uncharge negative -2147463424 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.053169] sk ffff8816b3312700 
sk_mem_uncharge negative -2147461120 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.526984] sk_mem_uncharge: 
1083519 callbacks suppressed
Nov 15 23:37:32 g1 user.warn kernel: [14150.527384] sk ffff8816b3312700 
sk_mem_uncharge negative -2147483392 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.527782] sk ffff8816b3312700 
sk_mem_uncharge negative -2147481088 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.528179] sk ffff8816b3312700 
sk_mem_uncharge negative -2147478784 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.528576] sk ffff8816b3312700 
sk_mem_uncharge negative -2147476480 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.528975] sk ffff8816b3312700 
sk_mem_uncharge negative -2147474176 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.529370] sk ffff8816b3312700 
sk_mem_uncharge negative -2147471872 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.529770] sk ffff8816b3312700 
sk_mem_uncharge negative -2147469568 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.530167] sk ffff8816b3312700 
sk_mem_uncharge negative -2147467264 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.530572] sk ffff8816b3312700 
sk_mem_uncharge negative -2147464960 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.530973] sk ffff8816b3312700 
sk_mem_uncharge negative -2147462656 by 2304
<here it reboots much earlier, at this point>

I will try to get more info, what kind of socket it is.

On 2014-11-12 19:27, Eric Dumazet wrote:
> On Wed, 2014-11-12 at 19:07 +0200, Denys Fedoryshchenko wrote:
>> Hi
>> 
>> 
>> I've been able to trace invalid memory shown for sk_forward_alloc. Not
>> sure if it is related to bug when all tcp stack are wrecked on 
>> highload
>> servers (i had even such crash on server with plain torrents, not that
>> much of them).
>> First they were appearing in ss output as f4294966016, but in tc it
>> shows as unsigned integer, while in kernel it is signed integer. 
>> Should
>> i provide patch for iproute2?
>> After changing value to correct one, here is what is got:
>>           skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>>           skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> 
>> So as it is signed integer, it is actually appears a lot as -1280 
>> bytes.
>> After placing several WARN_ON_ONCE on fall functions involving
>> sk_forward_alloc, where negative value may
>> appear i traced down at least to tcp_connect_queue_skb().
>> [   13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476
>> tcp_connect_queue_skb+0x9f/0xd0()
>> After adding some debug values, it was confirmed:
>> 
>> 
>> skb_header_release(skb);
>> __tcp_add_write_queue_tail(sk, skb);
>> sk->sk_wmem_queued += skb->truesize;
>> sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is 
>> 0,
>> and skb->truesize is 1280, so -1280 is result
>> tp->write_seq = tcb->end_seq;
>> tp->packets_out += tcp_skb_pcount(skb);
>> 
>> Maybe it should not be subtracted here? Or maybe logic are inverted 
>> all
>> over the code?
>> Please help :)
>> 
> 
> Thanks a lot for the report, I am CCing other experts ;)
> 
> AFAIK, a negative forward alloc should not be a problem, a long as the
> incursion is bound.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-16  8:54                               ` Denys Fedoryshchenko
@ 2014-11-16 18:11                                 ` Eric Dumazet
  2014-11-16 19:05                                   ` Denys Fedoryshchenko
  2014-11-16 19:10                                   ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet
  0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-11-16 18:11 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev

On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote:
> As latest findings, when servers are going crazy because of tcp memory 
> invalid accounting.
> First of all i upgraded kernel to latest version 3.17.3 and added also 
> patch from upcoming kernel,
> "12) Don't call sock_kfree_s() with NULL pointers, this function also 
> has the side effect of adjusting
> the socket memory usage.  From Cong Wang.", but it didnt helped.
> 
> I added printk_ratelimited to places where suspicious values might 
> appear, and got some more information.
> First, is not very suspicious, no idea if it is a problem:
> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 4352
> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 4352
> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 4352
> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 4352
> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 4352
> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 4352
> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 4352
> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 4352
> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 4352
> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 4352
> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 4352
> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 4352
> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 4352
> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 4352
> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 4352
> Second is always linked with crashes, it is sk_mem_uncharge and 
> sk_forward_alloc goes negative. Patch to show message
> for sk_mem_uncharge in sock.h is very simple:
> 
>   static inline void sk_mem_uncharge(struct sock *sk, int size)
> @@ -1480,6 +1485,8 @@
>          if (!sk_has_account(sk))
>                  return;
>          sk->sk_forward_alloc += size;
> +       if (sk->sk_forward_alloc < -8192)
> +           printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge 
> negative %d by %d\n", sk, sk->sk_forward_alloc, size);
>   }
> 


Could you describe your hardware setup and networking setup ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-16 18:11                                 ` Eric Dumazet
@ 2014-11-16 19:05                                   ` Denys Fedoryshchenko
  2014-11-16 20:16                                     ` Eric Dumazet
  2014-11-16 19:10                                   ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-16 19:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev

On 2014-11-16 20:11, Eric Dumazet wrote:
> On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote:
>> As latest findings, when servers are going crazy because of tcp memory
>> invalid accounting.
>> First of all i upgraded kernel to latest version 3.17.3 and added also
>> patch from upcoming kernel,
>> "12) Don't call sock_kfree_s() with NULL pointers, this function also
>> has the side effect of adjusting
>> the socket memory usage.  From Cong Wang.", but it didnt helped.
>> 
>> I added printk_ratelimited to places where suspicious values might
>> appear, and got some more information.
>> First, is not very suspicious, no idea if it is a problem:
>> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 
>> 4352
>> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 
>> 4352
>> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 
>> 4352
>> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 
>> 4352
>> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 
>> 4352
>> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 
>> 4352
>> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 
>> 4352
>> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 
>> 4352
>> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 
>> 4352
>> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 
>> 4352
>> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 
>> 4352
>> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 
>> 4352
>> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 
>> 4352
>> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 
>> 4352
>> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 
>> 4352
>> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 
>> 4352
>> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 
>> 4352
>> Second is always linked with crashes, it is sk_mem_uncharge and
>> sk_forward_alloc goes negative. Patch to show message
>> for sk_mem_uncharge in sock.h is very simple:
>> 
>>   static inline void sk_mem_uncharge(struct sock *sk, int size)
>> @@ -1480,6 +1485,8 @@
>>          if (!sk_has_account(sk))
>>                  return;
>>          sk->sk_forward_alloc += size;
>> +       if (sk->sk_forward_alloc < -8192)
>> +           printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge
>> negative %d by %d\n", sk, sk->sk_forward_alloc, size);
>>   }
>> 
> 
> 
> Could you describe your hardware setup and networking setup ?
This problem are happening on multiple different units that i am using 
as https balancers, and all of them very different (except it is all 
Intel CPU's, but even in that - different generations and models). Such 
problem seems happens on all of them, and seems doesn't depend on 
hardware (networking - igb, e1000e, broadcom stuff - all affected). But 
if it is important:
S2600GZ motherboard, one E5-2620 Xeon
networking - onboard igb, 2 ports used
100GB RAM
This particular one has bonding (but it seems crashes with or without 
it).

System are custom, running on USB flash, busybox+glibc based setup, 
similar OS working for other purposes for NAT, PPPoE termination without 
any issues.

What is common between failing units:

I am using haproxy-based HTTPS balancer(Also as i remember haproxy doing 
a lot of setsockopt stuff), that is handling right now:
     454444 connections established
Bandwidth passing thru is around 1Gbps.

I'm disabling tso/gso/gro on all interfaces.

The way i am forwarding transparent traffic to haproxy:
iptables -t mangle -A PREROUTING -p tcp --sport 443 -j MARK --set-mark 
0x1
iptables -t mangle -A PREROUTING -p tcp --dport 443 -j MARK --set-mark 
0x1
ip rule add fwmark 0x1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100

"Typical" setup is

backend ssl_passthru
         mode tcp
         option transparent
         source 0.0.0.0 usesrc clientip

frontend ssl-in
         mode tcp
         bind    :443 transparent
         default_backend ssl_passthru
         option tcp-smart-accept

I hope i didnt missed something important. I can provide remote ssh 
access to it.
I will keep sending info, just with hope that some of info maybe will 
give idea, what i should patch or test.

P.S. Just got an idea now, that -2147483648 hinting that somewhere is 
happening integer overflow from very large positive value, to negative. 
I will try to set triggers also to that now.

If required i can provide image with such system. I am not sure you are 
interested in this problem and if it can be reproduced on synthetic 
setup, but as i remember this memory leak happened with me once also on 
normal server with torrents (i left some image unattended for 2 weeks, 
with a lot of requests, and it crashed at the end), so it might affect 
also other use cases.
I am trying to limit now socket buffers, to see if it will decrease 
frequency of crashes.
Also i tried to put "canary" values inside structure, near 
sk_forward_alloc , to see if there is any sort of memory corruption 
occuring on sk_forward_alloc, but seems there is no corruption.
I will try also going back to stable kernels 3.2.64, to see if it will 
fix this problem, but testing takes sometimes almost 1 day, depends on 
luck.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-16 18:11                                 ` Eric Dumazet
  2014-11-16 19:05                                   ` Denys Fedoryshchenko
@ 2014-11-16 19:10                                   ` Eric Dumazet
  1 sibling, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-11-16 19:10 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev

On Sun, 2014-11-16 at 10:11 -0800, Eric Dumazet wrote:

> Could you describe your hardware setup and networking setup ?

Hmmm, this looks like a bug in TCP fast open.

I am cooking a patch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-16 19:05                                   ` Denys Fedoryshchenko
@ 2014-11-16 20:16                                     ` Eric Dumazet
  2014-11-17  5:01                                       ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-16 20:16 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev

On Sun, 2014-11-16 at 21:05 +0200, Denys Fedoryshchenko wrote:
> On 2014-11-16 20:11, Eric Dumazet wrote:
> > On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote:
> >> As latest findings, when servers are going crazy because of tcp memory
> >> invalid accounting.
> >> First of all i upgraded kernel to latest version 3.17.3 and added also
> >> patch from upcoming kernel,
> >> "12) Don't call sock_kfree_s() with NULL pointers, this function also
> >> has the side effect of adjusting
> >> the socket memory usage.  From Cong Wang.", but it didnt helped.
> >> 
> >> I added printk_ratelimited to places where suspicious values might
> >> appear, and got some more information.
> >> First, is not very suspicious, no idea if it is a problem:
> >> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 
> >> 4352
> >> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 
> >> 4352
> >> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 
> >> 4352
> >> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 
> >> 4352
> >> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 
> >> 4352
> >> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 
> >> 4352
> >> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 
> >> 4352
> >> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 
> >> 4352
> >> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 
> >> 4352
> >> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 
> >> 4352
> >> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 
> >> 4352
> >> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 
> >> 4352
> >> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 
> >> 4352
> >> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 
> >> 4352
> >> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 
> >> 4352
> >> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 
> >> 4352
> >> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 
> >> 4352
> >> Second is always linked with crashes, it is sk_mem_uncharge and
> >> sk_forward_alloc goes negative. Patch to show message
> >> for sk_mem_uncharge in sock.h is very simple:
> >> 
> >>   static inline void sk_mem_uncharge(struct sock *sk, int size)
> >> @@ -1480,6 +1485,8 @@
> >>          if (!sk_has_account(sk))
> >>                  return;
> >>          sk->sk_forward_alloc += size;
> >> +       if (sk->sk_forward_alloc < -8192)
> >> +           printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge
> >> negative %d by %d\n", sk, sk->sk_forward_alloc, size);
> >>   }
> >> 
> > 
> > 
> > Could you describe your hardware setup and networking setup ?
> This problem are happening on multiple different units that i am using 
> as https balancers, and all of them very different (except it is all 
> Intel CPU's, but even in that - different generations and models). Such 
> problem seems happens on all of them, and seems doesn't depend on 
> hardware (networking - igb, e1000e, broadcom stuff - all affected). But 
> if it is important:
> S2600GZ motherboard, one E5-2620 Xeon
> networking - onboard igb, 2 ports used
> 100GB RAM
> This particular one has bonding (but it seems crashes with or without 
> it).
> 
> System are custom, running on USB flash, busybox+glibc based setup, 
> similar OS working for other purposes for NAT, PPPoE termination without 
> any issues.
> 
> What is common between failing units:
> 
> I am using haproxy-based HTTPS balancer(Also as i remember haproxy doing 
> a lot of setsockopt stuff), that is handling right now:
>      454444 connections established
> Bandwidth passing thru is around 1Gbps.
> 
> I'm disabling tso/gso/gro on all interfaces.
> 
> The way i am forwarding transparent traffic to haproxy:
> iptables -t mangle -A PREROUTING -p tcp --sport 443 -j MARK --set-mark 
> 0x1
> iptables -t mangle -A PREROUTING -p tcp --dport 443 -j MARK --set-mark 
> 0x1
> ip rule add fwmark 0x1 lookup 100
> ip route add local 0.0.0.0/0 dev lo table 100
> 
> "Typical" setup is
> 
> backend ssl_passthru
>          mode tcp
>          option transparent
>          source 0.0.0.0 usesrc clientip
> 
> frontend ssl-in
>          mode tcp
>          bind    :443 transparent
>          default_backend ssl_passthru
>          option tcp-smart-accept
> 
> I hope i didnt missed something important. I can provide remote ssh 
> access to it.
> I will keep sending info, just with hope that some of info maybe will 
> give idea, what i should patch or test.
> 
> P.S. Just got an idea now, that -2147483648 hinting that somewhere is 
> happening integer overflow from very large positive value, to negative. 
> I will try to set triggers also to that now.
> 
> If required i can provide image with such system. I am not sure you are 
> interested in this problem and if it can be reproduced on synthetic 
> setup, but as i remember this memory leak happened with me once also on 
> normal server with torrents (i left some image unattended for 2 weeks, 
> with a lot of requests, and it crashed at the end), so it might affect 
> also other use cases.
> I am trying to limit now socket buffers, to see if it will decrease 
> frequency of crashes.
> Also i tried to put "canary" values inside structure, near 
> sk_forward_alloc , to see if there is any sort of memory corruption 
> occuring on sk_forward_alloc, but seems there is no corruption.
> I will try also going back to stable kernels 3.2.64, to see if it will 
> fix this problem, but testing takes sometimes almost 1 day, depends on 
> luck.

Thanks Denys !

Could you try following patch ?

Thanks !

 net/ipv4/tcp_output.c |   33 +++++++++++----------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a3d453b94747..877eb4aa05a6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_fastopen_request *fo = tp->fastopen_req;
-	int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
+	int syn_loss = 0, space, err = 0;
 	struct sk_buff *syn_data = NULL, *data;
 	unsigned long last_syn_loss = 0;
 
@@ -3031,25 +3031,17 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 	/* limit to order-0 allocations */
 	space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
 
-	syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
-				   sk->sk_allocation);
-	if (syn_data == NULL)
+	syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+	if (!syn_data)
 		goto fallback;
 
-	for (i = 0; i < iovlen && syn_data->len < space; ++i) {
-		struct iovec *iov = &fo->data->msg_iov[i];
-		unsigned char __user *from = iov->iov_base;
-		int len = iov->iov_len;
-
-		if (syn_data->len + len > space)
-			len = space - syn_data->len;
-		else if (i + 1 == iovlen)
-			/* No more data pending in inet_wait_for_connect() */
-			fo->data = NULL;
+	memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+	if (memcpy_fromiovec(skb_put(syn_data, space), fo->data->msg_iov, space))
+		goto fallback;
 
-		if (skb_add_data(syn_data, from, len))
-			goto fallback;
-	}
+	/* No more data pending in inet_wait_for_connect() */
+	if (space == fo->size)
+		fo->data = NULL;
 
 	/* Queue a data-only packet after the regular SYN for retransmission */
 	data = pskb_copy(syn_data, sk->sk_allocation);
@@ -3101,13 +3093,10 @@ int tcp_connect(struct sock *sk)
 		return 0;
 	}
 
-	buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
-	if (unlikely(buff == NULL))
+	buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+	if (unlikely(!buff))
 		return -ENOBUFS;
 
-	/* Reserve space for headers. */
-	skb_reserve(buff, MAX_TCP_HEADER);
-
 	tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
 	tp->retrans_stamp = tcp_time_stamp;
 	tcp_connect_queue_skb(sk, buff);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-16 20:16                                     ` Eric Dumazet
@ 2014-11-17  5:01                                       ` Eric Dumazet
  2014-11-17 10:22                                         ` Denys Fedoryshchenko
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-17  5:01 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev

On Sun, 2014-11-16 at 12:16 -0800, Eric Dumazet wrote:

> Thanks Denys !
> 
> Could you try following patch ?
> 
> Thanks !

Hmm.... I have an updated patch, sorry.

(A memcpy_fromiovec() has to be memcpy_fromiovecend() )

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a3d453b94747..c2bbfcd9c0db 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_fastopen_request *fo = tp->fastopen_req;
-	int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
+	int syn_loss = 0, space, err = 0;
 	struct sk_buff *syn_data = NULL, *data;
 	unsigned long last_syn_loss = 0;
 
@@ -3031,25 +3031,19 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 	/* limit to order-0 allocations */
 	space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
 
-	syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
-				   sk->sk_allocation);
-	if (syn_data == NULL)
+	syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+	if (!syn_data)
 		goto fallback;
 
-	for (i = 0; i < iovlen && syn_data->len < space; ++i) {
-		struct iovec *iov = &fo->data->msg_iov[i];
-		unsigned char __user *from = iov->iov_base;
-		int len = iov->iov_len;
-
-		if (syn_data->len + len > space)
-			len = space - syn_data->len;
-		else if (i + 1 == iovlen)
-			/* No more data pending in inet_wait_for_connect() */
-			fo->data = NULL;
+	syn_data->ip_summed = CHECKSUM_PARTIAL;
+	memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+	if (memcpy_fromiovecend(skb_put(syn_data, space),
+				fo->data->msg_iov, 0, space))
+		goto fallback;
 
-		if (skb_add_data(syn_data, from, len))
-			goto fallback;
-	}
+	/* No more data pending in inet_wait_for_connect() */
+	if (space == fo->size)
+		fo->data = NULL;
 
 	/* Queue a data-only packet after the regular SYN for retransmission */
 	data = pskb_copy(syn_data, sk->sk_allocation);
@@ -3101,13 +3095,10 @@ int tcp_connect(struct sock *sk)
 		return 0;
 	}
 
-	buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
-	if (unlikely(buff == NULL))
+	buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+	if (unlikely(!buff))
 		return -ENOBUFS;
 
-	/* Reserve space for headers. */
-	skb_reserve(buff, MAX_TCP_HEADER);
-
 	tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
 	tp->retrans_stamp = tcp_time_stamp;
 	tcp_connect_queue_skb(sk, buff);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-17  5:01                                       ` Eric Dumazet
@ 2014-11-17 10:22                                         ` Denys Fedoryshchenko
  2014-11-17 15:33                                           ` Denys Fedoryshchenko
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-17 10:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev

On 2014-11-17 07:01, Eric Dumazet wrote:
> On Sun, 2014-11-16 at 12:16 -0800, Eric Dumazet wrote:
> 
>> Thanks Denys !
>> 
>> Could you try following patch ?
>> 
>> Thanks !
> 
> Hmm.... I have an updated patch, sorry.
> 
> (A memcpy_fromiovec() has to be memcpy_fromiovecend() )
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a3d453b94747..c2bbfcd9c0db 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk,
> struct sk_buff *syn)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
>  	struct tcp_fastopen_request *fo = tp->fastopen_req;
> -	int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
> +	int syn_loss = 0, space, err = 0;
>  	struct sk_buff *syn_data = NULL, *data;
>  	unsigned long last_syn_loss = 0;
> 
> @@ -3031,25 +3031,19 @@ static int tcp_send_syn_data(struct sock *sk,
> struct sk_buff *syn)
>  	/* limit to order-0 allocations */
>  	space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
> 
> -	syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
> -				   sk->sk_allocation);
> -	if (syn_data == NULL)
> +	syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
> +	if (!syn_data)
>  		goto fallback;
> 
> -	for (i = 0; i < iovlen && syn_data->len < space; ++i) {
> -		struct iovec *iov = &fo->data->msg_iov[i];
> -		unsigned char __user *from = iov->iov_base;
> -		int len = iov->iov_len;
> -
> -		if (syn_data->len + len > space)
> -			len = space - syn_data->len;
> -		else if (i + 1 == iovlen)
> -			/* No more data pending in inet_wait_for_connect() */
> -			fo->data = NULL;
> +	syn_data->ip_summed = CHECKSUM_PARTIAL;
> +	memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
> +	if (memcpy_fromiovecend(skb_put(syn_data, space),
> +				fo->data->msg_iov, 0, space))
> +		goto fallback;
> 
> -		if (skb_add_data(syn_data, from, len))
> -			goto fallback;
> -	}
> +	/* No more data pending in inet_wait_for_connect() */
> +	if (space == fo->size)
> +		fo->data = NULL;
> 
>  	/* Queue a data-only packet after the regular SYN for retransmission 
> */
>  	data = pskb_copy(syn_data, sk->sk_allocation);
> @@ -3101,13 +3095,10 @@ int tcp_connect(struct sock *sk)
>  		return 0;
>  	}
> 
> -	buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
> -	if (unlikely(buff == NULL))
> +	buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
> +	if (unlikely(!buff))
>  		return -ENOBUFS;
> 
> -	/* Reserve space for headers. */
> -	skb_reserve(buff, MAX_TCP_HEADER);
> -
>  	tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
>  	tp->retrans_stamp = tcp_time_stamp;
>  	tcp_connect_queue_skb(sk, buff);

Installed patch, but will have to wait a while (usually at least 
24hours), to be sure if it is stable.

Thanks a lot!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
  2014-11-17 10:22                                         ` Denys Fedoryshchenko
@ 2014-11-17 15:33                                           ` Denys Fedoryshchenko
       [not found]                                             ` <1416240651.5102.10.camel@edumazet-glaptop2.roam.corp.google.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-17 15:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev

On 2014-11-17 12:22, Denys Fedoryshchenko wrote:
>> Hmm.... I have an updated patch, sorry.
>> 
...
> 
> Installed patch, but will have to wait a while (usually at least
> 24hours), to be sure if it is stable.
> 
> Thanks a lot!
Tried updated patch, it seems crashed same after while with it too, and 
on second test i noticed same value overflow.

In debug, after i added alert if sk_forward_alloc > 1147483648 i noticed 
that on some sockets it continuously increasing this value until it will 
overflow.
I can provide logs if it is interesting.

I will try to sysctl fastopen to zero, to make sure if it changes 
anything.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next] tcp: make connect() mem charging friendly
       [not found]                                                                             ` <1416266755.18588.12.camel@edumazet-glaptop2.roam.corp.google.com>
@ 2014-11-17 23:49                                                                               ` Eric Dumazet
  2014-11-18  7:06                                                                                 ` [PATCH v2 " Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-17 23:49 UTC (permalink / raw)
  To: Denys Fedoryshchenko, David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c |   66 +++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 40 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index eb73a1dccf56..377e15d0ba96 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_fastopen_request *fo = tp->fastopen_req;
-	int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
-	struct sk_buff *syn_data = NULL, *data;
+	int syn_loss = 0, space, err = 0;
 	unsigned long last_syn_loss = 0;
+	struct sk_buff *syn_data;
 
 	tp->rx_opt.mss_clamp = tp->advmss;  /* If MSS is not cached */
 	tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie,
@@ -3044,48 +3044,38 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 	/* limit to order-0 allocations */
 	space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
 
-	syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
-				   sk->sk_allocation);
-	if (syn_data == NULL)
+	syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+	if (!syn_data)
+		goto fallback;
+	syn_data->ip_summed = CHECKSUM_PARTIAL;
+	memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+	if (memcpy_fromiovecend(skb_put(syn_data, space),
+				fo->data->msg_iov, 0, space))
 		goto fallback;
 
-	for (i = 0; i < iovlen && syn_data->len < space; ++i) {
-		struct iovec *iov = &fo->data->msg_iov[i];
-		unsigned char __user *from = iov->iov_base;
-		int len = iov->iov_len;
+	/* No more data pending in inet_wait_for_connect() */
+	if (space == fo->size)
+		fo->data = NULL;
+	fo->copied = space;
 
-		if (syn_data->len + len > space)
-			len = space - syn_data->len;
-		else if (i + 1 == iovlen)
-			/* No more data pending in inet_wait_for_connect() */
-			fo->data = NULL;
+	tcp_connect_queue_skb(sk, syn_data);
 
-		if (skb_add_data(syn_data, from, len))
-			goto fallback;
-	}
+	err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
 
-	/* Queue a data-only packet after the regular SYN for retransmission */
-	data = pskb_copy(syn_data, sk->sk_allocation);
-	if (data == NULL)
-		goto fallback;
-	TCP_SKB_CB(data)->seq++;
-	TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN;
-	TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH);
-	tcp_connect_queue_skb(sk, data);
-	fo->copied = data->len;
-
-	/* syn_data is about to be sent, we need to take current time stamps
-	 * for the packets that are in write queue : SYN packet and DATA
-	 */
-	skb_mstamp_get(&syn->skb_mstamp);
-	data->skb_mstamp = syn->skb_mstamp;
+	syn->skb_mstamp = syn_data->skb_mstamp;
 
-	if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) {
+	/* Now full SYN+DATA was cloned and sent (or not),
+	 * remove the SYN from the original skb (syn_data)
+	 * we keep in write queue in case of a retransmit, as we
+	 * also have the SYN packet (with no data) in the same queue.
+	 */
+	TCP_SKB_CB(syn_data)->seq++;
+	TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH;
+	if (!err) {
 		tp->syn_data = (fo->copied > 0);
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT);
 		goto done;
 	}
-	syn_data = NULL;
 
 fallback:
 	/* Send a regular SYN with Fast Open cookie request option */
@@ -3094,7 +3084,6 @@ fallback:
 	err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation);
 	if (err)
 		tp->syn_fastopen = 0;
-	kfree_skb(syn_data);
 done:
 	fo->cookie.len = -1;  /* Exclude Fast Open option for SYN retries */
 	return err;
@@ -3114,13 +3103,10 @@ int tcp_connect(struct sock *sk)
 		return 0;
 	}
 
-	buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
-	if (unlikely(buff == NULL))
+	buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+	if (unlikely(!buff))
 		return -ENOBUFS;
 
-	/* Reserve space for headers. */
-	skb_reserve(buff, MAX_TCP_HEADER);
-
 	tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
 	tp->retrans_stamp = tcp_time_stamp;
 	tcp_connect_queue_skb(sk, buff);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next] tcp: make connect() mem charging friendly
  2014-11-17 23:49                                                                               ` [PATCH net-next] tcp: make connect() mem charging friendly Eric Dumazet
@ 2014-11-18  7:06                                                                                 ` Eric Dumazet
  2014-11-19  6:10                                                                                   ` Yuchung Cheng
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-18  7:06 UTC (permalink / raw)
  To: Denys Fedoryshchenko, David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2: added a kfree_skb(syn_data) if memcpy_fromiovecend() fails,
    as spotted by Yuchung.

 net/ipv4/tcp_output.c |   68 ++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 40 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index eb73a1dccf56b823a45c0ca034e40dc50fc48068..f5bd4bd3f7e669b3fd48a843d55e7313a30a3409 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_fastopen_request *fo = tp->fastopen_req;
-	int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
-	struct sk_buff *syn_data = NULL, *data;
+	int syn_loss = 0, space, err = 0;
 	unsigned long last_syn_loss = 0;
+	struct sk_buff *syn_data;
 
 	tp->rx_opt.mss_clamp = tp->advmss;  /* If MSS is not cached */
 	tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie,
@@ -3044,48 +3044,40 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 	/* limit to order-0 allocations */
 	space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
 
-	syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
-				   sk->sk_allocation);
-	if (syn_data == NULL)
+	syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+	if (!syn_data)
 		goto fallback;
+	syn_data->ip_summed = CHECKSUM_PARTIAL;
+	memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+	if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
+					 fo->data->msg_iov, 0, space))) {
+		kfree_skb(syn_data);
+		goto fallback;
+	}
 
-	for (i = 0; i < iovlen && syn_data->len < space; ++i) {
-		struct iovec *iov = &fo->data->msg_iov[i];
-		unsigned char __user *from = iov->iov_base;
-		int len = iov->iov_len;
+	/* No more data pending in inet_wait_for_connect() */
+	if (space == fo->size)
+		fo->data = NULL;
+	fo->copied = space;
 
-		if (syn_data->len + len > space)
-			len = space - syn_data->len;
-		else if (i + 1 == iovlen)
-			/* No more data pending in inet_wait_for_connect() */
-			fo->data = NULL;
+	tcp_connect_queue_skb(sk, syn_data);
 
-		if (skb_add_data(syn_data, from, len))
-			goto fallback;
-	}
+	err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
 
-	/* Queue a data-only packet after the regular SYN for retransmission */
-	data = pskb_copy(syn_data, sk->sk_allocation);
-	if (data == NULL)
-		goto fallback;
-	TCP_SKB_CB(data)->seq++;
-	TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN;
-	TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH);
-	tcp_connect_queue_skb(sk, data);
-	fo->copied = data->len;
-
-	/* syn_data is about to be sent, we need to take current time stamps
-	 * for the packets that are in write queue : SYN packet and DATA
-	 */
-	skb_mstamp_get(&syn->skb_mstamp);
-	data->skb_mstamp = syn->skb_mstamp;
+	syn->skb_mstamp = syn_data->skb_mstamp;
 
-	if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) {
+	/* Now full SYN+DATA was cloned and sent (or not),
+	 * remove the SYN from the original skb (syn_data)
+	 * we keep in write queue in case of a retransmit, as we
+	 * also have the SYN packet (with no data) in the same queue.
+	 */
+	TCP_SKB_CB(syn_data)->seq++;
+	TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH;
+	if (!err) {
 		tp->syn_data = (fo->copied > 0);
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT);
 		goto done;
 	}
-	syn_data = NULL;
 
 fallback:
 	/* Send a regular SYN with Fast Open cookie request option */
@@ -3094,7 +3086,6 @@ fallback:
 	err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation);
 	if (err)
 		tp->syn_fastopen = 0;
-	kfree_skb(syn_data);
 done:
 	fo->cookie.len = -1;  /* Exclude Fast Open option for SYN retries */
 	return err;
@@ -3114,13 +3105,10 @@ int tcp_connect(struct sock *sk)
 		return 0;
 	}
 
-	buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
-	if (unlikely(buff == NULL))
+	buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+	if (unlikely(!buff))
 		return -ENOBUFS;
 
-	/* Reserve space for headers. */
-	skb_reserve(buff, MAX_TCP_HEADER);
-
 	tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
 	tp->retrans_stamp = tcp_time_stamp;
 	tcp_connect_queue_skb(sk, buff);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly
  2014-11-18  7:06                                                                                 ` [PATCH v2 " Eric Dumazet
@ 2014-11-19  6:10                                                                                   ` Yuchung Cheng
  2014-11-19 19:57                                                                                     ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Yuchung Cheng @ 2014-11-19  6:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Denys Fedoryshchenko, David Miller, netdev, Neal Cardwell

On Tue, Nov 18, 2014 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While working on sk_forward_alloc problems reported by Denys
> Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
> sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
> sk_forward_alloc is negative while connect is in progress.
>
> We can fix this by calling regular sk_stream_alloc_skb() both for the
> SYN packet (in tcp_connect()) and the syn_data packet in
> tcp_send_syn_data()
>
> Then, tcp_send_syn_data() can avoid copying syn_data as we simply
> can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
>
> Instead of open coding memcpy_fromiovecend(), simply use this helper.
>
> This leaves in socket write queue clean fast clone skbs.
>
> This was tested against our fastopen packetdrill tests.
>
> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>

Thanks! this simplifies the code a lot.

> ---
> v2: added a kfree_skb(syn_data) if memcpy_fromiovecend() fails,
>     as spotted by Yuchung.
>
>  net/ipv4/tcp_output.c |   68 ++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 40 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index eb73a1dccf56b823a45c0ca034e40dc50fc48068..f5bd4bd3f7e669b3fd48a843d55e7313a30a3409 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
>  {
>         struct tcp_sock *tp = tcp_sk(sk);
>         struct tcp_fastopen_request *fo = tp->fastopen_req;
> -       int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
> -       struct sk_buff *syn_data = NULL, *data;
> +       int syn_loss = 0, space, err = 0;
>         unsigned long last_syn_loss = 0;
> +       struct sk_buff *syn_data;
>
>         tp->rx_opt.mss_clamp = tp->advmss;  /* If MSS is not cached */
>         tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie,
> @@ -3044,48 +3044,40 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
>         /* limit to order-0 allocations */
>         space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
>
> -       syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
> -                                  sk->sk_allocation);
> -       if (syn_data == NULL)
> +       syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
> +       if (!syn_data)
>                 goto fallback;
> +       syn_data->ip_summed = CHECKSUM_PARTIAL;
> +       memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
> +       if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
> +                                        fo->data->msg_iov, 0, space))) {
> +               kfree_skb(syn_data);
> +               goto fallback;
> +       }
>
> -       for (i = 0; i < iovlen && syn_data->len < space; ++i) {
> -               struct iovec *iov = &fo->data->msg_iov[i];
> -               unsigned char __user *from = iov->iov_base;
> -               int len = iov->iov_len;
> +       /* No more data pending in inet_wait_for_connect() */
> +       if (space == fo->size)
> +               fo->data = NULL;
> +       fo->copied = space;
>
> -               if (syn_data->len + len > space)
> -                       len = space - syn_data->len;
> -               else if (i + 1 == iovlen)
> -                       /* No more data pending in inet_wait_for_connect() */
> -                       fo->data = NULL;
> +       tcp_connect_queue_skb(sk, syn_data);
>
> -               if (skb_add_data(syn_data, from, len))
> -                       goto fallback;
> -       }
> +       err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
>
> -       /* Queue a data-only packet after the regular SYN for retransmission */
> -       data = pskb_copy(syn_data, sk->sk_allocation);
> -       if (data == NULL)
> -               goto fallback;
> -       TCP_SKB_CB(data)->seq++;
> -       TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN;
> -       TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH);
> -       tcp_connect_queue_skb(sk, data);
> -       fo->copied = data->len;
> -
> -       /* syn_data is about to be sent, we need to take current time stamps
> -        * for the packets that are in write queue : SYN packet and DATA
> -        */
> -       skb_mstamp_get(&syn->skb_mstamp);
> -       data->skb_mstamp = syn->skb_mstamp;
> +       syn->skb_mstamp = syn_data->skb_mstamp;
>
> -       if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) {
> +       /* Now full SYN+DATA was cloned and sent (or not),
> +        * remove the SYN from the original skb (syn_data)
> +        * we keep in write queue in case of a retransmit, as we
> +        * also have the SYN packet (with no data) in the same queue.
> +        */
> +       TCP_SKB_CB(syn_data)->seq++;
> +       TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH;
> +       if (!err) {
>                 tp->syn_data = (fo->copied > 0);
>                 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT);
>                 goto done;
>         }
> -       syn_data = NULL;
>
>  fallback:
>         /* Send a regular SYN with Fast Open cookie request option */
> @@ -3094,7 +3086,6 @@ fallback:
>         err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation);
>         if (err)
>                 tp->syn_fastopen = 0;
> -       kfree_skb(syn_data);
>  done:
>         fo->cookie.len = -1;  /* Exclude Fast Open option for SYN retries */
>         return err;
> @@ -3114,13 +3105,10 @@ int tcp_connect(struct sock *sk)
>                 return 0;
>         }
>
> -       buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
> -       if (unlikely(buff == NULL))
> +       buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
> +       if (unlikely(!buff))
>                 return -ENOBUFS;
>
> -       /* Reserve space for headers. */
> -       skb_reserve(buff, MAX_TCP_HEADER);
> -
>         tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
>         tp->retrans_stamp = tcp_time_stamp;
>         tcp_connect_queue_skb(sk, buff);
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly
  2014-11-19  6:10                                                                                   ` Yuchung Cheng
@ 2014-11-19 19:57                                                                                     ` David Miller
  2014-11-20 15:33                                                                                       ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2014-11-19 19:57 UTC (permalink / raw)
  To: ycheng; +Cc: eric.dumazet, nuclearcat, netdev, ncardwell

From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 19 Nov 2014 14:10:42 +0800

> On Tue, Nov 18, 2014 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> While working on sk_forward_alloc problems reported by Denys
>> Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
>> sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
>> sk_forward_alloc is negative while connect is in progress.
>>
>> We can fix this by calling regular sk_stream_alloc_skb() both for the
>> SYN packet (in tcp_connect()) and the syn_data packet in
>> tcp_send_syn_data()
>>
>> Then, tcp_send_syn_data() can avoid copying syn_data as we simply
>> can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
>>
>> Instead of open coding memcpy_fromiovecend(), simply use this helper.
>>
>> This leaves in socket write queue clean fast clone skbs.
>>
>> This was tested against our fastopen packetdrill tests.
>>
>> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> 
> Thanks! this simplifies the code a lot.

Agreed, applied, thanks everyone!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly
  2014-11-19 19:57                                                                                     ` David Miller
@ 2014-11-20 15:33                                                                                       ` Eric Dumazet
  2014-11-20 16:23                                                                                         ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-20 15:33 UTC (permalink / raw)
  To: David Miller; +Cc: ycheng, nuclearcat, netdev, ncardwell

On Wed, 2014-11-19 at 14:57 -0500, David Miller wrote:
> From: Yuchung Cheng <ycheng@google.com>
> > 
> > Thanks! this simplifies the code a lot.
> 
> Agreed, applied, thanks everyone!

BTW, unless I am mistaken, it seems we can probably replace
memcpy_fromiovecend() by memcpy_fromiovec() and delete from
tcp_sendmsg() the annoying code skipping over the already consumed
bytes.

Something like this untested patch :

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c239f4740d10b10b67ef4fa44c831851fb9e1dcf..227540eef9d0870721258f9ddbace27b417c619e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1089,20 +1089,19 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 	int iovlen, flags, err, copied = 0;
-	int mss_now = 0, size_goal, copied_syn = 0, offset = 0;
+	int mss_now = 0, size_goal, copied_syn = 0;
 	bool sg;
 	long timeo;
 
 	lock_sock(sk);
 
 	flags = msg->msg_flags;
-	if (flags & MSG_FASTOPEN) {
+	if (unlikely(flags & MSG_FASTOPEN)) {
 		err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size);
 		if (err == -EINPROGRESS && copied_syn > 0)
 			goto out;
 		else if (err)
 			goto out_err;
-		offset = copied_syn;
 	}
 
 	timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
@@ -1151,15 +1150,6 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		unsigned char __user *from = iov->iov_base;
 
 		iov++;
-		if (unlikely(offset > 0)) {  /* Skip bytes copied in SYN */
-			if (offset >= seglen) {
-				offset -= seglen;
-				continue;
-			}
-			seglen -= offset;
-			from += offset;
-			offset = 0;
-		}
 
 		while (seglen > 0) {
 			int copy = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f5bd4bd3f7e669b3fd48a843d55e7313a30a3409..524e5b657e881a348f11def3f48f29a76f54fbab 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3049,8 +3049,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 		goto fallback;
 	syn_data->ip_summed = CHECKSUM_PARTIAL;
 	memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
-	if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
-					 fo->data->msg_iov, 0, space))) {
+	if (unlikely(memcpy_fromiovec(skb_put(syn_data, space),
+				      fo->data->msg_iov, space))) {
 		kfree_skb(syn_data);
 		goto fallback;
 	}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code
  2014-11-20 15:33                                                                                       ` Eric Dumazet
@ 2014-11-20 16:23                                                                                         ` Eric Dumazet
  2014-11-20 23:05                                                                                           ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-20 16:23 UTC (permalink / raw)
  To: David Miller; +Cc: ycheng, nuclearcat, netdev, ncardwell

From: Eric Dumazet <edumazet@google.com>

If we consume iovec bytes in tcp_send_syn_data(), we can remove
annoying fastopen code in tcp_sendmsg() skipping over the already
consumed bytes.

Also add an unlikely(flags & MSG_FASTOPEN), as most TCP sendmsg() do not
ask for FASTOPEN.

Tested:
 Ran our 125 packetdrill fastopen tests

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c        |   14 ++------------
 net/ipv4/tcp_output.c |    4 ++--
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c239f4740d10b10b67ef4fa44c831851fb9e1dcf..227540eef9d0870721258f9ddbace27b417c619e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1089,20 +1089,19 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 	int iovlen, flags, err, copied = 0;
-	int mss_now = 0, size_goal, copied_syn = 0, offset = 0;
+	int mss_now = 0, size_goal, copied_syn = 0;
 	bool sg;
 	long timeo;
 
 	lock_sock(sk);
 
 	flags = msg->msg_flags;
-	if (flags & MSG_FASTOPEN) {
+	if (unlikely(flags & MSG_FASTOPEN)) {
 		err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size);
 		if (err == -EINPROGRESS && copied_syn > 0)
 			goto out;
 		else if (err)
 			goto out_err;
-		offset = copied_syn;
 	}
 
 	timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
@@ -1151,15 +1150,6 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		unsigned char __user *from = iov->iov_base;
 
 		iov++;
-		if (unlikely(offset > 0)) {  /* Skip bytes copied in SYN */
-			if (offset >= seglen) {
-				offset -= seglen;
-				continue;
-			}
-			seglen -= offset;
-			from += offset;
-			offset = 0;
-		}
 
 		while (seglen > 0) {
 			int copy = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f5bd4bd3f7e669b3fd48a843d55e7313a30a3409..524e5b657e881a348f11def3f48f29a76f54fbab 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3049,8 +3049,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 		goto fallback;
 	syn_data->ip_summed = CHECKSUM_PARTIAL;
 	memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
-	if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
-					 fo->data->msg_iov, 0, space))) {
+	if (unlikely(memcpy_fromiovec(skb_put(syn_data, space),
+				      fo->data->msg_iov, space))) {
 		kfree_skb(syn_data);
 		goto fallback;
 	}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code
  2014-11-20 16:23                                                                                         ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet
@ 2014-11-20 23:05                                                                                           ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-11-20 23:05 UTC (permalink / raw)
  To: David Miller, Al Viro; +Cc: ycheng, nuclearcat, netdev, ncardwell

On Thu, 2014-11-20 at 08:23 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> If we consume iovec bytes in tcp_send_syn_data(), we can remove
> annoying fastopen code in tcp_sendmsg() skipping over the already
> consumed bytes.
> 
> Also add an unlikely(flags & MSG_FASTOPEN), as most TCP sendmsg() do not
> ask for FASTOPEN.
> 
> Tested:
>  Ran our 125 packetdrill fastopen tests
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Please disregard this patch, Al Viro said sendmsg() was apparently not
allowed to change iovec.

Some callers depend on it staying constant.

Not clear why memcpy_fromiovec() even exists.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-11-20 23:05 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-17 10:52 /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? Denys Fedoryshchenko
2014-07-17 11:51 ` Eric Dumazet
2014-07-17 12:13   ` Denys Fedoryshchenko
2014-07-17 13:12   ` Denys Fedoryshchenko
2014-07-17 13:42     ` Eric Dumazet
2014-07-18  8:12       ` Denys Fedoryshchenko
2014-07-18  8:41         ` Eric Dumazet
2014-09-23 20:14           ` Semen Slepov
     [not found]           ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]             ` <0b0b74ab8d2222fa85b23e06c6e992ec@visp.net.lb>
     [not found]               ` <1405849675.10255.100.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                 ` <6fc086a122bc3ac728a7aec9e812310f@visp.net.lb>
     [not found]                   ` <1406016045.10255.122.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                     ` <3f463c3912608f7959481edaa6d7668d@visp.net.lb>
     [not found]                       ` <1406194817.3363.70.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                         ` <8f138d88e3b850130eb9bb15e3e27069@visp.net.lb>
2014-11-12 17:07                           ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Denys Fedoryshchenko
2014-11-12 17:27                             ` Eric Dumazet
2014-11-16  8:54                               ` Denys Fedoryshchenko
2014-11-16 18:11                                 ` Eric Dumazet
2014-11-16 19:05                                   ` Denys Fedoryshchenko
2014-11-16 20:16                                     ` Eric Dumazet
2014-11-17  5:01                                       ` Eric Dumazet
2014-11-17 10:22                                         ` Denys Fedoryshchenko
2014-11-17 15:33                                           ` Denys Fedoryshchenko
     [not found]                                             ` <1416240651.5102.10.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                                               ` <c9a009556b97fcf7399eb73035c7f8a0@visp.net.lb>
     [not found]                                                 ` <1416243011.5102.19.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                                                   ` <34058cb0ee676b891dd9d54515512733@visp.net.lb>
     [not found]                                                     ` <0487f30ffd58f2ee9b47135fa5404d25@visp.net.lb>
     [not found]                                                       ` <1416245491.18588.1.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                                                         ` <ecd48e5dd70f9f83a4a4db3d758cc615@visp.net.lb>
     [not found]                                                           ` <b2bf14dfc3180a316dd9c9bd89d02c40@visp.net.lb>
     [not found]                                                             ` <1416252989.18588.2.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                                                               ` <2a4554e39649ac6dfbf2e0366794af08@visp.net.lb>
     [not found]                                                                 ` <1416254490.18588.6.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found]                                                                   ` <ed049b2f9ecc47676e25ec8aeafb25c0@visp.net.lb>
     [not found]                                                                     ` <1f001a887ec3fb5b962e4374391863b9@visp.net.lb>
     [not found]                                                                       ` <e33542339748d2f51a149a4020ee5f75@visp.net.lb>
     [not found]                                                                         ` <57f78d76af5f78317ba69b613123ff9c@visp.net.lb>
     [not found]                                                                           ` <d05242d8a607d4e15f3dd9200223699f@visp.net.lb>
     [not found]                                                                             ` <1416266755.18588.12.camel@edumazet-glaptop2.roam.corp.google.com>
2014-11-17 23:49                                                                               ` [PATCH net-next] tcp: make connect() mem charging friendly Eric Dumazet
2014-11-18  7:06                                                                                 ` [PATCH v2 " Eric Dumazet
2014-11-19  6:10                                                                                   ` Yuchung Cheng
2014-11-19 19:57                                                                                     ` David Miller
2014-11-20 15:33                                                                                       ` Eric Dumazet
2014-11-20 16:23                                                                                         ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet
2014-11-20 23:05                                                                                           ` Eric Dumazet
2014-11-16 19:10                                   ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).