* /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
@ 2014-07-17 10:52 Denys Fedoryshchenko
2014-07-17 11:51 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-17 10:52 UTC (permalink / raw)
To: netdev; +Cc: kaber, davem
Hi
I noticed TCP transfer rate slowdown after few days of operation on
kernel 3.15.3, after some digging found out this:
balancer-backup ~ # cat /proc/net/sockstat
sockets: used 118236
TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 1 memory 2
after shutting down program
balancer-backup ~ # cat /proc/net/sockstat
sockets: used 47
TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
UDP: inuse 0 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
sysctl settings:
net.ipv4.tcp_mem = 1767103 2045612 3068412
I restarted recently process, and mem value didnt changed (while because
it is sockets should release all memory), also it looks incorrect,
because at same time:
balancer-backup ~ # cat /proc/meminfo
MemTotal: 32939492 kB
MemFree: 29876564 kB
While 1955339 * 4096 should be around 8GB.
Probably it is just accounting issue or is it real memory leak?
What other info i can provide to troubleshoot this info more properly?
I will upgrade to 3.15.5 also now, to see if issue persist there.
Also i noticed several warnings:
[1116634.378936] ------------[ cut here ]------------
[1116634.379169] WARNING: CPU: 0 PID: 28350 at net/core/stream.c:201
sk_stream_kill_queues+0xff/0x104()
[1116634.379606] Modules linked in: microcode xt_tcpudp xt_mark
iptable_mangle ip_tables x_tables 8021q garp stp mrp llc
[1116634.380069] CPU: 0 PID: 28350 Comm: haproxy Tainted: G W
3.15.3-build-0007 #2
[1116634.380492] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS
2.0.11 02/26/2010
[1116634.380921] 0000000000000000 ffff880778393db0 ffffffff8160042b
0000000000000000
[1116634.381352] ffff880778393de8 ffffffff810b4e03 ffffffff81584db2
ffff8807e4df9380
[1116634.381780] ffff8807e4df94c8 0000000000000007 ffff8807e4df93f0
ffff880778393df8
[1116634.382212] Call Trace:
[1116634.382440] [<ffffffff8160042b>] dump_stack+0x45/0x56
[1116634.382659] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[1116634.382871] [<ffffffff81584db2>] ?
sk_stream_kill_queues+0xff/0x104
[1116634.383087] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[1116634.383308] [<ffffffff81584db2>] sk_stream_kill_queues+0xff/0x104
[1116634.383522] [<ffffffff815bf066>] inet_csk_destroy_sock+0x77/0xb7
[1116634.383741] [<ffffffff815c31c1>] tcp_close+0x287/0x37a
[1116634.383953] [<ffffffff815e03d5>] inet_release+0x6f/0x76
[1116634.384167] [<ffffffff81578bfe>] sock_release+0x1a/0x79
[1116634.384379] [<ffffffff81578c6a>] sock_close+0xd/0x11
[1116634.384600] [<ffffffff8115750e>] __fput+0xdc/0x18d
[1116634.384826] [<ffffffff811575eb>] ____fput+0x9/0xb
[1116634.385052] [<ffffffff810ca5e2>] task_work_run+0x78/0x8e
[1116634.385276] [<ffffffff81002880>] do_notify_resume+0x52/0x60
[1116634.385504] [<ffffffff81606970>] int_signal+0x12/0x17
[1116634.385728] ---[ end trace fb11499084e23ab6 ]---
[1116634.386531] ------------[ cut here ]------------
[1116634.386792] WARNING: CPU: 0 PID: 28350 at net/ipv4/af_inet.c:153
inet_sock_destruct+0x160/0x189()
[1116634.387264] Modules linked in: microcode xt_tcpudp xt_mark
iptable_mangle ip_tables x_tables 8021q garp stp mrp llc
[1116634.387781] CPU: 0 PID: 28350 Comm: haproxy Tainted: G W
3.15.3-build-0007 #2
[1116634.388236] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS
2.0.11 02/26/2010
[1116634.388680] 0000000000000000 ffff880778393d98 ffffffff8160042b
0000000000000000
[1116634.389157] ffff880778393dd0 ffffffff810b4e03 ffffffff815e00d7
ffff8807e4df9380
[1116634.389602] ffff8807e4df94c8 0000000000000007 ffff8807e4df93f0
ffff880778393de0
[1116634.390046] Call Trace:
[1116634.390270] [<ffffffff8160042b>] dump_stack+0x45/0x56
[1116634.390499] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[1116634.390742] [<ffffffff815e00d7>] ? inet_sock_destruct+0x160/0x189
[1116634.390979] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[1116634.391221] [<ffffffff815e00d7>] inet_sock_destruct+0x160/0x189
[1116634.391457] [<ffffffff8157ca3a>] __sk_free+0x18/0xd5
[1116634.391688] [<ffffffff8157cb0a>] sk_free+0x13/0x15
[1116634.391927] [<ffffffff815c32a8>] tcp_close+0x36e/0x37a
[1116634.392141] [<ffffffff815e03d5>] inet_release+0x6f/0x76
[1116634.392484] [<ffffffff81578bfe>] sock_release+0x1a/0x79
[1116634.392774] [<ffffffff81578c6a>] sock_close+0xd/0x11
[1116634.392996] [<ffffffff8115750e>] __fput+0xdc/0x18d
[1116634.393226] [<ffffffff811575eb>] ____fput+0x9/0xb
[1116634.393450] [<ffffffff810ca5e2>] task_work_run+0x78/0x8e
[1116634.393696] [<ffffffff81002880>] do_notify_resume+0x52/0x60
[1116634.393935] [<ffffffff81606970>] int_signal+0x12/0x17
[1116634.394159] ---[ end trace fb11499084e23ab7 ]---
P.S. After restarting server and around 5 minutes of operation:
sockets: used 109439
TCP: inuse 110642 orphan 1372 tw 98904 alloc 110768 mem 215254
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
2014-07-17 10:52 /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? Denys Fedoryshchenko
@ 2014-07-17 11:51 ` Eric Dumazet
2014-07-17 12:13 ` Denys Fedoryshchenko
2014-07-17 13:12 ` Denys Fedoryshchenko
0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-07-17 11:51 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem
On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote:
> Hi
>
> I noticed TCP transfer rate slowdown after few days of operation on
> kernel 3.15.3, after some digging found out this:
What was previous version you were using without this problem ?
>
> balancer-backup ~ # cat /proc/net/sockstat
> sockets: used 118236
> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
> UDP: inuse 1 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 1 memory 2
>
> after shutting down program
> balancer-backup ~ # cat /proc/net/sockstat
> sockets: used 47
> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
> UDP: inuse 0 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 0 memory 0
>
> sysctl settings:
> net.ipv4.tcp_mem = 1767103 2045612 3068412
>
> I restarted recently process, and mem value didnt changed (while because
> it is sockets should release all memory), also it looks incorrect,
> because at same time:
> balancer-backup ~ # cat /proc/meminfo
> MemTotal: 32939492 kB
> MemFree: 29876564 kB
>
> While 1955339 * 4096 should be around 8GB.
> Probably it is just accounting issue or is it real memory leak?
> What other info i can provide to troubleshoot this info more properly?
> I will upgrade to 3.15.5 also now, to see if issue persist there.
I see nothing really wrong in your report.
It looks like you have a lot of sockets around after shutdown of the
program. Each FIN-WAIT socket might consume a lot of buffers in its
write queue, unless you use/force SO_LINGER or something.
If you try the following command, you might see how many sockets have
outstanding data.
ss -amn
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
2014-07-17 11:51 ` Eric Dumazet
@ 2014-07-17 12:13 ` Denys Fedoryshchenko
2014-07-17 13:12 ` Denys Fedoryshchenko
1 sibling, 0 replies; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-17 12:13 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, kaber, davem
On 2014-07-17 14:51, Eric Dumazet wrote:
> On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote:
>> Hi
>>
>> I noticed TCP transfer rate slowdown after few days of operation on
>> kernel 3.15.3, after some digging found out this:
>
> What was previous version you were using without this problem ?
It is fresh installations, just noticed on units with heavy load
(200-300k sockets at peak time) after few days it will exhaust memory
completely, in one case it makes unit
completely unreachable, while trying to ssh there i can establish it
establish tcp, but no ssh banner message shown from server side.
But because unit is remote, and no kvm - i can't see what is really
wrong there, seems i should try netconsole.
On other units it just causes tcp performance degradation.
>
>>
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 118236
>> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
>> UDP: inuse 1 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 1 memory 2
>>
>> after shutting down program
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 47
>> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
>> UDP: inuse 0 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 0 memory 0
>>
>> sysctl settings:
>> net.ipv4.tcp_mem = 1767103 2045612 3068412
>>
>> I restarted recently process, and mem value didnt changed (while
>> because
>> it is sockets should release all memory), also it looks incorrect,
>> because at same time:
>> balancer-backup ~ # cat /proc/meminfo
>> MemTotal: 32939492 kB
>> MemFree: 29876564 kB
>>
>> While 1955339 * 4096 should be around 8GB.
>> Probably it is just accounting issue or is it real memory leak?
>> What other info i can provide to troubleshoot this info more properly?
>> I will upgrade to 3.15.5 also now, to see if issue persist there.
>
> I see nothing really wrong in your report.
>
> It looks like you have a lot of sockets around after shutdown of the
> program. Each FIN-WAIT socket might consume a lot of buffers in its
> write queue, unless you use/force SO_LINGER or something.
>
> If you try the following command, you might see how many sockets have
> outstanding data.
>
> ss -amn
I will check that next time, thanks a lot!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
2014-07-17 11:51 ` Eric Dumazet
2014-07-17 12:13 ` Denys Fedoryshchenko
@ 2014-07-17 13:12 ` Denys Fedoryshchenko
2014-07-17 13:42 ` Eric Dumazet
1 sibling, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-17 13:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, kaber, davem
Well, it seems not normal. But i am not sure, is mem value means this
memory is allocated and not released?
If it is allocated pages, MemFree in /proc/memstat should be different
value.
I just shut down balancer process now and bypassed all traffic, and
waited all(ok, almost all) sockets to close, this particular server had
300k connections before test.
sockets: used 78
TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
MemTotal: 32664776 kB
MemFree: 32462456 kB
MemAvailable: 32449704 kB
ss output:
Netid State Recv-Q Send-Q
Local
Address:Port
Peer Address:Port
nl UNCONN 0 0
0:0
*
nl UNCONN 0 0
4:0
*
nl UNCONN 0 0
10:0
*
nl UNCONN 0 0
11:0
*
nl UNCONN 0 0
15:0
*
nl UNCONN 0 0
16:0
*
nl UNCONN 0 0
18:0
*
nl UNCONN 0 0
21:0
*
tcp UNCONN 0 0
*:49484
*:*
skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
tcp LISTEN 0 1280
*:8880
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 20
*:22
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 1
*:23
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 1280
*:443
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp FIN-WAIT-1 0 4545
157.56.194.74:443
172.17.37.4:52473
skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
tcp ESTAB 0 3008
10.0.22.22:22
194.146.153.22:59156
skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0)
tcp FIN-WAIT-1 0 1
54.225.250.234:443
172.17.32.251:35969
skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0)
tcp FIN-WAIT-2 0 0
206.53.159.203:443
172.17.76.10:54688
tcp FIN-WAIT-2 0 0
173.194.39.40:443
172.17.64.221:39482
tcp FIN-WAIT-1 0 1
69.171.235.48:443
172.17.32.251:52731
skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0)
tcp FIN-WAIT-2 0 0
173.194.39.38:443
172.17.64.228:52013
skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp FIN-WAIT-1 0 4645
2.16.216.121:443
172.17.40.215:51985
skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
tcp FIN-WAIT-2 0 0
173.192.231.39:443
172.17.10.41:56736
skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp FIN-WAIT-2 0 0
173.194.70.95:443
172.17.64.221:59055
tcp FIN-WAIT-2 0 0
46.33.74.129:443
172.17.74.223:53207
skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp FIN-WAIT-2 0 0
173.194.70.99:443
172.17.43.95:57045
skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp FIN-WAIT-2 0 0
198.11.217.194:443
172.17.75.46:63384
skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp FIN-WAIT-1 0 3601
31.13.81.33:443
172.17.14.192:54971
skmem:(r0,rb8192,t0,tb8192,f1280,w6912,o0,bl0)
tcp FIN-WAIT-1 0 4561
174.36.208.130:443
172.17.45.227:56934
skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
tcp FIN-WAIT-2 0 0
93.186.18.171:443
172.17.38.62:57496
tcp FIN-WAIT-2 0 0
173.194.70.104:443
172.17.70.233:42575
skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0)
tcp LAST-ACK 0 3752
31.13.93.33:443
172.17.46.52:6869
skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
On 2014-07-17 14:51, Eric Dumazet wrote:
> On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote:
>> Hi
>>
>> I noticed TCP transfer rate slowdown after few days of operation on
>> kernel 3.15.3, after some digging found out this:
>
> What was previous version you were using without this problem ?
>
>>
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 118236
>> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339
>> UDP: inuse 1 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 1 memory 2
>>
>> after shutting down program
>> balancer-backup ~ # cat /proc/net/sockstat
>> sockets: used 47
>> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061
>> UDP: inuse 0 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 0 memory 0
>>
>> sysctl settings:
>> net.ipv4.tcp_mem = 1767103 2045612 3068412
>>
>> I restarted recently process, and mem value didnt changed (while
>> because
>> it is sockets should release all memory), also it looks incorrect,
>> because at same time:
>> balancer-backup ~ # cat /proc/meminfo
>> MemTotal: 32939492 kB
>> MemFree: 29876564 kB
>>
>> While 1955339 * 4096 should be around 8GB.
>> Probably it is just accounting issue or is it real memory leak?
>> What other info i can provide to troubleshoot this info more properly?
>> I will upgrade to 3.15.5 also now, to see if issue persist there.
>
> I see nothing really wrong in your report.
>
> It looks like you have a lot of sockets around after shutdown of the
> program. Each FIN-WAIT socket might consume a lot of buffers in its
> write queue, unless you use/force SO_LINGER or something.
>
> If you try the following command, you might see how many sockets have
> outstanding data.
>
> ss -amn
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
2014-07-17 13:12 ` Denys Fedoryshchenko
@ 2014-07-17 13:42 ` Eric Dumazet
2014-07-18 8:12 ` Denys Fedoryshchenko
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-07-17 13:42 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem
On Thu, 2014-07-17 at 16:12 +0300, Denys Fedoryshchenko wrote:
> Well, it seems not normal. But i am not sure, is mem value means this
> memory is allocated and not released?
mem means : sockets have forward_alloc reservations. They can use this
amount of memory without worrying.
> If it is allocated pages, MemFree in /proc/memstat should be different
> value.
Its not allocated pages.
> I just shut down balancer process now and bypassed all traffic, and
> waited all(ok, almost all) sockets to close, this particular server had
> 300k connections before test.
>
> sockets: used 78
> TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593
> UDP: inuse 1 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 0 memory 0
>
> MemTotal: 32664776 kB
> MemFree: 32462456 kB
> MemAvailable: 32449704 kB
>
> ss output:
>
> Netid State Recv-Q Send-Q
> Local
> Address:Port
> Peer Address:Port
> nl UNCONN 0 0
>
> 0:0
> *
> nl UNCONN 0 0
>
> 4:0
> *
> nl UNCONN 0 0
>
> 10:0
> *
> nl UNCONN 0 0
>
> 11:0
> *
> nl UNCONN 0 0
>
> 15:0
> *
> nl UNCONN 0 0
>
> 16:0
> *
> nl UNCONN 0 0
>
> 18:0
> *
> nl UNCONN 0 0
>
> 21:0
> *
> tcp UNCONN 0 0
>
> *:49484
> *:*
> skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
> tcp LISTEN 0 1280
>
> *:8880
> *:*
> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp LISTEN 0 20
>
> *:22
> *:*
> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp LISTEN 0 1
>
> *:23
> *:*
> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp LISTEN 0 1280
>
> *:443
> *:*
> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
> tcp FIN-WAIT-1 0 4545
>
> 157.56.194.74:443
> 172.17.37.4:52473
> skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
> tcp ESTAB 0 3008
>
> 10.0.22.22:22
> 194.146.153.22:59156
> skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0)
> tcp FIN-WAIT-1 0 1
>
> 54.225.250.234:443
> 172.17.32.251:35969
> skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0)
> tcp FIN-WAIT-2 0 0
>
> 206.53.159.203:443
> 172.17.76.10:54688
>
> tcp FIN-WAIT-2 0 0
>
> 173.194.39.40:443
> 172.17.64.221:39482
>
> tcp FIN-WAIT-1 0 1
>
> 69.171.235.48:443
> 172.17.32.251:52731
> skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0)
This looks bad :
f4294966016 : 0xFFFFFB00
There is a bug of some sort, that leads to overflows.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
2014-07-17 13:42 ` Eric Dumazet
@ 2014-07-18 8:12 ` Denys Fedoryshchenko
2014-07-18 8:41 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-07-18 8:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, kaber, davem
More updates. Now i found out how to see if tcp are under memory
pressure mode, i guess when it is active, speed might degrade. Now
almost all sockets are released, is it normal that tcp pressure still
"yes" ?
balancer-backup ~ # cat /proc/net/protocols
protocol size sockets memory press maxhdr slab module cl co di ac
io in de sh ss gs se re sp bi br ha uh gp em
PACKET 1216 0 -1 NI 0 no kernel n n n n
n n n n n n n n n n n n n n n
ALG 624 0 0 NI 0 no kernel n n n n
n n n n n n n n n n n n n n n
UNIX 832 1 -1 NI 0 yes kernel n n n n
n n n n n n n n n n n n n n n
UDP-Lite 800 0 -1 NI 0 yes kernel y y y n
y y y n y y y y y n y y y y n
PING 760 0 -1 NI 0 yes kernel y y y n
n y n n y y y y n y y y y y n
RAW 768 0 -1 NI 0 yes kernel y y y n
y y y n y y y y n y y y y n n
UDP 800 1 0 NI 0 yes kernel y y y n
y n y n y y y y y n y y y y n
TCP 1608 24 1823119 yes 208 yes kernel y y y
y y y y y y y y y y n y y y y y
NETLINK 992 0 -1 NI 0 no kernel n n n n
n n n n n n n n n n n n n n n
balancer-backup ~ # cat /proc/net/sockstat
sockets: used 51
TCP: inuse 21 orphan 13 tw 2 alloc 21 mem 1823120
UDP: inuse 1 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory
Netid State Recv-Q Send-Q Local Address:Port Peer
Address:Port
nl UNCONN 0 0 0:0
*
nl UNCONN 0 0 4:0
*
nl UNCONN 0 0 10:0
*
nl UNCONN 0 0 11:0
*
nl UNCONN 0 0 15:0
*
nl UNCONN 0 0 16:0
*
nl UNCONN 0 0 18:0
*
nl UNCONN 0 0 21:0
*
u_str LISTEN 0 0 /tmp/dropbear-b661327d/auth-ac4c35f5-6
398981316 * 0
tcp UNCONN 0 0 *:21084
*:*
skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
tcp LISTEN 0 10000 *:80
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 10000 *:8880
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 9 *:8080
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 20 *:22
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 1 *:23
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 10000 *:443
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp ESTAB 0 176 212.22.93.7:22
109.110.116.35:59925
skmem:(r0,rb359040,t0,tb46080,f3584,w4608,o0,bl0)
tcp FIN-WAIT-1 0 27545 173.194.19.109:80
185.4.252.44:61532
skmem:(r0,rb359040,t0,tb46080,f0,w45056,o0,bl0)
tcp SYN-SENT 0 1 212.22.93.7:63910
54.225.188.51:80
skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp FIN-WAIT-1 0 28333 151.249.89.224:80
185.4.252.27:56892
skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0)
tcp FIN-WAIT-1 0 28009 2.16.61.24:80
185.4.252.139:52608
skmem:(r0,rb359040,t0,tb46080,f0,w45056,o0,bl0)
tcp FIN-WAIT-1 0 21753 173.194.19.46:80
185.4.252.214:51174
skmem:(r0,rb359040,t0,tb46080,f1024,w35840,o0,bl0)
tcp FIN-WAIT-1 0 28437 87.248.223.79:80
185.4.253.107:60924
skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0)
tcp FIN-WAIT-1 0 46194 212.25.73.21:80
185.4.253.107:50242
skmem:(r0,rb359040,t0,tb78336,f0,w73728,o0,bl0)
tcp FIN-WAIT-1 0 65001 173.194.35.119:443
185.4.252.58:50040
skmem:(r0,rb333440,t0,tb119808,f3584,w115200,o0,bl0)
tcp SYN-SENT 0 1 212.22.93.7:27514
54.225.188.51:80
skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp FIN-WAIT-1 0 3863 198.23.87.67:443
185.4.253.115:1693
skmem:(r0,rb333440,t0,tb46080,f3072,w9216,o0,bl0)
tcp SYN-SENT 0 1 212.22.93.7:24487
54.225.188.51:80
skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp FIN-WAIT-2 0 0 54.225.251.189:443
185.4.253.19:49317
tcp FIN-WAIT-2 0 0 184.173.136.67:443
185.4.252.214:56383
skmem:(r0,rb333440,t0,tb46080,f0,w0,o0,bl0)
tcp SYN-SENT 0 1 212.22.93.7:7830
54.225.188.51:80
skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
tcp FIN-WAIT-1 0 6042 23.200.86.144:80
185.4.252.139:54631
skmem:(r0,rb359040,t0,tb46080,f1792,w10496,o0,bl0)
tcp FIN-WAIT-1 0 28961 87.248.223.141:80
185.4.253.18:61083
skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0)
There was many like this one (with overflow?)
tcp SYN-SENT 0 1 212.22.93.7:7830
54.225.188.51:80
skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0)
After 5-10 minutes they disappeared, but still it is in tcp pressure
mode:
balancer-backup ~ # ss -amn
Netid State Recv-Q Send-Q
Local Address:Port
Peer Address:Port
nl UNCONN 0 0
0:0
*
nl UNCONN 0 0
4:0
*
nl UNCONN 0 0
10:0
*
nl UNCONN 0 0
11:0
*
nl UNCONN 0 0
15:0
*
nl UNCONN 0 0
16:0
*
nl UNCONN 0 0
18:0
*
nl UNCONN 0 0
21:0
*
u_str LISTEN 0 0
/tmp/dropbear-b661327d/auth-ac4c35f5-6 398981316
* 0
tcp UNCONN 0 0
*:56684
*:*
skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
tcp LISTEN 0 10000
*:80
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 10000
*:8880
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 9
*:8080
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 20
*:22
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 1
*:23
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp LISTEN 0 10000
*:443
*:*
skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
tcp ESTAB 0 2688
212.22.93.7:22
109.110.116.35:59925
skmem:(r0,rb359040,t2304,tb46080,f256,w16128,o0,bl0)
tcp FIN-WAIT-1 0 21753
173.194.19.46:80
185.4.252.214:51174
skmem:(r0,rb359040,t0,tb46080,f1024,w35840,o0,bl0)
tcp FIN-WAIT-1 0 46194
212.25.73.21:80
185.4.253.107:50242
skmem:(r0,rb359040,t0,tb78336,f0,w73728,o0,bl0)
tcp FIN-WAIT-1 0 65001
173.194.35.119:443
185.4.252.58:50040
skmem:(r0,rb333440,t0,tb119808,f3584,w115200,o0,bl0)
tcp FIN-WAIT-1 0 3863
198.23.87.67:443
185.4.253.115:1693
skmem:(r0,rb333440,t0,tb46080,f3072,w9216,o0,bl0)
balancer-backup ~ # cat /proc/net/protocols
protocol size sockets memory press maxhdr slab module cl co di ac
io in de sh ss gs se re sp bi br ha uh gp em
PACKET 1216 0 -1 NI 0 no kernel n n n n
n n n n n n n n n n n n n n n
ALG 624 0 0 NI 0 no kernel n n n n
n n n n n n n n n n n n n n n
UNIX 832 1 -1 NI 0 yes kernel n n n n
n n n n n n n n n n n n n n n
UDP-Lite 800 0 -1 NI 0 yes kernel y y y n
y y y n y y y y y n y y y y n
PING 760 0 -1 NI 0 yes kernel y y y n
n y n n y y y y n y y y y y n
RAW 768 0 -1 NI 0 yes kernel y y y n
y y y n y y y y n y y y y n n
UDP 800 1 0 NI 0 yes kernel y y y n
y n y n y y y y y n y y y y n
TCP 1608 15 1823049 yes 208 yes kernel y y y
y y y y y y y y y y n y y y y y
NETLINK 992 0 -1 NI 0 no kernel n n n n
n n n n n n n n n n n n n n n
Also two warnings shown on latest stable (well almost latest, 3.15.6
released very recently, i have 3.15.5).
[73563.552667] ------------[ cut here ]------------
[73563.552889] WARNING: CPU: 3 PID: 0 at net/core/stream.c:201
sk_stream_kill_queues+0xff/0x104()
[73563.553308] Modules linked in: iptable_filter tcp_diag udp_diag
inet_diag unix_diag microcode xt_tcpudp xt_mark iptable_mangle ip_tables
x_tables 8021q garp stp mrp llc
[73563.553975] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
3.15.5-build-0007 #1
[73563.554190] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS
2.0.11 02/26/2010
[73563.559607] 0000000000000000 ffff88080f263a38 ffffffff8160054b
0000000000000000
[73563.560029] ffff88080f263a70 ffffffff810b4e03 ffffffff81584ed6
ffff880093b9d480
[73563.560451] ffff880093b9d5c8 ffff8806dcecb0a8 0000000000000000
ffff88080f263a80
[73563.560874] Call Trace:
[73563.561080] <IRQ> [<ffffffff8160054b>] dump_stack+0x45/0x56
[73563.561304] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[73563.561517] [<ffffffff81584ed6>] ? sk_stream_kill_queues+0xff/0x104
[73563.561730] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[73563.561943] [<ffffffff81584ed6>] sk_stream_kill_queues+0xff/0x104
[73563.562158] [<ffffffff815bf18a>] inet_csk_destroy_sock+0x77/0xb7
[73563.562372] [<ffffffff815c0555>] tcp_done+0x78/0x7d
[73563.562583] [<ffffffff815c8d2c>] tcp_reset+0x53/0x56
[73563.562795] [<ffffffff815c8e96>] tcp_validate_incoming+0x167/0x261
[73563.563012] [<ffffffff815c9d7c>] tcp_rcv_state_process+0x613/0xaa3
[73563.563228] [<ffffffff815cfae3>] tcp_v4_do_rcv+0x291/0x2db
[73563.563440] [<ffffffff815b531c>] ? inet_del_offload+0x38/0x38
[73563.563653] [<ffffffff815d2197>] tcp_v4_rcv+0x329/0x617
[73563.563868] [<ffffffff815b06ff>] ? nf_hook_slow+0x6a/0xfb
[73563.564080] [<ffffffff815b53d1>] ip_local_deliver_finish+0xb5/0x125
[73563.564293] [<ffffffff815b584f>] ip_local_deliver+0x6d/0x71
[73563.564505] [<ffffffff815b56b3>] ip_rcv_finish+0x272/0x2b4
[73563.564717] [<ffffffff815b5aea>] ip_rcv+0x297/0x305
[73563.564929] [<ffffffff8158b3c3>]
__netif_receive_skb_core+0x55b/0x5a4
[73563.565142] [<ffffffff8158b41f>] __netif_receive_skb+0x13/0x55
[73563.565353] [<ffffffff8158b49c>]
netif_receive_skb_internal+0x3b/0x72
[73563.565565] [<ffffffff8158ba45>] napi_gro_receive+0x39/0x7b
[73563.565779] [<ffffffff813b2bd2>] bnx2_poll_work+0xf8a/0x10be
[73563.565990] [<ffffffff815c13d1>] ? tcp_sendmsg+0x435/0xb18
[73563.566201] [<ffffffff813b2d32>] bnx2_poll_msix+0x2c/0xae
[73563.566413] [<ffffffff8158c133>] net_rx_action+0xc9/0x162
[73563.566628] [<ffffffff810b846a>] __do_softirq+0xd2/0x1c3
[73563.566841] [<ffffffff810b86b2>] irq_exit+0x3a/0x81
[73563.567059] [<ffffffff81004021>] do_IRQ+0xb4/0xcd
[73563.567275] [<ffffffff816061aa>] common_interrupt+0x6a/0x6a
[73563.567490] <EOI> [<ffffffff81559e73>] ?
cpuidle_enter_state+0x4d/0x9e
[73563.567713] [<ffffffff81559ee6>] cpuidle_enter+0x12/0x14
[73563.567933] [<ffffffff810e27f7>] cpu_startup_entry+0x163/0x204
[73563.568150] [<ffffffff810239af>] start_secondary+0x1b4/0x1b9
[73563.568366] ---[ end trace 098a1058f104bc5a ]---
[73563.568577] ------------[ cut here ]------------
[73563.568791] WARNING: CPU: 3 PID: 0 at net/ipv4/af_inet.c:153
inet_sock_destruct+0x160/0x189()
[73563.569211] Modules linked in: iptable_filter tcp_diag udp_diag
inet_diag unix_diag microcode xt_tcpudp xt_mark iptable_mangle ip_tables
x_tables 8021q garp stp mrp llc
[73563.569878] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W
3.15.5-build-0007 #1
[73563.570300] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS
2.0.11 02/26/2010
[73563.570719] 0000000000000000 ffff88080f263b38 ffffffff8160054b
0000000000000000
[73563.571145] ffff88080f263b70 ffffffff810b4e03 ffffffff815e01fb
ffff880093b9d480
[73563.571573] ffff880093b9d5c8 ffff880093b9d4d0 0000000000000000
ffff88080f263b80
[73563.572000] Call Trace:
[73563.572210] <IRQ> [<ffffffff8160054b>] dump_stack+0x45/0x56
[73563.572430] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e
[73563.572646] [<ffffffff815e01fb>] ? inet_sock_destruct+0x160/0x189
[73563.572862] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17
[73563.573076] [<ffffffff815e01fb>] inet_sock_destruct+0x160/0x189
[73563.573292] [<ffffffff8157cb5e>] __sk_free+0x18/0xd5
[73563.573504] [<ffffffff8157cc2e>] sk_free+0x13/0x15
[73563.573719] [<ffffffff815d2464>] tcp_v4_rcv+0x5f6/0x617
[73563.573933] [<ffffffff815b06ff>] ? nf_hook_slow+0x6a/0xfb
[73563.574147] [<ffffffff815b53d1>] ip_local_deliver_finish+0xb5/0x125
[73563.574362] [<ffffffff815b584f>] ip_local_deliver+0x6d/0x71
[73563.574576] [<ffffffff815b56b3>] ip_rcv_finish+0x272/0x2b4
[73563.574789] [<ffffffff815b5aea>] ip_rcv+0x297/0x305
[73563.575002] [<ffffffff8158b3c3>]
__netif_receive_skb_core+0x55b/0x5a4
[73563.575218] [<ffffffff8158b41f>] __netif_receive_skb+0x13/0x55
[73563.575432] [<ffffffff8158b49c>]
netif_receive_skb_internal+0x3b/0x72
[73563.575650] [<ffffffff8158ba45>] napi_gro_receive+0x39/0x7b
[73563.575866] [<ffffffff813b2bd2>] bnx2_poll_work+0xf8a/0x10be
[73563.576083] [<ffffffff815c13d1>] ? tcp_sendmsg+0x435/0xb18
[73563.576298] [<ffffffff813b2d32>] bnx2_poll_msix+0x2c/0xae
[73563.576512] [<ffffffff8158c133>] net_rx_action+0xc9/0x162
[73563.576729] [<ffffffff810b846a>] __do_softirq+0xd2/0x1c3
[73563.576944] [<ffffffff810b86b2>] irq_exit+0x3a/0x81
[73563.577159] [<ffffffff81004021>] do_IRQ+0xb4/0xcd
[73563.577377] [<ffffffff816061aa>] common_interrupt+0x6a/0x6a
[73563.577591] <EOI> [<ffffffff81559e73>] ?
cpuidle_enter_state+0x4d/0x9e
[73563.577814] [<ffffffff81559ee6>] cpuidle_enter+0x12/0x14
[73563.578029] [<ffffffff810e27f7>] cpu_startup_entry+0x163/0x204
[73563.578246] [<ffffffff810239af>] start_secondary+0x1b4/0x1b9
[73563.578460] ---[ end trace 098a1058f104bc5b ]---
On 2014-07-17 16:42, Eric Dumazet wrote:
> On Thu, 2014-07-17 at 16:12 +0300, Denys Fedoryshchenko wrote:
>> Well, it seems not normal. But i am not sure, is mem value means this
>> memory is allocated and not released?
>
> mem means : sockets have forward_alloc reservations. They can use this
> amount of memory without worrying.
>
>> If it is allocated pages, MemFree in /proc/memstat should be different
>> value.
>
> Its not allocated pages.
>
>> I just shut down balancer process now and bypassed all traffic, and
>> waited all(ok, almost all) sockets to close, this particular server
>> had
>> 300k connections before test.
>>
>> sockets: used 78
>> TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593
>> UDP: inuse 1 mem 0
>> UDPLITE: inuse 0
>> RAW: inuse 0
>> FRAG: inuse 0 memory 0
>>
>> MemTotal: 32664776 kB
>> MemFree: 32462456 kB
>> MemAvailable: 32449704 kB
>>
>> ss output:
>>
>> Netid State Recv-Q Send-Q
>> Local
>> Address:Port
>> Peer Address:Port
>> nl UNCONN 0 0
>>
>> 0:0
>> *
>> nl UNCONN 0 0
>>
>> 4:0
>> *
>> nl UNCONN 0 0
>>
>> 10:0
>> *
>> nl UNCONN 0 0
>>
>> 11:0
>> *
>> nl UNCONN 0 0
>>
>> 15:0
>> *
>> nl UNCONN 0 0
>>
>> 16:0
>> *
>> nl UNCONN 0 0
>>
>> 18:0
>> *
>> nl UNCONN 0 0
>>
>> 21:0
>> *
>> tcp UNCONN 0 0
>>
>> *:49484
>> *:*
>> skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0)
>> tcp LISTEN 0 1280
>>
>> *:8880
>> *:*
>> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp LISTEN 0 20
>>
>> *:22
>> *:*
>> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp LISTEN 0 1
>>
>> *:23
>> *:*
>> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp LISTEN 0 1280
>>
>> *:443
>> *:*
>> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
>> tcp FIN-WAIT-1 0 4545
>>
>> 157.56.194.74:443
>> 172.17.37.4:52473
>> skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0)
>> tcp ESTAB 0 3008
>>
>> 10.0.22.22:22
>> 194.146.153.22:59156
>> skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0)
>> tcp FIN-WAIT-1 0 1
>>
>> 54.225.250.234:443
>> 172.17.32.251:35969
>> skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0)
>> tcp FIN-WAIT-2 0 0
>>
>> 206.53.159.203:443
>> 172.17.76.10:54688
>>
>> tcp FIN-WAIT-2 0 0
>>
>> 173.194.39.40:443
>> 172.17.64.221:39482
>>
>> tcp FIN-WAIT-1 0 1
>>
>> 69.171.235.48:443
>> 172.17.32.251:52731
>> skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0)
>
> This looks bad :
>
> f4294966016 : 0xFFFFFB00
>
> There is a bug of some sort, that leads to overflows.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
2014-07-18 8:12 ` Denys Fedoryshchenko
@ 2014-07-18 8:41 ` Eric Dumazet
2014-09-23 20:14 ` Semen Slepov
[not found] ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>
0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-07-18 8:41 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem
On Fri, 2014-07-18 at 11:12 +0300, Denys Fedoryshchenko wrote:
> More updates. Now i found out how to see if tcp are under memory
> pressure mode, i guess when it is active, speed might degrade. Now
> almost all sockets are released, is it normal that tcp pressure still
> "yes" ?
>
> balancer-backup ~ # cat /proc/net/protocols
> protocol size sockets memory press maxhdr slab module cl co di ac
> io in de sh ss gs se re sp bi br ha uh gp em
> PACKET 1216 0 -1 NI 0 no kernel n n n n
> n n n n n n n n n n n n n n n
> ALG 624 0 0 NI 0 no kernel n n n n
> n n n n n n n n n n n n n n n
> UNIX 832 1 -1 NI 0 yes kernel n n n n
> n n n n n n n n n n n n n n n
> UDP-Lite 800 0 -1 NI 0 yes kernel y y y n
> y y y n y y y y y n y y y y n
> PING 760 0 -1 NI 0 yes kernel y y y n
> n y n n y y y y n y y y y y n
> RAW 768 0 -1 NI 0 yes kernel y y y n
> y y y n y y y y n y y y y n n
> UDP 800 1 0 NI 0 yes kernel y y y n
> y n y n y y y y y n y y y y n
> TCP 1608 24 1823119 yes 208 yes kernel y y y
> y y y y y y y y y y n y y y y y
> NETLINK 992 0 -1 NI 0 no kernel n n n n
> n n n n n n n n n n n n n n n
> balancer-backup ~ # cat /proc/net/sockstat
> sockets: used 51
> TCP: inuse 21 orphan 13 tw 2 alloc 21 mem 1823120
> UDP: inuse 1 mem 0
> UDPLITE: inuse 0
> RAW: inuse 0
> FRAG: inuse 0 memory
tcp believes its under pressure because of some bug on accounting.
If you change /proc/sys/net/ipv4/tcp_mem (3rd value), it might say it
is not under pressure.
I am taking a look, thanks again for the report.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels?
2014-07-18 8:41 ` Eric Dumazet
@ 2014-09-23 20:14 ` Semen Slepov
[not found] ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>
1 sibling, 0 replies; 25+ messages in thread
From: Semen Slepov @ 2014-09-23 20:14 UTC (permalink / raw)
To: netdev
Eric Dumazet <eric.dumazet <at> gmail.com> writes:
>
>
> tcp believes its under pressure because of some bug on accounting.
>
> If you change /proc/sys/net/ipv4/tcp_mem (3rd value), it might say it
> is not under pressure.
>
> I am taking a look, thanks again for the report.
>
>
Still have this accounting bug on 3.15.7 . Is there fix in further kernel
releases (or patch)?
^ permalink raw reply [flat|nested] 25+ messages in thread
* /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
[not found] ` <8f138d88e3b850130eb9bb15e3e27069@visp.net.lb>
@ 2014-11-12 17:07 ` Denys Fedoryshchenko
2014-11-12 17:27 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-12 17:07 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
Hi
I've been able to trace invalid memory shown for sk_forward_alloc. Not
sure if it is related to bug when all tcp stack are wrecked on highload
servers (i had even such crash on server with plain torrents, not that
much of them).
First they were appearing in ss output as f4294966016, but in tc it
shows as unsigned integer, while in kernel it is signed integer. Should
i provide patch for iproute2?
After changing value to correct one, here is what is got:
skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
So as it is signed integer, it is actually appears a lot as -1280 bytes.
After placing several WARN_ON_ONCE on fall functions involving
sk_forward_alloc, where negative value may
appear i traced down at least to tcp_connect_queue_skb().
[ 13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476
tcp_connect_queue_skb+0x9f/0xd0()
After adding some debug values, it was confirmed:
skb_header_release(skb);
__tcp_add_write_queue_tail(sk, skb);
sk->sk_wmem_queued += skb->truesize;
sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is 0,
and skb->truesize is 1280, so -1280 is result
tp->write_seq = tcb->end_seq;
tp->packets_out += tcp_skb_pcount(skb);
Maybe it should not be subtracted here? Or maybe logic are inverted all
over the code?
Please help :)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-12 17:07 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Denys Fedoryshchenko
@ 2014-11-12 17:27 ` Eric Dumazet
2014-11-16 8:54 ` Denys Fedoryshchenko
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-12 17:27 UTC (permalink / raw)
To: Denys Fedoryshchenko, Neal Cardwell, Yuchung Cheng; +Cc: netdev
On Wed, 2014-11-12 at 19:07 +0200, Denys Fedoryshchenko wrote:
> Hi
>
>
> I've been able to trace invalid memory shown for sk_forward_alloc. Not
> sure if it is related to bug when all tcp stack are wrecked on highload
> servers (i had even such crash on server with plain torrents, not that
> much of them).
> First they were appearing in ss output as f4294966016, but in tc it
> shows as unsigned integer, while in kernel it is signed integer. Should
> i provide patch for iproute2?
> After changing value to correct one, here is what is got:
> skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>
> So as it is signed integer, it is actually appears a lot as -1280 bytes.
> After placing several WARN_ON_ONCE on fall functions involving
> sk_forward_alloc, where negative value may
> appear i traced down at least to tcp_connect_queue_skb().
> [ 13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476
> tcp_connect_queue_skb+0x9f/0xd0()
> After adding some debug values, it was confirmed:
>
>
> skb_header_release(skb);
> __tcp_add_write_queue_tail(sk, skb);
> sk->sk_wmem_queued += skb->truesize;
> sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is 0,
> and skb->truesize is 1280, so -1280 is result
> tp->write_seq = tcb->end_seq;
> tp->packets_out += tcp_skb_pcount(skb);
>
> Maybe it should not be subtracted here? Or maybe logic are inverted all
> over the code?
> Please help :)
>
Thanks a lot for the report, I am CCing other experts ;)
AFAIK, a negative forward alloc should not be a problem, a long as the
incursion is bound.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-12 17:27 ` Eric Dumazet
@ 2014-11-16 8:54 ` Denys Fedoryshchenko
2014-11-16 18:11 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-16 8:54 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev
As latest findings, when servers are going crazy because of tcp memory
invalid accounting.
First of all i upgraded kernel to latest version 3.17.3 and added also
patch from upcoming kernel,
"12) Don't call sock_kfree_s() with NULL pointers, this function also
has the side effect of adjusting
the socket memory usage. From Cong Wang.", but it didnt helped.
I added printk_ratelimited to places where suspicious values might
appear, and got some more information.
First, is not very suspicious, no idea if it is a problem:
[ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 4352
[ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 4352
[ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 4352
[ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 4352
[ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 4352
[ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 4352
[ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 4352
[ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 4352
[ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 4352
[ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
[ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 4352
[ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
[ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 4352
[ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 4352
[ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 4352
[ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 4352
[ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 4352
Second is always linked with crashes, it is sk_mem_uncharge and
sk_forward_alloc goes negative. Patch to show message
for sk_mem_uncharge in sock.h is very simple:
static inline void sk_mem_uncharge(struct sock *sk, int size)
@@ -1480,6 +1485,8 @@
if (!sk_has_account(sk))
return;
sk->sk_forward_alloc += size;
+ if (sk->sk_forward_alloc < -8192)
+ printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge
negative %d by %d\n", sk, sk->sk_forward_alloc, size);
}
This is what i am usually setting before box are rebooted:
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.363437] sk ffff88155e904100
sk_mem_uncharge negative -2147482496 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.363837] sk ffff88155e904100
sk_mem_uncharge negative -2147480192 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.364232] sk ffff88155e904100
sk_mem_uncharge negative -2147477888 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.364627] sk ffff88155e904100
sk_mem_uncharge negative -2147475584 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365022] sk ffff88155e904100
sk_mem_uncharge negative -2147473280 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365416] sk ffff88155e904100
sk_mem_uncharge negative -2147470976 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365811] sk ffff88155e904100
sk_mem_uncharge negative -2147468672 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.366214] sk ffff88155e904100
sk_mem_uncharge negative -2147466368 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.366611] sk ffff88155e904100
sk_mem_uncharge negative -2147464064 by 2304
Nov 15 13:20:53 g1 user.warn kernel: [ 2002.367004] sk ffff88155e904100
sk_mem_uncharge negative -2147461760 by 2304
Nov 15 13:20:58 g1 user.warn kernel: [ 2007.363601] sk_mem_uncharge:
576170 callbacks suppressed
Nov 15 13:20:58 g1 user.warn kernel: [ 2007.364001] sk ffff88155e904100
sk_mem_uncharge negative -1517794432 by 2304
.....
until
...
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.393667] sk ffff88170141ad80
sk_mem_uncharge negative -169088 by 2944
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394064] sk ffff88170141ad80
sk_mem_uncharge negative -166144 by 2944
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394472] sk ffff88170141ad80
sk_mem_uncharge negative -163200 by 2944
Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394871] sk ffff88170141ad80
sk_mem_uncharge negative -158208 by 4992
<reboot>
Or:
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758129] sk ffff8815f8014e00
sk_mem_uncharge negative -2147483520 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758536] sk ffff8815f8014e00
sk_mem_uncharge negative -2147481216 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758935] sk ffff8815f8014e00
sk_mem_uncharge negative -2147478912 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.759332] sk ffff8815f8014e00
sk_mem_uncharge negative -2147476608 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.759728] sk ffff8815f8014e00
sk_mem_uncharge negative -2147474304 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.760124] sk ffff8815f8014e00
sk_mem_uncharge negative -2147472000 by 2304
Nov 15 14:55:01 g1 user.warn kernel: [ 1965.760518] sk ffff8815f8014e00
sk_mem_uncharge negative -2147469696 by 2304
...
Nov 15 15:03:27 g1 user.warn kernel: [ 2472.170857] sk ffff88163f429a00
sk_mem_uncharge negative -277169024 by 896
Nov 15 15:03:30 g1 user.warn kernel: [ 2474.611109] sk ffff88163f429a00
sk_mem_uncharge negative -277164928 by 896
Nov 15 15:03:30 g1 user.warn kernel: [ 2474.611511] sk ffff88163f429a00
sk_mem_uncharge negative -277164032 by 896
Nov 15 15:03:30 g1 user.warn kernel: [ 2474.706237] sk ffff88163f429a00
sk_mem_uncharge negative -277159936 by 2304
Nov 15 15:03:34 g1 user.warn kernel: [ 2478.889110] sk ffff88163f429a00
sk_mem_uncharge negative -277155840 by 2304
Nov 15 15:03:34 g1 user.warn kernel: [ 2478.911516] sk ffff88163f429a00
sk_mem_uncharge negative -277147648 by 1536
Nov 15 15:03:35 g1 user.warn kernel: [ 2479.320851] sk ffff88163f429a00
sk_mem_uncharge negative -277143552 by 2304
Nov 15 15:04:33 g1 user.warn kernel: [ 2537.408184] sk ffff88163f429a00
sk_mem_uncharge negative -277139456 by 2304
Nov 15 15:04:33 g1 user.warn kernel: [ 2537.409790] sk ffff88163f429a00
sk_mem_uncharge negative -277135360 by 896
(here i can see two sk at same time are like this)
Or
Nov 15 23:37:19 g1 user.warn kernel: [14137.049570] sk ffff8816b3312700
sk_mem_uncharge negative -2147481856 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.049973] sk ffff8816b3312700
sk_mem_uncharge negative -2147479552 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.050372] sk ffff8816b3312700
sk_mem_uncharge negative -2147477248 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.050770] sk ffff8816b3312700
sk_mem_uncharge negative -2147474944 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.051170] sk ffff8816b3312700
sk_mem_uncharge negative -2147472640 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.051572] sk ffff8816b3312700
sk_mem_uncharge negative -2147470336 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.051971] sk ffff8816b3312700
sk_mem_uncharge negative -2147468032 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.052371] sk ffff8816b3312700
sk_mem_uncharge negative -2147465728 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.052771] sk ffff8816b3312700
sk_mem_uncharge negative -2147463424 by 2304
Nov 15 23:37:19 g1 user.warn kernel: [14137.053169] sk ffff8816b3312700
sk_mem_uncharge negative -2147461120 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.526984] sk_mem_uncharge:
1083519 callbacks suppressed
Nov 15 23:37:32 g1 user.warn kernel: [14150.527384] sk ffff8816b3312700
sk_mem_uncharge negative -2147483392 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.527782] sk ffff8816b3312700
sk_mem_uncharge negative -2147481088 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.528179] sk ffff8816b3312700
sk_mem_uncharge negative -2147478784 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.528576] sk ffff8816b3312700
sk_mem_uncharge negative -2147476480 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.528975] sk ffff8816b3312700
sk_mem_uncharge negative -2147474176 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.529370] sk ffff8816b3312700
sk_mem_uncharge negative -2147471872 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.529770] sk ffff8816b3312700
sk_mem_uncharge negative -2147469568 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.530167] sk ffff8816b3312700
sk_mem_uncharge negative -2147467264 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.530572] sk ffff8816b3312700
sk_mem_uncharge negative -2147464960 by 2304
Nov 15 23:37:32 g1 user.warn kernel: [14150.530973] sk ffff8816b3312700
sk_mem_uncharge negative -2147462656 by 2304
<here it reboots much earlier, at this point>
I will try to get more info, what kind of socket it is.
On 2014-11-12 19:27, Eric Dumazet wrote:
> On Wed, 2014-11-12 at 19:07 +0200, Denys Fedoryshchenko wrote:
>> Hi
>>
>>
>> I've been able to trace invalid memory shown for sk_forward_alloc. Not
>> sure if it is related to bug when all tcp stack are wrecked on
>> highload
>> servers (i had even such crash on server with plain torrents, not that
>> much of them).
>> First they were appearing in ss output as f4294966016, but in tc it
>> shows as unsigned integer, while in kernel it is signed integer.
>> Should
>> i provide patch for iproute2?
>> After changing value to correct one, here is what is got:
>> skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0)
>> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0)
>>
>> So as it is signed integer, it is actually appears a lot as -1280
>> bytes.
>> After placing several WARN_ON_ONCE on fall functions involving
>> sk_forward_alloc, where negative value may
>> appear i traced down at least to tcp_connect_queue_skb().
>> [ 13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476
>> tcp_connect_queue_skb+0x9f/0xd0()
>> After adding some debug values, it was confirmed:
>>
>>
>> skb_header_release(skb);
>> __tcp_add_write_queue_tail(sk, skb);
>> sk->sk_wmem_queued += skb->truesize;
>> sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is
>> 0,
>> and skb->truesize is 1280, so -1280 is result
>> tp->write_seq = tcb->end_seq;
>> tp->packets_out += tcp_skb_pcount(skb);
>>
>> Maybe it should not be subtracted here? Or maybe logic are inverted
>> all
>> over the code?
>> Please help :)
>>
>
> Thanks a lot for the report, I am CCing other experts ;)
>
> AFAIK, a negative forward alloc should not be a problem, a long as the
> incursion is bound.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-16 8:54 ` Denys Fedoryshchenko
@ 2014-11-16 18:11 ` Eric Dumazet
2014-11-16 19:05 ` Denys Fedoryshchenko
2014-11-16 19:10 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet
0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-11-16 18:11 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev
On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote:
> As latest findings, when servers are going crazy because of tcp memory
> invalid accounting.
> First of all i upgraded kernel to latest version 3.17.3 and added also
> patch from upcoming kernel,
> "12) Don't call sock_kfree_s() with NULL pointers, this function also
> has the side effect of adjusting
> the socket memory usage. From Cong Wang.", but it didnt helped.
>
> I added printk_ratelimited to places where suspicious values might
> appear, and got some more information.
> First, is not very suspicious, no idea if it is a problem:
> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 4352
> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 4352
> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 4352
> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 4352
> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 4352
> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 4352
> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 4352
> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 4352
> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 4352
> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 4352
> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352
> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 4352
> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 4352
> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 4352
> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 4352
> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 4352
> Second is always linked with crashes, it is sk_mem_uncharge and
> sk_forward_alloc goes negative. Patch to show message
> for sk_mem_uncharge in sock.h is very simple:
>
> static inline void sk_mem_uncharge(struct sock *sk, int size)
> @@ -1480,6 +1485,8 @@
> if (!sk_has_account(sk))
> return;
> sk->sk_forward_alloc += size;
> + if (sk->sk_forward_alloc < -8192)
> + printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge
> negative %d by %d\n", sk, sk->sk_forward_alloc, size);
> }
>
Could you describe your hardware setup and networking setup ?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-16 18:11 ` Eric Dumazet
@ 2014-11-16 19:05 ` Denys Fedoryshchenko
2014-11-16 20:16 ` Eric Dumazet
2014-11-16 19:10 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet
1 sibling, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-16 19:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev
On 2014-11-16 20:11, Eric Dumazet wrote:
> On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote:
>> As latest findings, when servers are going crazy because of tcp memory
>> invalid accounting.
>> First of all i upgraded kernel to latest version 3.17.3 and added also
>> patch from upcoming kernel,
>> "12) Don't call sock_kfree_s() with NULL pointers, this function also
>> has the side effect of adjusting
>> the socket memory usage. From Cong Wang.", but it didnt helped.
>>
>> I added printk_ratelimited to places where suspicious values might
>> appear, and got some more information.
>> First, is not very suspicious, no idea if it is a problem:
>> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by
>> 4352
>> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by
>> 4352
>> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by
>> 4352
>> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by
>> 4352
>> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by
>> 4352
>> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by
>> 4352
>> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by
>> 4352
>> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by
>> 4352
>> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by
>> 4352
>> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by
>> 4352
>> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by
>> 4352
>> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by
>> 4352
>> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by
>> 4352
>> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by
>> 4352
>> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by
>> 4352
>> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by
>> 4352
>> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by
>> 4352
>> Second is always linked with crashes, it is sk_mem_uncharge and
>> sk_forward_alloc goes negative. Patch to show message
>> for sk_mem_uncharge in sock.h is very simple:
>>
>> static inline void sk_mem_uncharge(struct sock *sk, int size)
>> @@ -1480,6 +1485,8 @@
>> if (!sk_has_account(sk))
>> return;
>> sk->sk_forward_alloc += size;
>> + if (sk->sk_forward_alloc < -8192)
>> + printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge
>> negative %d by %d\n", sk, sk->sk_forward_alloc, size);
>> }
>>
>
>
> Could you describe your hardware setup and networking setup ?
This problem are happening on multiple different units that i am using
as https balancers, and all of them very different (except it is all
Intel CPU's, but even in that - different generations and models). Such
problem seems happens on all of them, and seems doesn't depend on
hardware (networking - igb, e1000e, broadcom stuff - all affected). But
if it is important:
S2600GZ motherboard, one E5-2620 Xeon
networking - onboard igb, 2 ports used
100GB RAM
This particular one has bonding (but it seems crashes with or without
it).
System are custom, running on USB flash, busybox+glibc based setup,
similar OS working for other purposes for NAT, PPPoE termination without
any issues.
What is common between failing units:
I am using haproxy-based HTTPS balancer(Also as i remember haproxy doing
a lot of setsockopt stuff), that is handling right now:
454444 connections established
Bandwidth passing thru is around 1Gbps.
I'm disabling tso/gso/gro on all interfaces.
The way i am forwarding transparent traffic to haproxy:
iptables -t mangle -A PREROUTING -p tcp --sport 443 -j MARK --set-mark
0x1
iptables -t mangle -A PREROUTING -p tcp --dport 443 -j MARK --set-mark
0x1
ip rule add fwmark 0x1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
"Typical" setup is
backend ssl_passthru
mode tcp
option transparent
source 0.0.0.0 usesrc clientip
frontend ssl-in
mode tcp
bind :443 transparent
default_backend ssl_passthru
option tcp-smart-accept
I hope i didnt missed something important. I can provide remote ssh
access to it.
I will keep sending info, just with hope that some of info maybe will
give idea, what i should patch or test.
P.S. Just got an idea now, that -2147483648 hinting that somewhere is
happening integer overflow from very large positive value, to negative.
I will try to set triggers also to that now.
If required i can provide image with such system. I am not sure you are
interested in this problem and if it can be reproduced on synthetic
setup, but as i remember this memory leak happened with me once also on
normal server with torrents (i left some image unattended for 2 weeks,
with a lot of requests, and it crashed at the end), so it might affect
also other use cases.
I am trying to limit now socket buffers, to see if it will decrease
frequency of crashes.
Also i tried to put "canary" values inside structure, near
sk_forward_alloc , to see if there is any sort of memory corruption
occuring on sk_forward_alloc, but seems there is no corruption.
I will try also going back to stable kernels 3.2.64, to see if it will
fix this problem, but testing takes sometimes almost 1 day, depends on
luck.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-16 18:11 ` Eric Dumazet
2014-11-16 19:05 ` Denys Fedoryshchenko
@ 2014-11-16 19:10 ` Eric Dumazet
1 sibling, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-11-16 19:10 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev
On Sun, 2014-11-16 at 10:11 -0800, Eric Dumazet wrote:
> Could you describe your hardware setup and networking setup ?
Hmmm, this looks like a bug in TCP fast open.
I am cooking a patch.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-16 19:05 ` Denys Fedoryshchenko
@ 2014-11-16 20:16 ` Eric Dumazet
2014-11-17 5:01 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-16 20:16 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev
On Sun, 2014-11-16 at 21:05 +0200, Denys Fedoryshchenko wrote:
> On 2014-11-16 20:11, Eric Dumazet wrote:
> > On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote:
> >> As latest findings, when servers are going crazy because of tcp memory
> >> invalid accounting.
> >> First of all i upgraded kernel to latest version 3.17.3 and added also
> >> patch from upcoming kernel,
> >> "12) Don't call sock_kfree_s() with NULL pointers, this function also
> >> has the side effect of adjusting
> >> the socket memory usage. From Cong Wang.", but it didnt helped.
> >>
> >> I added printk_ratelimited to places where suspicious values might
> >> appear, and got some more information.
> >> First, is not very suspicious, no idea if it is a problem:
> >> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by
> >> 4352
> >> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by
> >> 4352
> >> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by
> >> 4352
> >> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by
> >> 4352
> >> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by
> >> 4352
> >> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by
> >> 4352
> >> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by
> >> 4352
> >> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by
> >> 4352
> >> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by
> >> 4352
> >> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by
> >> 4352
> >> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by
> >> 4352
> >> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by
> >> 4352
> >> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by
> >> 4352
> >> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by
> >> 4352
> >> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by
> >> 4352
> >> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by
> >> 4352
> >> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by
> >> 4352
> >> Second is always linked with crashes, it is sk_mem_uncharge and
> >> sk_forward_alloc goes negative. Patch to show message
> >> for sk_mem_uncharge in sock.h is very simple:
> >>
> >> static inline void sk_mem_uncharge(struct sock *sk, int size)
> >> @@ -1480,6 +1485,8 @@
> >> if (!sk_has_account(sk))
> >> return;
> >> sk->sk_forward_alloc += size;
> >> + if (sk->sk_forward_alloc < -8192)
> >> + printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge
> >> negative %d by %d\n", sk, sk->sk_forward_alloc, size);
> >> }
> >>
> >
> >
> > Could you describe your hardware setup and networking setup ?
> This problem are happening on multiple different units that i am using
> as https balancers, and all of them very different (except it is all
> Intel CPU's, but even in that - different generations and models). Such
> problem seems happens on all of them, and seems doesn't depend on
> hardware (networking - igb, e1000e, broadcom stuff - all affected). But
> if it is important:
> S2600GZ motherboard, one E5-2620 Xeon
> networking - onboard igb, 2 ports used
> 100GB RAM
> This particular one has bonding (but it seems crashes with or without
> it).
>
> System are custom, running on USB flash, busybox+glibc based setup,
> similar OS working for other purposes for NAT, PPPoE termination without
> any issues.
>
> What is common between failing units:
>
> I am using haproxy-based HTTPS balancer(Also as i remember haproxy doing
> a lot of setsockopt stuff), that is handling right now:
> 454444 connections established
> Bandwidth passing thru is around 1Gbps.
>
> I'm disabling tso/gso/gro on all interfaces.
>
> The way i am forwarding transparent traffic to haproxy:
> iptables -t mangle -A PREROUTING -p tcp --sport 443 -j MARK --set-mark
> 0x1
> iptables -t mangle -A PREROUTING -p tcp --dport 443 -j MARK --set-mark
> 0x1
> ip rule add fwmark 0x1 lookup 100
> ip route add local 0.0.0.0/0 dev lo table 100
>
> "Typical" setup is
>
> backend ssl_passthru
> mode tcp
> option transparent
> source 0.0.0.0 usesrc clientip
>
> frontend ssl-in
> mode tcp
> bind :443 transparent
> default_backend ssl_passthru
> option tcp-smart-accept
>
> I hope i didnt missed something important. I can provide remote ssh
> access to it.
> I will keep sending info, just with hope that some of info maybe will
> give idea, what i should patch or test.
>
> P.S. Just got an idea now, that -2147483648 hinting that somewhere is
> happening integer overflow from very large positive value, to negative.
> I will try to set triggers also to that now.
>
> If required i can provide image with such system. I am not sure you are
> interested in this problem and if it can be reproduced on synthetic
> setup, but as i remember this memory leak happened with me once also on
> normal server with torrents (i left some image unattended for 2 weeks,
> with a lot of requests, and it crashed at the end), so it might affect
> also other use cases.
> I am trying to limit now socket buffers, to see if it will decrease
> frequency of crashes.
> Also i tried to put "canary" values inside structure, near
> sk_forward_alloc , to see if there is any sort of memory corruption
> occuring on sk_forward_alloc, but seems there is no corruption.
> I will try also going back to stable kernels 3.2.64, to see if it will
> fix this problem, but testing takes sometimes almost 1 day, depends on
> luck.
Thanks Denys !
Could you try following patch ?
Thanks !
net/ipv4/tcp_output.c | 33 +++++++++++----------------------
1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a3d453b94747..877eb4aa05a6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_fastopen_request *fo = tp->fastopen_req;
- int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
+ int syn_loss = 0, space, err = 0;
struct sk_buff *syn_data = NULL, *data;
unsigned long last_syn_loss = 0;
@@ -3031,25 +3031,17 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
/* limit to order-0 allocations */
space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
- syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
- sk->sk_allocation);
- if (syn_data == NULL)
+ syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+ if (!syn_data)
goto fallback;
- for (i = 0; i < iovlen && syn_data->len < space; ++i) {
- struct iovec *iov = &fo->data->msg_iov[i];
- unsigned char __user *from = iov->iov_base;
- int len = iov->iov_len;
-
- if (syn_data->len + len > space)
- len = space - syn_data->len;
- else if (i + 1 == iovlen)
- /* No more data pending in inet_wait_for_connect() */
- fo->data = NULL;
+ memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+ if (memcpy_fromiovec(skb_put(syn_data, space), fo->data->msg_iov, space))
+ goto fallback;
- if (skb_add_data(syn_data, from, len))
- goto fallback;
- }
+ /* No more data pending in inet_wait_for_connect() */
+ if (space == fo->size)
+ fo->data = NULL;
/* Queue a data-only packet after the regular SYN for retransmission */
data = pskb_copy(syn_data, sk->sk_allocation);
@@ -3101,13 +3093,10 @@ int tcp_connect(struct sock *sk)
return 0;
}
- buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
- if (unlikely(buff == NULL))
+ buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+ if (unlikely(!buff))
return -ENOBUFS;
- /* Reserve space for headers. */
- skb_reserve(buff, MAX_TCP_HEADER);
-
tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
tp->retrans_stamp = tcp_time_stamp;
tcp_connect_queue_skb(sk, buff);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-16 20:16 ` Eric Dumazet
@ 2014-11-17 5:01 ` Eric Dumazet
2014-11-17 10:22 ` Denys Fedoryshchenko
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-17 5:01 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev
On Sun, 2014-11-16 at 12:16 -0800, Eric Dumazet wrote:
> Thanks Denys !
>
> Could you try following patch ?
>
> Thanks !
Hmm.... I have an updated patch, sorry.
(A memcpy_fromiovec() has to be memcpy_fromiovecend() )
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a3d453b94747..c2bbfcd9c0db 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_fastopen_request *fo = tp->fastopen_req;
- int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
+ int syn_loss = 0, space, err = 0;
struct sk_buff *syn_data = NULL, *data;
unsigned long last_syn_loss = 0;
@@ -3031,25 +3031,19 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
/* limit to order-0 allocations */
space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
- syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
- sk->sk_allocation);
- if (syn_data == NULL)
+ syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+ if (!syn_data)
goto fallback;
- for (i = 0; i < iovlen && syn_data->len < space; ++i) {
- struct iovec *iov = &fo->data->msg_iov[i];
- unsigned char __user *from = iov->iov_base;
- int len = iov->iov_len;
-
- if (syn_data->len + len > space)
- len = space - syn_data->len;
- else if (i + 1 == iovlen)
- /* No more data pending in inet_wait_for_connect() */
- fo->data = NULL;
+ syn_data->ip_summed = CHECKSUM_PARTIAL;
+ memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+ if (memcpy_fromiovecend(skb_put(syn_data, space),
+ fo->data->msg_iov, 0, space))
+ goto fallback;
- if (skb_add_data(syn_data, from, len))
- goto fallback;
- }
+ /* No more data pending in inet_wait_for_connect() */
+ if (space == fo->size)
+ fo->data = NULL;
/* Queue a data-only packet after the regular SYN for retransmission */
data = pskb_copy(syn_data, sk->sk_allocation);
@@ -3101,13 +3095,10 @@ int tcp_connect(struct sock *sk)
return 0;
}
- buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
- if (unlikely(buff == NULL))
+ buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+ if (unlikely(!buff))
return -ENOBUFS;
- /* Reserve space for headers. */
- skb_reserve(buff, MAX_TCP_HEADER);
-
tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
tp->retrans_stamp = tcp_time_stamp;
tcp_connect_queue_skb(sk, buff);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-17 5:01 ` Eric Dumazet
@ 2014-11-17 10:22 ` Denys Fedoryshchenko
2014-11-17 15:33 ` Denys Fedoryshchenko
0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-17 10:22 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev
On 2014-11-17 07:01, Eric Dumazet wrote:
> On Sun, 2014-11-16 at 12:16 -0800, Eric Dumazet wrote:
>
>> Thanks Denys !
>>
>> Could you try following patch ?
>>
>> Thanks !
>
> Hmm.... I have an updated patch, sorry.
>
> (A memcpy_fromiovec() has to be memcpy_fromiovecend() )
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a3d453b94747..c2bbfcd9c0db 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk,
> struct sk_buff *syn)
> {
> struct tcp_sock *tp = tcp_sk(sk);
> struct tcp_fastopen_request *fo = tp->fastopen_req;
> - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
> + int syn_loss = 0, space, err = 0;
> struct sk_buff *syn_data = NULL, *data;
> unsigned long last_syn_loss = 0;
>
> @@ -3031,25 +3031,19 @@ static int tcp_send_syn_data(struct sock *sk,
> struct sk_buff *syn)
> /* limit to order-0 allocations */
> space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
>
> - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
> - sk->sk_allocation);
> - if (syn_data == NULL)
> + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
> + if (!syn_data)
> goto fallback;
>
> - for (i = 0; i < iovlen && syn_data->len < space; ++i) {
> - struct iovec *iov = &fo->data->msg_iov[i];
> - unsigned char __user *from = iov->iov_base;
> - int len = iov->iov_len;
> -
> - if (syn_data->len + len > space)
> - len = space - syn_data->len;
> - else if (i + 1 == iovlen)
> - /* No more data pending in inet_wait_for_connect() */
> - fo->data = NULL;
> + syn_data->ip_summed = CHECKSUM_PARTIAL;
> + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
> + if (memcpy_fromiovecend(skb_put(syn_data, space),
> + fo->data->msg_iov, 0, space))
> + goto fallback;
>
> - if (skb_add_data(syn_data, from, len))
> - goto fallback;
> - }
> + /* No more data pending in inet_wait_for_connect() */
> + if (space == fo->size)
> + fo->data = NULL;
>
> /* Queue a data-only packet after the regular SYN for retransmission
> */
> data = pskb_copy(syn_data, sk->sk_allocation);
> @@ -3101,13 +3095,10 @@ int tcp_connect(struct sock *sk)
> return 0;
> }
>
> - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
> - if (unlikely(buff == NULL))
> + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
> + if (unlikely(!buff))
> return -ENOBUFS;
>
> - /* Reserve space for headers. */
> - skb_reserve(buff, MAX_TCP_HEADER);
> -
> tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
> tp->retrans_stamp = tcp_time_stamp;
> tcp_connect_queue_skb(sk, buff);
Installed patch, but will have to wait a while (usually at least
24hours), to be sure if it is stable.
Thanks a lot!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug)
2014-11-17 10:22 ` Denys Fedoryshchenko
@ 2014-11-17 15:33 ` Denys Fedoryshchenko
[not found] ` <1416240651.5102.10.camel@edumazet-glaptop2.roam.corp.google.com>
0 siblings, 1 reply; 25+ messages in thread
From: Denys Fedoryshchenko @ 2014-11-17 15:33 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev
On 2014-11-17 12:22, Denys Fedoryshchenko wrote:
>> Hmm.... I have an updated patch, sorry.
>>
...
>
> Installed patch, but will have to wait a while (usually at least
> 24hours), to be sure if it is stable.
>
> Thanks a lot!
Tried updated patch, it seems crashed same after while with it too, and
on second test i noticed same value overflow.
In debug, after i added alert if sk_forward_alloc > 1147483648 i noticed
that on some sockets it continuously increasing this value until it will
overflow.
I can provide logs if it is interesting.
I will try to sysctl fastopen to zero, to make sure if it changes
anything.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next] tcp: make connect() mem charging friendly
[not found] ` <1416266755.18588.12.camel@edumazet-glaptop2.roam.corp.google.com>
@ 2014-11-17 23:49 ` Eric Dumazet
2014-11-18 7:06 ` [PATCH v2 " Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-17 23:49 UTC (permalink / raw)
To: Denys Fedoryshchenko, David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng
From: Eric Dumazet <edumazet@google.com>
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.
We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()
Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
Instead of open coding memcpy_fromiovecend(), simply use this helper.
This leaves in socket write queue clean fast clone skbs.
This was tested against our fastopen packetdrill tests.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_output.c | 66 +++++++++++++++-------------------------
1 file changed, 26 insertions(+), 40 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index eb73a1dccf56..377e15d0ba96 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_fastopen_request *fo = tp->fastopen_req;
- int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
- struct sk_buff *syn_data = NULL, *data;
+ int syn_loss = 0, space, err = 0;
unsigned long last_syn_loss = 0;
+ struct sk_buff *syn_data;
tp->rx_opt.mss_clamp = tp->advmss; /* If MSS is not cached */
tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie,
@@ -3044,48 +3044,38 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
/* limit to order-0 allocations */
space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
- syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
- sk->sk_allocation);
- if (syn_data == NULL)
+ syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+ if (!syn_data)
+ goto fallback;
+ syn_data->ip_summed = CHECKSUM_PARTIAL;
+ memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+ if (memcpy_fromiovecend(skb_put(syn_data, space),
+ fo->data->msg_iov, 0, space))
goto fallback;
- for (i = 0; i < iovlen && syn_data->len < space; ++i) {
- struct iovec *iov = &fo->data->msg_iov[i];
- unsigned char __user *from = iov->iov_base;
- int len = iov->iov_len;
+ /* No more data pending in inet_wait_for_connect() */
+ if (space == fo->size)
+ fo->data = NULL;
+ fo->copied = space;
- if (syn_data->len + len > space)
- len = space - syn_data->len;
- else if (i + 1 == iovlen)
- /* No more data pending in inet_wait_for_connect() */
- fo->data = NULL;
+ tcp_connect_queue_skb(sk, syn_data);
- if (skb_add_data(syn_data, from, len))
- goto fallback;
- }
+ err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
- /* Queue a data-only packet after the regular SYN for retransmission */
- data = pskb_copy(syn_data, sk->sk_allocation);
- if (data == NULL)
- goto fallback;
- TCP_SKB_CB(data)->seq++;
- TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN;
- TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH);
- tcp_connect_queue_skb(sk, data);
- fo->copied = data->len;
-
- /* syn_data is about to be sent, we need to take current time stamps
- * for the packets that are in write queue : SYN packet and DATA
- */
- skb_mstamp_get(&syn->skb_mstamp);
- data->skb_mstamp = syn->skb_mstamp;
+ syn->skb_mstamp = syn_data->skb_mstamp;
- if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) {
+ /* Now full SYN+DATA was cloned and sent (or not),
+ * remove the SYN from the original skb (syn_data)
+ * we keep in write queue in case of a retransmit, as we
+ * also have the SYN packet (with no data) in the same queue.
+ */
+ TCP_SKB_CB(syn_data)->seq++;
+ TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH;
+ if (!err) {
tp->syn_data = (fo->copied > 0);
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT);
goto done;
}
- syn_data = NULL;
fallback:
/* Send a regular SYN with Fast Open cookie request option */
@@ -3094,7 +3084,6 @@ fallback:
err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation);
if (err)
tp->syn_fastopen = 0;
- kfree_skb(syn_data);
done:
fo->cookie.len = -1; /* Exclude Fast Open option for SYN retries */
return err;
@@ -3114,13 +3103,10 @@ int tcp_connect(struct sock *sk)
return 0;
}
- buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
- if (unlikely(buff == NULL))
+ buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+ if (unlikely(!buff))
return -ENOBUFS;
- /* Reserve space for headers. */
- skb_reserve(buff, MAX_TCP_HEADER);
-
tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
tp->retrans_stamp = tcp_time_stamp;
tcp_connect_queue_skb(sk, buff);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 net-next] tcp: make connect() mem charging friendly
2014-11-17 23:49 ` [PATCH net-next] tcp: make connect() mem charging friendly Eric Dumazet
@ 2014-11-18 7:06 ` Eric Dumazet
2014-11-19 6:10 ` Yuchung Cheng
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-18 7:06 UTC (permalink / raw)
To: Denys Fedoryshchenko, David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng
From: Eric Dumazet <edumazet@google.com>
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.
We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()
Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
Instead of open coding memcpy_fromiovecend(), simply use this helper.
This leaves in socket write queue clean fast clone skbs.
This was tested against our fastopen packetdrill tests.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2: added a kfree_skb(syn_data) if memcpy_fromiovecend() fails,
as spotted by Yuchung.
net/ipv4/tcp_output.c | 68 ++++++++++++++++------------------------
1 file changed, 28 insertions(+), 40 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index eb73a1dccf56b823a45c0ca034e40dc50fc48068..f5bd4bd3f7e669b3fd48a843d55e7313a30a3409 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_fastopen_request *fo = tp->fastopen_req;
- int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
- struct sk_buff *syn_data = NULL, *data;
+ int syn_loss = 0, space, err = 0;
unsigned long last_syn_loss = 0;
+ struct sk_buff *syn_data;
tp->rx_opt.mss_clamp = tp->advmss; /* If MSS is not cached */
tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie,
@@ -3044,48 +3044,40 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
/* limit to order-0 allocations */
space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
- syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
- sk->sk_allocation);
- if (syn_data == NULL)
+ syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
+ if (!syn_data)
goto fallback;
+ syn_data->ip_summed = CHECKSUM_PARTIAL;
+ memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
+ if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
+ fo->data->msg_iov, 0, space))) {
+ kfree_skb(syn_data);
+ goto fallback;
+ }
- for (i = 0; i < iovlen && syn_data->len < space; ++i) {
- struct iovec *iov = &fo->data->msg_iov[i];
- unsigned char __user *from = iov->iov_base;
- int len = iov->iov_len;
+ /* No more data pending in inet_wait_for_connect() */
+ if (space == fo->size)
+ fo->data = NULL;
+ fo->copied = space;
- if (syn_data->len + len > space)
- len = space - syn_data->len;
- else if (i + 1 == iovlen)
- /* No more data pending in inet_wait_for_connect() */
- fo->data = NULL;
+ tcp_connect_queue_skb(sk, syn_data);
- if (skb_add_data(syn_data, from, len))
- goto fallback;
- }
+ err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
- /* Queue a data-only packet after the regular SYN for retransmission */
- data = pskb_copy(syn_data, sk->sk_allocation);
- if (data == NULL)
- goto fallback;
- TCP_SKB_CB(data)->seq++;
- TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN;
- TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH);
- tcp_connect_queue_skb(sk, data);
- fo->copied = data->len;
-
- /* syn_data is about to be sent, we need to take current time stamps
- * for the packets that are in write queue : SYN packet and DATA
- */
- skb_mstamp_get(&syn->skb_mstamp);
- data->skb_mstamp = syn->skb_mstamp;
+ syn->skb_mstamp = syn_data->skb_mstamp;
- if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) {
+ /* Now full SYN+DATA was cloned and sent (or not),
+ * remove the SYN from the original skb (syn_data)
+ * we keep in write queue in case of a retransmit, as we
+ * also have the SYN packet (with no data) in the same queue.
+ */
+ TCP_SKB_CB(syn_data)->seq++;
+ TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH;
+ if (!err) {
tp->syn_data = (fo->copied > 0);
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT);
goto done;
}
- syn_data = NULL;
fallback:
/* Send a regular SYN with Fast Open cookie request option */
@@ -3094,7 +3086,6 @@ fallback:
err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation);
if (err)
tp->syn_fastopen = 0;
- kfree_skb(syn_data);
done:
fo->cookie.len = -1; /* Exclude Fast Open option for SYN retries */
return err;
@@ -3114,13 +3105,10 @@ int tcp_connect(struct sock *sk)
return 0;
}
- buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
- if (unlikely(buff == NULL))
+ buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+ if (unlikely(!buff))
return -ENOBUFS;
- /* Reserve space for headers. */
- skb_reserve(buff, MAX_TCP_HEADER);
-
tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
tp->retrans_stamp = tcp_time_stamp;
tcp_connect_queue_skb(sk, buff);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly
2014-11-18 7:06 ` [PATCH v2 " Eric Dumazet
@ 2014-11-19 6:10 ` Yuchung Cheng
2014-11-19 19:57 ` David Miller
0 siblings, 1 reply; 25+ messages in thread
From: Yuchung Cheng @ 2014-11-19 6:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Denys Fedoryshchenko, David Miller, netdev, Neal Cardwell
On Tue, Nov 18, 2014 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While working on sk_forward_alloc problems reported by Denys
> Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
> sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
> sk_forward_alloc is negative while connect is in progress.
>
> We can fix this by calling regular sk_stream_alloc_skb() both for the
> SYN packet (in tcp_connect()) and the syn_data packet in
> tcp_send_syn_data()
>
> Then, tcp_send_syn_data() can avoid copying syn_data as we simply
> can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
>
> Instead of open coding memcpy_fromiovecend(), simply use this helper.
>
> This leaves in socket write queue clean fast clone skbs.
>
> This was tested against our fastopen packetdrill tests.
>
> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Thanks! this simplifies the code a lot.
> ---
> v2: added a kfree_skb(syn_data) if memcpy_fromiovecend() fails,
> as spotted by Yuchung.
>
> net/ipv4/tcp_output.c | 68 ++++++++++++++++------------------------
> 1 file changed, 28 insertions(+), 40 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index eb73a1dccf56b823a45c0ca034e40dc50fc48068..f5bd4bd3f7e669b3fd48a843d55e7313a30a3409 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
> {
> struct tcp_sock *tp = tcp_sk(sk);
> struct tcp_fastopen_request *fo = tp->fastopen_req;
> - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen;
> - struct sk_buff *syn_data = NULL, *data;
> + int syn_loss = 0, space, err = 0;
> unsigned long last_syn_loss = 0;
> + struct sk_buff *syn_data;
>
> tp->rx_opt.mss_clamp = tp->advmss; /* If MSS is not cached */
> tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie,
> @@ -3044,48 +3044,40 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
> /* limit to order-0 allocations */
> space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
>
> - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
> - sk->sk_allocation);
> - if (syn_data == NULL)
> + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation);
> + if (!syn_data)
> goto fallback;
> + syn_data->ip_summed = CHECKSUM_PARTIAL;
> + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
> + if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
> + fo->data->msg_iov, 0, space))) {
> + kfree_skb(syn_data);
> + goto fallback;
> + }
>
> - for (i = 0; i < iovlen && syn_data->len < space; ++i) {
> - struct iovec *iov = &fo->data->msg_iov[i];
> - unsigned char __user *from = iov->iov_base;
> - int len = iov->iov_len;
> + /* No more data pending in inet_wait_for_connect() */
> + if (space == fo->size)
> + fo->data = NULL;
> + fo->copied = space;
>
> - if (syn_data->len + len > space)
> - len = space - syn_data->len;
> - else if (i + 1 == iovlen)
> - /* No more data pending in inet_wait_for_connect() */
> - fo->data = NULL;
> + tcp_connect_queue_skb(sk, syn_data);
>
> - if (skb_add_data(syn_data, from, len))
> - goto fallback;
> - }
> + err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
>
> - /* Queue a data-only packet after the regular SYN for retransmission */
> - data = pskb_copy(syn_data, sk->sk_allocation);
> - if (data == NULL)
> - goto fallback;
> - TCP_SKB_CB(data)->seq++;
> - TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN;
> - TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH);
> - tcp_connect_queue_skb(sk, data);
> - fo->copied = data->len;
> -
> - /* syn_data is about to be sent, we need to take current time stamps
> - * for the packets that are in write queue : SYN packet and DATA
> - */
> - skb_mstamp_get(&syn->skb_mstamp);
> - data->skb_mstamp = syn->skb_mstamp;
> + syn->skb_mstamp = syn_data->skb_mstamp;
>
> - if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) {
> + /* Now full SYN+DATA was cloned and sent (or not),
> + * remove the SYN from the original skb (syn_data)
> + * we keep in write queue in case of a retransmit, as we
> + * also have the SYN packet (with no data) in the same queue.
> + */
> + TCP_SKB_CB(syn_data)->seq++;
> + TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH;
> + if (!err) {
> tp->syn_data = (fo->copied > 0);
> NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT);
> goto done;
> }
> - syn_data = NULL;
>
> fallback:
> /* Send a regular SYN with Fast Open cookie request option */
> @@ -3094,7 +3086,6 @@ fallback:
> err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation);
> if (err)
> tp->syn_fastopen = 0;
> - kfree_skb(syn_data);
> done:
> fo->cookie.len = -1; /* Exclude Fast Open option for SYN retries */
> return err;
> @@ -3114,13 +3105,10 @@ int tcp_connect(struct sock *sk)
> return 0;
> }
>
> - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
> - if (unlikely(buff == NULL))
> + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
> + if (unlikely(!buff))
> return -ENOBUFS;
>
> - /* Reserve space for headers. */
> - skb_reserve(buff, MAX_TCP_HEADER);
> -
> tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
> tp->retrans_stamp = tcp_time_stamp;
> tcp_connect_queue_skb(sk, buff);
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly
2014-11-19 6:10 ` Yuchung Cheng
@ 2014-11-19 19:57 ` David Miller
2014-11-20 15:33 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2014-11-19 19:57 UTC (permalink / raw)
To: ycheng; +Cc: eric.dumazet, nuclearcat, netdev, ncardwell
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 19 Nov 2014 14:10:42 +0800
> On Tue, Nov 18, 2014 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> While working on sk_forward_alloc problems reported by Denys
>> Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
>> sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
>> sk_forward_alloc is negative while connect is in progress.
>>
>> We can fix this by calling regular sk_stream_alloc_skb() both for the
>> SYN packet (in tcp_connect()) and the syn_data packet in
>> tcp_send_syn_data()
>>
>> Then, tcp_send_syn_data() can avoid copying syn_data as we simply
>> can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
>>
>> Instead of open coding memcpy_fromiovecend(), simply use this helper.
>>
>> This leaves in socket write queue clean fast clone skbs.
>>
>> This was tested against our fastopen packetdrill tests.
>>
>> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
>
> Thanks! this simplifies the code a lot.
Agreed, applied, thanks everyone!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly
2014-11-19 19:57 ` David Miller
@ 2014-11-20 15:33 ` Eric Dumazet
2014-11-20 16:23 ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-20 15:33 UTC (permalink / raw)
To: David Miller; +Cc: ycheng, nuclearcat, netdev, ncardwell
On Wed, 2014-11-19 at 14:57 -0500, David Miller wrote:
> From: Yuchung Cheng <ycheng@google.com>
> >
> > Thanks! this simplifies the code a lot.
>
> Agreed, applied, thanks everyone!
BTW, unless I am mistaken, it seems we can probably replace
memcpy_fromiovecend() by memcpy_fromiovec() and delete from
tcp_sendmsg() the annoying code skipping over the already consumed
bytes.
Something like this untested patch :
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c239f4740d10b10b67ef4fa44c831851fb9e1dcf..227540eef9d0870721258f9ddbace27b417c619e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1089,20 +1089,19 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
int iovlen, flags, err, copied = 0;
- int mss_now = 0, size_goal, copied_syn = 0, offset = 0;
+ int mss_now = 0, size_goal, copied_syn = 0;
bool sg;
long timeo;
lock_sock(sk);
flags = msg->msg_flags;
- if (flags & MSG_FASTOPEN) {
+ if (unlikely(flags & MSG_FASTOPEN)) {
err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size);
if (err == -EINPROGRESS && copied_syn > 0)
goto out;
else if (err)
goto out_err;
- offset = copied_syn;
}
timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
@@ -1151,15 +1150,6 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
unsigned char __user *from = iov->iov_base;
iov++;
- if (unlikely(offset > 0)) { /* Skip bytes copied in SYN */
- if (offset >= seglen) {
- offset -= seglen;
- continue;
- }
- seglen -= offset;
- from += offset;
- offset = 0;
- }
while (seglen > 0) {
int copy = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f5bd4bd3f7e669b3fd48a843d55e7313a30a3409..524e5b657e881a348f11def3f48f29a76f54fbab 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3049,8 +3049,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
goto fallback;
syn_data->ip_summed = CHECKSUM_PARTIAL;
memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
- if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
- fo->data->msg_iov, 0, space))) {
+ if (unlikely(memcpy_fromiovec(skb_put(syn_data, space),
+ fo->data->msg_iov, space))) {
kfree_skb(syn_data);
goto fallback;
}
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code
2014-11-20 15:33 ` Eric Dumazet
@ 2014-11-20 16:23 ` Eric Dumazet
2014-11-20 23:05 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2014-11-20 16:23 UTC (permalink / raw)
To: David Miller; +Cc: ycheng, nuclearcat, netdev, ncardwell
From: Eric Dumazet <edumazet@google.com>
If we consume iovec bytes in tcp_send_syn_data(), we can remove
annoying fastopen code in tcp_sendmsg() skipping over the already
consumed bytes.
Also add an unlikely(flags & MSG_FASTOPEN), as most TCP sendmsg() do not
ask for FASTOPEN.
Tested:
Ran our 125 packetdrill fastopen tests
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp.c | 14 ++------------
net/ipv4/tcp_output.c | 4 ++--
2 files changed, 4 insertions(+), 14 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c239f4740d10b10b67ef4fa44c831851fb9e1dcf..227540eef9d0870721258f9ddbace27b417c619e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1089,20 +1089,19 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
int iovlen, flags, err, copied = 0;
- int mss_now = 0, size_goal, copied_syn = 0, offset = 0;
+ int mss_now = 0, size_goal, copied_syn = 0;
bool sg;
long timeo;
lock_sock(sk);
flags = msg->msg_flags;
- if (flags & MSG_FASTOPEN) {
+ if (unlikely(flags & MSG_FASTOPEN)) {
err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size);
if (err == -EINPROGRESS && copied_syn > 0)
goto out;
else if (err)
goto out_err;
- offset = copied_syn;
}
timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
@@ -1151,15 +1150,6 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
unsigned char __user *from = iov->iov_base;
iov++;
- if (unlikely(offset > 0)) { /* Skip bytes copied in SYN */
- if (offset >= seglen) {
- offset -= seglen;
- continue;
- }
- seglen -= offset;
- from += offset;
- offset = 0;
- }
while (seglen > 0) {
int copy = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f5bd4bd3f7e669b3fd48a843d55e7313a30a3409..524e5b657e881a348f11def3f48f29a76f54fbab 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3049,8 +3049,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
goto fallback;
syn_data->ip_summed = CHECKSUM_PARTIAL;
memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
- if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
- fo->data->msg_iov, 0, space))) {
+ if (unlikely(memcpy_fromiovec(skb_put(syn_data, space),
+ fo->data->msg_iov, space))) {
kfree_skb(syn_data);
goto fallback;
}
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code
2014-11-20 16:23 ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet
@ 2014-11-20 23:05 ` Eric Dumazet
0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2014-11-20 23:05 UTC (permalink / raw)
To: David Miller, Al Viro; +Cc: ycheng, nuclearcat, netdev, ncardwell
On Thu, 2014-11-20 at 08:23 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> If we consume iovec bytes in tcp_send_syn_data(), we can remove
> annoying fastopen code in tcp_sendmsg() skipping over the already
> consumed bytes.
>
> Also add an unlikely(flags & MSG_FASTOPEN), as most TCP sendmsg() do not
> ask for FASTOPEN.
>
> Tested:
> Ran our 125 packetdrill fastopen tests
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
Please disregard this patch, Al Viro said sendmsg() was apparently not
allowed to change iovec.
Some callers depend on it staying constant.
Not clear why memcpy_fromiovec() even exists.
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-11-20 23:05 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-17 10:52 /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? Denys Fedoryshchenko
2014-07-17 11:51 ` Eric Dumazet
2014-07-17 12:13 ` Denys Fedoryshchenko
2014-07-17 13:12 ` Denys Fedoryshchenko
2014-07-17 13:42 ` Eric Dumazet
2014-07-18 8:12 ` Denys Fedoryshchenko
2014-07-18 8:41 ` Eric Dumazet
2014-09-23 20:14 ` Semen Slepov
[not found] ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <0b0b74ab8d2222fa85b23e06c6e992ec@visp.net.lb>
[not found] ` <1405849675.10255.100.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <6fc086a122bc3ac728a7aec9e812310f@visp.net.lb>
[not found] ` <1406016045.10255.122.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <3f463c3912608f7959481edaa6d7668d@visp.net.lb>
[not found] ` <1406194817.3363.70.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <8f138d88e3b850130eb9bb15e3e27069@visp.net.lb>
2014-11-12 17:07 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Denys Fedoryshchenko
2014-11-12 17:27 ` Eric Dumazet
2014-11-16 8:54 ` Denys Fedoryshchenko
2014-11-16 18:11 ` Eric Dumazet
2014-11-16 19:05 ` Denys Fedoryshchenko
2014-11-16 20:16 ` Eric Dumazet
2014-11-17 5:01 ` Eric Dumazet
2014-11-17 10:22 ` Denys Fedoryshchenko
2014-11-17 15:33 ` Denys Fedoryshchenko
[not found] ` <1416240651.5102.10.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <c9a009556b97fcf7399eb73035c7f8a0@visp.net.lb>
[not found] ` <1416243011.5102.19.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <34058cb0ee676b891dd9d54515512733@visp.net.lb>
[not found] ` <0487f30ffd58f2ee9b47135fa5404d25@visp.net.lb>
[not found] ` <1416245491.18588.1.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <ecd48e5dd70f9f83a4a4db3d758cc615@visp.net.lb>
[not found] ` <b2bf14dfc3180a316dd9c9bd89d02c40@visp.net.lb>
[not found] ` <1416252989.18588.2.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <2a4554e39649ac6dfbf2e0366794af08@visp.net.lb>
[not found] ` <1416254490.18588.6.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <ed049b2f9ecc47676e25ec8aeafb25c0@visp.net.lb>
[not found] ` <1f001a887ec3fb5b962e4374391863b9@visp.net.lb>
[not found] ` <e33542339748d2f51a149a4020ee5f75@visp.net.lb>
[not found] ` <57f78d76af5f78317ba69b613123ff9c@visp.net.lb>
[not found] ` <d05242d8a607d4e15f3dd9200223699f@visp.net.lb>
[not found] ` <1416266755.18588.12.camel@edumazet-glaptop2.roam.corp.google.com>
2014-11-17 23:49 ` [PATCH net-next] tcp: make connect() mem charging friendly Eric Dumazet
2014-11-18 7:06 ` [PATCH v2 " Eric Dumazet
2014-11-19 6:10 ` Yuchung Cheng
2014-11-19 19:57 ` David Miller
2014-11-20 15:33 ` Eric Dumazet
2014-11-20 16:23 ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet
2014-11-20 23:05 ` Eric Dumazet
2014-11-16 19:10 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).