All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
@ 2016-03-30  5:16 Yang Yingliang
  2016-03-30  5:25 ` Eric Dumazet
  2016-03-30 12:56 ` Sergei Shtylyov
  0 siblings, 2 replies; 21+ messages in thread
From: Yang Yingliang @ 2016-03-30  5:16 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet

When task A hold the sk owned in tcp_sendmsg, if lots of packets
arrive and the packets will be added to backlog queue. The packets
will be handled in release_sock called from tcp_sendmsg. When the
sk_backlog is removed from sk, the length will not decrease until
all the packets in backlog queue are handled. This may leads to the
new packets be dropped because the lenth is too big. So set the
lenth to 0 immediately after it's detached from sk.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 net/core/sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 47fc8bb..108be05 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
 
 	do {
 		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
+		sk->sk_backlog.len = 0;
 		bh_unlock_sock(sk);
 
 		do {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30  5:16 [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk Yang Yingliang
@ 2016-03-30  5:25 ` Eric Dumazet
  2016-03-30  5:34   ` Eric Dumazet
  2016-03-30  5:38   ` Yang Yingliang
  2016-03-30 12:56 ` Sergei Shtylyov
  1 sibling, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2016-03-30  5:25 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem

On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
> When task A hold the sk owned in tcp_sendmsg, if lots of packets
> arrive and the packets will be added to backlog queue. The packets
> will be handled in release_sock called from tcp_sendmsg. When the
> sk_backlog is removed from sk, the length will not decrease until
> all the packets in backlog queue are handled. This may leads to the
> new packets be dropped because the lenth is too big. So set the
> lenth to 0 immediately after it's detached from sk.
> 
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  net/core/sock.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 47fc8bb..108be05 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
>  
>  	do {
>  		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
> +		sk->sk_backlog.len = 0;
>  		bh_unlock_sock(sk);
>  
>  		do {

Certainly not.

Have you really missed the comment ?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871


I do not believe the case you describe can happen, unless a misbehaving
driver cooks fat skb (with skb->truesize being far more bigger than
skb->len)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30  5:25 ` Eric Dumazet
@ 2016-03-30  5:34   ` Eric Dumazet
  2016-03-30  5:56     ` Yang Yingliang
  2016-03-30  5:38   ` Yang Yingliang
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2016-03-30  5:34 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem

On Tue, 2016-03-29 at 22:25 -0700, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
> > When task A hold the sk owned in tcp_sendmsg, if lots of packets
> > arrive and the packets will be added to backlog queue. The packets
> > will be handled in release_sock called from tcp_sendmsg. When the
> > sk_backlog is removed from sk, the length will not decrease until
> > all the packets in backlog queue are handled. This may leads to the
> > new packets be dropped because the lenth is too big. So set the
> > lenth to 0 immediately after it's detached from sk.
> > 
> > Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> > ---
> >  net/core/sock.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 47fc8bb..108be05 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
> >  
> >  	do {
> >  		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
> > +		sk->sk_backlog.len = 0;
> >  		bh_unlock_sock(sk);
> >  
> >  		do {
> 
> Certainly not.
> 
> Have you really missed the comment ?
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871
> 
> 
> I do not believe the case you describe can happen, unless a misbehaving
> driver cooks fat skb (with skb->truesize being far more bigger than
> skb->len)
> 

And also make sure you backported
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da882c1f2ecadb0ed582628ec1585e36b137c0f0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30  5:25 ` Eric Dumazet
  2016-03-30  5:34   ` Eric Dumazet
@ 2016-03-30  5:38   ` Yang Yingliang
  1 sibling, 0 replies; 21+ messages in thread
From: Yang Yingliang @ 2016-03-30  5:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem



On 2016/3/30 13:25, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
>> When task A hold the sk owned in tcp_sendmsg, if lots of packets
>> arrive and the packets will be added to backlog queue. The packets
>> will be handled in release_sock called from tcp_sendmsg. When the
>> sk_backlog is removed from sk, the length will not decrease until
>> all the packets in backlog queue are handled. This may leads to the
>> new packets be dropped because the lenth is too big. So set the
>> lenth to 0 immediately after it's detached from sk.
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   net/core/sock.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 47fc8bb..108be05 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
>>
>>   	do {
>>   		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
>> +		sk->sk_backlog.len = 0;
>>   		bh_unlock_sock(sk);
>>
>>   		do {
>
> Certainly not.
>
> Have you really missed the comment ?
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871

My kernel is 4.1 LTS, it seems don't have this patch. I will try this 
patch later.

Thanks
Yang
>
>
> I do not believe the case you describe can happen, unless a misbehaving
> driver cooks fat skb (with skb->truesize being far more bigger than
> skb->len)
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30  5:34   ` Eric Dumazet
@ 2016-03-30  5:56     ` Yang Yingliang
  2016-03-30 13:47       ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Yingliang @ 2016-03-30  5:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem



On 2016/3/30 13:34, Eric Dumazet wrote:
> On Tue, 2016-03-29 at 22:25 -0700, Eric Dumazet wrote:
>> On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
>>> When task A hold the sk owned in tcp_sendmsg, if lots of packets
>>> arrive and the packets will be added to backlog queue. The packets
>>> will be handled in release_sock called from tcp_sendmsg. When the
>>> sk_backlog is removed from sk, the length will not decrease until
>>> all the packets in backlog queue are handled. This may leads to the
>>> new packets be dropped because the lenth is too big. So set the
>>> lenth to 0 immediately after it's detached from sk.
>>>
>>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>>> ---
>>>   net/core/sock.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 47fc8bb..108be05 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
>>>
>>>   	do {
>>>   		sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
>>> +		sk->sk_backlog.len = 0;
>>>   		bh_unlock_sock(sk);
>>>
>>>   		do {
>>
>> Certainly not.
>>
>> Have you really missed the comment ?
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871
>>
>>
>> I do not believe the case you describe can happen, unless a misbehaving
>> driver cooks fat skb (with skb->truesize being far more bigger than
>> skb->len)
>>
>
> And also make sure you backported
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da882c1f2ecadb0ed582628ec1585e36b137c0f0

Sorry, I made a mistake. I am very sure my kernel has these two patches.
And I can get some dropping of the packets in 10Gb eth.

# netstat -s | grep -i backlog
     TCPBacklogDrop: 4135
# netstat -s | grep -i backlog
     TCPBacklogDrop: 4167


>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30  5:16 [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk Yang Yingliang
  2016-03-30  5:25 ` Eric Dumazet
@ 2016-03-30 12:56 ` Sergei Shtylyov
  2016-04-07  6:01   ` Yang Yingliang
  1 sibling, 1 reply; 21+ messages in thread
From: Sergei Shtylyov @ 2016-03-30 12:56 UTC (permalink / raw)
  To: Yang Yingliang, netdev; +Cc: davem, eric.dumazet

Hello.

On 3/30/2016 8:16 AM, Yang Yingliang wrote:

> When task A hold the sk owned in tcp_sendmsg, if lots of packets
> arrive and the packets will be added to backlog queue. The packets
> will be handled in release_sock called from tcp_sendmsg. When the
> sk_backlog is removed from sk, the length will not decrease until
> all the packets in backlog queue are handled. This may leads to the
> new packets be dropped because the lenth is too big. So set the
> lenth to 0 immediately after it's detached from sk.

    Length?

> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30  5:56     ` Yang Yingliang
@ 2016-03-30 13:47       ` Eric Dumazet
  2016-04-07  5:59         ` Yang Yingliang
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2016-03-30 13:47 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem

On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:

> Sorry, I made a mistake. I am very sure my kernel has these two patches.
> And I can get some dropping of the packets in 10Gb eth.
> 
> # netstat -s | grep -i backlog
>      TCPBacklogDrop: 4135
> # netstat -s | grep -i backlog
>      TCPBacklogDrop: 4167

Sender will retransmit and the receiver backlog will lilely be emptied
before the packets arrive again.

Are you sure these are TCP drops ?

Which 10Gb NIC is it ? (ethtool -i eth0)

What is the max size of sendmsg() chunks are generated by your apps ?

Are they forcing small SO_RCVBUF or SO_SNDBUF ?

What percentage of drops do you have ?

Here (at Google), we have less than one backlog drop per billion
packets, on host facing the public Internet.

If a TCP sender sends a burst of tiny packets because it is misbehaving,
you absolutely will drop packets, especially if applications use
sendmsg() with very big lengths and big SO_SNDBUF.

Trying to not drop these hostile packets as you did is simply opening
your host to DOS attacks.

Eventually, we should even drop earlier in TCP stack (before taking
socket lock).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30 13:47       ` Eric Dumazet
@ 2016-04-07  5:59         ` Yang Yingliang
  2016-04-07 10:21           ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Yingliang @ 2016-04-07  5:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong



On 2016/3/30 21:47, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:
>
>> Sorry, I made a mistake. I am very sure my kernel has these two patches.
>> And I can get some dropping of the packets in 10Gb eth.
>>
>> # netstat -s | grep -i backlog
>>       TCPBacklogDrop: 4135
>> # netstat -s | grep -i backlog
>>       TCPBacklogDrop: 4167
>
> Sender will retransmit and the receiver backlog will lilely be emptied
> before the packets arrive again.
>
> Are you sure these are TCP drops ?
Yes.

>
> Which 10Gb NIC is it ? (ethtool -i eth0)
The NIC driver is not upstream. And my system is arm64.

>
> What is the max size of sendmsg() chunks are generated by your apps ?
256KB

>
> Are they forcing small SO_RCVBUF or SO_SNDBUF ?
I am not sure.
I add some debug message in kernel:
[2016-04-06 10:56:55][ 1365.477140] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12402232 rmem_alloc:0 truesize:53320
[2016-04-06 10:56:55][ 1365.477170] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12460884 rmem_alloc:55986 truesize:58652
[2016-04-06 10:56:55][ 1365.477192] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12506206 rmem_alloc:0 truesize:45322
[2016-04-06 10:56:55][ 1365.477226] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12519536 rmem_alloc:7998 truesize:13330
[2016-04-06 10:56:55][ 1365.477254] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12575522 rmem_alloc:0 truesize:55986
[2016-04-06 10:56:55][ 1365.477282] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477301] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:26660 truesize:31992
[2016-04-06 10:56:55][ 1365.477321] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:26660
[2016-04-06 10:56:55][ 1365.477341] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:42656
[2016-04-06 10:56:55][ 1365.477384] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477403] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:34658

>
> What percentage of drops do you have ?
netstat -s | grep -i TCPBacklogDrop increases 20-40 per second.
It's about 1.2% (117724(TCPBacklogDrop)/214502873(InSegs of cat 
/proc/net/snmp)).

>
> Here (at Google), we have less than one backlog drop per billion
> packets, on host facing the public Internet.
>
> If a TCP sender sends a burst of tiny packets because it is misbehaving,
> you absolutely will drop packets, especially if applications use
> sendmsg() with very big lengths and big SO_SNDBUF.
>
> Trying to not drop these hostile packets as you did is simply opening
> your host to DOS attacks.
>
> Eventually, we should even drop earlier in TCP stack (before taking
> socket lock).
>
>
How about expand the buffer like:

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d204f3..da1bc16 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
  extern int sysctl_tcp_min_tso_segs;
  extern int sysctl_tcp_autocorking;
  extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_backlog_buf_multi;

  extern atomic_long_t tcp_memory_allocated;
  extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index f0e8297..9511410 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -631,6 +631,13 @@ static struct ctl_table ipv4_table[] = {
  		.mode		= 0644,
  		.proc_handler	= proc_dointvec
  	},
+	{
+		.procname	= "tcp_backlog_buf_multi",
+		.data		= &sysctl_tcp_backlog_buf_multi,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
  #ifdef CONFIG_NETLABEL
  	{
  		.procname	= "cipso_cache_enable",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87463c8..337ad55 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
  int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
  int sysctl_tcp_early_retrans __read_mostly = 3;
  int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
+int sysctl_tcp_backlog_buf_multi __read_mostly = 1;
+EXPORT_SYMBOL(sysctl_tcp_backlog_buf_multi);

  #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
  #define FLAG_WIN_UPDATE		0x02 /* Incoming ACK was a window update.	*/
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13b92d5..39272f3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1635,7 +1635,8 @@ process:
  		if (!tcp_prequeue(sk, skb))
  			ret = tcp_v4_do_rcv(sk, skb);
  	} else if (unlikely(sk_add_backlog(sk, skb,
-					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
+					   (sk->sk_rcvbuf + sk->sk_sndbuf) *
+					   sysctl_tcp_backlog_buf_multi))) {
  		bh_unlock_sock(sk);
  		NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
  		goto discard_and_relse;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c1147ac..1e8f709 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1433,7 +1433,8 @@ process:
  		if (!tcp_prequeue(sk, skb))
  			ret = tcp_v6_do_rcv(sk, skb);
  	} else if (unlikely(sk_add_backlog(sk, skb,
-					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
+					   (sk->sk_rcvbuf + sk->sk_sndbuf) *
+					   sysctl_tcp_backlog_buf_multi))) {
  		bh_unlock_sock(sk);
  		NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
  		goto discard_and_relse;
-- 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-03-30 12:56 ` Sergei Shtylyov
@ 2016-04-07  6:01   ` Yang Yingliang
  0 siblings, 0 replies; 21+ messages in thread
From: Yang Yingliang @ 2016-04-07  6:01 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: davem, eric.dumazet



On 2016/3/30 20:56, Sergei Shtylyov wrote:
> Hello.
>
> On 3/30/2016 8:16 AM, Yang Yingliang wrote:
>
>> When task A hold the sk owned in tcp_sendmsg, if lots of packets
>> arrive and the packets will be added to backlog queue. The packets
>> will be handled in release_sock called from tcp_sendmsg. When the
>> sk_backlog is removed from sk, the length will not decrease until
>> all the packets in backlog queue are handled. This may leads to the
>> new packets be dropped because the lenth is too big. So set the
>> lenth to 0 immediately after it's detached from sk.
>
>     Length?
>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> [...]
>
> MBR, Sergei
>
>
Yes. It's a typo.

Thanks
Yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-07  5:59         ` Yang Yingliang
@ 2016-04-07 10:21           ` Eric Dumazet
  2016-04-07 14:51             ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2016-04-07 10:21 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong

On Thu, 2016-04-07 at 13:59 +0800, Yang Yingliang wrote:
> 
> On 2016/3/30 21:47, Eric Dumazet wrote:
> > On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:
> >
> >> Sorry, I made a mistake. I am very sure my kernel has these two patches.
> >> And I can get some dropping of the packets in 10Gb eth.
> >>
> >> # netstat -s | grep -i backlog
> >>       TCPBacklogDrop: 4135
> >> # netstat -s | grep -i backlog
> >>       TCPBacklogDrop: 4167
> >
> > Sender will retransmit and the receiver backlog will lilely be emptied
> > before the packets arrive again.
> >
> > Are you sure these are TCP drops ?
> Yes.
> 
> >
> > Which 10Gb NIC is it ? (ethtool -i eth0)
> The NIC driver is not upstream. And my system is arm64.
> 
> >
> > What is the max size of sendmsg() chunks are generated by your apps ?
> 256KB
> 
> >
> > Are they forcing small SO_RCVBUF or SO_SNDBUF ?
> I am not sure.
> I add some debug message in kernel:
> [2016-04-06 10:56:55][ 1365.477140] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12402232 rmem_alloc:0 truesize:53320
> [2016-04-06 10:56:55][ 1365.477170] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12460884 rmem_alloc:55986 truesize:58652
> [2016-04-06 10:56:55][ 1365.477192] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12506206 rmem_alloc:0 truesize:45322
> [2016-04-06 10:56:55][ 1365.477226] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12519536 rmem_alloc:7998 truesize:13330
> [2016-04-06 10:56:55][ 1365.477254] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12575522 rmem_alloc:0 truesize:55986
> [2016-04-06 10:56:55][ 1365.477282] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
> [2016-04-06 10:56:55][ 1365.477301] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:26660 truesize:31992
> [2016-04-06 10:56:55][ 1365.477321] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:26660
> [2016-04-06 10:56:55][ 1365.477341] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:42656
> [2016-04-06 10:56:55][ 1365.477384] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
> [2016-04-06 10:56:55][ 1365.477403] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:34658
> 
> >
> > What percentage of drops do you have ?
> netstat -s | grep -i TCPBacklogDrop increases 20-40 per second.
> It's about 1.2% (117724(TCPBacklogDrop)/214502873(InSegs of cat 
> /proc/net/snmp)).
> 
> >
> > Here (at Google), we have less than one backlog drop per billion
> > packets, on host facing the public Internet.
> >
> > If a TCP sender sends a burst of tiny packets because it is misbehaving,
> > you absolutely will drop packets, especially if applications use
> > sendmsg() with very big lengths and big SO_SNDBUF.
> >
> > Trying to not drop these hostile packets as you did is simply opening
> > your host to DOS attacks.
> >
> > Eventually, we should even drop earlier in TCP stack (before taking
> > socket lock).
> >
> >
> How about expand the buffer like:

Please do not send patches before really understanding the issue you
have.

Having a backlog of 12506206 bytes is ridiculous. Dropping packets is
absolutely fine if this ever happens.

Something is really wrong on your host, or the sender simply does not
comply with TCP protocol (not caring of receiver window at all)

Since you added a trace of truesize, please also trace skb->len

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-07 10:21           ` Eric Dumazet
@ 2016-04-07 14:51             ` Eric Dumazet
  2016-04-08 11:18               ` Yang Yingliang
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2016-04-07 14:51 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong

On Thu, 2016-04-07 at 03:21 -0700, Eric Dumazet wrote:

> Please do not send patches before really understanding the issue you
> have.
> 
> Having a backlog of 12506206 bytes is ridiculous. Dropping packets is
> absolutely fine if this ever happens.
> 
> Something is really wrong on your host, or the sender simply does not
> comply with TCP protocol (not caring of receiver window at all)
> 
> Since you added a trace of truesize, please also trace skb->len
> 

BTW, have you played with /proc/sys/net/ipv4/tcp_adv_win_scale ?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-07 14:51             ` Eric Dumazet
@ 2016-04-08 11:18               ` Yang Yingliang
  2016-04-08 14:44                 ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Yingliang @ 2016-04-08 11:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong



On 2016/4/7 22:51, Eric Dumazet wrote:
> On Thu, 2016-04-07 at 03:21 -0700, Eric Dumazet wrote:
>
>> Please do not send patches before really understanding the issue you
>> have.
>>
>> Having a backlog of 12506206 bytes is ridiculous. Dropping packets is
>> absolutely fine if this ever happens.
>>
>> Something is really wrong on your host, or the sender simply does not
>> comply with TCP protocol (not caring of receiver window at all)
>>
>> Since you added a trace of truesize, please also trace skb->len
>>

[2016-04-08 18:33:39][ 9748.726948] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:0, truesize:31992, len:17540
[2016-04-08 18:33:39][ 9748.726964] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:29326, truesize:18662, 
len:10240
[2016-04-08 18:33:39][ 9748.726986] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:0, truesize:39990, len:21920
[2016-04-08 18:33:39][ 9748.727028] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:0, truesize:58652, len:32140
[2016-04-08 18:33:39][ 9748.727068] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:0, truesize:58652, len:32140
[2016-04-08 18:33:39][ 9748.727082] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:21328, truesize:5332, len:2940
[2016-04-08 18:33:39][ 9748.727310] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:0, truesize:53320, len:29220
[2016-04-08 18:33:39][ 9748.727326] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:26660, truesize:7998, len:4400
[2016-04-08 18:33:39][ 9748.727352] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:47988, truesize:58652, 
len:32140
[2016-04-08 18:33:39][ 9748.727389] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:0, truesize:39990, len:21920
[2016-04-08 18:33:39][ 9748.727409] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12607514 rmem_alloc:58652, truesize:18662, 
len:10240

If I expand buffer 5 times((sndbuf+rcvbuf)*5). There are only 5M data in 
backlog at most.

[2016-04-08 18:33:39][ 9748.777743] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5435954 rmem_alloc:0, truesize:55986, len:30680
[2016-04-08 18:33:39][ 9748.777762] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5457282 rmem_alloc:58652, truesize:21328, 
len:11700
[2016-04-08 18:33:39][ 9748.777804] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5515934 rmem_alloc:55986, truesize:58652, 
len:32140
[2016-04-08 18:33:39][ 9748.777818] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5537262 rmem_alloc:0, truesize:21328, len:11700
[2016-04-08 18:33:39][ 9748.777839] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5574586 rmem_alloc:0, truesize:37324, len:20460
[2016-04-08 18:33:39][ 9748.777854] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5601246 rmem_alloc:58652, truesize:26660, 
len:14620
[2016-04-08 18:33:39][ 9748.777881] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5659898 rmem_alloc:21328, truesize:58652, 
len:32140
[2016-04-08 18:33:39][ 9748.777894] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:5675894 rmem_alloc:37324, truesize:15996, len:8780
[2016-04-08 18:33:39][ 9748.778047] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:58652 rmem_alloc:0, truesize:58652, len:32140
[2016-04-08 18:33:39][ 9748.778075] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:117304 rmem_alloc:0, truesize:58652, len:32140
[2016-04-08 18:33:39][ 9748.778084] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:122636 rmem_alloc:0, truesize:5332, len:2940
[2016-04-08 18:33:39][ 9748.778109] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:175956 rmem_alloc:0, truesize:53320, len:29220
[2016-04-08 18:33:39][ 9748.778156] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:234608 rmem_alloc:0, truesize:58652, len:32140
[2016-04-08 18:33:39][ 9748.778178] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:282596 rmem_alloc:58652, truesize:47988, len:26300
>
> BTW, have you played with /proc/sys/net/ipv4/tcp_adv_win_scale ?
>

I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-08 11:18               ` Yang Yingliang
@ 2016-04-08 14:44                 ` Eric Dumazet
  2016-04-08 16:53                   ` David Miller
  2016-04-11 11:57                   ` Yang Yingliang
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2016-04-08 14:44 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong

On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:

> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.

Try :

echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale

And restart your flows.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-08 14:44                 ` Eric Dumazet
@ 2016-04-08 16:53                   ` David Miller
  2016-04-08 17:04                     ` Eric Dumazet
  2016-04-11 11:57                   ` Yang Yingliang
  1 sibling, 1 reply; 21+ messages in thread
From: David Miller @ 2016-04-08 16:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: yangyingliang, netdev, dingtianhong

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 08 Apr 2016 07:44:25 -0700

> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
> 
>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
> 
> Try :
> 
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> 
> And restart your flows.

I'm honestly beginning to suspect a bug in their driver and how they
handle skb->truesize.

Yang, until you show us the driver you are using and how is handles
receive packets, we are largely in the dark about a major component
of this issue and that is entirely unfair to us.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-08 16:53                   ` David Miller
@ 2016-04-08 17:04                     ` Eric Dumazet
  2016-04-11 14:42                       ` Yang Yingliang
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2016-04-08 17:04 UTC (permalink / raw)
  To: David Miller; +Cc: yangyingliang, netdev, dingtianhong

On Fri, 2016-04-08 at 12:53 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 08 Apr 2016 07:44:25 -0700
> 
> > On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
> > 
> >> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
> > 
> > Try :
> > 
> > echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> > 
> > And restart your flows.
> 
> I'm honestly beginning to suspect a bug in their driver and how they
> handle skb->truesize.
> 
> Yang, until you show us the driver you are using and how is handles
> receive packets, we are largely in the dark about a major component
> of this issue and that is entirely unfair to us.

Apparently their skb->truesize and skb->len combinations are correct.

I suspect an issue with rcvbuf autouning on a bidirectional tcp traffic.
We mostly focus on unidirectional flows, but they seem to use a mixed
case.

Also, fact that sendmsg() locks the socket for the duration of the call
is problematic : I suspect their issues would mostly disappear by using
smaller chunk sizes (ie 64KB per sendmsg() instead of 256KB).

We also could add resched points in sendmsg() (processing backlog if it
gets too hot), but I fear this would slow down the fast path.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-08 14:44                 ` Eric Dumazet
  2016-04-08 16:53                   ` David Miller
@ 2016-04-11 11:57                   ` Yang Yingliang
  2016-04-11 12:13                     ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Yang Yingliang @ 2016-04-11 11:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong



On 2016/4/8 22:44, Eric Dumazet wrote:
> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>
>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>
> Try :
>
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>
> And restart your flows.
>
cat /proc/sys/net/ipv4/tcp_rmem
10240 2097152 10485760

echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale

It seems has not effect.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-11 11:57                   ` Yang Yingliang
@ 2016-04-11 12:13                     ` Eric Dumazet
  2016-04-12  2:59                       ` Yang Yingliang
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2016-04-11 12:13 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong

On Mon, 2016-04-11 at 19:57 +0800, Yang Yingliang wrote:
> 
> On 2016/4/8 22:44, Eric Dumazet wrote:
> > On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
> >
> >> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
> >
> > Try :
> >
> > echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> >
> > And restart your flows.
> >
> cat /proc/sys/net/ipv4/tcp_rmem
> 10240 2097152 10485760

What about leaving the default values ?

$ cat /proc/sys/net/ipv4/tcp_rmem
4096	87380	6291456

> 
> echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> 
> It seems has not effect.
> 

I have no idea what you did on the sender side to allow it to send more
than 1.5 MB then.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-08 17:04                     ` Eric Dumazet
@ 2016-04-11 14:42                       ` Yang Yingliang
  0 siblings, 0 replies; 21+ messages in thread
From: Yang Yingliang @ 2016-04-11 14:42 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, dingtianhong



On 2016/4/9 1:04, Eric Dumazet wrote:
> On Fri, 2016-04-08 at 12:53 -0400, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Fri, 08 Apr 2016 07:44:25 -0700
>>
>>> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>>>
>>>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>>>
>>> Try :
>>>
>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>
>>> And restart your flows.
>>
>> I'm honestly beginning to suspect a bug in their driver and how they
>> handle skb->truesize.
>>
>> Yang, until you show us the driver you are using and how is handles
>> receive packets, we are largely in the dark about a major component
>> of this issue and that is entirely unfair to us.
>
> Apparently their skb->truesize and skb->len combinations are correct.
>
> I suspect an issue with rcvbuf autouning on a bidirectional tcp traffic.
> We mostly focus on unidirectional flows, but they seem to use a mixed
> case.
>
> Also, fact that sendmsg() locks the socket for the duration of the call
> is problematic : I suspect their issues would mostly disappear by using
> smaller chunk sizes (ie 64KB per sendmsg() instead of 256KB).
It's less packets dropping with using 64KB chunk.

>
> We also could add resched points in sendmsg() (processing backlog if it
> gets too hot), but I fear this would slow down the fast path.
>
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-11 12:13                     ` Eric Dumazet
@ 2016-04-12  2:59                       ` Yang Yingliang
  2016-04-12 12:31                         ` Yang Yingliang
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Yingliang @ 2016-04-12  2:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong



On 2016/4/11 20:13, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 19:57 +0800, Yang Yingliang wrote:
>>
>> On 2016/4/8 22:44, Eric Dumazet wrote:
>>> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>>>
>>>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>>>
>>> Try :
>>>
>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>
>>> And restart your flows.
>>>
>> cat /proc/sys/net/ipv4/tcp_rmem
>> 10240 2097152 10485760
>
> What about leaving the default values ?
I tried, it did not work.

>
> $ cat /proc/sys/net/ipv4/tcp_rmem
> 4096	87380	6291456
>
>>
>> echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>
>> It seems has not effect.
>>
>
> I have no idea what you did on the sender side to allow it to send more
> than 1.5 MB then.

We are doing performance test. The sender send 256KB per-block with 128
threads to one socket. And the receiver uses 10Gb NIC to handle the
data on ARM64. The data flow is driver->ip layer->tcp layer->iscsi.

I added some debug messages and found handling backlog packets in 
__release_sock() cost about 11ms at most. This can cause backlog queue
overflow. The sk_data_ready is re-assigned, it may cost time in our
program. I will check it out.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-12  2:59                       ` Yang Yingliang
@ 2016-04-12 12:31                         ` Yang Yingliang
  2016-04-13  2:42                           ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Yingliang @ 2016-04-12 12:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong



On 2016/4/12 10:59, Yang Yingliang wrote:
>
>
> On 2016/4/11 20:13, Eric Dumazet wrote:
>> On Mon, 2016-04-11 at 19:57 +0800, Yang Yingliang wrote:
>>>
>>> On 2016/4/8 22:44, Eric Dumazet wrote:
>>>> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>>>>
>>>>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>>>>
>>>> Try :
>>>>
>>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>>
>>>> And restart your flows.
>>>>
>>> cat /proc/sys/net/ipv4/tcp_rmem
>>> 10240 2097152 10485760
>>
>> What about leaving the default values ?
> I tried, it did not work.
>
>>
>> $ cat /proc/sys/net/ipv4/tcp_rmem
>> 4096    87380    6291456
>>
>>>
>>> echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>
>>> It seems has not effect.
>>>
>>
>> I have no idea what you did on the sender side to allow it to send more
>> than 1.5 MB then.
>
> We are doing performance test. The sender send 256KB per-block with 128
> threads to one socket. And the receiver uses 10Gb NIC to handle the
> data on ARM64. The data flow is driver->ip layer->tcp layer->iscsi.
>
> I added some debug messages and found handling backlog packets in
> __release_sock() cost about 11ms at most. This can cause backlog queue
> overflow. The sk_data_ready is re-assigned, it may cost time in our
> program. I will check it out.
>
I traced the cost cycles of handling backlog packets in
__release_sock().
16.97 ms to handling about 12MB backlog packets, of which 13.66ms to do
sk_data_ready.
The speed of handling packets in TCP is 5.65Gb/s which is smaller than
the NIC's bandwidth. So the packets will be dropped.

If the cost of sk_data_read cannot be reduced, do we have other choice
exclude dropping packets ?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
  2016-04-12 12:31                         ` Yang Yingliang
@ 2016-04-13  2:42                           ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2016-04-13  2:42 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong

On Tue, 2016-04-12 at 20:31 +0800, Yang Yingliang wrote:

> I traced the cost cycles of handling backlog packets in
> __release_sock().
> 16.97 ms to handling about 12MB backlog packets, of which 13.66ms to do
> sk_data_ready.
> The speed of handling packets in TCP is 5.65Gb/s which is smaller than
> the NIC's bandwidth. So the packets will be dropped.
> 
> If the cost of sk_data_read cannot be reduced, do we have other choice
> exclude dropping packets ?

Normally, TCP stack sends ACK packets with appropriate RWIN.

Sender should not send more packets than allowed in RWIN, even if there
are 128 threads using one TCP socket, it does not matter.

Imagine you do not have a backlog problem (nothing does the sendmsg()
while you receive data), and nothing reads the socket. Then the receiver
should eventually send WIN 0 back to the sender and sender should stop,
before any drop can possibly happen.

I have no problem receiving one TCP flow at 34Gbit, so it must be
something related to the huge windows you seem to use.

One possibility could be to tweak in ACK packets a reduced rwin so that
the sender is not allowed to continue the flood while we are painfully
processing a huge backlog.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-04-13  2:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-30  5:16 [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk Yang Yingliang
2016-03-30  5:25 ` Eric Dumazet
2016-03-30  5:34   ` Eric Dumazet
2016-03-30  5:56     ` Yang Yingliang
2016-03-30 13:47       ` Eric Dumazet
2016-04-07  5:59         ` Yang Yingliang
2016-04-07 10:21           ` Eric Dumazet
2016-04-07 14:51             ` Eric Dumazet
2016-04-08 11:18               ` Yang Yingliang
2016-04-08 14:44                 ` Eric Dumazet
2016-04-08 16:53                   ` David Miller
2016-04-08 17:04                     ` Eric Dumazet
2016-04-11 14:42                       ` Yang Yingliang
2016-04-11 11:57                   ` Yang Yingliang
2016-04-11 12:13                     ` Eric Dumazet
2016-04-12  2:59                       ` Yang Yingliang
2016-04-12 12:31                         ` Yang Yingliang
2016-04-13  2:42                           ` Eric Dumazet
2016-03-30  5:38   ` Yang Yingliang
2016-03-30 12:56 ` Sergei Shtylyov
2016-04-07  6:01   ` Yang Yingliang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.