linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2)
@ 2018-10-09 12:05 Yafang Shao
  2018-10-09 12:05 ` [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path Yafang Shao
  2018-10-09 14:00 ` [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2) Eric Dumazet
  0 siblings, 2 replies; 9+ messages in thread
From: Yafang Shao @ 2018-10-09 12:05 UTC (permalink / raw)
  To: edumazet, davem; +Cc: netdev, linux-kernel, Yafang Shao

The sk_max_ack_backlog will be set in the caller inet_listen() and
dccp_listen_start(), so it is redundant to set it in
inet_csk_listen_start().
Just remove this setting.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index dfd5009..cdd5c95 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -871,7 +871,6 @@ int inet_csk_listen_start(struct sock *sk, int backlog)
 
 	reqsk_queue_alloc(&icsk->icsk_accept_queue);
 
-	sk->sk_max_ack_backlog = backlog;
 	sk->sk_ack_backlog = 0;
 	inet_csk_delack_init(sk);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
  2018-10-09 12:05 [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2) Yafang Shao
@ 2018-10-09 12:05 ` Yafang Shao
  2018-10-09 14:12   ` Eric Dumazet
  2018-10-09 14:00 ` [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2) Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2018-10-09 12:05 UTC (permalink / raw)
  To: edumazet, davem; +Cc: netdev, linux-kernel, Yafang Shao

By default, the sk->sk_allocation is GFP_KERNEL, that means if there's
no enough memory it will do both direct reclaim and background reclaim.
If the size of system memory is great, the direct reclaim may cause great
latency spike.

When we set MSG_DONTWAIT in send syscalls, we really don't want it to be
blocked, so we'd better clear __GFP_DIRECT_RECLAIM when allocate skb in the
send path. Then, it will return immediately if there's no enough memory to
be allocated, and then the appliation has a chance to do some other stuffs
instead of being blocked here.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 net/ipv4/tcp.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 43ef83b..fe4f5ce 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1182,6 +1182,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	bool process_backlog = false;
 	bool zc = false;
 	long timeo;
+	gfp_t gfp;
 
 	flags = msg->msg_flags;
 
@@ -1255,6 +1256,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	/* Ok commence sending. */
 	copied = 0;
 
+	gfp = flags & MSG_DONTWAIT ? sk->sk_allocation & ~__GFP_DIRECT_RECLAIM :
+	      sk->sk_allocation;
+
 restart:
 	mss_now = tcp_send_mss(sk, &size_goal, flags);
 
@@ -1283,8 +1287,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			}
 			first_skb = tcp_rtx_and_write_queues_empty(sk);
 			linear = select_size(first_skb, zc);
-			skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
-						  first_skb);
+			skb = sk_stream_alloc_skb(sk, linear, gfp, first_skb);
 			if (!skb)
 				goto wait_for_memory;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2)
  2018-10-09 12:05 [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2) Yafang Shao
  2018-10-09 12:05 ` [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path Yafang Shao
@ 2018-10-09 14:00 ` Eric Dumazet
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2018-10-09 14:00 UTC (permalink / raw)
  To: Yafang Shao; +Cc: David Miller, netdev, LKML

On Tue, Oct 9, 2018 at 5:05 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> The sk_max_ack_backlog will be set in the caller inet_listen() and
> dccp_listen_start(), so it is redundant to set it in
> inet_csk_listen_start().
> Just remove this setting.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  net/ipv4/inet_connection_sock.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index dfd5009..cdd5c95 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -871,7 +871,6 @@ int inet_csk_listen_start(struct sock *sk, int backlog)
>
>         reqsk_queue_alloc(&icsk->icsk_accept_queue);
>
> -       sk->sk_max_ack_backlog = backlog;
>         sk->sk_ack_backlog = 0;
>         inet_csk_delack_init(sk);


You got it wrong again. Can you read my feedbacks one more time ?

This setting is not redundant, unless you move the ones in
inet_listen() and inet_dccp_listen() earlier.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
  2018-10-09 12:05 ` [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path Yafang Shao
@ 2018-10-09 14:12   ` Eric Dumazet
  2018-10-09 14:52     ` Yafang Shao
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2018-10-09 14:12 UTC (permalink / raw)
  To: Yafang Shao; +Cc: David Miller, netdev, LKML

On Tue, Oct 9, 2018 at 5:05 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> By default, the sk->sk_allocation is GFP_KERNEL, that means if there's
> no enough memory it will do both direct reclaim and background reclaim.
> If the size of system memory is great, the direct reclaim may cause great
> latency spike.
>
> When we set MSG_DONTWAIT in send syscalls, we really don't want it to be
> blocked, so we'd better clear __GFP_DIRECT_RECLAIM when allocate skb in the
> send path. Then, it will return immediately if there's no enough memory to
> be allocated, and then the appliation has a chance to do some other stuffs
> instead of being blocked here.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  net/ipv4/tcp.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 43ef83b..fe4f5ce 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1182,6 +1182,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>         bool process_backlog = false;
>         bool zc = false;
>         long timeo;
> +       gfp_t gfp;
>
>         flags = msg->msg_flags;
>
> @@ -1255,6 +1256,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>         /* Ok commence sending. */
>         copied = 0;
>
> +       gfp = flags & MSG_DONTWAIT ? sk->sk_allocation & ~__GFP_DIRECT_RECLAIM :
> +             sk->sk_allocation;
> +
>  restart:
>         mss_now = tcp_send_mss(sk, &size_goal, flags);
>
> @@ -1283,8 +1287,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>                         }
>                         first_skb = tcp_rtx_and_write_queues_empty(sk);
>                         linear = select_size(first_skb, zc);
> -                       skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
> -                                                 first_skb);
> +                       skb = sk_stream_alloc_skb(sk, linear, gfp, first_skb);
>                         if (!skb)
>                                 goto wait_for_memory;


How have you tested this patch exactly ?

Most of TCP payloads are added in page fragments, and you have not
changed the page allocation fragments.

Also, I do not see how an application will get future notifications
that it can retry the failed system call ?
How are you really going to deal with this in high performance applications ?

I would rather prefer a socket setsockopt() to eventually be able to
flip __GFP_DIRECT_RECLAIM in sk->sk_allocation,
to not add all these tests in fast path, but honestly I do not see how
applications can really make use of this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
  2018-10-09 14:12   ` Eric Dumazet
@ 2018-10-09 14:52     ` Yafang Shao
  2018-10-09 14:58       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2018-10-09 14:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, LKML

On Tue, Oct 9, 2018 at 10:12 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Oct 9, 2018 at 5:05 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > By default, the sk->sk_allocation is GFP_KERNEL, that means if there's
> > no enough memory it will do both direct reclaim and background reclaim.
> > If the size of system memory is great, the direct reclaim may cause great
> > latency spike.
> >
> > When we set MSG_DONTWAIT in send syscalls, we really don't want it to be
> > blocked, so we'd better clear __GFP_DIRECT_RECLAIM when allocate skb in the
> > send path. Then, it will return immediately if there's no enough memory to
> > be allocated, and then the appliation has a chance to do some other stuffs
> > instead of being blocked here.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  net/ipv4/tcp.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 43ef83b..fe4f5ce 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1182,6 +1182,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >         bool process_backlog = false;
> >         bool zc = false;
> >         long timeo;
> > +       gfp_t gfp;
> >
> >         flags = msg->msg_flags;
> >
> > @@ -1255,6 +1256,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >         /* Ok commence sending. */
> >         copied = 0;
> >
> > +       gfp = flags & MSG_DONTWAIT ? sk->sk_allocation & ~__GFP_DIRECT_RECLAIM :
> > +             sk->sk_allocation;
> > +
> >  restart:
> >         mss_now = tcp_send_mss(sk, &size_goal, flags);
> >
> > @@ -1283,8 +1287,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >                         }
> >                         first_skb = tcp_rtx_and_write_queues_empty(sk);
> >                         linear = select_size(first_skb, zc);
> > -                       skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
> > -                                                 first_skb);
> > +                       skb = sk_stream_alloc_skb(sk, linear, gfp, first_skb);
> >                         if (!skb)
> >                                 goto wait_for_memory;
>
>
> How have you tested this patch exactly ?
>
There was a network latency (hunreds msecs or even one sec ) recently
on our production enviroment.
And finally I diagnosed that this latency was caused by direct reclaim
in tcp_sendmsg.
That issue could be resovled by keeping a reserved memory.
But I think deeply that why not forbid direct reclaim if we set MSG_DONWAIT.
So I did this change and tested it. The application got a errno
returned instead of being blocked in send path.
That's why I sumbit this patch.

> Most of TCP payloads are added in page fragments, and you have not
> changed the page allocation fragments.
>
> Also, I do not see how an application will get future notifications
> that it can retry the failed system call ?
> How are you really going to deal with this in high performance applications ?
>

I think that immdiately return with errno is better than being blocked.
Maybe this solution is not good enough.
At least it could tell the application that something is wrong and it
can't send now.

> I would rather prefer a socket setsockopt() to eventually be able to
> flip __GFP_DIRECT_RECLAIM in sk->sk_allocation,
> to not add all these tests in fast path, but honestly I do not see how
> applications can really make use of this.

Maybe an event is needed to tell the application it can send now.
I don't have better idea neither.

Thanks
Yafang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
  2018-10-09 14:52     ` Yafang Shao
@ 2018-10-09 14:58       ` Eric Dumazet
  2018-10-09 15:38         ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2018-10-09 14:58 UTC (permalink / raw)
  To: Yafang Shao; +Cc: David Miller, netdev, LKML

> >
> There was a network latency (hunreds msecs or even one sec ) recently
> on our production enviroment.
> And finally I diagnosed that this latency was caused by direct reclaim
> in tcp_sendmsg.
> That issue could be resovled by keeping a reserved memory.
> But I think deeply that why not forbid direct reclaim if we set MSG_DONWAIT.
> So I did this change and tested it. The application got a errno
> returned instead of being blocked in send path.
> That's why I sumbit this patch.

Sure, and I asked you how you have tested it, because it seems clear
to me that  you missed
the real memory allocation point (We fill up to 64 KB of page
fragments memory into one (small) skb)

And how is the application going to use MSG_DONTWAIT in the real
world, I do wonder as well.

We do not add bloat in the kernel if no application is ever going to
use it, especially in the TCP fast path.

Give us a test, so that we can see how this can be used...

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
  2018-10-09 14:58       ` Eric Dumazet
@ 2018-10-09 15:38         ` Eric Dumazet
  2018-10-10  1:30           ` Yafang Shao
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2018-10-09 15:38 UTC (permalink / raw)
  To: Yafang Shao; +Cc: David Miller, netdev, LKML

On Tue, Oct 9, 2018 at 7:58 AM Eric Dumazet <edumazet@google.com> wrote:
>

> We do not add bloat in the kernel if no application is ever going to
> use it, especially in the TCP fast path.
>

BTW, are you willing to change all memory allocations in the kernel as well ?

Let say an application is using a system call providing a pathname
(open(), stat(), ...), how this system call
is going to ask the kernel for no direct reclaim ?

Even allocating a socket with socket() or accept() has no ability to
avoid direct reclaim.

So tcp_sendmsg() is only the tip of the iceberg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
  2018-10-09 15:38         ` Eric Dumazet
@ 2018-10-10  1:30           ` Yafang Shao
  2018-10-10  1:44             ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2018-10-10  1:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, LKML

On Tue, Oct 9, 2018 at 11:38 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Oct 9, 2018 at 7:58 AM Eric Dumazet <edumazet@google.com> wrote:
> >
>
> > We do not add bloat in the kernel if no application is ever going to
> > use it, especially in the TCP fast path.
> >
>
> BTW, are you willing to change all memory allocations in the kernel as well ?
>
> Let say an application is using a system call providing a pathname
> (open(), stat(), ...), how this system call
> is going to ask the kernel for no direct reclaim ?
>
> Even allocating a socket with socket() or accept() has no ability to
> avoid direct reclaim.
>
> So tcp_sendmsg() is only the tip of the iceberg.

If we can really find a solution that is good enough to hanlde direct
reclaim in tcp_sendmsg,
we could also implement it in other syscalls.
Unexpected latency is hateful.

Thanks
Yafang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
  2018-10-10  1:30           ` Yafang Shao
@ 2018-10-10  1:44             ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2018-10-10  1:44 UTC (permalink / raw)
  To: Yafang Shao, Eric Dumazet; +Cc: David Miller, netdev, LKML



On 10/09/2018 06:30 PM, Yafang Shao wrote:
> On Tue, Oct 9, 2018 at 11:38 PM Eric Dumazet <edumazet@google.com> wrote:
>>
>> On Tue, Oct 9, 2018 at 7:58 AM Eric Dumazet <edumazet@google.com> wrote:
>>>
>>
>>> We do not add bloat in the kernel if no application is ever going to
>>> use it, especially in the TCP fast path.
>>>
>>
>> BTW, are you willing to change all memory allocations in the kernel as well ?
>>
>> Let say an application is using a system call providing a pathname
>> (open(), stat(), ...), how this system call
>> is going to ask the kernel for no direct reclaim ?
>>
>> Even allocating a socket with socket() or accept() has no ability to
>> avoid direct reclaim.
>>
>> So tcp_sendmsg() is only the tip of the iceberg.
> 
> If we can really find a solution that is good enough to hanlde direct
> reclaim in tcp_sendmsg,
> we could also implement it in other syscalls.
> Unexpected latency is hateful.

We have thousands of other places in the kernel, I want to find a generic solution,
not patch all the places one by one.

So come back when you have something more generic, and once applications have a way
to handle gracefully (without calling sendmsg() in infinite loop ...)
to these memory allocation issues.

How is EPOLLOUT going to be generated ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-10-10  1:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-09 12:05 [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2) Yafang Shao
2018-10-09 12:05 ` [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path Yafang Shao
2018-10-09 14:12   ` Eric Dumazet
2018-10-09 14:52     ` Yafang Shao
2018-10-09 14:58       ` Eric Dumazet
2018-10-09 15:38         ` Eric Dumazet
2018-10-10  1:30           ` Yafang Shao
2018-10-10  1:44             ` Eric Dumazet
2018-10-09 14:00 ` [PATCH v3 net-next] inet: remove redundant backlog setting in listen(2) Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).