netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TCP_DEFER_ACCEPT wakes up without data
@ 2014-06-05  0:14 Wayne Badger
  2014-06-07 15:41 ` Julian Anastasov
  0 siblings, 1 reply; 12+ messages in thread
From: Wayne Badger @ 2014-06-05  0:14 UTC (permalink / raw)
  To: netdev

I would like to revisit the current state of TCP_DEFER_ACCEPT.  I know
that it has been discussed in the past and the discussion about 4.5
years ago led to commit d1b99ba41d6c5aa1ed2fc634323449dd656899e9 (tcp:
accept socket after TCP_DEFER_ACCEPT period) being introduced which
changed the semantics of TCP_DEFER_ACCEPT.

Prior to commit d1b99ba41d, the open request was never queued to the
listener for accepting if no data was ever received.  After the commit,
the open request is queued to the listener upon receipt of an ACK from
the retransmitted SYN-ACK, which is sent TCP_DEFER_ACCEPT seconds
(which value is provided in the TCP_DEFER_ACCEPT setsockopt) after the
3-way handshake occurs (with appropriate rounding up to account for the
geometric rexmit algorithm).

The tcp(7) man page says that the listener is awakened _only_ when data
arrives on the socket.  We prefer this behavior because it reduces
memory (minisock vs. sock) and scheduling/context switch overhead.  When
no data is ever sent by the client, letting the kernel dispose of the
open request is more efficient than letting the listener dispose of it.
Is there a way to get this behavior with a recent kernel that I have not
discovered?

This discussion is not just academic.  We constantly receive dataless
connection requests from clients.  On some machines, we have seen these
types of connections comprise up to 20% of our connections.  With the
current TCP_DEFER_ACCEPT behavior, apache creates many threads that end
up just waiting for data, timing out, and then closing the connection.
We can increase the TCP_DEFER_ACCEPT value, but that means that we are
just consuming kernel resources for longer than we want and end up
having apache close the connection anyway.

A few questions.

If the new behavior was desired at the time of the commit, why was the
existing TCP_DEFER_ACCEPT behavior changed instead of just adding a new
sockopt that implements the new behavior?  Why was the tcp(7) man page
not modified to describe the new behavior?

I propose the following patch to regain the tcp(7) semantics.  The patch
resolves dataless connections in one of two ways.  After the initial
timeout, a SYN-ACK is retransmitted and if a bare ACK is received, then
a RST is sent to the client and the connection is immediately dropped.
If nothing is received from the client, then the connection is silently
dropped after another rexmit timeout (the next one in the geometric
sequence).

I understand that the patch is incomplete.  For example, it doesn't
handle syncookies but I wanted to initiate the discussion.

Thanks,
Wayne

--- tcp_minisocks.c     2014-05-01 15:53:47.000000000 -0500
+++ tcp_minisocks.c.new 2014-05-16 16:08:59.000000000 -0500
@@ -672,12 +672,31 @@ struct sock *tcp_check_req(struct sock *
        if (fastopen)
                return sk;
 
-       /* While TCP_DEFER_ACCEPT is active, drop bare ACK. */
-       if (req->num_timeout < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
-           TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
-               inet_rsk(req)->acked = 1;
-               NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPDEFERACCEPTDROP);
-               return NULL;
+       if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept) {
+               /* While TCP_DEFER_ACCEPT is active, drop bare ACK.
+                * Send a RST and drop the connection if the timer has expired.
+                */
+               if (TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+                       if (req->retrans < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept) {
+                               inet_rsk(req)->acked = 1;
+                               NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPDEFERACCEPTDROP);
+                       } else {
+                               req->rsk_ops->send_reset(sk, skb);
+                               inet_csk_reqsk_queue_drop(sk, req, prev);
+                       }
+                       return NULL;
+               }
+
+               /* Make a finished and dataless TCP_DEFER_ACCEPT connection
+                * request "disappear" without instantiating the socket or
+                * bothering user space.
+                */
+               if ((TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 2) &&
+                   (tcp_flag_word(th) & TCP_FLAG_FIN)) {
+                       req->rsk_ops->send_reset(sk, skb);
+                       inet_csk_reqsk_queue_drop(sk, req, prev);
+                       return NULL;
+               }
        }
 
        /* OK, ACK is valid, create big socket and

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2014-06-05  0:14 TCP_DEFER_ACCEPT wakes up without data Wayne Badger
@ 2014-06-07 15:41 ` Julian Anastasov
  2014-06-11  0:57   ` Wayne Badger
  0 siblings, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2014-06-07 15:41 UTC (permalink / raw)
  To: Wayne Badger; +Cc: netdev


	Hello,

On Wed, 4 Jun 2014, Wayne Badger wrote:

> I would like to revisit the current state of TCP_DEFER_ACCEPT.  I know
> that it has been discussed in the past and the discussion about 4.5
> years ago led to commit d1b99ba41d6c5aa1ed2fc634323449dd656899e9 (tcp:
> accept socket after TCP_DEFER_ACCEPT period) being introduced which
> changed the semantics of TCP_DEFER_ACCEPT.

	This discussion (http://marc.info/?t=125541062900001&r=1&w=2)
has some hints about using TCP_SYNCNT.

> A few questions.
> 
> If the new behavior was desired at the time of the commit, why was the
> existing TCP_DEFER_ACCEPT behavior changed instead of just adding a new
> sockopt that implements the new behavior?  Why was the tcp(7) man page

	May be because with help from TCP_SYNCNT we
have more control.

> not modified to describe the new behavior?

> I propose the following patch to regain the tcp(7) semantics.  The patch
> resolves dataless connections in one of two ways.  After the initial
> timeout, a SYN-ACK is retransmitted and if a bare ACK is received, then
> a RST is sent to the client and the connection is immediately dropped.
> If nothing is received from the client, then the connection is silently
> dropped after another rexmit timeout (the next one in the geometric
> sequence).

	After receiving the first bare ACK we do not
retransmit SYN+ACKs, we send them after the period to
trigger a new ACK[+DATA] that will lead to wakeup. You
prefer if the next packet from client comes without DATA
to send RST instead of more SYN+ACKs?

	The best place would be to send this reset in 
inet_csk_reqsk_queue_prune() after the /* Drop this request */
comment if inet_rsk(req)->acked is set because we are not
sure if our SYN+ACKs after the period will lead to new packets
from client. But we have no skb to reply, not sure if
the open request contains data to build a reset.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2014-06-07 15:41 ` Julian Anastasov
@ 2014-06-11  0:57   ` Wayne Badger
  2014-06-12  6:00     ` Julian Anastasov
       [not found]     ` <58a4abb51fe9411fbc7b1a58a2a6f5da@UCL-MBX03.OASIS.UCLOUVAIN.BE>
  0 siblings, 2 replies; 12+ messages in thread
From: Wayne Badger @ 2014-06-11  0:57 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev

On 6/7/14, 10:41 AM, Julian Anastasov wrote:
>     Hello,
>
> On Wed, 4 Jun 2014, Wayne Badger wrote:
>
>> I would like to revisit the current state of TCP_DEFER_ACCEPT.  I know
>> that it has been discussed in the past and the discussion about 4.5
>> years ago led to commit d1b99ba41d6c5aa1ed2fc634323449dd656899e9 (tcp:
>> accept socket after TCP_DEFER_ACCEPT period) being introduced which
>> changed the semantics of TCP_DEFER_ACCEPT.
>
>     This discussion (http://marc.info/?t=125541062900001&r=1&w=2)
> has some hints about using TCP_SYNCNT.

Thanks for the pointer.  I have read through this discussion, but I
can't see how it helps with the current implementation.  TCP_SYNCNT
(or sysctl.tcp_synack_retries if TCP_SYNCNT is unused) allows you to
set the number of retries, but inet_csk_reqsk_queue_prune essentially
ignores the TCP_SYNCNT value if TCP_DEFER_ACCEPT is in use.

        if (queue->rskq_defer_accept)
            max_retries = queue->rskq_defer_accept;

>> A few questions.
>>
>> If the new behavior was desired at the time of the commit, why was the
>> existing TCP_DEFER_ACCEPT behavior changed instead of just adding a new
>> sockopt that implements the new behavior?  Why was the tcp(7) man page
>
>     May be because with help from TCP_SYNCNT we
> have more control.

I have so far been unable to obtain the behavior documented for
TCP_DEFER_ACCEPT, even with various settings of TCP_SYNCNT.  No setting
of TCP_SYNCNT can make it be used in the calculations in favor of the
TCP_DEFER_ACCEPT value.  No matter which value I choose for TCP_SYNCNT,
the connection is always promoted to a full socket and moved to the
accept queue.

Would you verify whether a server ever accepts the socket if data is
not sent?  I've been using v3.14.0 with default sysctl settings and a
TCP_DEFER_ACCEPT value of 30 and don't see that behavior.

>> not modified to describe the new behavior?
>
>> I propose the following patch to regain the tcp(7) semantics.  The patch
>> resolves dataless connections in one of two ways.  After the initial
>> timeout, a SYN-ACK is retransmitted and if a bare ACK is received, then
>> a RST is sent to the client and the connection is immediately dropped.
>> If nothing is received from the client, then the connection is silently
>> dropped after another rexmit timeout (the next one in the geometric
>> sequence).
>
>     After receiving the first bare ACK we do not
> retransmit SYN+ACKs, we send them after the period to
> trigger a new ACK[+DATA] that will lead to wakeup. You
> prefer if the next packet from client comes without DATA
> to send RST instead of more SYN+ACKs?

The SYN-ACK is sent after the timeout period and I do see the
kernel waiting for the appropriate timeout period based on the
TCP_DEFER_ACCEPT value.  This action, by itself, does not promote the
minisock to a full socket and wake up user space.  The subsequent
receipt of the duplicate bare ACK is what causes the kernel to promote
the minisock to a full socket, insert the full socket on the accept
queue and wake up user space.

The behavior that we want is for the receipt of the duplicate bare
ACK to not result in waking up user space.  The socket still hasn't
received any data, so there's no point in the process accepting the
socket since there's nothing the process can do.

As I mentioned above, I can't get the kernel to send more SYN-ACKs
because the minisock is promoted and the user process wakes up to
handle the accept.

Eventually, these connections will need to be removed.  Since we have
a connection that is in an ESTABLISHED connection (we got a SYN,
responded with SYN-ACK, and received a valid ACK), we shouldn't just
discard the minisock.  We can send a RST from just a minisock (note
that tcp_check_req already sends RSTs) and we choose to do that so at
least let the client knows that we're done.  Promoting the minisock to a
full socket just to send a FIN is just extra work.

I would prefer that we send a RST upon receipt of a bare ACK for a
socket that has completed the 3way handshake, waited the
TCP_DEFER_ACCEPT timeout and has never received any
data.  If it has timed out, then the server should be done with the
connection request.

>     The best place would be to send this reset in
> inet_csk_reqsk_queue_prune() after the /* Drop this request */
> comment if inet_rsk(req)->acked is set because we are not
> sure if our SYN+ACKs after the period will lead to new packets
> from client. But we have no skb to reply, not sure if
> the open request contains data to build a reset.

We could drop the connection in inet_csk_reqsk_queue_prune, but we have
to ensure that the receipt of the bare ACK response from the duplicate
SYN-ACK doesn't promote the socket as it is doing now.  We'll also have
to do something for syncookies because a valid bare ACK received for
a socket that doesn't even have a minisock should behave similarly.
The code to handle this can't go in inet_csk_reqsk_queue_prune because
there is no socket of any type to prune so we'll at least need to have the
code in two places but we could easily commonize it in a function.

Was it the intent with commit d1b99ba41d to change the semantics
of TCP_DEFER_ACCEPT such that sockets were always promoted to full
sockets and moved to the accept queue?  I want to make sure that we
are all on the same page with what the semantics mean because if we
have a disagreement there, then nothing else matters.  I'm just trying
to get a socket to stay in the kernel and not wake up the listener before
there is any data.

> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

Thanks,
Wayne

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2014-06-11  0:57   ` Wayne Badger
@ 2014-06-12  6:00     ` Julian Anastasov
       [not found]     ` <58a4abb51fe9411fbc7b1a58a2a6f5da@UCL-MBX03.OASIS.UCLOUVAIN.BE>
  1 sibling, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2014-06-12  6:00 UTC (permalink / raw)
  To: Wayne Badger; +Cc: netdev


	Hello,

On Tue, 10 Jun 2014, Wayne Badger wrote:

> On 6/7/14, 10:41 AM, Julian Anastasov wrote:
> >
> >     This discussion (http://marc.info/?t=125541062900001&r=1&w=2)
> > has some hints about using TCP_SYNCNT.
> 
> Thanks for the pointer.  I have read through this discussion, but I
> can't see how it helps with the current implementation.  TCP_SYNCNT
> (or sysctl.tcp_synack_retries if TCP_SYNCNT is unused) allows you to
> set the number of retries, but inet_csk_reqsk_queue_prune essentially
> ignores the TCP_SYNCNT value if TCP_DEFER_ACCEPT is in use.
> 
>         if (queue->rskq_defer_accept)
>             max_retries = queue->rskq_defer_accept;

	You are right, I missed that. So, we send just
one SYN+ACK when period is about to expire.

> I have so far been unable to obtain the behavior documented for
> TCP_DEFER_ACCEPT, even with various settings of TCP_SYNCNT.  No setting
> of TCP_SYNCNT can make it be used in the calculations in favor of the
> TCP_DEFER_ACCEPT value.  No matter which value I choose for TCP_SYNCNT,
> the connection is always promoted to a full socket and moved to the
> accept queue.
> 
> Would you verify whether a server ever accepts the socket if data is
> not sent?  I've been using v3.14.0 with default sysctl settings and a
> TCP_DEFER_ACCEPT value of 30 and don't see that behavior.

	syn_ack_recalc() schedules SYN+ACK, so under
normal conditions, it triggers ACK from client and
child is created, even without DATA. Request will
expire only when last SYN+ACK or the following ACK
is lost (or not sent).

> The behavior that we want is for the receipt of the duplicate bare
> ACK to not result in waking up user space.  The socket still hasn't
> received any data, so there's no point in the process accepting the
> socket since there's nothing the process can do.

	One problem with this behavior is that after first ACK
more ACKs are not expected. Your RST logic still relies on the
last SYN+ACK we sent to trigger additional ACK. I guess,
we can live with this because for firewalls it is not worse
than current behavior. We replace accept() with RST.

> I would prefer that we send a RST upon receipt of a bare ACK for a
> socket that has completed the 3way handshake, waited the
> TCP_DEFER_ACCEPT timeout and has never received any
> data.  If it has timed out, then the server should be done with the
> connection request.

	I'm ok with this idea.

> >     The best place would be to send this reset in
> > inet_csk_reqsk_queue_prune() after the /* Drop this request */
> > comment if inet_rsk(req)->acked is set because we are not
> > sure if our SYN+ACKs after the period will lead to new packets
> > from client. But we have no skb to reply, not sure if
> > the open request contains data to build a reset.
> 
> We could drop the connection in inet_csk_reqsk_queue_prune, but we have
> to ensure that the receipt of the bare ACK response from the duplicate
> SYN-ACK doesn't promote the socket as it is doing now.  We'll also have

	Agreed. I'm just not sure for the implementation
needed for inet_csk_reqsk_queue_prune. TCP experts
can help here.

> to do something for syncookies because a valid bare ACK received for
> a socket that doesn't even have a minisock should behave similarly.
> The code to handle this can't go in inet_csk_reqsk_queue_prune because
> there is no socket of any type to prune so we'll at least need to have the
> code in two places but we could easily commonize it in a function.

	I don't know the syn-cookie code. Even now
inet_csk_reqsk_queue_prune() expires acked requests,
isn't inet_csk_reqsk_queue_drop sufficient?

> Was it the intent with commit d1b99ba41d to change the semantics
> of TCP_DEFER_ACCEPT such that sockets were always promoted to full
> sockets and moved to the accept queue?  I want to make sure that we
> are all on the same page with what the semantics mean because if we
> have a disagreement there, then nothing else matters.  I'm just trying
> to get a socket to stay in the kernel and not wake up the listener before
> there is any data.

	IIRC, the main idea is to remove connection from
firewalls, the idea with triggered ACK was the easiest
solution.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
       [not found]     ` <58a4abb51fe9411fbc7b1a58a2a6f5da@UCL-MBX03.OASIS.UCLOUVAIN.BE>
@ 2020-06-04 23:18       ` Christoph Paasch
  2020-06-05  1:27         ` Eric Dumazet
  2020-06-05 19:47         ` Julian Anastasov
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Paasch @ 2020-06-04 23:18 UTC (permalink / raw)
  To: Julian Anastasov, Eric Dumazet; +Cc: Wayne Badger, netdev, Leif Hedstrom

+Eric & Leif

Hello,


(digging out an old thread ... ;-) )

On Wed, Jun 11, 2014 at 11:05 PM Julian Anastasov <ja@ssi.bg> wrote:
>
>
>         Hello,
>
> On Tue, 10 Jun 2014, Wayne Badger wrote:
>
> > On 6/7/14, 10:41 AM, Julian Anastasov wrote:
> > >
> > >     This discussion (http://marc.info/?t=125541062900001&r=1&w=2)
> > > has some hints about using TCP_SYNCNT.
> >
> > Thanks for the pointer.  I have read through this discussion, but I
> > can't see how it helps with the current implementation.  TCP_SYNCNT
> > (or sysctl.tcp_synack_retries if TCP_SYNCNT is unused) allows you to
> > set the number of retries, but inet_csk_reqsk_queue_prune essentially
> > ignores the TCP_SYNCNT value if TCP_DEFER_ACCEPT is in use.
> >
> >         if (queue->rskq_defer_accept)
> >             max_retries = queue->rskq_defer_accept;
>
>         You are right, I missed that. So, we send just
> one SYN+ACK when period is about to expire.
>
> > I have so far been unable to obtain the behavior documented for
> > TCP_DEFER_ACCEPT, even with various settings of TCP_SYNCNT.  No setting
> > of TCP_SYNCNT can make it be used in the calculations in favor of the
> > TCP_DEFER_ACCEPT value.  No matter which value I choose for TCP_SYNCNT,
> > the connection is always promoted to a full socket and moved to the
> > accept queue.
> >
> > Would you verify whether a server ever accepts the socket if data is
> > not sent?  I've been using v3.14.0 with default sysctl settings and a
> > TCP_DEFER_ACCEPT value of 30 and don't see that behavior.
>
>         syn_ack_recalc() schedules SYN+ACK, so under
> normal conditions, it triggers ACK from client and
> child is created, even without DATA. Request will
> expire only when last SYN+ACK or the following ACK
> is lost (or not sent).
>
> > The behavior that we want is for the receipt of the duplicate bare
> > ACK to not result in waking up user space.  The socket still hasn't
> > received any data, so there's no point in the process accepting the
> > socket since there's nothing the process can do.
>
>         One problem with this behavior is that after first ACK
> more ACKs are not expected. Your RST logic still relies on the
> last SYN+ACK we sent to trigger additional ACK. I guess,
> we can live with this because for firewalls it is not worse
> than current behavior. We replace accept() with RST.
>
> > I would prefer that we send a RST upon receipt of a bare ACK for a
> > socket that has completed the 3way handshake, waited the
> > TCP_DEFER_ACCEPT timeout and has never received any
> > data.  If it has timed out, then the server should be done with the
> > connection request.
>
>         I'm ok with this idea.

Is there any specific reason as to why we would not want to do this?

API-breakage does not seem to me to be a concern here. Apps that are
setting DEFER_ACCEPT probably would not expect to get a socket that
does not have data to read.


Any thoughts?


Thanks,
Christoph


>
> > >     The best place would be to send this reset in
> > > inet_csk_reqsk_queue_prune() after the /* Drop this request */
> > > comment if inet_rsk(req)->acked is set because we are not
> > > sure if our SYN+ACKs after the period will lead to new packets
> > > from client. But we have no skb to reply, not sure if
> > > the open request contains data to build a reset.
> >
> > We could drop the connection in inet_csk_reqsk_queue_prune, but we have
> > to ensure that the receipt of the bare ACK response from the duplicate
> > SYN-ACK doesn't promote the socket as it is doing now.  We'll also have
>
>         Agreed. I'm just not sure for the implementation
> needed for inet_csk_reqsk_queue_prune. TCP experts
> can help here.
>
> > to do something for syncookies because a valid bare ACK received for
> > a socket that doesn't even have a minisock should behave similarly.
> > The code to handle this can't go in inet_csk_reqsk_queue_prune because
> > there is no socket of any type to prune so we'll at least need to have the
> > code in two places but we could easily commonize it in a function.
>
>         I don't know the syn-cookie code. Even now
> inet_csk_reqsk_queue_prune() expires acked requests,
> isn't inet_csk_reqsk_queue_drop sufficient?
>
> > Was it the intent with commit d1b99ba41d to change the semantics
> > of TCP_DEFER_ACCEPT such that sockets were always promoted to full
> > sockets and moved to the accept queue?  I want to make sure that we
> > are all on the same page with what the semantics mean because if we
> > have a disagreement there, then nothing else matters.  I'm just trying
> > to get a socket to stay in the kernel and not wake up the listener before
> > there is any data.
>
>         IIRC, the main idea is to remove connection from
> firewalls, the idea with triggered ACK was the easiest
> solution.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2020-06-04 23:18       ` Christoph Paasch
@ 2020-06-05  1:27         ` Eric Dumazet
  2020-06-05 14:57           ` Christoph Paasch
  2020-06-05 19:47         ` Julian Anastasov
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2020-06-05  1:27 UTC (permalink / raw)
  To: Christoph Paasch, Julian Anastasov, Eric Dumazet
  Cc: Wayne Badger, netdev, Leif Hedstrom



On 6/4/20 4:18 PM, Christoph Paasch wrote:
> +Eric & Leif
> 
> Hello,
> 
> 
> (digging out an old thread ... ;-) )
>

Is there a tldr; ?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2020-06-05  1:27         ` Eric Dumazet
@ 2020-06-05 14:57           ` Christoph Paasch
  2020-06-05 16:48             ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Paasch @ 2020-06-05 14:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, Wayne Badger, netdev, Leif Hedstrom

On Thu, Jun 4, 2020 at 6:28 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 6/4/20 4:18 PM, Christoph Paasch wrote:
> > +Eric & Leif
> >
> > Hello,
> >
> >
> > (digging out an old thread ... ;-) )
> >
>
> Is there a tldr; ?

Sure! TCP_DEFER_ACCEPT delays the creation of the socket until data
has been sent by the client *or* the specified time has expired upon
which a last SYN/ACK is sent and if the client replies with an ACK the
socket will be created and presented to the accept()-call. In the
latter case it means that the app gets a socket that does not have any
data to be read - which goes against the intention of TCP_DEFER_ACCEPT
(man-page says: "Allow a listener to be awakened only when data
arrives on the socket.").

In the original thread the proposal was to kill the connection with a
TCP-RST when the specified timeout expired (after the final SYN/ACK).

Thus, my question in my first email whether there is a specific reason
to not do this.

API-breakage does not seem to me to be a concern here. Apps that are
setting DEFER_ACCEPT probably would not expect to get a socket that
does not have data to read.


Thanks,
Christoph

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2020-06-05 14:57           ` Christoph Paasch
@ 2020-06-05 16:48             ` Eric Dumazet
  2020-06-07 10:00               ` Florian Westphal
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2020-06-05 16:48 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet
  Cc: Julian Anastasov, Wayne Badger, netdev, Leif Hedstrom



On 6/5/20 7:57 AM, Christoph Paasch wrote:
> On Thu, Jun 4, 2020 at 6:28 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 6/4/20 4:18 PM, Christoph Paasch wrote:
>>> +Eric & Leif
>>>
>>> Hello,
>>>
>>>
>>> (digging out an old thread ... ;-) )
>>>
>>
>> Is there a tldr; ?
> 
> Sure! TCP_DEFER_ACCEPT delays the creation of the socket until data
> has been sent by the client *or* the specified time has expired upon
> which a last SYN/ACK is sent and if the client replies with an ACK the
> socket will be created and presented to the accept()-call. In the
> latter case it means that the app gets a socket that does not have any
> data to be read - which goes against the intention of TCP_DEFER_ACCEPT
> (man-page says: "Allow a listener to be awakened only when data
> arrives on the socket.").
> 
> In the original thread the proposal was to kill the connection with a
> TCP-RST when the specified timeout expired (after the final SYN/ACK).
> 
> Thus, my question in my first email whether there is a specific reason
> to not do this.
> 
> API-breakage does not seem to me to be a concern here. Apps that are
> setting DEFER_ACCEPT probably would not expect to get a socket that
> does not have data to read.

Thanks for the summary ;)

I disagree.

A server might have two modes :

1) A fast path, expecting data from user in a small amount of time, from peers not too far away.

2) A slow path, for clients far away. Server can implement strategies to control number of sockets
that have been accepted() but not yet active (no data yet received).

I have attended many conferences with bad wifi networks to pretend that 3WHS + headers can always
be completed in X seconds.

> 
> 
> Thanks,
> Christoph
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2020-06-04 23:18       ` Christoph Paasch
  2020-06-05  1:27         ` Eric Dumazet
@ 2020-06-05 19:47         ` Julian Anastasov
  1 sibling, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2020-06-05 19:47 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Eric Dumazet, Wayne Badger, netdev, Leif Hedstrom


	Hello,

On Thu, 4 Jun 2020, Christoph Paasch wrote:

> On Wed, Jun 11, 2014 at 11:05 PM Julian Anastasov <ja@ssi.bg> wrote:
> >
> >
> > > The behavior that we want is for the receipt of the duplicate bare
> > > ACK to not result in waking up user space.  The socket still hasn't
> > > received any data, so there's no point in the process accepting the
> > > socket since there's nothing the process can do.
> >
> >         One problem with this behavior is that after first ACK
> > more ACKs are not expected. Your RST logic still relies on the
> > last SYN+ACK we sent to trigger additional ACK. I guess,
> > we can live with this because for firewalls it is not worse
> > than current behavior. We replace accept() with RST.
> >
> > > I would prefer that we send a RST upon receipt of a bare ACK for a
> > > socket that has completed the 3way handshake, waited the
> > > TCP_DEFER_ACCEPT timeout and has never received any
> > > data.  If it has timed out, then the server should be done with the
> > > connection request.
> >
> >         I'm ok with this idea.
> 
> Is there any specific reason as to why we would not want to do this?
> 
> API-breakage does not seem to me to be a concern here. Apps that are
> setting DEFER_ACCEPT probably would not expect to get a socket that
> does not have data to read.

	There are older threads that explain why we create sockets
on bare ACK:

https://marc.info/?t=125541062900001&r=1&w=2

	Starting message:

https://marc.info/?l=linux-netdev&m=125541058019459&w=2

	In short, it is better for firewalls when we
do not drop silently the request socket. For now we
do not have a way to specify that we do not want
child sockets without DATA.

TL;DR

Here is how TCP_DEFER_ACCEPT works for server, with
pseudo-states.

TCP_DEFER_ACCEPT for server uses:
- num_timeout: increments when it is time to send SYN+ACK
- num_retrans: increments when SYN+ACK is sent

- TCP_DEFER_ACCEPT (seconds) is converted to retransmission
periods. The possible time for SYN+ACK retransmissions is
calculated as the maximum of SYNCNT/tcp_synack_retries and
the TCP_DEFER_ACCEPT periods, see reqsk_timer_handler().

- when bare ACK is received we avoid resending SYN+ACKs, we
start to resend just when the last TCP_DEFER_ACCEPT period
is started, see syn_ack_recalc().

Pseudo States for SYN_RECV:

WAIT_AND_RESEND:
	- when:
		- SYN is received
	- [re]send SYN+ACK depending on SYNCNT (at least for
	TCP_DEFER_ACCEPT time, if set), the retransmission
	period is max-of(TCP_DEFER_ACCEPT periods, SYNCNT)
	- on received DATA => create socket
	- on received bare ACK:
		- if using TCP_DEFER_ACCEPT => enter WAIT_SILENTLY
		- if not using TCP_DEFER_ACCEPT => create socket

WAIT_SILENTLY:
	- when:
		- using TCP_DEFER_ACCEPT
		- bare ACK received (indicates that client received
		our SYN+ACK, so no need to resend anymore)
	- do not send SYN+ACK
	- on received DATA => create socket
	- on received bare ACK: drop it (do not resend on packet)
	- when the last TCP_DEFER_ACCEPT period starts enter
	REFRESH_STATE

REFRESH_STATE:
	- when:
		- last TCP_DEFER_ACCEPT period starts after bare ACK
	- resend depending on SYNCNT to update the connection state
	in client and stateful firewalls. Such packets will trigger
	DATA (hopefully), ACK, FIN, RST, etc. When SYNCNT is less
	than the TCP_DEFER_ACCEPT period, we will send single
	SYN+ACK, otherwise more SYN+ACKs will be sent (up to SYNCNT).
	- on received DATA => create socket
	- on received bare ACK: create socket
	- no more resends (SYNCNT) => Drop req.

Programming hints for servers:

1. TCP_DEFER_ACCEPT controlled with TCP_SYNCNT
	- set TCP_DEFER_ACCEPT (defer period) to 1 (used as flag=Enabled)
	- tune the possible retransmission period with TCP_SYNCNT
	- PRO:
		- more control on the sent packets
	- CON:
		- time depends on TCP_TIMEOUT_INIT value
		- server is not silent after bare ACK because
		defer period is small

2. Use just seconds in TCP_DEFER_ACCEPT
	- PRO:
		- defer period is specified in seconds
		- no SYN+ACKs are sent in defer period after bare ACK
		- can send more SYN+ACKs when TCP_SYNCNT is greater,
		to avoid losses

- In both cases be ready to accept sockets without data
after the defer period

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2020-06-05 16:48             ` Eric Dumazet
@ 2020-06-07 10:00               ` Florian Westphal
  2020-06-07 15:05                 ` Leif Hedstrom
  2020-06-09 16:45                 ` Christoph Paasch
  0 siblings, 2 replies; 12+ messages in thread
From: Florian Westphal @ 2020-06-07 10:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Paasch, Julian Anastasov, Wayne Badger, netdev, Leif Hedstrom

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Sure! TCP_DEFER_ACCEPT delays the creation of the socket until data
> > has been sent by the client *or* the specified time has expired upon
> > which a last SYN/ACK is sent and if the client replies with an ACK the
> > socket will be created and presented to the accept()-call. In the
> > latter case it means that the app gets a socket that does not have any
> > data to be read - which goes against the intention of TCP_DEFER_ACCEPT
> > (man-page says: "Allow a listener to be awakened only when data
> > arrives on the socket.").
> > 
> > In the original thread the proposal was to kill the connection with a
> > TCP-RST when the specified timeout expired (after the final SYN/ACK).
> > 
> > Thus, my question in my first email whether there is a specific reason
> > to not do this.
> > 
> > API-breakage does not seem to me to be a concern here. Apps that are
> > setting DEFER_ACCEPT probably would not expect to get a socket that
> > does not have data to read.
> 
> Thanks for the summary ;)
> 
> I disagree.
> 
> A server might have two modes :
> 
> 1) A fast path, expecting data from user in a small amount of time, from peers not too far away.
> 
> 2) A slow path, for clients far away. Server can implement strategies to control number of sockets
> that have been accepted() but not yet active (no data yet received).

So we can't change DEFER_ACCEPT behaviour.
Any objections to adding TCP_DEFER_ACCEPT2 with the behaviour outlined
by Christoph?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2020-06-07 10:00               ` Florian Westphal
@ 2020-06-07 15:05                 ` Leif Hedstrom
  2020-06-09 16:45                 ` Christoph Paasch
  1 sibling, 0 replies; 12+ messages in thread
From: Leif Hedstrom @ 2020-06-07 15:05 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eric Dumazet, Christoph Paasch, Julian Anastasov, Wayne Badger, netdev



> On Jun 7, 2020, at 04:01, Florian Westphal <fw@strlen.de> wrote:
> 
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Sure! TCP_DEFER_ACCEPT delays the creation of the socket until data
>>> has been sent by the client *or* the specified time has expired upon
>>> which a last SYN/ACK is sent and if the client replies with an ACK the
>>> socket will be created and presented to the accept()-call. In the
>>> latter case it means that the app gets a socket that does not have any
>>> data to be read - which goes against the intention of TCP_DEFER_ACCEPT
>>> (man-page says: "Allow a listener to be awakened only when data
>>> arrives on the socket.").
>>> 
>>> In the original thread the proposal was to kill the connection with a
>>> TCP-RST when the specified timeout expired (after the final SYN/ACK).
>>> 
>>> Thus, my question in my first email whether there is a specific reason
>>> to not do this.
>>> 
>>> API-breakage does not seem to me to be a concern here. Apps that are
>>> setting DEFER_ACCEPT probably would not expect to get a socket that
>>> does not have data to read.
>> 
>> Thanks for the summary ;)
>> 
>> I disagree.
>> 
>> A server might have two modes :
>> 
>> 1) A fast path, expecting data from user in a small amount of time, from peers not too far away.
>> 
>> 2) A slow path, for clients far away. Server can implement strategies to control number of sockets
>> that have been accepted() but not yet active (no data yet received).
> 
> So we can't change DEFER_ACCEPT behaviour.
> Any objections to adding TCP_DEFER_ACCEPT2 with the behaviour outlined
> by Christoph?


I think that would be useful, although ideally a better, more descriptive name ?

Cheers,

— Leif 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP_DEFER_ACCEPT wakes up without data
  2020-06-07 10:00               ` Florian Westphal
  2020-06-07 15:05                 ` Leif Hedstrom
@ 2020-06-09 16:45                 ` Christoph Paasch
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Paasch @ 2020-06-09 16:45 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eric Dumazet, Julian Anastasov, Wayne Badger, netdev, Leif Hedstrom

Hello,

On Sun, Jun 7, 2020 at 3:00 AM Florian Westphal <fw@strlen.de> wrote:
>
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > Sure! TCP_DEFER_ACCEPT delays the creation of the socket until data
> > > has been sent by the client *or* the specified time has expired upon
> > > which a last SYN/ACK is sent and if the client replies with an ACK the
> > > socket will be created and presented to the accept()-call. In the
> > > latter case it means that the app gets a socket that does not have any
> > > data to be read - which goes against the intention of TCP_DEFER_ACCEPT
> > > (man-page says: "Allow a listener to be awakened only when data
> > > arrives on the socket.").
> > >
> > > In the original thread the proposal was to kill the connection with a
> > > TCP-RST when the specified timeout expired (after the final SYN/ACK).
> > >
> > > Thus, my question in my first email whether there is a specific reason
> > > to not do this.
> > >
> > > API-breakage does not seem to me to be a concern here. Apps that are
> > > setting DEFER_ACCEPT probably would not expect to get a socket that
> > > does not have data to read.
> >
> > Thanks for the summary ;)
> >
> > I disagree.
> >
> > A server might have two modes :
> >
> > 1) A fast path, expecting data from user in a small amount of time, from peers not too far away.
> >
> > 2) A slow path, for clients far away. Server can implement strategies to control number of sockets
> > that have been accepted() but not yet active (no data yet received).

to add to that: There are indeed scenarios where TCP-SYN/... without
payload go through fine but as soon as the packet-size increases
WiFi/Cell has problems because of smaller grants given by the
AP/tower. But even those connections should be able to get the data
through within a "reasonable" timeframe. Anything beyond that
timeframe will anyways have such a bad user-experience that it is
pointless to continue.

So, a use-case here would be where the user is in such a slow network
and a TCP-split proxy is deployed (such proxies are very common in
cellular networks). That proxy will be ACKing the server's SYN/ACK
retransmission at the end of the defer-accept period, while the client
is still trying very hard to get the data through to the proxy (or
even, the client might have gone totally out-of-service).

For those kinds of scenarios it would make sense to have a different
DEFER_ACCEPT-behavior (maybe with a separate socket-option as Florian
suggested).


Christoph

>
> So we can't change DEFER_ACCEPT behaviour.
> Any objections to adding TCP_DEFER_ACCEPT2 with the behaviour outlined
> by Christoph?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-09 16:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-05  0:14 TCP_DEFER_ACCEPT wakes up without data Wayne Badger
2014-06-07 15:41 ` Julian Anastasov
2014-06-11  0:57   ` Wayne Badger
2014-06-12  6:00     ` Julian Anastasov
     [not found]     ` <58a4abb51fe9411fbc7b1a58a2a6f5da@UCL-MBX03.OASIS.UCLOUVAIN.BE>
2020-06-04 23:18       ` Christoph Paasch
2020-06-05  1:27         ` Eric Dumazet
2020-06-05 14:57           ` Christoph Paasch
2020-06-05 16:48             ` Eric Dumazet
2020-06-07 10:00               ` Florian Westphal
2020-06-07 15:05                 ` Leif Hedstrom
2020-06-09 16:45                 ` Christoph Paasch
2020-06-05 19:47         ` Julian Anastasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).