* TCP_REPAIR MSS issue @ 2016-06-10 11:22 Eggert, Lars 2016-06-14 11:28 ` [CRIU] " Pavel Emelyanov 0 siblings, 1 reply; 13+ messages in thread From: Eggert, Lars @ 2016-06-10 11:22 UTC (permalink / raw) To: netdev, criu [-- Attachment #1: Type: text/plain, Size: 337 bytes --] Hi, I see an issue with TCP_REPAIR on kernel 4.6.0, where a migrated connection is only sending minimum-sized segments (~500 bytes), although the interfaces and path support Ethernet-sized MTUs. A connection that doesn't use TCP_REPAIR over the same path uses full-sized segments. Know issue? Any workaround? Thanks, Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-10 11:22 TCP_REPAIR MSS issue Eggert, Lars @ 2016-06-14 11:28 ` Pavel Emelyanov 2016-06-14 11:40 ` Eggert, Lars 0 siblings, 1 reply; 13+ messages in thread From: Pavel Emelyanov @ 2016-06-14 11:28 UTC (permalink / raw) To: Eggert, Lars; +Cc: netdev, criu, Andrey Vagin On 06/10/2016 02:22 PM, Eggert, Lars wrote: > Hi, > > I see an issue with TCP_REPAIR on kernel 4.6.0, where a migrated connection is only sending minimum-sized > segments (~500 bytes), although the interfaces and path support Ethernet-sized MTUs. A connection that > doesn't use TCP_REPAIR over the same path uses full-sized segments. > > Know issue? Any workaround? Andrey (in Cc) has played with TCP_REPAIR recently, I guess he can know something. > Thanks, > Lars > > > > > _______________________________________________ > CRIU mailing list > CRIU@openvz.org > https://lists.openvz.org/mailman/listinfo/criu > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 11:28 ` [CRIU] " Pavel Emelyanov @ 2016-06-14 11:40 ` Eggert, Lars 2016-06-14 13:03 ` Eric Dumazet 2016-06-14 17:15 ` Andrey Vagin 0 siblings, 2 replies; 13+ messages in thread From: Eggert, Lars @ 2016-06-14 11:40 UTC (permalink / raw) To: Pavel Emelyanov; +Cc: netdev, criu, Andrey Vagin, Honda, Michio [-- Attachment #1: Type: text/plain, Size: 593 bytes --] On 2016-06-14, at 13:28, Pavel Emelyanov <xemul@virtuozzo.com> wrote: > Andrey (in Cc) has played with TCP_REPAIR recently, I guess he can know something. Thanks for CC'ing him. We looked a little bit more into this: When TCP_REPAIR is on, tcp_connect() directly calls tcp_finish_connect() before returning, passing NULL for skb, which causes sk_rx_dst_set() to be bypassed. Later, when TCP_REPAIR is being turned off, do_tcp_setsockopt() just does tcp_send_window_probe(), but apparently all the "dst" stuff is being bypassed then also, so the mss remains at TCP_MSS_DEFAULT. Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 11:40 ` Eggert, Lars @ 2016-06-14 13:03 ` Eric Dumazet 2016-06-14 13:10 ` Eric Dumazet 2016-06-14 17:15 ` Andrey Vagin 1 sibling, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2016-06-14 13:03 UTC (permalink / raw) To: Eggert, Lars; +Cc: Pavel Emelyanov, netdev, criu, Andrey Vagin, Honda, Michio On Tue, 2016-06-14 at 11:40 +0000, Eggert, Lars wrote: > On 2016-06-14, at 13:28, Pavel Emelyanov <xemul@virtuozzo.com> wrote: > > Andrey (in Cc) has played with TCP_REPAIR recently, I guess he can know something. > > Thanks for CC'ing him. We looked a little bit more into this: > > When TCP_REPAIR is on, tcp_connect() directly calls tcp_finish_connect() before returning, passing NULL for skb, which causes sk_rx_dst_set() to be bypassed. Later, when TCP_REPAIR is being turned off, do_tcp_setsockopt() just does tcp_send_window_probe(), but apparently all the "dst" stuff is being bypassed then also, so the mss remains at TCP_MSS_DEFAULT. > > Lars What gives : sysctl net/ipv4/ip_no_pmtu_disc net/ipv4/tcp_mtu_probing ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 13:03 ` Eric Dumazet @ 2016-06-14 13:10 ` Eric Dumazet 2016-06-14 13:24 ` Eggert, Lars 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2016-06-14 13:10 UTC (permalink / raw) To: Eggert, Lars; +Cc: Pavel Emelyanov, netdev, criu, Andrey Vagin, Honda, Michio On Tue, 2016-06-14 at 06:03 -0700, Eric Dumazet wrote: > On Tue, 2016-06-14 at 11:40 +0000, Eggert, Lars wrote: > > On 2016-06-14, at 13:28, Pavel Emelyanov <xemul@virtuozzo.com> wrote: > > > Andrey (in Cc) has played with TCP_REPAIR recently, I guess he can know something. > > > > Thanks for CC'ing him. We looked a little bit more into this: > > > > When TCP_REPAIR is on, tcp_connect() directly calls tcp_finish_connect() before returning, passing NULL for skb, which causes sk_rx_dst_set() to be bypassed. Later, when TCP_REPAIR is being turned off, do_tcp_setsockopt() just does tcp_send_window_probe(), but apparently all the "dst" stuff is being bypassed then also, so the mss remains at TCP_MSS_DEFAULT. > > > > Lars > > What gives : > > sysctl net/ipv4/ip_no_pmtu_disc net/ipv4/tcp_mtu_probing > Also, is it a regression ? Was this working better with an older linux version ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 13:10 ` Eric Dumazet @ 2016-06-14 13:24 ` Eggert, Lars 2016-06-14 14:54 ` Eggert, Lars 0 siblings, 1 reply; 13+ messages in thread From: Eggert, Lars @ 2016-06-14 13:24 UTC (permalink / raw) To: Eric Dumazet; +Cc: Pavel Emelyanov, netdev, criu, Andrey Vagin, Honda, Michio [-- Attachment #1: Type: text/plain, Size: 433 bytes --] On 2016-06-14, at 15:10, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> What gives : >> sysctl net/ipv4/ip_no_pmtu_disc net/ipv4/tcp_mtu_probing net.ipv4.ip_no_pmtu_disc = 0 net.ipv4.tcp_mtu_probing = 0 > Also, is it a regression ? Was this working better with an older linux > version ? No idea, 4.6.0 is the only kernel I ran this on (is a new project). What would be a good older version to check? Thanks, Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 13:24 ` Eggert, Lars @ 2016-06-14 14:54 ` Eggert, Lars 0 siblings, 0 replies; 13+ messages in thread From: Eggert, Lars @ 2016-06-14 14:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: Pavel Emelyanov, netdev, criu, Andrey Vagin, Honda, Michio [-- Attachment #1: Type: text/plain, Size: 431 bytes --] On 2016-06-14, at 15:24, Eggert, Lars <lars@netapp.com> wrote: > On 2016-06-14, at 15:10, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> Also, is it a regression ? Was this working better with an older linux >> version ? > > No idea, 4.6.0 is the only kernel I ran this on (is a new project). What would be a good older version to check? FWIW, I had 4.2.0 available on this machine, and it has the same problem. Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 11:40 ` Eggert, Lars 2016-06-14 13:03 ` Eric Dumazet @ 2016-06-14 17:15 ` Andrey Vagin 2016-06-14 19:37 ` Eggert, Lars 1 sibling, 1 reply; 13+ messages in thread From: Andrey Vagin @ 2016-06-14 17:15 UTC (permalink / raw) To: Eggert, Lars; +Cc: Pavel Emelyanov, netdev, criu, Honda, Michio Hi, Recently we found that we have to restore more parameters for tcp sockets. https://patchwork.kernel.org/patch/9144995/ As for your problem, criu saves and restores mss_clamp. Could you check that it works for your case? on dump: static int tcp_stream_get_options(int sk, struct tcp_info *ti, TcpStreamEntry *tse) { int ret; socklen_t auxl; int val; auxl = sizeof(tse->mss_clamp); ret = getsockopt(sk, SOL_TCP, TCP_MAXSEG, &tse->mss_clamp, &auxl); if (ret < 0) goto err_sopt; on restore: pr_debug("Will set mss clamp to %u\n", tse->mss_clamp); opts[onr].opt_code = TCPOPT_MAXSEG; opts[onr].opt_val = tse->mss_clamp; onr++; if (setsockopt(sk, SOL_TCP, TCP_REPAIR_OPTIONS, opts, onr * sizeof(struct tcp_repair_opt)) < 0) { pr_perror("Can't repair options"); return -1; } Thanks, Andrew On Tue, Jun 14, 2016 at 11:40:01AM +0000, Eggert, Lars wrote: > On 2016-06-14, at 13:28, Pavel Emelyanov <xemul@virtuozzo.com> wrote: > > Andrey (in Cc) has played with TCP_REPAIR recently, I guess he can know something. > > Thanks for CC'ing him. We looked a little bit more into this: > > When TCP_REPAIR is on, tcp_connect() directly calls tcp_finish_connect() before returning, passing NULL for skb, which causes sk_rx_dst_set() to be bypassed. Later, when TCP_REPAIR is being turned off, do_tcp_setsockopt() just does tcp_send_window_probe(), but apparently all the "dst" stuff is being bypassed then also, so the mss remains at TCP_MSS_DEFAULT. > > Lars ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 17:15 ` Andrey Vagin @ 2016-06-14 19:37 ` Eggert, Lars 2016-06-14 21:21 ` Andrey Vagin 0 siblings, 1 reply; 13+ messages in thread From: Eggert, Lars @ 2016-06-14 19:37 UTC (permalink / raw) To: Andrey Vagin; +Cc: Pavel Emelyanov, netdev, criu, Honda, Michio [-- Attachment #1: Type: text/plain, Size: 476 bytes --] Hi, On 2016-06-14, at 19:15, Andrey Vagin <avagin@virtuozzo.com> wrote: > Recently we found that we have to restore more parameters for tcp > sockets. > https://patchwork.kernel.org/patch/9144995/ thanks for the pointer. > As for your problem, criu saves and restores mss_clamp. Could you check > that it works for your case? I do this already, but clamping doesn't help here, since it only limits the MSS (but does not increase it from the minimum.) Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 19:37 ` Eggert, Lars @ 2016-06-14 21:21 ` Andrey Vagin 2016-06-16 7:51 ` Eggert, Lars 0 siblings, 1 reply; 13+ messages in thread From: Andrey Vagin @ 2016-06-14 21:21 UTC (permalink / raw) To: Eggert, Lars; +Cc: Pavel Emelyanov, netdev, criu, Honda, Michio On Tue, Jun 14, 2016 at 07:37:12PM +0000, Eggert, Lars wrote: > Hi, > > On 2016-06-14, at 19:15, Andrey Vagin <avagin@virtuozzo.com> wrote: > > Recently we found that we have to restore more parameters for tcp > > sockets. > > https://patchwork.kernel.org/patch/9144995/ > > thanks for the pointer. > > > As for your problem, criu saves and restores mss_clamp. Could you check > > that it works for your case? > > I do this already, but clamping doesn't help here, since it only limits the MSS (but does not increase it from the minimum.) Yes, you are right. On my host, I see that dst is set in tcp_v4_connect() -> sk_setup_caps() > > Lars ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-14 21:21 ` Andrey Vagin @ 2016-06-16 7:51 ` Eggert, Lars 2016-06-16 21:09 ` Andrey Vagin 0 siblings, 1 reply; 13+ messages in thread From: Eggert, Lars @ 2016-06-16 7:51 UTC (permalink / raw) To: Andrey Vagin; +Cc: Pavel Emelyanov, netdev, criu, Honda, Michio [-- Attachment #1: Type: text/plain, Size: 361 bytes --] Hi, On 2016-06-14, at 23:21, Andrey Vagin <avagin@virtuozzo.com> wrote: > On my host, I see that dst is set in tcp_v4_connect() -> sk_setup_caps() sorry, are you saying that you don't see the issue with TCP_MSS_DEFAULT-sized segments after TCP_REPAIR on your kernel? Or are you saying my quick attempt at analyzing the cause was wrong? Thanks, Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-16 7:51 ` Eggert, Lars @ 2016-06-16 21:09 ` Andrey Vagin 2016-07-18 9:06 ` Eggert, Lars 0 siblings, 1 reply; 13+ messages in thread From: Andrey Vagin @ 2016-06-16 21:09 UTC (permalink / raw) To: Eggert, Lars; +Cc: Pavel Emelyanov, netdev, criu, Honda, Michio On Thu, Jun 16, 2016 at 07:51:22AM +0000, Eggert, Lars wrote: > Hi, > > On 2016-06-14, at 23:21, Andrey Vagin <avagin@virtuozzo.com> wrote: > > On my host, I see that dst is set in tcp_v4_connect() -> sk_setup_caps() > > sorry, are you saying that you don't see the issue with TCP_MSS_DEFAULT-sized segments after TCP_REPAIR on your kernel? Or are you saying my quick attempt at analyzing the cause was wrong? I can't reproduce this issue, now I'm trying to understand why it works for me and doesn't work for you. I've read you version of a reason: > When TCP_REPAIR is on, tcp_connect() directly calls tcp_finish_connect() before > returning, passing NULL for skb, which causes sk_rx_dst_set() to be bypassed. > Later, when TCP_REPAIR is being turned off, do_tcp_setsockopt() just does > tcp_send_window_probe(), but apparently all the "dst" stuff is being bypassed > then also, so the mss remains at TCP_MSS_DEFAULT. I found where dst is set for a socket when a tcp connection is restored. Then I added a debug message into tcp_sync_mss and found that mss is intialized to TCP_MSS_DEFAULT, but then it's updated after unlocking network. So here is a question why mss isn't updated in your case. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 95c0b50..b0d323f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1367,6 +1367,13 @@ unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu) icsk->icsk_pmtu_cookie = pmtu; if (icsk->icsk_mtup.enabled) mss_now = min(mss_now, tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low)); + + static struct tcp_sock *tp_s = NULL; + if (tp->repair || tp == tp_s) { + printk("%s:%d: pmtu = %d mss = %d (%d)\n", __func__, __LINE__, pmtu, mss_now, tp->mss_cache); + tp_s = tp; + dump_stack(); + } tp->mss_cache = mss_now; return mss_now; [ 86.095286] tcp_sync_mss:1372: pmtu = 1500 mss = 524 (536) [ 86.095292] CPU: 0 PID: 12474 Comm: criu ve: 101 Not tainted 3.10.0-327.18.2.ovz.14.14-00004-g4ba9241-dirty #9 14.14 [ 86.095294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 86.095297] ffff8804094ec400 00000000b1bcc4c2 ffff880427b9bcf8 ffffffff8164c988 [ 86.095301] ffff880427b9bd18 ffffffff815a4aca ffff8804275c0780 ffff8804094ec400 [ 86.095303] ffff880427b9bd98 ffffffff815a70c8 ffffffff815911f0 ffffffff81a43500 [ 86.095307] Call Trace: [ 86.095315] [<ffffffff8164c988>] dump_stack+0x19/0x1b [ 86.095320] [<ffffffff815a4aca>] tcp_sync_mss+0x19a/0x1a0 [ 86.095323] [<ffffffff815a70c8>] tcp_connect+0x98/0x9d0 [ 86.095327] [<ffffffff815911f0>] ? inet_unhash+0xc0/0xc0 [ 86.095333] [<ffffffff81543e0b>] ? secure_ipv4_port_ephemeral+0x5b/0x80 [ 86.095337] [<ffffffff815ac4da>] tcp_v4_connect+0x2da/0x4d0 [ 86.095342] [<ffffffff811af5f9>] ? __do_fault+0x589/0x670 [ 86.095347] [<ffffffff815c376d>] __inet_stream_connect+0xbd/0x330 [ 86.095351] [<ffffffff811b4db1>] ? handle_mm_fault+0x521/0x920 [ 86.095354] [<ffffffff815c3a18>] inet_stream_connect+0x38/0x50 [ 86.095358] [<ffffffff815314a3>] SYSC_connect+0x73/0xf0 [ 86.095363] [<ffffffff81657d63>] ? trace_do_page_fault+0x43/0x110 [ 86.095366] [<ffffffff81657389>] ? do_async_page_fault+0x29/0xe0 [ 86.095369] [<ffffffff81531c8e>] SyS_connect+0xe/0x10 [ 86.095373] [<ffffffff8165c749>] system_call_fastpath+0x16/0x1b [ 91.813519] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 91.814600] device veth51e6d765 entered promiscuous mode [ 91.814654] br0: port 2(veth51e6d765) entered forwarding state [ 91.814661] br0: port 2(veth51e6d765) entered forwarding state [ 106.853351] br0: port 2(veth51e6d765) entered forwarding state [ 116.224891] tcp_sync_mss:1372: pmtu = 1500 mss = 1448 (524) [ 116.224929] CPU: 1 PID: 0 Comm: swapper/1 ve: 0 Not tainted 3.10.0-327.18.2.ovz.14.14-00004-g4ba9241-dirty #9 14.14 [ 116.224935] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 116.224941] 000000002cd1a562 9139c0c08fdb34c5 ffff88043fc83a88 ffffffff8164c988 [ 116.224948] ffff88043fc83aa8 ffffffff815a4aca ffff8804275c0780 0000000000004100 [ 116.224954] ffff88043fc83b48 ffffffff8159fa24 ffff88043fc83be8 ffffffffa0289299 [ 116.224960] Call Trace: [ 116.224965] <IRQ> [<ffffffff8164c988>] dump_stack+0x19/0x1b [ 116.224980] [<ffffffff815a4aca>] tcp_sync_mss+0x19a/0x1a0 [ 116.224986] [<ffffffff8159fa24>] tcp_ack+0x394/0x11a0 [ 116.225005] [<ffffffffa0289299>] ? ipt_do_table+0x339/0x700 [ip_tables] [ 116.225014] [<ffffffffa0289299>] ? ipt_do_table+0x339/0x700 [ip_tables] [ 116.225024] [<ffffffff815a23d6>] tcp_rcv_established+0x1c6/0x740 [ 116.225031] [<ffffffff815ad6fa>] tcp_v4_do_rcv+0x10a/0x3b0 [ 116.225039] [<ffffffff815914f7>] ? __inet_lookup_established+0x47/0x140 [ 116.225045] [<ffffffff815aec03>] tcp_v4_rcv+0x823/0xa90 [ 116.225051] [<ffffffff815873b6>] ip_local_deliver_finish+0xe6/0x220 [ 116.225060] [<ffffffff81587695>] ip_local_deliver+0x55/0xd0 [ 116.225066] [<ffffffff815872d0>] ? ip_rcv_finish+0x350/0x350 [ 116.225071] [<ffffffff81586ffd>] ip_rcv_finish+0x7d/0x350 [ 116.225077] [<ffffffff815879cc>] ip_rcv+0x2bc/0x3e0 > > Thanks, > Lars ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [CRIU] TCP_REPAIR MSS issue 2016-06-16 21:09 ` Andrey Vagin @ 2016-07-18 9:06 ` Eggert, Lars 0 siblings, 0 replies; 13+ messages in thread From: Eggert, Lars @ 2016-07-18 9:06 UTC (permalink / raw) To: Andrey Vagin; +Cc: Pavel Emelyanov, netdev, criu, Honda, Michio [-- Attachment #1: Type: text/plain, Size: 792 bytes --] Hi, On 2016-06-16, at 23:09, Andrey Vagin <avagin@virtuozzo.com> wrote: > I can't reproduce this issue, now I'm trying to understand why it works > for me and doesn't work for you. just to conclude this thread for the list: Andrey and me debugged this off-list. The issue arose, because my code did a bind() to 0.0.0.0 in TCP_REPAIR mode. When turning off TCP_REPAIR and sending into the socket, this caused minimum-MSS-sized segments to be transmitted. The issue goes away when I bind() to the local IP address of my local egress interface. To me at least, this is a POLA violation (excuse the FreeBSD terminology :-) Either binding to 0.0.0.0 should fail, or it should succeed and full-sized segments should be sent. But at least I have a work-around now. Thanks, Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-07-18 9:06 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-10 11:22 TCP_REPAIR MSS issue Eggert, Lars 2016-06-14 11:28 ` [CRIU] " Pavel Emelyanov 2016-06-14 11:40 ` Eggert, Lars 2016-06-14 13:03 ` Eric Dumazet 2016-06-14 13:10 ` Eric Dumazet 2016-06-14 13:24 ` Eggert, Lars 2016-06-14 14:54 ` Eggert, Lars 2016-06-14 17:15 ` Andrey Vagin 2016-06-14 19:37 ` Eggert, Lars 2016-06-14 21:21 ` Andrey Vagin 2016-06-16 7:51 ` Eggert, Lars 2016-06-16 21:09 ` Andrey Vagin 2016-07-18 9:06 ` Eggert, Lars
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.