All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: <netdev@vger.kernel.org>, David Ahern <dsahern@gmail.com>,
	Tom Herbert <tom@herbertland.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Nikita Shirokov <tehnerd@fb.com>, <kernel-team@fb.com>,
	<lvs-devel@vger.kernel.org>
Subject: Re: [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP
Date: Thu, 3 May 2018 00:01:25 -0700	[thread overview]
Message-ID: <20180503070114.bcuusvzga45klccs@kafai-mbp> (raw)
In-Reply-To: <alpine.LFD.2.20.1805022143360.3301@ja.home.ssi.bg>

On Wed, May 02, 2018 at 10:30:32PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 2 May 2018, Martin KaFai Lau wrote:
> 
> > On Wed, May 02, 2018 at 09:38:43AM +0300, Julian Anastasov wrote:
> > > 
> > > - initial traffic for port 21 does not use GSO. But after
> > > every packet IPVS calls maybe_update_pmtu (rt->dst.ops->update_pmtu)
> > > to report the reduced MTU. These updates are stored in fnhe_pmtu
> > > but they do not go to any route, even if we try to get fresh
> > > output route. Why? Because the local routes are not cached, so
> > > they can not use the fnhe. This is what my patch for route.c
> > > will fix. With this fix FTP-DATA gets route with reduced PMTU.
> > For IPv6, the 'if (rt6->rt6i_flags & RTF_LOCAL)' gate in
> > __ip6_rt_update_pmtu() may need to be lifted also.
> 
> 	Probably. I completely forgot the IPv6 part
> but as I don't know the IPv6 code enough, it may take
> some time to understand what can be the problem there...
> I'm not sure whether everything started with commit 0a6b2a1dc2a2,
> so that in some configurations before that commit things
> worked and problem was not noticed.
> 
> 	I think, we should focus on such direction for IPv6:
> 
> - do we remember per-VIP PMTU for the local routes
IPv6 used not to create cache route for DST_HOST route which
is a /128 route (that includes local /128 route).

Because of this, it had a bug such that a PMTU for the DST_HOST
route will trigger dst.ops->update_pmtu() which then set
an expire on the permanent /128 route instead of a cache
route.  The permanent route got unexpectedly expired/removed
later.

The fix was to allow creating /128 cache route as long as
it is not RTF_LOCAL in 653437d02f1f and 7035870d1219.  The
first post spelled out the problem better:
https://patchwork.ozlabs.org/patch/456050/

Later, when we only create cache route after seeing PMTU
in 45e4fd26683c, this RTF_LOCAL checking was carried over
to __ip6_rt_update_pmtu().

Out of my head, I don't see issue removing the
RTF_LOCAL check from __ip6_rt_update_pmtu().
DavidA, what do you think?

> 
> - when exactly we start to use the new PMTU, eg. what happens
> in case socket caches the route, whether route is killed via
> dst->obsolete. Or may be while the PMTU expiration is handled
> per-packet, the PMTU change is noticed only on ICMP...
Before sk can reuse its dst cache, the sk will notice
its dst cache is no longer valid by calling dst_check().
dst_check() should return NULL which is one of the side
effect of the earlier update_pmtu().  This dst_check()
is usually only called when the sk needs to do output,
so the new PMTU route (i.e. the RTF_CACHE IPv6 route)
only have effect to the later packets.

> 
> - as IPVS reports the PMTU via dst.ops->update_pmtu() long
> before any large packets are sent, do we propagate the
> PMTU. Also, for IPv4 __ip_rt_update_pmtu() has some protection
> from such per-packet updates that do not change the PMTU.
> 
> - if IPVS starts to send ICMP when gso_size exceeds PMTU,
> like in my draft patch, whether the PMTU is propagated
> to route and then to socket. As for the gso_size decrease,
> playing in IPVS is not very safe, at least, we need help
> from GSO experts to know how we should use it.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>

WARNING: multiple messages have this Message-ID (diff)
From: Martin KaFai Lau <kafai@fb.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: netdev@vger.kernel.org, David Ahern <dsahern@gmail.com>,
	Tom Herbert <tom@herbertland.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Nikita Shirokov <tehnerd@fb.com>,
	kernel-team@fb.com, lvs-devel@vger.kernel.org
Subject: Re: [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP
Date: Thu, 3 May 2018 00:01:25 -0700	[thread overview]
Message-ID: <20180503070114.bcuusvzga45klccs@kafai-mbp> (raw)
In-Reply-To: <alpine.LFD.2.20.1805022143360.3301@ja.home.ssi.bg>

On Wed, May 02, 2018 at 10:30:32PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 2 May 2018, Martin KaFai Lau wrote:
> 
> > On Wed, May 02, 2018 at 09:38:43AM +0300, Julian Anastasov wrote:
> > > 
> > > - initial traffic for port 21 does not use GSO. But after
> > > every packet IPVS calls maybe_update_pmtu (rt->dst.ops->update_pmtu)
> > > to report the reduced MTU. These updates are stored in fnhe_pmtu
> > > but they do not go to any route, even if we try to get fresh
> > > output route. Why? Because the local routes are not cached, so
> > > they can not use the fnhe. This is what my patch for route.c
> > > will fix. With this fix FTP-DATA gets route with reduced PMTU.
> > For IPv6, the 'if (rt6->rt6i_flags & RTF_LOCAL)' gate in
> > __ip6_rt_update_pmtu() may need to be lifted also.
> 
> 	Probably. I completely forgot the IPv6 part
> but as I don't know the IPv6 code enough, it may take
> some time to understand what can be the problem there...
> I'm not sure whether everything started with commit 0a6b2a1dc2a2,
> so that in some configurations before that commit things
> worked and problem was not noticed.
> 
> 	I think, we should focus on such direction for IPv6:
> 
> - do we remember per-VIP PMTU for the local routes
IPv6 used not to create cache route for DST_HOST route which
is a /128 route (that includes local /128 route).

Because of this, it had a bug such that a PMTU for the DST_HOST
route will trigger dst.ops->update_pmtu() which then set
an expire on the permanent /128 route instead of a cache
route.  The permanent route got unexpectedly expired/removed
later.

The fix was to allow creating /128 cache route as long as
it is not RTF_LOCAL in 653437d02f1f and 7035870d1219.  The
first post spelled out the problem better:
https://patchwork.ozlabs.org/patch/456050/

Later, when we only create cache route after seeing PMTU
in 45e4fd26683c, this RTF_LOCAL checking was carried over
to __ip6_rt_update_pmtu().

Out of my head, I don't see issue removing the
RTF_LOCAL check from __ip6_rt_update_pmtu().
DavidA, what do you think?

> 
> - when exactly we start to use the new PMTU, eg. what happens
> in case socket caches the route, whether route is killed via
> dst->obsolete. Or may be while the PMTU expiration is handled
> per-packet, the PMTU change is noticed only on ICMP...
Before sk can reuse its dst cache, the sk will notice
its dst cache is no longer valid by calling dst_check().
dst_check() should return NULL which is one of the side
effect of the earlier update_pmtu().  This dst_check()
is usually only called when the sk needs to do output,
so the new PMTU route (i.e. the RTF_CACHE IPv6 route)
only have effect to the later packets.

> 
> - as IPVS reports the PMTU via dst.ops->update_pmtu() long
> before any large packets are sent, do we propagate the
> PMTU. Also, for IPv4 __ip_rt_update_pmtu() has some protection
> from such per-packet updates that do not change the PMTU.
> 
> - if IPVS starts to send ICMP when gso_size exceeds PMTU,
> like in my draft patch, whether the PMTU is propagated
> to route and then to socket. As for the gso_size decrease,
> playing in IPVS is not very safe, at least, we need help
> from GSO experts to know how we should use it.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>

  reply	other threads:[~2018-05-03  7:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19 21:23 [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP Martin KaFai Lau
2018-04-20 20:43 ` Julian Anastasov
2018-04-20 23:13   ` Martin KaFai Lau
2018-05-02  6:38 ` Julian Anastasov
2018-05-02 17:10   ` Martin KaFai Lau
2018-05-02 17:10     ` Martin KaFai Lau
2018-05-02 19:30     ` Julian Anastasov
2018-05-03  7:01       ` Martin KaFai Lau [this message]
2018-05-03  7:01         ` Martin KaFai Lau
2018-05-04  3:50         ` David Ahern
2018-05-05 12:58         ` Julian Anastasov
2018-05-07 16:22           ` Martin KaFai Lau
2018-05-07 16:22             ` Martin KaFai Lau
2018-05-07 17:00             ` Julian Anastasov
2018-06-05 21:31           ` Martin KaFai Lau
2018-06-05 21:31             ` Martin KaFai Lau
2018-06-05 22:35             ` Julian Anastasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180503070114.bcuusvzga45klccs@kafai-mbp \
    --to=kafai@fb.com \
    --cc=dsahern@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=ja@ssi.bg \
    --cc=kernel-team@fb.com \
    --cc=lvs-devel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tehnerd@fb.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.