* [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 @ 2020-05-15 1:35 Andrew Sy Kim 2020-05-15 18:07 ` Julian Anastasov 2020-05-17 17:16 ` [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 Andrew Sy Kim 0 siblings, 2 replies; 11+ messages in thread From: Andrew Sy Kim @ 2020-05-15 1:35 UTC (permalink / raw) Cc: kim.andrewsy, Wensong Zhang, Simon Horman, Julian Anastasov, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, David S. Miller, Jakub Kicinski, open list:IPVS, open list:IPVS, open list:NETFILTER, open list:NETFILTER, open list When expire_nodest_conn=1 and an IPVS destination is deleted, IPVS doesn't expire connections with the IP_VS_CONN_F_ONE_PACKET flag set (any UDP connection). If there are many UDP packets to a virtual server from a single client and a destination is deleted, many packets are silently dropped whenever an existing connection entry with the same source port exists. This patch ensures IPVS also expires UDP connections when a packet matches an existing connection with no destinations. Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> --- net/netfilter/ipvs/ip_vs_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index aa6a603a2425..f0535586fe75 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -2116,8 +2116,7 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int else ip_vs_conn_put(cp); - if (sysctl_expire_nodest_conn(ipvs) && - !(flags & IP_VS_CONN_F_ONE_PACKET)) { + if (sysctl_expire_nodest_conn(ipvs)) { /* try to expire the connection immediately */ ip_vs_conn_expire_now(cp); } -- 2.20.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 2020-05-15 1:35 [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 Andrew Sy Kim @ 2020-05-15 18:07 ` Julian Anastasov 2020-05-17 17:27 ` Andrew Kim 2020-05-17 17:16 ` [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 Andrew Sy Kim 1 sibling, 1 reply; 11+ messages in thread From: Julian Anastasov @ 2020-05-15 18:07 UTC (permalink / raw) To: Andrew Sy Kim Cc: Wensong Zhang, Simon Horman, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, David S. Miller, Jakub Kicinski, open list:IPVS, open list:IPVS, open list:NETFILTER, open list:NETFILTER, open list Hello, On Thu, 14 May 2020, Andrew Sy Kim wrote: > When expire_nodest_conn=1 and an IPVS destination is deleted, IPVS > doesn't expire connections with the IP_VS_CONN_F_ONE_PACKET flag set (any > UDP connection). If there are many UDP packets to a virtual server from a > single client and a destination is deleted, many packets are silently > dropped whenever an existing connection entry with the same source port > exists. This patch ensures IPVS also expires UDP connections when a > packet matches an existing connection with no destinations. > > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> > --- > net/netfilter/ipvs/ip_vs_core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c > index aa6a603a2425..f0535586fe75 100644 > --- a/net/netfilter/ipvs/ip_vs_core.c > +++ b/net/netfilter/ipvs/ip_vs_core.c > @@ -2116,8 +2116,7 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int > else > ip_vs_conn_put(cp); Above ip_vs_conn_put() should free the ONE_PACKET connections because: - such connections never start timer, they are designed to exist just to schedule the packet, then they are released. - noone takes extra references So, ip_vs_conn_put() simply calls ip_vs_conn_expire() where connections should be released immediately. As result, we can not access cp after this point here. That is why we work just with 'flags' below... Note that not every UDP connection has ONE_PACKET flag, it is present if you configure it for the service. Do you have -o/--ops flag? If not, the UDP connection should expire before the next jiffie. This is the theory, in practice, you may observe some problem... > - if (sysctl_expire_nodest_conn(ipvs) && > - !(flags & IP_VS_CONN_F_ONE_PACKET)) { > + if (sysctl_expire_nodest_conn(ipvs)) { > /* try to expire the connection immediately */ > ip_vs_conn_expire_now(cp); > } You can also look at the discussion which resulted in the last patch for this place: http://archive.linuxvirtualserver.org/html/lvs-devel/2018-07/msg00014.html Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 2020-05-15 18:07 ` Julian Anastasov @ 2020-05-17 17:27 ` Andrew Kim 2020-05-17 17:30 ` Andrew Kim 0 siblings, 1 reply; 11+ messages in thread From: Andrew Kim @ 2020-05-17 17:27 UTC (permalink / raw) To: Julian Anastasov Cc: Wensong Zhang, Simon Horman, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, David S. Miller, Jakub Kicinski, open list:IPVS, open list:IPVS, open list:NETFILTER, open list:NETFILTER, open list Hi Julian, Thanks for getting back to me, that makes sense. Would you be opposed to trying to expire all UDP connections matching a deleted destination only if expire_nodest_conn=1? Even today with `expire_nodest_conn=1`, many packets could be dropped if there are many requests from a single client trying to reuse client ports matching a deleted destination. Setting `expire_nodest_conn=1` and reducing the UDP timeout helps but deleting all connections when the destination is deleted seems more efficient. Looking forward to hearing your thoughts, Andrew Sy Kim On Fri, May 15, 2020 at 2:07 PM Julian Anastasov <ja@ssi.bg> wrote: > > > Hello, > > On Thu, 14 May 2020, Andrew Sy Kim wrote: > > > When expire_nodest_conn=1 and an IPVS destination is deleted, IPVS > > doesn't expire connections with the IP_VS_CONN_F_ONE_PACKET flag set (any > > UDP connection). If there are many UDP packets to a virtual server from a > > single client and a destination is deleted, many packets are silently > > dropped whenever an existing connection entry with the same source port > > exists. This patch ensures IPVS also expires UDP connections when a > > packet matches an existing connection with no destinations. > > > > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> > > --- > > net/netfilter/ipvs/ip_vs_core.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c > > index aa6a603a2425..f0535586fe75 100644 > > --- a/net/netfilter/ipvs/ip_vs_core.c > > +++ b/net/netfilter/ipvs/ip_vs_core.c > > @@ -2116,8 +2116,7 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int > > else > > ip_vs_conn_put(cp); > > Above ip_vs_conn_put() should free the ONE_PACKET > connections because: > > - such connections never start timer, they are designed > to exist just to schedule the packet, then they are released. > - noone takes extra references > > So, ip_vs_conn_put() simply calls ip_vs_conn_expire() > where connections should be released immediately. As result, > we can not access cp after this point here. That is why we work > just with 'flags' below... > > Note that not every UDP connection has ONE_PACKET > flag, it is present if you configure it for the service. > Do you have -o/--ops flag? If not, the UDP connection > should expire before the next jiffie. This is the theory, > in practice, you may observe some problem... > > > - if (sysctl_expire_nodest_conn(ipvs) && > > - !(flags & IP_VS_CONN_F_ONE_PACKET)) { > > + if (sysctl_expire_nodest_conn(ipvs)) { > > /* try to expire the connection immediately */ > > ip_vs_conn_expire_now(cp); > > } > > You can also look at the discussion which resulted in > the last patch for this place: > > http://archive.linuxvirtualserver.org/html/lvs-devel/2018-07/msg00014.html > > Regards > > -- > Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 2020-05-17 17:27 ` Andrew Kim @ 2020-05-17 17:30 ` Andrew Kim 0 siblings, 0 replies; 11+ messages in thread From: Andrew Kim @ 2020-05-17 17:30 UTC (permalink / raw) To: Julian Anastasov Cc: Wensong Zhang, Simon Horman, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, David S. Miller, Jakub Kicinski, open list:IPVS, open list:IPVS, open list:NETFILTER, open list:NETFILTER, open list I sent a new patch diff based on my ask above. Please take a look :) Thanks, Andrew Sy Kim On Sun, May 17, 2020 at 1:27 PM Andrew Kim <kim.andrewsy@gmail.com> wrote: > > Hi Julian, > > Thanks for getting back to me, that makes sense. > > Would you be opposed to trying to expire all UDP connections matching > a deleted destination only if expire_nodest_conn=1? > Even today with `expire_nodest_conn=1`, many packets could be dropped > if there are many requests from a single client > trying to reuse client ports matching a deleted destination. Setting > `expire_nodest_conn=1` and reducing the UDP timeout > helps but deleting all connections when the destination is deleted > seems more efficient. > > Looking forward to hearing your thoughts, > > Andrew Sy Kim > > > On Fri, May 15, 2020 at 2:07 PM Julian Anastasov <ja@ssi.bg> wrote: > > > > > > Hello, > > > > On Thu, 14 May 2020, Andrew Sy Kim wrote: > > > > > When expire_nodest_conn=1 and an IPVS destination is deleted, IPVS > > > doesn't expire connections with the IP_VS_CONN_F_ONE_PACKET flag set (any > > > UDP connection). If there are many UDP packets to a virtual server from a > > > single client and a destination is deleted, many packets are silently > > > dropped whenever an existing connection entry with the same source port > > > exists. This patch ensures IPVS also expires UDP connections when a > > > packet matches an existing connection with no destinations. > > > > > > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> > > > --- > > > net/netfilter/ipvs/ip_vs_core.c | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c > > > index aa6a603a2425..f0535586fe75 100644 > > > --- a/net/netfilter/ipvs/ip_vs_core.c > > > +++ b/net/netfilter/ipvs/ip_vs_core.c > > > @@ -2116,8 +2116,7 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int > > > else > > > ip_vs_conn_put(cp); > > > > Above ip_vs_conn_put() should free the ONE_PACKET > > connections because: > > > > - such connections never start timer, they are designed > > to exist just to schedule the packet, then they are released. > > - noone takes extra references > > > > So, ip_vs_conn_put() simply calls ip_vs_conn_expire() > > where connections should be released immediately. As result, > > we can not access cp after this point here. That is why we work > > just with 'flags' below... > > > > Note that not every UDP connection has ONE_PACKET > > flag, it is present if you configure it for the service. > > Do you have -o/--ops flag? If not, the UDP connection > > should expire before the next jiffie. This is the theory, > > in practice, you may observe some problem... > > > > > - if (sysctl_expire_nodest_conn(ipvs) && > > > - !(flags & IP_VS_CONN_F_ONE_PACKET)) { > > > + if (sysctl_expire_nodest_conn(ipvs)) { > > > /* try to expire the connection immediately */ > > > ip_vs_conn_expire_now(cp); > > > } > > > > You can also look at the discussion which resulted in > > the last patch for this place: > > > > http://archive.linuxvirtualserver.org/html/lvs-devel/2018-07/msg00014.html > > > > Regards > > > > -- > > Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 2020-05-15 1:35 [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 Andrew Sy Kim 2020-05-15 18:07 ` Julian Anastasov @ 2020-05-17 17:16 ` Andrew Sy Kim 2020-05-18 19:10 ` Julian Anastasov 1 sibling, 1 reply; 11+ messages in thread From: Andrew Sy Kim @ 2020-05-17 17:16 UTC (permalink / raw) Cc: kim.andrewsy, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Wensong Zhang, Simon Horman, Julian Anastasov, Jakub Kicinski, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, netdev, lvs-devel, linux-kernel, netfilter-devel, coreteam If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should also expire all matching connections immiediately instead of waiting for the next matching packet. This is particulary useful when there are a lot of packets coming from a few number of clients. Those clients are likely to match against existing entries if a source port in the connection hash is reused. When the number of entries in the connection tracker is large, we can significantly reduce the number of dropped packets by expiring all connections upon deletion. Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> --- include/net/ip_vs.h | 7 ++++++ net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++ net/netfilter/ipvs/ip_vs_core.c | 5 ----- net/netfilter/ipvs/ip_vs_ctl.c | 9 ++++++++ 4 files changed, 54 insertions(+), 5 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 83be2d93b407..deecf1344676 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -1049,6 +1049,11 @@ static inline int sysctl_conn_reuse_mode(struct netns_ipvs *ipvs) return ipvs->sysctl_conn_reuse_mode; } +static inline int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) +{ + return ipvs->sysctl_expire_nodest_conn; +} + static inline int sysctl_schedule_icmp(struct netns_ipvs *ipvs) { return ipvs->sysctl_schedule_icmp; @@ -1209,6 +1214,8 @@ struct ip_vs_conn * ip_vs_conn_out_get_proto(struct netns_ipvs *ipvs, int af, const struct sk_buff *skb, const struct ip_vs_iphdr *iph); +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest); + /* Get reference to gain full access to conn. * By default, RCU read-side critical sections have access only to * conn fields and its PE data, see ip_vs_conn_rcu_free() for reference. diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index 02f2f636798d..c69dfbbc3416 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs) goto flush_again; } } + +/* Flush all the connection entries in the ip_vs_conn_tab with a + * matching destination. + */ +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest) +{ + int idx; + struct ip_vs_conn *cp, *cp_c; + + rcu_read_lock(); + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { + if (cp->ipvs != ipvs) + continue; + + if (cp->dest != dest) + continue; + + /* As timers are expired in LIFO order, restart + * the timer of controlling connection first, so + * that it is expired after us. + */ + cp_c = cp->control; + /* cp->control is valid only with reference to cp */ + if (cp_c && __ip_vs_conn_get(cp)) { + IP_VS_DBG(4, "del controlling connection\n"); + ip_vs_conn_expire_now(cp_c); + __ip_vs_conn_put(cp); + } + IP_VS_DBG(4, "del connection\n"); + ip_vs_conn_expire_now(cp); + } + cond_resched_rcu(); + } + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(ip_vs_conn_flush_dest); + /* * per netns init and exit */ diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index aa6a603a2425..0139fa597d76 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -694,11 +694,6 @@ static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) return ipvs->sysctl_nat_icmp_send; } -static int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) -{ - return ipvs->sysctl_expire_nodest_conn; -} - #else static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 8d14a1acbc37..f87c03622874 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) */ __ip_vs_del_dest(svc->ipvs, dest, false); + /* If expire_nodest_conn is enabled and protocol is UDP, + * attempt best effort flush of all connections with this + * destination. + */ + if (sysctl_expire_nodest_conn(svc->ipvs) && + dest->protocol == IPPROTO_UDP) { + ip_vs_conn_flush_dest(svc->ipvs, dest); + } + LeaveFunction(2); return 0; -- 2.20.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 2020-05-17 17:16 ` [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 Andrew Sy Kim @ 2020-05-18 19:10 ` Julian Anastasov 2020-05-18 19:54 ` Andrew Kim 0 siblings, 1 reply; 11+ messages in thread From: Julian Anastasov @ 2020-05-18 19:10 UTC (permalink / raw) To: Andrew Sy Kim Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Wensong Zhang, Simon Horman, Jakub Kicinski, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, netdev, lvs-devel, linux-kernel, netfilter-devel, coreteam Hello, On Sun, 17 May 2020, Andrew Sy Kim wrote: > If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should > also expire all matching connections immiediately instead of waiting for > the next matching packet. This is particulary useful when there are a > lot of packets coming from a few number of clients. Those clients are > likely to match against existing entries if a source port in the > connection hash is reused. When the number of entries in the connection > tracker is large, we can significantly reduce the number of dropped > packets by expiring all connections upon deletion. > > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> > --- > include/net/ip_vs.h | 7 ++++++ > net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++ > net/netfilter/ipvs/ip_vs_core.c | 5 ----- > net/netfilter/ipvs/ip_vs_ctl.c | 9 ++++++++ > 4 files changed, 54 insertions(+), 5 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c > index 02f2f636798d..c69dfbbc3416 100644 > --- a/net/netfilter/ipvs/ip_vs_conn.c > +++ b/net/netfilter/ipvs/ip_vs_conn.c > @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs) > goto flush_again; > } > } > + > +/* Flush all the connection entries in the ip_vs_conn_tab with a > + * matching destination. > + */ > +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest) > +{ > + int idx; > + struct ip_vs_conn *cp, *cp_c; > + > + rcu_read_lock(); > + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { > + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { > + if (cp->ipvs != ipvs) > + continue; > + > + if (cp->dest != dest) > + continue; > + > + /* As timers are expired in LIFO order, restart > + * the timer of controlling connection first, so > + * that it is expired after us. > + */ > + cp_c = cp->control; > + /* cp->control is valid only with reference to cp */ > + if (cp_c && __ip_vs_conn_get(cp)) { > + IP_VS_DBG(4, "del controlling connection\n"); > + ip_vs_conn_expire_now(cp_c); > + __ip_vs_conn_put(cp); > + } > + IP_VS_DBG(4, "del connection\n"); > + ip_vs_conn_expire_now(cp); > + } > + cond_resched_rcu(); Such kind of loop is correct if done in another context: 1. kthread or 2. delayed work: mod_delayed_work(system_long_wq, ...) Otherwise cond_resched_rcu() can schedule() while holding __ip_vs_mutex. Also, it will add long delay if many dests are removed. If such loop analyzes instead all cp->dest for IP_VS_DEST_F_AVAILABLE, it should be done after calling __ip_vs_conn_get(). > static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > index 8d14a1acbc37..f87c03622874 100644 > --- a/net/netfilter/ipvs/ip_vs_ctl.c > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) > */ > __ip_vs_del_dest(svc->ipvs, dest, false); > > + /* If expire_nodest_conn is enabled and protocol is UDP, > + * attempt best effort flush of all connections with this > + * destination. > + */ > + if (sysctl_expire_nodest_conn(svc->ipvs) && > + dest->protocol == IPPROTO_UDP) { > + ip_vs_conn_flush_dest(svc->ipvs, dest); Above work should be scheduled from __ip_vs_del_dest(). Check for UDP is not needed, sysctl_expire_nodest_conn() is for all protocols. If the flushing is complex to implement, we can still allow rescheduling for unavailable dests: - first we should move this block above the ip_vs_try_to_schedule() block because: 1. the scheduling does not return unavailabel dests, even for persistence, so no need to check new connections for the flag 2. it will allow to create new connection if dest for existing connection is unavailable if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { /* the destination server is not available */ if (sysctl_expire_nodest_conn(ipvs)) { bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb); ip_vs_conn_expire_now(cp); __ip_vs_conn_put(cp); if (uses_ct) return NF_DROP; cp = NULL; } else { __ip_vs_conn_put(cp); return NF_DROP; } } if (unlikely(!cp)) { int v; if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) return v; } Before now, we always waited one jiffie connection to expire, now one packet will: - schedule expiration for existing connection with unavailable dest, as before - create new connection to available destination that will be found first in lists. But it can work only when sysctl var "conntrack" is 0, we do not want to create two netfilter conntracks to different real servers. Note that we intentionally removed the timer_pending() check because we can not see existing ONE_PACKET connections in table. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 2020-05-18 19:10 ` Julian Anastasov @ 2020-05-18 19:54 ` Andrew Kim 2020-05-19 11:46 ` Marco Angaroni 0 siblings, 1 reply; 11+ messages in thread From: Andrew Kim @ 2020-05-18 19:54 UTC (permalink / raw) To: Julian Anastasov Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Wensong Zhang, Simon Horman, Jakub Kicinski, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, open list:IPVS, open list:IPVS, open list, open list:NETFILTER, open list:NETFILTER Hi Julian, Thank you for getting back to me. I will update the patch based on your feedback shortly. Regards, Andrew On Mon, May 18, 2020 at 3:10 PM Julian Anastasov <ja@ssi.bg> wrote: > > > Hello, > > On Sun, 17 May 2020, Andrew Sy Kim wrote: > > > If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should > > also expire all matching connections immiediately instead of waiting for > > the next matching packet. This is particulary useful when there are a > > lot of packets coming from a few number of clients. Those clients are > > likely to match against existing entries if a source port in the > > connection hash is reused. When the number of entries in the connection > > tracker is large, we can significantly reduce the number of dropped > > packets by expiring all connections upon deletion. > > > > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> > > --- > > include/net/ip_vs.h | 7 ++++++ > > net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++ > > net/netfilter/ipvs/ip_vs_core.c | 5 ----- > > net/netfilter/ipvs/ip_vs_ctl.c | 9 ++++++++ > > 4 files changed, 54 insertions(+), 5 deletions(-) > > > > > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c > > index 02f2f636798d..c69dfbbc3416 100644 > > --- a/net/netfilter/ipvs/ip_vs_conn.c > > +++ b/net/netfilter/ipvs/ip_vs_conn.c > > @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs) > > goto flush_again; > > } > > } > > + > > +/* Flush all the connection entries in the ip_vs_conn_tab with a > > + * matching destination. > > + */ > > +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest) > > +{ > > + int idx; > > + struct ip_vs_conn *cp, *cp_c; > > + > > + rcu_read_lock(); > > + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { > > + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { > > + if (cp->ipvs != ipvs) > > + continue; > > + > > + if (cp->dest != dest) > > + continue; > > + > > + /* As timers are expired in LIFO order, restart > > + * the timer of controlling connection first, so > > + * that it is expired after us. > > + */ > > + cp_c = cp->control; > > + /* cp->control is valid only with reference to cp */ > > + if (cp_c && __ip_vs_conn_get(cp)) { > > + IP_VS_DBG(4, "del controlling connection\n"); > > + ip_vs_conn_expire_now(cp_c); > > + __ip_vs_conn_put(cp); > > + } > > + IP_VS_DBG(4, "del connection\n"); > > + ip_vs_conn_expire_now(cp); > > + } > > + cond_resched_rcu(); > > Such kind of loop is correct if done in another context: > > 1. kthread > or > 2. delayed work: mod_delayed_work(system_long_wq, ...) > > Otherwise cond_resched_rcu() can schedule() while holding > __ip_vs_mutex. Also, it will add long delay if many dests are > removed. > > If such loop analyzes instead all cp->dest for > IP_VS_DEST_F_AVAILABLE, it should be done after calling > __ip_vs_conn_get(). > > > static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } > > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > > index 8d14a1acbc37..f87c03622874 100644 > > --- a/net/netfilter/ipvs/ip_vs_ctl.c > > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > > @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) > > */ > > __ip_vs_del_dest(svc->ipvs, dest, false); > > > > + /* If expire_nodest_conn is enabled and protocol is UDP, > > + * attempt best effort flush of all connections with this > > + * destination. > > + */ > > + if (sysctl_expire_nodest_conn(svc->ipvs) && > > + dest->protocol == IPPROTO_UDP) { > > + ip_vs_conn_flush_dest(svc->ipvs, dest); > > Above work should be scheduled from __ip_vs_del_dest(). > Check for UDP is not needed, sysctl_expire_nodest_conn() is for > all protocols. > > If the flushing is complex to implement, we can still allow > rescheduling for unavailable dests: > > - first we should move this block above the ip_vs_try_to_schedule() > block because: > > 1. the scheduling does not return unavailabel dests, even > for persistence, so no need to check new connections for > the flag > > 2. it will allow to create new connection if dest for > existing connection is unavailable > > if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { > /* the destination server is not available */ > > if (sysctl_expire_nodest_conn(ipvs)) { > bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb); > > ip_vs_conn_expire_now(cp); > __ip_vs_conn_put(cp); > if (uses_ct) > return NF_DROP; > cp = NULL; > } else { > __ip_vs_conn_put(cp); > return NF_DROP; > } > } > > if (unlikely(!cp)) { > int v; > > if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) > return v; > } > > Before now, we always waited one jiffie connection to expire, > now one packet will: > > - schedule expiration for existing connection with unavailable dest, > as before > > - create new connection to available destination that will be found > first in lists. But it can work only when sysctl var "conntrack" is 0, > we do not want to create two netfilter conntracks to different > real servers. > > Note that we intentionally removed the timer_pending() check > because we can not see existing ONE_PACKET connections in table. > > Regards > > -- > Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 2020-05-18 19:54 ` Andrew Kim @ 2020-05-19 11:46 ` Marco Angaroni 2020-05-19 14:18 ` Andrew Kim 2020-05-19 19:46 ` Julian Anastasov 0 siblings, 2 replies; 11+ messages in thread From: Marco Angaroni @ 2020-05-19 11:46 UTC (permalink / raw) To: Andrew Kim Cc: Julian Anastasov, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Wensong Zhang, Simon Horman, Jakub Kicinski, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, open list:IPVS, open list:IPVS, open list, open list:NETFILTER, open list:NETFILTER Hi Andrew, Julian, could you please confirm if/how this patch is changing any of the following behaviours, which I’m listing below as per my understanding ? When expire_nodest is set and real-server is unavailable, at the moment the following happens to a packet going through IPVS: a) TCP (or other connection-oriented protocols): the packet is silently dropped, then the following retransmission causes the generation of a RST from the load-balancer to the client, which will then re-open a new TCP connection b) UDP: the packet is silently dropped, then the following retransmission is rescheduled to a new real-server c) UDP in OPS mode: the packet is rescheduled to a new real-server, as no previous connection exists in IPVS connection table, and a new OPS connection is created (but it lasts only the time to transmit the packet) d) UDP in OPS mode + persistent-template: the packet is rescheduled to a new real-server, as previous template-connection is invalidated, a new template-connection is created, and a new OPS connection is created (but it lasts only the time to transmit the packet) It seems to me that you are trying to optimize case a) and b), avoiding the first step where the packet is silently dropped and consequently avoiding the retransmission. And contextually expire also all the other connections pointing to the unavailable real-sever. However I'm confused about the references to OPS mode. And why you need to expire all the connections at once: if you expire on a per connection basis, the client experiences the same behaviour (no more re-transmissions), but you avoid the complexities of a new thread. Maybe also the documentation of expire_nodest_conn sysctl should be updated. When it's stated: If this feature is enabled, the load balancer will expire the connection immediately when a packet arrives and its destination server is not available, then the client program will be notified that the connection is closed I think it should be at least "and the client program" instead of "then the client program". Or a more detailed explanation. Thanks Marco Angaroni Il giorno lun 18 mag 2020 alle ore 22:06 Andrew Kim <kim.andrewsy@gmail.com> ha scritto: > > Hi Julian, > > Thank you for getting back to me. I will update the patch based on > your feedback shortly. > > Regards, > > Andrew > > On Mon, May 18, 2020 at 3:10 PM Julian Anastasov <ja@ssi.bg> wrote: > > > > > > Hello, > > > > On Sun, 17 May 2020, Andrew Sy Kim wrote: > > > > > If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should > > > also expire all matching connections immiediately instead of waiting for > > > the next matching packet. This is particulary useful when there are a > > > lot of packets coming from a few number of clients. Those clients are > > > likely to match against existing entries if a source port in the > > > connection hash is reused. When the number of entries in the connection > > > tracker is large, we can significantly reduce the number of dropped > > > packets by expiring all connections upon deletion. > > > > > > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> > > > --- > > > include/net/ip_vs.h | 7 ++++++ > > > net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++ > > > net/netfilter/ipvs/ip_vs_core.c | 5 ----- > > > net/netfilter/ipvs/ip_vs_ctl.c | 9 ++++++++ > > > 4 files changed, 54 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c > > > index 02f2f636798d..c69dfbbc3416 100644 > > > --- a/net/netfilter/ipvs/ip_vs_conn.c > > > +++ b/net/netfilter/ipvs/ip_vs_conn.c > > > @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs) > > > goto flush_again; > > > } > > > } > > > + > > > +/* Flush all the connection entries in the ip_vs_conn_tab with a > > > + * matching destination. > > > + */ > > > +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest) > > > +{ > > > + int idx; > > > + struct ip_vs_conn *cp, *cp_c; > > > + > > > + rcu_read_lock(); > > > + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { > > > + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { > > > + if (cp->ipvs != ipvs) > > > + continue; > > > + > > > + if (cp->dest != dest) > > > + continue; > > > + > > > + /* As timers are expired in LIFO order, restart > > > + * the timer of controlling connection first, so > > > + * that it is expired after us. > > > + */ > > > + cp_c = cp->control; > > > + /* cp->control is valid only with reference to cp */ > > > + if (cp_c && __ip_vs_conn_get(cp)) { > > > + IP_VS_DBG(4, "del controlling connection\n"); > > > + ip_vs_conn_expire_now(cp_c); > > > + __ip_vs_conn_put(cp); > > > + } > > > + IP_VS_DBG(4, "del connection\n"); > > > + ip_vs_conn_expire_now(cp); > > > + } > > > + cond_resched_rcu(); > > > > Such kind of loop is correct if done in another context: > > > > 1. kthread > > or > > 2. delayed work: mod_delayed_work(system_long_wq, ...) > > > > Otherwise cond_resched_rcu() can schedule() while holding > > __ip_vs_mutex. Also, it will add long delay if many dests are > > removed. > > > > If such loop analyzes instead all cp->dest for > > IP_VS_DEST_F_AVAILABLE, it should be done after calling > > __ip_vs_conn_get(). > > > > > static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } > > > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > > > index 8d14a1acbc37..f87c03622874 100644 > > > --- a/net/netfilter/ipvs/ip_vs_ctl.c > > > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > > > @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) > > > */ > > > __ip_vs_del_dest(svc->ipvs, dest, false); > > > > > > + /* If expire_nodest_conn is enabled and protocol is UDP, > > > + * attempt best effort flush of all connections with this > > > + * destination. > > > + */ > > > + if (sysctl_expire_nodest_conn(svc->ipvs) && > > > + dest->protocol == IPPROTO_UDP) { > > > + ip_vs_conn_flush_dest(svc->ipvs, dest); > > > > Above work should be scheduled from __ip_vs_del_dest(). > > Check for UDP is not needed, sysctl_expire_nodest_conn() is for > > all protocols. > > > > If the flushing is complex to implement, we can still allow > > rescheduling for unavailable dests: > > > > - first we should move this block above the ip_vs_try_to_schedule() > > block because: > > > > 1. the scheduling does not return unavailabel dests, even > > for persistence, so no need to check new connections for > > the flag > > > > 2. it will allow to create new connection if dest for > > existing connection is unavailable > > > > if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { > > /* the destination server is not available */ > > > > if (sysctl_expire_nodest_conn(ipvs)) { > > bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb); > > > > ip_vs_conn_expire_now(cp); > > __ip_vs_conn_put(cp); > > if (uses_ct) > > return NF_DROP; > > cp = NULL; > > } else { > > __ip_vs_conn_put(cp); > > return NF_DROP; > > } > > } > > > > if (unlikely(!cp)) { > > int v; > > > > if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) > > return v; > > } > > > > Before now, we always waited one jiffie connection to expire, > > now one packet will: > > > > - schedule expiration for existing connection with unavailable dest, > > as before > > > > - create new connection to available destination that will be found > > first in lists. But it can work only when sysctl var "conntrack" is 0, > > we do not want to create two netfilter conntracks to different > > real servers. > > > > Note that we intentionally removed the timer_pending() check > > because we can not see existing ONE_PACKET connections in table. > > > > Regards > > > > -- > > Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 2020-05-19 11:46 ` Marco Angaroni @ 2020-05-19 14:18 ` Andrew Kim 2020-05-19 19:46 ` Julian Anastasov 1 sibling, 0 replies; 11+ messages in thread From: Andrew Kim @ 2020-05-19 14:18 UTC (permalink / raw) To: Marco Angaroni Cc: Julian Anastasov, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Wensong Zhang, Simon Horman, Jakub Kicinski, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, open list:IPVS, open list:IPVS, open list, open list:NETFILTER, open list:NETFILTER Hi Marco, > could you please confirm if/how this patch is changing any of the? > following behaviours, which I’m listing below as per my understanding? This patch would optimize b) where the chances of packets being silently dropped is lower. > However I'm confused about the references to OPS mode. > And why you need to expire all the connections at once: if you expire > on a per connection basis, the client experiences the same behaviour > (no more re-transmissions), but you avoid the complexities of a new > thread. I agree for TCP the client would experience the same behavior. This is mostly problematic for UDP when there are many entries in the connection hash from the same client to 1 destination. If we only expire connections on receiving packets, there can be many dropped packets before all entries are expired (or we reached the UDP timeout). Expiring all UDP packets immediately would reduce the number of packets dropped significantly. Thanks, Andrew Sy Kim On Tue, May 19, 2020 at 7:46 AM Marco Angaroni <marcoangaroni@gmail.com> wrote: > > Hi Andrew, Julian, > > could you please confirm if/how this patch is changing any of the > following behaviours, which I’m listing below as per my understanding > ? > > When expire_nodest is set and real-server is unavailable, at the > moment the following happens to a packet going through IPVS: > > a) TCP (or other connection-oriented protocols): > the packet is silently dropped, then the following retransmission > causes the generation of a RST from the load-balancer to the client, > which will then re-open a new TCP connection > b) UDP: > the packet is silently dropped, then the following retransmission > is rescheduled to a new real-server > c) UDP in OPS mode: > the packet is rescheduled to a new real-server, as no previous > connection exists in IPVS connection table, and a new OPS connection > is created (but it lasts only the time to transmit the packet) > d) UDP in OPS mode + persistent-template: > the packet is rescheduled to a new real-server, as previous > template-connection is invalidated, a new template-connection is > created, and a new OPS connection is created (but it lasts only the > time to transmit the packet) > > It seems to me that you are trying to optimize case a) and b), > avoiding the first step where the packet is silently dropped and > consequently avoiding the retransmission. > And contextually expire also all the other connections pointing to the > unavailable real-sever. > > However I'm confused about the references to OPS mode. > And why you need to expire all the connections at once: if you expire > on a per connection basis, the client experiences the same behaviour > (no more re-transmissions), but you avoid the complexities of a new > thread. > > Maybe also the documentation of expire_nodest_conn sysctl should be updated. > When it's stated: > > If this feature is enabled, the load balancer will expire the > connection immediately when a packet arrives and its > destination server is not available, then the client program > will be notified that the connection is closed > > I think it should be at least "and the client program" instead of > "then the client program". > Or a more detailed explanation. > > Thanks > Marco Angaroni > > > Il giorno lun 18 mag 2020 alle ore 22:06 Andrew Kim > <kim.andrewsy@gmail.com> ha scritto: > > > > Hi Julian, > > > > Thank you for getting back to me. I will update the patch based on > > your feedback shortly. > > > > Regards, > > > > Andrew > > > > On Mon, May 18, 2020 at 3:10 PM Julian Anastasov <ja@ssi.bg> wrote: > > > > > > > > > Hello, > > > > > > On Sun, 17 May 2020, Andrew Sy Kim wrote: > > > > > > > If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should > > > > also expire all matching connections immiediately instead of waiting for > > > > the next matching packet. This is particulary useful when there are a > > > > lot of packets coming from a few number of clients. Those clients are > > > > likely to match against existing entries if a source port in the > > > > connection hash is reused. When the number of entries in the connection > > > > tracker is large, we can significantly reduce the number of dropped > > > > packets by expiring all connections upon deletion. > > > > > > > > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> > > > > --- > > > > include/net/ip_vs.h | 7 ++++++ > > > > net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++ > > > > net/netfilter/ipvs/ip_vs_core.c | 5 ----- > > > > net/netfilter/ipvs/ip_vs_ctl.c | 9 ++++++++ > > > > 4 files changed, 54 insertions(+), 5 deletions(-) > > > > > > > > > > > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c > > > > index 02f2f636798d..c69dfbbc3416 100644 > > > > --- a/net/netfilter/ipvs/ip_vs_conn.c > > > > +++ b/net/netfilter/ipvs/ip_vs_conn.c > > > > @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs) > > > > goto flush_again; > > > > } > > > > } > > > > + > > > > +/* Flush all the connection entries in the ip_vs_conn_tab with a > > > > + * matching destination. > > > > + */ > > > > +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest) > > > > +{ > > > > + int idx; > > > > + struct ip_vs_conn *cp, *cp_c; > > > > + > > > > + rcu_read_lock(); > > > > + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { > > > > + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { > > > > + if (cp->ipvs != ipvs) > > > > + continue; > > > > + > > > > + if (cp->dest != dest) > > > > + continue; > > > > + > > > > + /* As timers are expired in LIFO order, restart > > > > + * the timer of controlling connection first, so > > > > + * that it is expired after us. > > > > + */ > > > > + cp_c = cp->control; > > > > + /* cp->control is valid only with reference to cp */ > > > > + if (cp_c && __ip_vs_conn_get(cp)) { > > > > + IP_VS_DBG(4, "del controlling connection\n"); > > > > + ip_vs_conn_expire_now(cp_c); > > > > + __ip_vs_conn_put(cp); > > > > + } > > > > + IP_VS_DBG(4, "del connection\n"); > > > > + ip_vs_conn_expire_now(cp); > > > > + } > > > > + cond_resched_rcu(); > > > > > > Such kind of loop is correct if done in another context: > > > > > > 1. kthread > > > or > > > 2. delayed work: mod_delayed_work(system_long_wq, ...) > > > > > > Otherwise cond_resched_rcu() can schedule() while holding > > > __ip_vs_mutex. Also, it will add long delay if many dests are > > > removed. > > > > > > If such loop analyzes instead all cp->dest for > > > IP_VS_DEST_F_AVAILABLE, it should be done after calling > > > __ip_vs_conn_get(). > > > > > > > static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } > > > > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > > > > index 8d14a1acbc37..f87c03622874 100644 > > > > --- a/net/netfilter/ipvs/ip_vs_ctl.c > > > > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > > > > @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) > > > > */ > > > > __ip_vs_del_dest(svc->ipvs, dest, false); > > > > > > > > + /* If expire_nodest_conn is enabled and protocol is UDP, > > > > + * attempt best effort flush of all connections with this > > > > + * destination. > > > > + */ > > > > + if (sysctl_expire_nodest_conn(svc->ipvs) && > > > > + dest->protocol == IPPROTO_UDP) { > > > > + ip_vs_conn_flush_dest(svc->ipvs, dest); > > > > > > Above work should be scheduled from __ip_vs_del_dest(). > > > Check for UDP is not needed, sysctl_expire_nodest_conn() is for > > > all protocols. > > > > > > If the flushing is complex to implement, we can still allow > > > rescheduling for unavailable dests: > > > > > > - first we should move this block above the ip_vs_try_to_schedule() > > > block because: > > > > > > 1. the scheduling does not return unavailabel dests, even > > > for persistence, so no need to check new connections for > > > the flag > > > > > > 2. it will allow to create new connection if dest for > > > existing connection is unavailable > > > > > > if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { > > > /* the destination server is not available */ > > > > > > if (sysctl_expire_nodest_conn(ipvs)) { > > > bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb); > > > > > > ip_vs_conn_expire_now(cp); > > > __ip_vs_conn_put(cp); > > > if (uses_ct) > > > return NF_DROP; > > > cp = NULL; > > > } else { > > > __ip_vs_conn_put(cp); > > > return NF_DROP; > > > } > > > } > > > > > > if (unlikely(!cp)) { > > > int v; > > > > > > if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) > > > return v; > > > } > > > > > > Before now, we always waited one jiffie connection to expire, > > > now one packet will: > > > > > > - schedule expiration for existing connection with unavailable dest, > > > as before > > > > > > - create new connection to available destination that will be found > > > first in lists. But it can work only when sysctl var "conntrack" is 0, > > > we do not want to create two netfilter conntracks to different > > > real servers. > > > > > > Note that we intentionally removed the timer_pending() check > > > because we can not see existing ONE_PACKET connections in table. > > > > > > Regards > > > > > > -- > > > Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 2020-05-19 11:46 ` Marco Angaroni 2020-05-19 14:18 ` Andrew Kim @ 2020-05-19 19:46 ` Julian Anastasov 2020-05-24 21:31 ` [PATCH] netfilter/ipvs: immediately expire no destination connections in kthread " Andrew Sy Kim 1 sibling, 1 reply; 11+ messages in thread From: Julian Anastasov @ 2020-05-19 19:46 UTC (permalink / raw) To: Marco Angaroni Cc: Andrew Kim, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Wensong Zhang, Simon Horman, Jakub Kicinski, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, open list:IPVS, open list:IPVS, open list, open list:NETFILTER, open list:NETFILTER [-- Attachment #1: Type: text/plain, Size: 3860 bytes --] Hello, On Tue, 19 May 2020, Marco Angaroni wrote: > Hi Andrew, Julian, > > could you please confirm if/how this patch is changing any of the > following behaviours, which I’m listing below as per my understanding > ? > > When expire_nodest is set and real-server is unavailable, at the > moment the following happens to a packet going through IPVS: > > a) TCP (or other connection-oriented protocols): > the packet is silently dropped, then the following retransmission > causes the generation of a RST from the load-balancer to the client, > which will then re-open a new TCP connection Yes. It seems we can not create new connection in all cases, we should also check with is_new_conn(). What we have is that two cases are possible depending on conn_reuse_mode, the state of existing connection and whether netfilter conntrack is used: 1. setup expire for old conn, then drop packet 2. setup expire for old conn, then create new conn to schedule the packet When expiration is set, the timer will fire in the next jiffie to remove the connection from hash table. Until removed, the connection still can cause drops. Sometimes we can simply create new connection with the same tuple, so it is possible both connections to coexist for one jiffie but the old connection is not reached on lookup. > b) UDP: > the packet is silently dropped, then the following retransmission > is rescheduled to a new real-server Yes, we drop while old conn is not expired yet > c) UDP in OPS mode: > the packet is rescheduled to a new real-server, as no previous > connection exists in IPVS connection table, and a new OPS connection > is created (but it lasts only the time to transmit the packet) Yes, OPS is not affected. > d) UDP in OPS mode + persistent-template: > the packet is rescheduled to a new real-server, as previous > template-connection is invalidated, a new template-connection is > created, and a new OPS connection is created (but it lasts only the > time to transmit the packet) Yes, the existing template is ignored when its server is unavailable. > It seems to me that you are trying to optimize case a) and b), > avoiding the first step where the packet is silently dropped and > consequently avoiding the retransmission. > And contextually expire also all the other connections pointing to the > unavailable real-sever. The change will allow immediate scheduling in a new connection for any protocol when netfilter conntrack is not used: - TCP: avoids retransmission for SYN - UDP: reduces drops from 1 jiffie to 0 (no drops) But this single jiffie compared to the delay between real server failure and the removal from the IPVS table can be negligible. Of course, if real server is removed while it is working, with this change we should not see any UDP drops. > However I'm confused about the references to OPS mode. > And why you need to expire all the connections at once: if you expire > on a per connection basis, the client experiences the same behaviour > (no more re-transmissions), but you avoid the complexities of a new > thread. Such flushing can help when conntrack is used in which case the cost is a retransmission or downtime for one jiffie. > Maybe also the documentation of expire_nodest_conn sysctl should be updated. > When it's stated: > > If this feature is enabled, the load balancer will expire the > connection immediately when a packet arrives and its > destination server is not available, then the client program > will be notified that the connection is closed > > I think it should be at least "and the client program" instead of > "then the client program". > Or a more detailed explanation. Yes, if the packet is SYN we can create new connection. If it is ACK, the retransmission will get RST. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] netfilter/ipvs: immediately expire no destination connections in kthread if expire_nodest_conn=1 2020-05-19 19:46 ` Julian Anastasov @ 2020-05-24 21:31 ` Andrew Sy Kim 0 siblings, 0 replies; 11+ messages in thread From: Andrew Sy Kim @ 2020-05-24 21:31 UTC (permalink / raw) Cc: kim.andrewsy, Wensong Zhang, Simon Horman, Julian Anastasov, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Jakub Kicinski, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, netdev, lvs-devel, linux-kernel, netfilter-devel, coreteam If expire_nodest_conn=1 and a destination is deleted, IPVS should also expire all matching connections immiediately instead of waiting for the next matching packet. This is particulary useful when there are a lot of packets coming from a few number of clients. Those clients are likely to match against existing entries if a source port in the connection hash is reused. When the number of entries in the connection tracker is large, we can significantly reduce the number of dropped packets by expiring all connections upon deletion in a kthread. Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> --- include/net/ip_vs.h | 12 +++++++++ net/netfilter/ipvs/ip_vs_conn.c | 48 +++++++++++++++++++++++++++++++++ net/netfilter/ipvs/ip_vs_core.c | 45 +++++++++++++------------------ net/netfilter/ipvs/ip_vs_ctl.c | 16 +++++++++++ 4 files changed, 95 insertions(+), 26 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 83be2d93b407..d1fbead6965d 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -1049,6 +1049,11 @@ static inline int sysctl_conn_reuse_mode(struct netns_ipvs *ipvs) return ipvs->sysctl_conn_reuse_mode; } +static inline int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) +{ + return ipvs->sysctl_expire_nodest_conn; +} + static inline int sysctl_schedule_icmp(struct netns_ipvs *ipvs) { return ipvs->sysctl_schedule_icmp; @@ -1209,6 +1214,13 @@ struct ip_vs_conn * ip_vs_conn_out_get_proto(struct netns_ipvs *ipvs, int af, const struct sk_buff *skb, const struct ip_vs_iphdr *iph); + +int ip_vs_conn_flush_dest(void *data); +struct ip_vs_conn_flush_dest_tinfo { + struct netns_ipvs *ipvs; + struct ip_vs_dest *dest; +}; + /* Get reference to gain full access to conn. * By default, RCU read-side critical sections have access only to * conn fields and its PE data, see ip_vs_conn_rcu_free() for reference. diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index 02f2f636798d..111fa0e287a2 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1366,6 +1366,54 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs) goto flush_again; } } + +/* Thread function to flush all the connection entries in the + * ip_vs_conn_tab with a matching destination. + */ +int ip_vs_conn_flush_dest(void *data) +{ + struct ip_vs_conn_flush_dest_tinfo *tinfo = data; + struct netns_ipvs *ipvs = tinfo->ipvs; + struct ip_vs_dest *dest = tinfo->dest; + + int idx; + struct ip_vs_conn *cp, *cp_c; + + IP_VS_DBG_BUF(4, "flushing all connections with destination %s:%d", + IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port)); + + rcu_read_lock(); + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { + if (cp->ipvs != ipvs) + continue; + + if (cp->dest != dest) + continue; + + /* As timers are expired in LIFO order, restart + * the timer of controlling connection first, so + * that it is expired after us. + */ + cp_c = cp->control; + /* cp->control is valid only with reference to cp */ + if (cp_c && __ip_vs_conn_get(cp)) { + IP_VS_DBG(4, "del controlling connection\n"); + ip_vs_conn_expire_now(cp_c); + __ip_vs_conn_put(cp); + } + + IP_VS_DBG(4, "del connection\n"); + ip_vs_conn_expire_now(cp); + } + cond_resched_rcu(); + } + rcu_read_unlock(); + + return 0; +} +EXPORT_SYMBOL_GPL(ip_vs_conn_flush_dest); + /* * per netns init and exit */ diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index aa6a603a2425..ff052e57e054 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -24,6 +24,7 @@ #include <linux/module.h> #include <linux/kernel.h> +#include <linux/kthread.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/sctp.h> @@ -694,11 +695,6 @@ static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) return ipvs->sysctl_nat_icmp_send; } -static int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) -{ - return ipvs->sysctl_expire_nodest_conn; -} - #else static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } @@ -2095,36 +2091,33 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int } } - if (unlikely(!cp)) { - int v; - - if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) - return v; - } - - IP_VS_DBG_PKT(11, af, pp, skb, iph.off, "Incoming packet"); - /* Check the server status */ - if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { + if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { /* the destination server is not available */ - __u32 flags = cp->flags; - - /* when timer already started, silently drop the packet.*/ - if (timer_pending(&cp->timer)) - __ip_vs_conn_put(cp); - else - ip_vs_conn_put(cp); + if (sysctl_expire_nodest_conn(ipvs)) { + bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb); - if (sysctl_expire_nodest_conn(ipvs) && - !(flags & IP_VS_CONN_F_ONE_PACKET)) { - /* try to expire the connection immediately */ ip_vs_conn_expire_now(cp); + __ip_vs_conn_put(cp); + if (uses_ct) + return NF_DROP; + cp = NULL; + } else { + __ip_vs_conn_put(cp); + return NF_DROP; } + } - return NF_DROP; + if (unlikely(!cp)) { + int v; + + if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) + return v; } + IP_VS_DBG_PKT(11, af, pp, skb, iph.off, "Incoming packet"); + ip_vs_in_stats(cp, skb); ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd); if (cp->packet_xmit) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 8d14a1acbc37..fa48268368fc 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -1163,6 +1163,22 @@ static void __ip_vs_del_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest, list_add(&dest->t_list, &ipvs->dest_trash); dest->idle_start = 0; spin_unlock_bh(&ipvs->dest_trash_lock); + + /* If expire_nodest_conn is enabled, expire all connections + * immediately in a kthread. + */ + if (sysctl_expire_nodest_conn(ipvs)) { + struct ip_vs_conn_flush_dest_tinfo *tinfo = NULL; + + tinfo = kcalloc(1, sizeof(struct ip_vs_conn_flush_dest_tinfo), + GFP_KERNEL); + tinfo->ipvs = ipvs; + tinfo->dest = dest; + + IP_VS_DBG(3, "flushing connections in kthread\n"); + kthread_run(ip_vs_conn_flush_dest, + tinfo, "ipvs-flush-dest-conn"); + } } -- 2.20.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-05-24 21:32 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-15 1:35 [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 Andrew Sy Kim 2020-05-15 18:07 ` Julian Anastasov 2020-05-17 17:27 ` Andrew Kim 2020-05-17 17:30 ` Andrew Kim 2020-05-17 17:16 ` [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 Andrew Sy Kim 2020-05-18 19:10 ` Julian Anastasov 2020-05-18 19:54 ` Andrew Kim 2020-05-19 11:46 ` Marco Angaroni 2020-05-19 14:18 ` Andrew Kim 2020-05-19 19:46 ` Julian Anastasov 2020-05-24 21:31 ` [PATCH] netfilter/ipvs: immediately expire no destination connections in kthread " Andrew Sy Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).