From: Pablo Neira Ayuso <pablo@netfilter.org>
To: netfilter-devel@vger.kernel.org
Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org
Subject: [PATCH 05/12] ipvs: allow connection reuse for unconfirmed conntrack
Date: Wed, 8 Jul 2020 19:46:02 +0200 [thread overview]
Message-ID: <20200708174609.1343-6-pablo@netfilter.org> (raw)
In-Reply-To: <20200708174609.1343-1-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.
As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.
YangYuxi lists some of the problem reports:
- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2
- IPVS low throughput #70747
https://github.com/kubernetes/kubernetes/issues/70747
- Apache Bench can fill up ipvs service proxy in seconds #544
https://github.com/cloudnativelabs/kube-router/issues/544
- Additional 1s latency in `host -> service IP -> pod`
https://github.com/kubernetes/kubernetes/issues/90854
Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <yx.atom1@gmail.com>
Signed-off-by: YangYuxi <yx.atom1@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/ip_vs.h | 10 ++++------
net/netfilter/ipvs/ip_vs_core.c | 12 +++++++-----
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 0c9881241323..011f407b76fe 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1626,18 +1626,16 @@ static inline void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp)
}
#endif /* CONFIG_IP_VS_NFCT */
-/* Really using conntrack? */
-static inline bool ip_vs_conn_uses_conntrack(struct ip_vs_conn *cp,
- struct sk_buff *skb)
+/* Using old conntrack that can not be redirected to another real server? */
+static inline bool ip_vs_conn_uses_old_conntrack(struct ip_vs_conn *cp,
+ struct sk_buff *skb)
{
#ifdef CONFIG_IP_VS_NFCT
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
- if (!(cp->flags & IP_VS_CONN_F_NFCT))
- return false;
ct = nf_ct_get(skb, &ctinfo);
- if (ct)
+ if (ct && nf_ct_is_confirmed(ct))
return true;
#endif
return false;
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index ca3670152565..b4a6b7662f3f 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2066,14 +2066,14 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
- bool uses_ct = false, resched = false;
+ bool old_ct = false, resched = false;
if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
unlikely(!atomic_read(&cp->dest->weight))) {
resched = true;
- uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+ old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
} else if (is_new_conn_expected(cp, conn_reuse_mode)) {
- uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+ old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
if (!atomic_read(&cp->n_control)) {
resched = true;
} else {
@@ -2081,15 +2081,17 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
* that uses conntrack while it is still
* referenced by controlled connection(s).
*/
- resched = !uses_ct;
+ resched = !old_ct;
}
}
if (resched) {
+ if (!old_ct)
+ cp->flags &= ~IP_VS_CONN_F_NFCT;
if (!atomic_read(&cp->n_control))
ip_vs_conn_expire_now(cp);
__ip_vs_conn_put(cp);
- if (uses_ct)
+ if (old_ct)
return NF_DROP;
cp = NULL;
}
--
2.20.1
next prev parent reply other threads:[~2020-07-08 17:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
2020-07-08 17:45 ` [PATCH 01/12] netfilter: introduce support for reject at prerouting stage Pablo Neira Ayuso
2020-07-08 17:45 ` [PATCH 02/12] netfilter: nft_set_pipapo: Drop useless assignment of scratch map index on insert Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 03/12] ipvs: register hooks only with services Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 04/12] ipvs: avoid expiring many connections from timer Pablo Neira Ayuso
2020-07-08 17:46 ` Pablo Neira Ayuso [this message]
2020-07-08 17:46 ` [PATCH 06/12] netfilter: nf_tables: add NFTA_CHAIN_ID attribute Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 07/12] netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 08/12] netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 09/12] netfilter: nf_tables: expose enum nft_chain_flags through UAPI Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 10/12] netfilter: nf_tables: add nft_chain_add() Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 11/12] netfilter: nf_tables: add NFT_CHAIN_BINDING Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 12/12] netfilter: nf_tables: reject unsupported chain flags Pablo Neira Ayuso
2020-07-08 19:42 ` [PATCH 00/12] Netfilter/IPVS updates for net-next David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200708174609.1343-6-pablo@netfilter.org \
--to=pablo@netfilter.org \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).