From: Julian Anastasov <ja@ssi.bg>
To: yangxingwu <xingwu.yang@gmail.com>
Cc: Simon Horman <horms@verge.net.au>,
pablo@netfilter.org, kadlec@netfilter.org, fw@strlen.de,
"David S. Miller" <davem@davemloft.net>,
kuba@kernel.org, netdev@vger.kernel.org,
lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org,
coreteam@netfilter.org,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-doc@vger.kernel.org, corbet@lwn.net
Subject: Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
Date: Mon, 25 Oct 2021 21:12:33 +0300 (EEST) [thread overview]
Message-ID: <707b5fb3-6b61-c53-e983-bc1373aa2bf@ssi.bg> (raw)
In-Reply-To: <20211025115910.2595-1-xingwu.yang@gmail.com>
Hello,
On Mon, 25 Oct 2021, yangxingwu wrote:
> Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is
> dead"), new connections to dead servers are redistributed immediately to
> new servers.
>
> Then commit d752c3645717 ("ipvs: allow rescheduling of new connections when
> port reuse is detected") disable expire_nodest_conn if conn_reuse_mode is
> 0. And new connection may be distributed to a real server with weight 0.
Your change does not look correct to me. At the time
expire_nodest_conn was created, it was not checked when
weight is 0. At different places different terms are used
but in short, we have two independent states for real server:
- inhibited: weight=0 and no new connections should be served,
packets for existing connections can be routed to server
if it is still available and packets are not dropped
by expire_nodest_conn.
The new feature is that port reuse detection can
redirect the new TCP connection into a new IPVS conn and
to expire the existing cp/ct.
- unavailable (!IP_VS_DEST_F_AVAILABLE): server is removed,
can be temporary, drop traffic for existing connections
but on expire_nodest_conn we can select different server
The new conn_reuse_mode flag allows port reuse to
be detected. Only then expire_nodest_conn has the
opportunity with commit dc7b3eb900aa to check weight=0
and to consider the old traffic as finished. If a new
server is selected, any retrans from previous connection
would be considered as part from the new connection. It
is a rapid way to switch server without checking with
is_new_conn_expected() because we can not have many
conns/conntracks to different servers.
> Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
> ---
> Documentation/networking/ipvs-sysctl.rst | 3 +--
> net/netfilter/ipvs/ip_vs_core.c | 5 +++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/ipvs-sysctl.rst b/Documentation/networking/ipvs-sysctl.rst
> index 2afccc63856e..1cfbf1add2fc 100644
> --- a/Documentation/networking/ipvs-sysctl.rst
> +++ b/Documentation/networking/ipvs-sysctl.rst
> @@ -37,8 +37,7 @@ conn_reuse_mode - INTEGER
>
> 0: disable any special handling on port reuse. The new
> connection will be delivered to the same real server that was
> - servicing the previous connection. This will effectively
> - disable expire_nodest_conn.
> + servicing the previous connection.
>
> bit 1: enable rescheduling of new connections when it is safe.
> That is, whenever expire_nodest_conn and for TCP sockets, when
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 128690c512df..9279aed69e23 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -2042,14 +2042,15 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
> ipvs, af, skb, &iph);
>
> conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
> - if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
> + if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) {
> bool old_ct = false, resched = false;
>
> if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
> unlikely(!atomic_read(&cp->dest->weight))) {
> resched = true;
> old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
> - } else if (is_new_conn_expected(cp, conn_reuse_mode)) {
> + } else if (conn_reuse_mode &&
> + is_new_conn_expected(cp, conn_reuse_mode)) {
> old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
> if (!atomic_read(&cp->n_control)) {
> resched = true;
> --
> 2.30.2
Regards
--
Julian Anastasov <ja@ssi.bg>
next prev parent reply other threads:[~2021-10-25 18:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-25 11:59 [PATCH] ipvs: Fix reuse connection if RS weight is 0 yangxingwu
2021-10-25 18:12 ` Julian Anastasov [this message]
2021-10-26 2:54 ` yangxingwu
2021-10-26 5:44 ` Julian Anastasov
2021-10-26 6:13 ` yangxingwu
2021-10-27 1:43 ` yangxingwu
2021-10-27 21:09 ` Julian Anastasov
2021-10-28 2:50 ` yangxingwu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=707b5fb3-6b61-c53-e983-bc1373aa2bf@ssi.bg \
--to=ja@ssi.bg \
--cc=corbet@lwn.net \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=horms@verge.net.au \
--cc=kadlec@netfilter.org \
--cc=kuba@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lvs-devel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=xingwu.yang@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).