linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ipvs: Fix reuse connection if RS weight is 0
@ 2021-10-25 11:59 yangxingwu
  2021-10-25 18:12 ` Julian Anastasov
  0 siblings, 1 reply; 8+ messages in thread
From: yangxingwu @ 2021-10-25 11:59 UTC (permalink / raw)
  To: horms
  Cc: ja, pablo, kadlec, fw, davem, kuba, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, linux-doc, corbet,
	xingwu.yang

Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is
dead"), new connections to dead servers are redistributed immediately to
new servers.

Then commit d752c3645717 ("ipvs: allow rescheduling of new connections when
port reuse is detected") disable expire_nodest_conn if conn_reuse_mode is
0. And new connection may be distributed to a real server with weight 0.

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
---
 Documentation/networking/ipvs-sysctl.rst | 3 +--
 net/netfilter/ipvs/ip_vs_core.c          | 5 +++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ipvs-sysctl.rst b/Documentation/networking/ipvs-sysctl.rst
index 2afccc63856e..1cfbf1add2fc 100644
--- a/Documentation/networking/ipvs-sysctl.rst
+++ b/Documentation/networking/ipvs-sysctl.rst
@@ -37,8 +37,7 @@ conn_reuse_mode - INTEGER
 
 	0: disable any special handling on port reuse. The new
 	connection will be delivered to the same real server that was
-	servicing the previous connection. This will effectively
-	disable expire_nodest_conn.
+	servicing the previous connection.
 
 	bit 1: enable rescheduling of new connections when it is safe.
 	That is, whenever expire_nodest_conn and for TCP sockets, when
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 128690c512df..9279aed69e23 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2042,14 +2042,15 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
 			     ipvs, af, skb, &iph);
 
 	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
-	if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
+	if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) {
 		bool old_ct = false, resched = false;
 
 		if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
 		    unlikely(!atomic_read(&cp->dest->weight))) {
 			resched = true;
 			old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
-		} else if (is_new_conn_expected(cp, conn_reuse_mode)) {
+		} else if (conn_reuse_mode &&
+			   is_new_conn_expected(cp, conn_reuse_mode)) {
 			old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
 			if (!atomic_read(&cp->n_control)) {
 				resched = true;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
  2021-10-25 11:59 [PATCH] ipvs: Fix reuse connection if RS weight is 0 yangxingwu
@ 2021-10-25 18:12 ` Julian Anastasov
  2021-10-26  2:54   ` yangxingwu
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2021-10-25 18:12 UTC (permalink / raw)
  To: yangxingwu
  Cc: Simon Horman, pablo, kadlec, fw, David S. Miller, kuba, netdev,
	lvs-devel, netfilter-devel, coreteam, linux-kernel, linux-doc,
	corbet


	Hello,

On Mon, 25 Oct 2021, yangxingwu wrote:

> Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is
> dead"), new connections to dead servers are redistributed immediately to
> new servers.
> 
> Then commit d752c3645717 ("ipvs: allow rescheduling of new connections when
> port reuse is detected") disable expire_nodest_conn if conn_reuse_mode is
> 0. And new connection may be distributed to a real server with weight 0.

	Your change does not look correct to me. At the time
expire_nodest_conn was created, it was not checked when
weight is 0. At different places different terms are used
but in short, we have two independent states for real server:

- inhibited: weight=0 and no new connections should be served,
	packets for existing connections can be routed to server
	if it is still available and packets are not dropped
	by expire_nodest_conn.
	The new feature is that port reuse detection can
	redirect the new TCP connection into a new IPVS conn and
	to expire the existing cp/ct.

- unavailable (!IP_VS_DEST_F_AVAILABLE): server is removed,
	can be temporary, drop traffic for existing connections
	but on expire_nodest_conn we can select different server

	The new conn_reuse_mode flag allows port reuse to
be detected. Only then expire_nodest_conn has the
opportunity with commit dc7b3eb900aa to check weight=0
and to consider the old traffic as finished. If a new
server is selected, any retrans from previous connection
would be considered as part from the new connection. It
is a rapid way to switch server without checking with
is_new_conn_expected() because we can not have many
conns/conntracks to different servers.

> Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
> ---
>  Documentation/networking/ipvs-sysctl.rst | 3 +--
>  net/netfilter/ipvs/ip_vs_core.c          | 5 +++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/networking/ipvs-sysctl.rst b/Documentation/networking/ipvs-sysctl.rst
> index 2afccc63856e..1cfbf1add2fc 100644
> --- a/Documentation/networking/ipvs-sysctl.rst
> +++ b/Documentation/networking/ipvs-sysctl.rst
> @@ -37,8 +37,7 @@ conn_reuse_mode - INTEGER
>  
>  	0: disable any special handling on port reuse. The new
>  	connection will be delivered to the same real server that was
> -	servicing the previous connection. This will effectively
> -	disable expire_nodest_conn.
> +	servicing the previous connection.
>  
>  	bit 1: enable rescheduling of new connections when it is safe.
>  	That is, whenever expire_nodest_conn and for TCP sockets, when
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 128690c512df..9279aed69e23 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -2042,14 +2042,15 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
>  			     ipvs, af, skb, &iph);
>  
>  	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
> -	if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
> +	if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) {
>  		bool old_ct = false, resched = false;
>  
>  		if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
>  		    unlikely(!atomic_read(&cp->dest->weight))) {
>  			resched = true;
>  			old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
> -		} else if (is_new_conn_expected(cp, conn_reuse_mode)) {
> +		} else if (conn_reuse_mode &&
> +			   is_new_conn_expected(cp, conn_reuse_mode)) {
>  			old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
>  			if (!atomic_read(&cp->n_control)) {
>  				resched = true;
> -- 
> 2.30.2

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
  2021-10-25 18:12 ` Julian Anastasov
@ 2021-10-26  2:54   ` yangxingwu
  2021-10-26  5:44     ` Julian Anastasov
  0 siblings, 1 reply; 8+ messages in thread
From: yangxingwu @ 2021-10-26  2:54 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Simon Horman, Pablo Neira Ayuso, kadlec, fw, David S. Miller,
	kuba, netdev, lvs-devel, netfilter-devel, coreteam, linux-kernel,
	linux-doc, corbet

thanks julian

What happens in this situation is that if we set the wait of the
realserver to 0 and do NOT remove the weight zero realserver with
sysctl settings (conn_reuse_mode == 0 && expire_nodest_conn == 1), and
the client reuses its source ports, the kernel will constantly
reuse connections and send the traffic to the weight 0 realserver.

you may check the details from
https://github.com/kubernetes/kubernetes/issues/81775

On Tue, Oct 26, 2021 at 2:12 AM Julian Anastasov <ja@ssi.bg> wrote:
>
>
>         Hello,
>
> On Mon, 25 Oct 2021, yangxingwu wrote:
>
> > Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is
> > dead"), new connections to dead servers are redistributed immediately to
> > new servers.
> >
> > Then commit d752c3645717 ("ipvs: allow rescheduling of new connections when
> > port reuse is detected") disable expire_nodest_conn if conn_reuse_mode is
> > 0. And new connection may be distributed to a real server with weight 0.
>
>         Your change does not look correct to me. At the time
> expire_nodest_conn was created, it was not checked when
> weight is 0. At different places different terms are used
> but in short, we have two independent states for real server:
>
> - inhibited: weight=0 and no new connections should be served,
>         packets for existing connections can be routed to server
>         if it is still available and packets are not dropped
>         by expire_nodest_conn.
>         The new feature is that port reuse detection can
>         redirect the new TCP connection into a new IPVS conn and
>         to expire the existing cp/ct.
>
> - unavailable (!IP_VS_DEST_F_AVAILABLE): server is removed,
>         can be temporary, drop traffic for existing connections
>         but on expire_nodest_conn we can select different server
>
>         The new conn_reuse_mode flag allows port reuse to
> be detected. Only then expire_nodest_conn has the
> opportunity with commit dc7b3eb900aa to check weight=0
> and to consider the old traffic as finished. If a new
> server is selected, any retrans from previous connection
> would be considered as part from the new connection. It
> is a rapid way to switch server without checking with
> is_new_conn_expected() because we can not have many
> conns/conntracks to different servers.
>
> > Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
> > ---
> >  Documentation/networking/ipvs-sysctl.rst | 3 +--
> >  net/netfilter/ipvs/ip_vs_core.c          | 5 +++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/networking/ipvs-sysctl.rst b/Documentation/networking/ipvs-sysctl.rst
> > index 2afccc63856e..1cfbf1add2fc 100644
> > --- a/Documentation/networking/ipvs-sysctl.rst
> > +++ b/Documentation/networking/ipvs-sysctl.rst
> > @@ -37,8 +37,7 @@ conn_reuse_mode - INTEGER
> >
> >       0: disable any special handling on port reuse. The new
> >       connection will be delivered to the same real server that was
> > -     servicing the previous connection. This will effectively
> > -     disable expire_nodest_conn.
> > +     servicing the previous connection.
> >
> >       bit 1: enable rescheduling of new connections when it is safe.
> >       That is, whenever expire_nodest_conn and for TCP sockets, when
> > diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> > index 128690c512df..9279aed69e23 100644
> > --- a/net/netfilter/ipvs/ip_vs_core.c
> > +++ b/net/netfilter/ipvs/ip_vs_core.c
> > @@ -2042,14 +2042,15 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
> >                            ipvs, af, skb, &iph);
> >
> >       conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
> > -     if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
> > +     if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) {
> >               bool old_ct = false, resched = false;
> >
> >               if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
> >                   unlikely(!atomic_read(&cp->dest->weight))) {
> >                       resched = true;
> >                       old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
> > -             } else if (is_new_conn_expected(cp, conn_reuse_mode)) {
> > +             } else if (conn_reuse_mode &&
> > +                        is_new_conn_expected(cp, conn_reuse_mode)) {
> >                       old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
> >                       if (!atomic_read(&cp->n_control)) {
> >                               resched = true;
> > --
> > 2.30.2
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
  2021-10-26  2:54   ` yangxingwu
@ 2021-10-26  5:44     ` Julian Anastasov
  2021-10-26  6:13       ` yangxingwu
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2021-10-26  5:44 UTC (permalink / raw)
  To: yangxingwu
  Cc: Simon Horman, Pablo Neira Ayuso, kadlec, fw, David S. Miller,
	kuba, netdev, lvs-devel, netfilter-devel, coreteam, linux-kernel,
	linux-doc, corbet


	Hello,

On Tue, 26 Oct 2021, yangxingwu wrote:

> thanks julian
> 
> What happens in this situation is that if we set the wait of the
> realserver to 0 and do NOT remove the weight zero realserver with
> sysctl settings (conn_reuse_mode == 0 && expire_nodest_conn == 1), and
> the client reuses its source ports, the kernel will constantly
> reuse connections and send the traffic to the weight 0 realserver.

	Yes, this is expected when conn_reuse_mode=0.

> you may check the details from
> https://github.com/kubernetes/kubernetes/issues/81775

	What happens if you try conn_reuse_mode=1? The
one-second delay in previous kernels should be corrected with

commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f
Date:   Wed Jul 1 18:17:19 2020 +0300

    ipvs: allow connection reuse for unconfirmed conntrack

> On Tue, Oct 26, 2021 at 2:12 AM Julian Anastasov <ja@ssi.bg> wrote:
> >
> > On Mon, 25 Oct 2021, yangxingwu wrote:
> >
> > > Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is
> > > dead"), new connections to dead servers are redistributed immediately to
> > > new servers.
> > >
> > > Then commit d752c3645717 ("ipvs: allow rescheduling of new connections when
> > > port reuse is detected") disable expire_nodest_conn if conn_reuse_mode is
> > > 0. And new connection may be distributed to a real server with weight 0.
> >
> >         Your change does not look correct to me. At the time
> > expire_nodest_conn was created, it was not checked when
> > weight is 0. At different places different terms are used
> > but in short, we have two independent states for real server:
> >
> > - inhibited: weight=0 and no new connections should be served,
> >         packets for existing connections can be routed to server
> >         if it is still available and packets are not dropped
> >         by expire_nodest_conn.
> >         The new feature is that port reuse detection can
> >         redirect the new TCP connection into a new IPVS conn and
> >         to expire the existing cp/ct.
> >
> > - unavailable (!IP_VS_DEST_F_AVAILABLE): server is removed,
> >         can be temporary, drop traffic for existing connections
> >         but on expire_nodest_conn we can select different server
> >
> >         The new conn_reuse_mode flag allows port reuse to
> > be detected. Only then expire_nodest_conn has the
> > opportunity with commit dc7b3eb900aa to check weight=0
> > and to consider the old traffic as finished. If a new
> > server is selected, any retrans from previous connection
> > would be considered as part from the new connection. It
> > is a rapid way to switch server without checking with
> > is_new_conn_expected() because we can not have many
> > conns/conntracks to different servers.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
  2021-10-26  5:44     ` Julian Anastasov
@ 2021-10-26  6:13       ` yangxingwu
  2021-10-27  1:43         ` yangxingwu
  0 siblings, 1 reply; 8+ messages in thread
From: yangxingwu @ 2021-10-26  6:13 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Simon Horman, Pablo Neira Ayuso, kadlec, fw, David S. Miller,
	kuba, netdev, lvs-devel, netfilter-devel, coreteam, linux-kernel,
	linux-doc, corbet

thanks Julian

yes, I know that the one-second delay issue has been fixed by commit
f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f if we set conn_reuse_mode to
1

BUT  it's still NOT what we expected with sysctl settings
(conn_reuse_mode == 0 && expire_nodest_conn == 1).

We run kubernetes in extremely diverse environments and this issue
happens a lot.

On Tue, Oct 26, 2021 at 1:44 PM Julian Anastasov <ja@ssi.bg> wrote:
>
>
>         Hello,
>
> On Tue, 26 Oct 2021, yangxingwu wrote:
>
> > thanks julian
> >
> > What happens in this situation is that if we set the wait of the
> > realserver to 0 and do NOT remove the weight zero realserver with
> > sysctl settings (conn_reuse_mode == 0 && expire_nodest_conn == 1), and
> > the client reuses its source ports, the kernel will constantly
> > reuse connections and send the traffic to the weight 0 realserver.
>
>         Yes, this is expected when conn_reuse_mode=0.
>
> > you may check the details from
> > https://github.com/kubernetes/kubernetes/issues/81775
>
>         What happens if you try conn_reuse_mode=1? The
> one-second delay in previous kernels should be corrected with
>
> commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f
> Date:   Wed Jul 1 18:17:19 2020 +0300
>
>     ipvs: allow connection reuse for unconfirmed conntrack
>
> > On Tue, Oct 26, 2021 at 2:12 AM Julian Anastasov <ja@ssi.bg> wrote:
> > >
> > > On Mon, 25 Oct 2021, yangxingwu wrote:
> > >
> > > > Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is
> > > > dead"), new connections to dead servers are redistributed immediately to
> > > > new servers.
> > > >
> > > > Then commit d752c3645717 ("ipvs: allow rescheduling of new connections when
> > > > port reuse is detected") disable expire_nodest_conn if conn_reuse_mode is
> > > > 0. And new connection may be distributed to a real server with weight 0.
> > >
> > >         Your change does not look correct to me. At the time
> > > expire_nodest_conn was created, it was not checked when
> > > weight is 0. At different places different terms are used
> > > but in short, we have two independent states for real server:
> > >
> > > - inhibited: weight=0 and no new connections should be served,
> > >         packets for existing connections can be routed to server
> > >         if it is still available and packets are not dropped
> > >         by expire_nodest_conn.
> > >         The new feature is that port reuse detection can
> > >         redirect the new TCP connection into a new IPVS conn and
> > >         to expire the existing cp/ct.
> > >
> > > - unavailable (!IP_VS_DEST_F_AVAILABLE): server is removed,
> > >         can be temporary, drop traffic for existing connections
> > >         but on expire_nodest_conn we can select different server
> > >
> > >         The new conn_reuse_mode flag allows port reuse to
> > > be detected. Only then expire_nodest_conn has the
> > > opportunity with commit dc7b3eb900aa to check weight=0
> > > and to consider the old traffic as finished. If a new
> > > server is selected, any retrans from previous connection
> > > would be considered as part from the new connection. It
> > > is a rapid way to switch server without checking with
> > > is_new_conn_expected() because we can not have many
> > > conns/conntracks to different servers.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
  2021-10-26  6:13       ` yangxingwu
@ 2021-10-27  1:43         ` yangxingwu
  2021-10-27 21:09           ` Julian Anastasov
  0 siblings, 1 reply; 8+ messages in thread
From: yangxingwu @ 2021-10-27  1:43 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Simon Horman, Pablo Neira Ayuso, kadlec, fw, David S. Miller,
	kuba, netdev, lvs-devel, netfilter-devel, coreteam, linux-kernel,
	linux-doc, corbet

Julian

what we want is if RS weight is 0, then no new connections should be
served even if conn_reuse_mode is 0, just as commit dc7b3eb900aa
("ipvs: Fix reuse connection if real server is
dead") trying to do

Pls let me know if there are any other issues of concern

On Tue, Oct 26, 2021 at 2:13 PM yangxingwu <xingwu.yang@gmail.com> wrote:
>
> thanks Julian
>
> yes, I know that the one-second delay issue has been fixed by commit
> f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f if we set conn_reuse_mode to
> 1
>
> BUT  it's still NOT what we expected with sysctl settings
> (conn_reuse_mode == 0 && expire_nodest_conn == 1).
>
> We run kubernetes in extremely diverse environments and this issue
> happens a lot.
>
> On Tue, Oct 26, 2021 at 1:44 PM Julian Anastasov <ja@ssi.bg> wrote:
> >
> >
> >         Hello,
> >
> > On Tue, 26 Oct 2021, yangxingwu wrote:
> >
> > > thanks julian
> > >
> > > What happens in this situation is that if we set the wait of the
> > > realserver to 0 and do NOT remove the weight zero realserver with
> > > sysctl settings (conn_reuse_mode == 0 && expire_nodest_conn == 1), and
> > > the client reuses its source ports, the kernel will constantly
> > > reuse connections and send the traffic to the weight 0 realserver.
> >
> >         Yes, this is expected when conn_reuse_mode=0.
> >
> > > you may check the details from
> > > https://github.com/kubernetes/kubernetes/issues/81775
> >
> >         What happens if you try conn_reuse_mode=1? The
> > one-second delay in previous kernels should be corrected with
> >
> > commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f
> > Date:   Wed Jul 1 18:17:19 2020 +0300
> >
> >     ipvs: allow connection reuse for unconfirmed conntrack
> >
> > > On Tue, Oct 26, 2021 at 2:12 AM Julian Anastasov <ja@ssi.bg> wrote:
> > > >
> > > > On Mon, 25 Oct 2021, yangxingwu wrote:
> > > >
> > > > > Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is
> > > > > dead"), new connections to dead servers are redistributed immediately to
> > > > > new servers.
> > > > >
> > > > > Then commit d752c3645717 ("ipvs: allow rescheduling of new connections when
> > > > > port reuse is detected") disable expire_nodest_conn if conn_reuse_mode is
> > > > > 0. And new connection may be distributed to a real server with weight 0.
> > > >
> > > >         Your change does not look correct to me. At the time
> > > > expire_nodest_conn was created, it was not checked when
> > > > weight is 0. At different places different terms are used
> > > > but in short, we have two independent states for real server:
> > > >
> > > > - inhibited: weight=0 and no new connections should be served,
> > > >         packets for existing connections can be routed to server
> > > >         if it is still available and packets are not dropped
> > > >         by expire_nodest_conn.
> > > >         The new feature is that port reuse detection can
> > > >         redirect the new TCP connection into a new IPVS conn and
> > > >         to expire the existing cp/ct.
> > > >
> > > > - unavailable (!IP_VS_DEST_F_AVAILABLE): server is removed,
> > > >         can be temporary, drop traffic for existing connections
> > > >         but on expire_nodest_conn we can select different server
> > > >
> > > >         The new conn_reuse_mode flag allows port reuse to
> > > > be detected. Only then expire_nodest_conn has the
> > > > opportunity with commit dc7b3eb900aa to check weight=0
> > > > and to consider the old traffic as finished. If a new
> > > > server is selected, any retrans from previous connection
> > > > would be considered as part from the new connection. It
> > > > is a rapid way to switch server without checking with
> > > > is_new_conn_expected() because we can not have many
> > > > conns/conntracks to different servers.
> >
> > Regards
> >
> > --
> > Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
  2021-10-27  1:43         ` yangxingwu
@ 2021-10-27 21:09           ` Julian Anastasov
  2021-10-28  2:50             ` yangxingwu
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2021-10-27 21:09 UTC (permalink / raw)
  To: yangxingwu
  Cc: Simon Horman, netdev, lvs-devel, netfilter-devel, linux-kernel,
	linux-doc


	Hello,

On Wed, 27 Oct 2021, yangxingwu wrote:

> what we want is if RS weight is 0, then no new connections should be
> served even if conn_reuse_mode is 0, just as commit dc7b3eb900aa
> ("ipvs: Fix reuse connection if real server is
> dead") trying to do
> 
> Pls let me know if there are any other issues of concern

	My concern is with the behaviour people expect
from each sysctl var: conn_reuse_mode decides if port reuse
is considered for rescheduling and expire_nodest_conn
should have priority only for unavailable servers (nodest means
No Destination), not in this case.

	We don't know how people use the conn_reuse_mode=0
mode, one may bind to a local port and try to send multiple
connections in a row with the hope they will go to same real
server, i.e. as part from same "session", even while weight=0.
If they do not want such behaviour (99% of the cases), they
will use the default conn_reuse_mode=1. OTOH, you have different
expectations for mode 0, not sure why but you do not want to use
the default mode=1 which is safer to use. May be the setups
forget to stay with conn_reuse_mode=1 on kernels 5.9+ and
set the var to 0 ?

	The problem with mentioned commit dc7b3eb900aa is that
it breaks FTP and persistent connections while the goal of
weight=0 is graceful inhibition of the server. We made
the mistake to add priority for expire_nodest_conn when weight=0.
This can be fixed with a !cp->control check. We do not want
expire_nodest_conn to kill every connection during the
graceful period.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipvs: Fix reuse connection if RS weight is 0
  2021-10-27 21:09           ` Julian Anastasov
@ 2021-10-28  2:50             ` yangxingwu
  0 siblings, 0 replies; 8+ messages in thread
From: yangxingwu @ 2021-10-28  2:50 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Simon Horman, netdev, lvs-devel, netfilter-devel, linux-kernel,
	linux-doc

hello

On Thu, Oct 28, 2021 at 5:09 AM Julian Anastasov <ja@ssi.bg> wrote:
>
>
>         Hello,
>
> On Wed, 27 Oct 2021, yangxingwu wrote:
>
> > what we want is if RS weight is 0, then no new connections should be
> > served even if conn_reuse_mode is 0, just as commit dc7b3eb900aa
> > ("ipvs: Fix reuse connection if real server is
> > dead") trying to do
> >
> > Pls let me know if there are any other issues of concern
>
>         My concern is with the behaviour people expect
> from each sysctl var: conn_reuse_mode decides if port reuse
> is considered for rescheduling and expire_nodest_conn
> should have priority only for unavailable servers (nodest means
> No Destination), not in this case.
>
>         We don't know how people use the conn_reuse_mode=0
> mode, one may bind to a local port and try to send multiple
> connections in a row with the hope they will go to same real
> server, i.e. as part from same "session", even while weight=0.
> If they do not want such behaviour (99% of the cases), they
> will use the default conn_reuse_mode=1. OTOH, you have different
> expectations for mode 0, not sure why but you do not want to use
> the default mode=1 which is safer to use. May be the setups
> forget to stay with conn_reuse_mode=1 on kernels 5.9+ and
> set the var to 0 ?

The problem is we can NOT decide what the customers do, many of them
run kubernetes with old versions of kube-proxy. And most importantly,
upgrade to new version is a very long and painful process, that's why
we want to fix this at the kernel level

>         The problem with mentioned commit dc7b3eb900aa is that
> it breaks FTP and persistent connections while the goal of
> weight=0 is graceful inhibition of the server. We made
> the mistake to add priority for expire_nodest_conn when weight=0.
> This can be fixed with a !cp->control check. We do not want
> expire_nodest_conn to kill every connection during the
> graceful period.

ok, got it, I will try to fix this problem

> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-28  2:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25 11:59 [PATCH] ipvs: Fix reuse connection if RS weight is 0 yangxingwu
2021-10-25 18:12 ` Julian Anastasov
2021-10-26  2:54   ` yangxingwu
2021-10-26  5:44     ` Julian Anastasov
2021-10-26  6:13       ` yangxingwu
2021-10-27  1:43         ` yangxingwu
2021-10-27 21:09           ` Julian Anastasov
2021-10-28  2:50             ` yangxingwu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).