[PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
@ 2014-12-08 14:27 Marcelo Ricardo Leitner
  2014-12-08 14:29 ` Marcelo Ricardo Leitner
  2014-12-09 23:37 ` Julian Anastasov
  0 siblings, 2 replies; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-12-08 14:27 UTC (permalink / raw)
  To: lvs-devel; +Cc: hannes, jbrouer

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---

Notes:
    Hi,

    We have a report that not doing so may cause poor load balacing if
    applications reuse src port. With a patch like this, it would make
    new SYNs on a given connection to drop the old one and start a new
    one.

    One could say that this reuse can be done on purpose and carefully
    as a way to cause poor load balancing to cause a DoS.

    Thing is, I'm unsure if we really should do this, as it may end up
    doing more harm than good.

    WDYT? And if we do additional checks, like at least validating seq
    number, would it be better?

    Thanks,
    Marcelo

 net/netfilter/ipvs/ip_vs_core.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 990decba1fe418e36e59a1f081fcf0e47188da29..e81a9ac3c7e4e25fb14953b7faa4ace054f51274 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1036,6 +1036,14 @@ static inline bool is_new_conn(const struct sk_buff *skb,
 	}
 }
 
+static inline bool is_new_conn_expected(const struct ip_vs_conn *cp)
+{
+	if (cp->protocol != IPPROTO_TCP)
+		return false;
+	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
+		(cp->state == IP_VS_TCP_S_FIN_WAIT);
+}
+
 /* Handle response packets: rewrite addresses and send away...
  */
 static unsigned int
@@ -1642,9 +1650,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	 */
 	cp = pp->conn_in_get(af, skb, &iph, 0);
 
-	if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp && cp->dest &&
-	    unlikely(!atomic_read(&cp->dest->weight)) && !iph.fragoffs &&
-	    is_new_conn(skb, &iph)) {
+	if (cp && cp->dest && !iph.fragoffs && is_new_conn(skb, &iph) &&
+	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) &&
+	      unlikely(!atomic_read(&cp->dest->weight))) ||
+	     unlikely(is_new_conn_expected(cp)))) {
 		ip_vs_conn_expire_now(cp);
 		__ip_vs_conn_put(cp);
 		cp = NULL;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-08 14:27 [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT Marcelo Ricardo Leitner
@ 2014-12-08 14:29 ` Marcelo Ricardo Leitner
  2014-12-09 23:37 ` Julian Anastasov
  1 sibling, 0 replies; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-12-08 14:29 UTC (permalink / raw)
  To: lvs-devel

(I'll repost if the idea is accepted, this one is just for discussion)

On 08-12-2014 12:27, Marcelo Ricardo Leitner wrote:
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
>
> Notes:
>      Hi,
>
>      We have a report that not doing so may cause poor load balacing if
>      applications reuse src port. With a patch like this, it would make
>      new SYNs on a given connection to drop the old one and start a new
>      one.
>
>      One could say that this reuse can be done on purpose and carefully
>      as a way to cause poor load balancing to cause a DoS.
>
>      Thing is, I'm unsure if we really should do this, as it may end up
>      doing more harm than good.
>
>      WDYT? And if we do additional checks, like at least validating seq
>      number, would it be better?
>
>      Thanks,
>      Marcelo
>
>   net/netfilter/ipvs/ip_vs_core.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 990decba1fe418e36e59a1f081fcf0e47188da29..e81a9ac3c7e4e25fb14953b7faa4ace054f51274 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -1036,6 +1036,14 @@ static inline bool is_new_conn(const struct sk_buff *skb,
>   	}
>   }
>
> +static inline bool is_new_conn_expected(const struct ip_vs_conn *cp)
> +{
> +	if (cp->protocol != IPPROTO_TCP)
> +		return false;
> +	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
> +		(cp->state == IP_VS_TCP_S_FIN_WAIT);
> +}
> +
>   /* Handle response packets: rewrite addresses and send away...
>    */
>   static unsigned int
> @@ -1642,9 +1650,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
>   	 */
>   	cp = pp->conn_in_get(af, skb, &iph, 0);
>
> -	if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp && cp->dest &&
> -	    unlikely(!atomic_read(&cp->dest->weight)) && !iph.fragoffs &&
> -	    is_new_conn(skb, &iph)) {
> +	if (cp && cp->dest && !iph.fragoffs && is_new_conn(skb, &iph) &&
> +	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) &&
> +	      unlikely(!atomic_read(&cp->dest->weight))) ||
> +	     unlikely(is_new_conn_expected(cp)))) {
>   		ip_vs_conn_expire_now(cp);
>   		__ip_vs_conn_put(cp);
>   		cp = NULL;
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-08 14:27 [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT Marcelo Ricardo Leitner
  2014-12-08 14:29 ` Marcelo Ricardo Leitner
@ 2014-12-09 23:37 ` Julian Anastasov
  2014-12-10 12:34   ` Marcelo Ricardo Leitner
  1 sibling, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2014-12-09 23:37 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: lvs-devel, hannes, jbrouer


	Hello,

On Mon, 8 Dec 2014, Marcelo Ricardo Leitner wrote:

> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
> 
> Notes:
>     Hi,
> 
>     We have a report that not doing so may cause poor load balacing if
>     applications reuse src port. With a patch like this, it would make
>     new SYNs on a given connection to drop the old one and start a new
>     one.

	People complained about UDP, such as RADIUS, etc.
I guess, for TCP it is some local client that can benefit
from balancing, it can be also a local testing tool.

>     One could say that this reuse can be done on purpose and carefully
>     as a way to cause poor load balancing to cause a DoS.
> 
>     Thing is, I'm unsure if we really should do this, as it may end up
>     doing more harm than good.
> 
>     WDYT? And if we do additional checks, like at least validating seq
>     number, would it be better?

	I think, checking of SEQ will not help, tw_recycle
works by checking the timestamp option, SEQ of SYN is
arbitrary.

	Also, not all connections can be expired, for
example controlling conns (FTP CTL), i.e. our attempt
to expire conn will not succeed if FTP DATA is still in
progress (cp->n_control != 0 check) or in some FW/TW state.

>     Thanks,
>     Marcelo
> 
>  net/netfilter/ipvs/ip_vs_core.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 990decba1fe418e36e59a1f081fcf0e47188da29..e81a9ac3c7e4e25fb14953b7faa4ace054f51274 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -1036,6 +1036,14 @@ static inline bool is_new_conn(const struct sk_buff *skb,
>  	}
>  }
>  
> +static inline bool is_new_conn_expected(const struct ip_vs_conn *cp)
> +{
> +	if (cp->protocol != IPPROTO_TCP)
> +		return false;
> +	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
> +		(cp->state == IP_VS_TCP_S_FIN_WAIT);

	For conns with cp->flags & IP_VS_CONN_F_NOOUTPUT the
FIN_WAIT state is problematic, we see only packets from
client (eg. DR method) and can not tell if server has sent its
FIN packet. I.e. diverting large transfer from one real to
another will lead to sending of FIN+ACKs from client to
new place.

	So, I'm not sure if we should restrict the FW state, eg:

	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
	       (cp->state == IP_VS_TCP_S_FIN_WAIT &&
		(cp->flags & IP_VS_CONN_F_NOOUTPUT) &&
		time_after_eq(jiffies + cp->timeout,
			      cp->timer.expires + 1 * HZ));

	I.e. for INPUT+OUTPUT IP_VS_TCP_S_FIN_WAIT is
not expired because only one side sent FIN. For
INPUT-ONLY we can allow expiration in IP_VS_TCP_S_FIN_WAIT,
for example, when FIN+ACK was not seen recently, say in
last second?

	If such checks look dangerous we can try to expire
only in IP_VS_TCP_S_TIME_WAIT.

> +}
> +
>  /* Handle response packets: rewrite addresses and send away...
>   */
>  static unsigned int
> @@ -1642,9 +1650,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
>  	 */
>  	cp = pp->conn_in_get(af, skb, &iph, 0);
>  
> -	if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp && cp->dest &&
> -	    unlikely(!atomic_read(&cp->dest->weight)) && !iph.fragoffs &&
> -	    is_new_conn(skb, &iph)) {
> +	if (cp && cp->dest && !iph.fragoffs && is_new_conn(skb, &iph) &&
> +	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) &&
> +	      unlikely(!atomic_read(&cp->dest->weight))) ||
> +	     unlikely(is_new_conn_expected(cp)))) {

	This check can be optimized:

	if (!iph.fragoffs && is_new_conn(skb, &iph) && cp &&
	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
	      unlikely(!atomic_read(&cp->dest->weight))) ||
	     unlikely(is_new_conn_expected(cp)))) {

>  		ip_vs_conn_expire_now(cp);
>  		__ip_vs_conn_put(cp);
>  		cp = NULL;
> -- 
> 1.9.3

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-09 23:37 ` Julian Anastasov
@ 2014-12-10 12:34   ` Marcelo Ricardo Leitner
  2014-12-10 16:56     ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-12-10 12:34 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

On 09-12-2014 21:37, Julian Anastasov wrote:
>
> 	Hello,
>
> On Mon, 8 Dec 2014, Marcelo Ricardo Leitner wrote:
>
>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>> ---
>>
>> Notes:
>>      Hi,
>>
>>      We have a report that not doing so may cause poor load balacing if
>>      applications reuse src port. With a patch like this, it would make
>>      new SYNs on a given connection to drop the old one and start a new
>>      one.
>
> 	People complained about UDP, such as RADIUS, etc.
> I guess, for TCP it is some local client that can benefit
> from balancing, it can be also a local testing tool.

So it's a new type of issue, ok. I don't have info on what their application 
is, just that it ends up reusing port numbers.

>>      One could say that this reuse can be done on purpose and carefully
>>      as a way to cause poor load balancing to cause a DoS.
>>
>>      Thing is, I'm unsure if we really should do this, as it may end up
>>      doing more harm than good.
>>
>>      WDYT? And if we do additional checks, like at least validating seq
>>      number, would it be better?
>
> 	I think, checking of SEQ will not help, tw_recycle
> works by checking the timestamp option, SEQ of SYN is
> arbitrary.

I thought for time wait assassination it would have to use a bigger seq 
together with the timestamps. But anyway, Windows (since Vista, even 8) 
doesn't use timestamps by default, unfortunately. Windows is not really 
involved, it's just to have that in mind, as by relying on timestamps we would 
be limiting the solution.

> 	Also, not all connections can be expired, for
> example controlling conns (FTP CTL), i.e. our attempt
> to expire conn will not succeed if FTP DATA is still in
> progress (cp->n_control != 0 check) or in some FW/TW state.

Interesting. I hadn't thought about this case.

>>      Thanks,
>>      Marcelo
>>
>>   net/netfilter/ipvs/ip_vs_core.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
>> index 990decba1fe418e36e59a1f081fcf0e47188da29..e81a9ac3c7e4e25fb14953b7faa4ace054f51274 100644
>> --- a/net/netfilter/ipvs/ip_vs_core.c
>> +++ b/net/netfilter/ipvs/ip_vs_core.c
>> @@ -1036,6 +1036,14 @@ static inline bool is_new_conn(const struct sk_buff *skb,
>>   	}
>>   }
>>
>> +static inline bool is_new_conn_expected(const struct ip_vs_conn *cp)
>> +{
>> +	if (cp->protocol != IPPROTO_TCP)
>> +		return false;
>> +	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>> +		(cp->state == IP_VS_TCP_S_FIN_WAIT);
>
> 	For conns with cp->flags & IP_VS_CONN_F_NOOUTPUT the
> FIN_WAIT state is problematic, we see only packets from
> client (eg. DR method) and can not tell if server has sent its
> FIN packet. I.e. diverting large transfer from one real to
> another will lead to sending of FIN+ACKs from client to
> new place.
>
> 	So, I'm not sure if we should restrict the FW state, eg:
>
> 	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
> 	       (cp->state == IP_VS_TCP_S_FIN_WAIT &&
> 		(cp->flags & IP_VS_CONN_F_NOOUTPUT) &&
> 		time_after_eq(jiffies + cp->timeout,
> 			      cp->timer.expires + 1 * HZ));
>
> 	I.e. for INPUT+OUTPUT IP_VS_TCP_S_FIN_WAIT is
> not expired because only one side sent FIN. For

And when the other side send it too, it will be in TIME_WAIT then. Ok.

> INPUT-ONLY we can allow expiration in IP_VS_TCP_S_FIN_WAIT,
> for example, when FIN+ACK was not seen recently, say in
> last second?

That's interesting.

> 	If such checks look dangerous we can try to expire
> only in IP_VS_TCP_S_TIME_WAIT.

Or only enable it within some sysctl, like we have expire_nodest_conn? We 
could have a "(tcp_)rebalance_on_port_reuse" or something like that.

>> +}
>> +
>>   /* Handle response packets: rewrite addresses and send away...
>>    */
>>   static unsigned int
>> @@ -1642,9 +1650,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
>>   	 */
>>   	cp = pp->conn_in_get(af, skb, &iph, 0);
>>
>> -	if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp && cp->dest &&
>> -	    unlikely(!atomic_read(&cp->dest->weight)) && !iph.fragoffs &&
>> -	    is_new_conn(skb, &iph)) {
>> +	if (cp && cp->dest && !iph.fragoffs && is_new_conn(skb, &iph) &&
>> +	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) &&
>> +	      unlikely(!atomic_read(&cp->dest->weight))) ||
>> +	     unlikely(is_new_conn_expected(cp)))) {
>
> 	This check can be optimized:
>
> 	if (!iph.fragoffs && is_new_conn(skb, &iph) && cp &&
> 	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
> 	      unlikely(!atomic_read(&cp->dest->weight))) ||
> 	     unlikely(is_new_conn_expected(cp)))) {

Yup. Okay.

I'll wait on the sysctl thing before posting as all the rest seems aligned. 
Thanks!

   Marcelo

>>   		ip_vs_conn_expire_now(cp);
>>   		__ip_vs_conn_put(cp);
>>   		cp = NULL;
>> --
>> 1.9.3
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-10 12:34   ` Marcelo Ricardo Leitner
@ 2014-12-10 16:56     ` Marcelo Ricardo Leitner
  2014-12-10 21:27       ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-12-10 16:56 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

On 10-12-2014 10:34, Marcelo Ricardo Leitner wrote:
> On 09-12-2014 21:37, Julian Anastasov wrote:
>>
>>     Hello,
>>
>> On Mon, 8 Dec 2014, Marcelo Ricardo Leitner wrote:
>>
>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>> ---
>>>
>>> Notes:
>>>      Hi,
>>>
>>>      We have a report that not doing so may cause poor load balacing if
>>>      applications reuse src port. With a patch like this, it would make
>>>      new SYNs on a given connection to drop the old one and start a new
>>>      one.
>>
>>     People complained about UDP, such as RADIUS, etc.
>> I guess, for TCP it is some local client that can benefit
>> from balancing, it can be also a local testing tool.
>
> So it's a new type of issue, ok. I don't have info on what their application
> is, just that it ends up reusing port numbers.
>
>>>      One could say that this reuse can be done on purpose and carefully
>>>      as a way to cause poor load balancing to cause a DoS.
>>>
>>>      Thing is, I'm unsure if we really should do this, as it may end up
>>>      doing more harm than good.
>>>
>>>      WDYT? And if we do additional checks, like at least validating seq
>>>      number, would it be better?
>>
>>     I think, checking of SEQ will not help, tw_recycle
>> works by checking the timestamp option, SEQ of SYN is
>> arbitrary.
>
> I thought for time wait assassination it would have to use a bigger seq
> together with the timestamps. But anyway, Windows (since Vista, even 8)
> doesn't use timestamps by default, unfortunately. Windows is not really
> involved, it's just to have that in mind, as by relying on timestamps we would
> be limiting the solution.
>
>>     Also, not all connections can be expired, for
>> example controlling conns (FTP CTL), i.e. our attempt
>> to expire conn will not succeed if FTP DATA is still in
>> progress (cp->n_control != 0 check) or in some FW/TW state.
>
> Interesting. I hadn't thought about this case.
>
>>>      Thanks,
>>>      Marcelo
>>>
>>>   net/netfilter/ipvs/ip_vs_core.c | 15 ++++++++++++---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
>>> index
>>> 990decba1fe418e36e59a1f081fcf0e47188da29..e81a9ac3c7e4e25fb14953b7faa4ace054f51274
>>> 100644
>>> --- a/net/netfilter/ipvs/ip_vs_core.c
>>> +++ b/net/netfilter/ipvs/ip_vs_core.c
>>> @@ -1036,6 +1036,14 @@ static inline bool is_new_conn(const struct sk_buff
>>> *skb,
>>>       }
>>>   }
>>>
>>> +static inline bool is_new_conn_expected(const struct ip_vs_conn *cp)
>>> +{
>>> +    if (cp->protocol != IPPROTO_TCP)
>>> +        return false;
>>> +    return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>>> +        (cp->state == IP_VS_TCP_S_FIN_WAIT);
>>
>>     For conns with cp->flags & IP_VS_CONN_F_NOOUTPUT the
>> FIN_WAIT state is problematic, we see only packets from
>> client (eg. DR method) and can not tell if server has sent its
>> FIN packet. I.e. diverting large transfer from one real to
>> another will lead to sending of FIN+ACKs from client to
>> new place.
>>
>>     So, I'm not sure if we should restrict the FW state, eg:
>>
>>     return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>>            (cp->state == IP_VS_TCP_S_FIN_WAIT &&
>>         (cp->flags & IP_VS_CONN_F_NOOUTPUT) &&
>>         time_after_eq(jiffies + cp->timeout,
>>                   cp->timer.expires + 1 * HZ));
>>
>>     I.e. for INPUT+OUTPUT IP_VS_TCP_S_FIN_WAIT is
>> not expired because only one side sent FIN. For
>
> And when the other side send it too, it will be in TIME_WAIT then. Ok.
>
>> INPUT-ONLY we can allow expiration in IP_VS_TCP_S_FIN_WAIT,
>> for example, when FIN+ACK was not seen recently, say in
>> last second?
>
> That's interesting.
>
>>     If such checks look dangerous we can try to expire
>> only in IP_VS_TCP_S_TIME_WAIT.
>
> Or only enable it within some sysctl, like we have expire_nodest_conn? We
> could have a "(tcp_)rebalance_on_port_reuse" or something like that.

Actually, I'm more for this tunable than a timeout, as it would allow one to 
easily rebalance the cluster when one add new nodes. More like:

      return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
	    (sysctl_tcp_reschedule_on_syn &&
	     cp->state == IP_VS_TCP_S_FIN_WAIT &&
	     (cp->flags & IP_VS_CONN_F_NOOUTPUT));

Then we play safe, leave that sysctl_tcp_reschedule_on_syn off by default, but 
one may use it when applicable. WDYT?

   Marcelo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-10 16:56     ` Marcelo Ricardo Leitner
@ 2014-12-10 21:27       ` Julian Anastasov
  2014-12-11 12:44         ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2014-12-10 21:27 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: lvs-devel, hannes, jbrouer


	Hello,

On Wed, 10 Dec 2014, Marcelo Ricardo Leitner wrote:

> > >     return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
> > >            (cp->state == IP_VS_TCP_S_FIN_WAIT &&
> > >         (cp->flags & IP_VS_CONN_F_NOOUTPUT) &&
> > >         time_after_eq(jiffies + cp->timeout,
> > >                   cp->timer.expires + 1 * HZ));
> > >
> > >     I.e. for INPUT+OUTPUT IP_VS_TCP_S_FIN_WAIT is
> > > not expired because only one side sent FIN. For
> >
> > And when the other side send it too, it will be in TIME_WAIT then. Ok.

	Yes, for IPVS-NAT TIME_WAIT means both sides closed.

> > >     If such checks look dangerous we can try to expire
> > > only in IP_VS_TCP_S_TIME_WAIT.
> >
> > Or only enable it within some sysctl, like we have expire_nodest_conn? We
> > could have a "(tcp_)rebalance_on_port_reuse" or something like that.
> 
> Actually, I'm more for this tunable than a timeout, as it would allow one to
> easily rebalance the cluster when one add new nodes. More like:
> 
>      return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
> 	    (sysctl_tcp_reschedule_on_syn &&
> 	     cp->state == IP_VS_TCP_S_FIN_WAIT &&
> 	     (cp->flags & IP_VS_CONN_F_NOOUTPUT));
> 
> Then we play safe, leave that sysctl_tcp_reschedule_on_syn off by default, but
> one may use it when applicable. WDYT?

	Agreed, a sysctl var is ok. Only that my preference
is to have sysctl var that is checked even before
is_new_conn() call to save some CPU cycles for a rare packet
such as SYN :)

	In theory, skb_header_pointer() is fast and we will
get pointer to header in the common case (without copy) but
checks here and there cost some cycles...

	One option would be to implement "conn_reuse_mode"
as a bitmask (for now 2 bits), values 0, 1 or 3:

0: do not call is_new_conn(), even for sysctl_expire_nodest_conn()
purposes. Fastest but without any expiation (reuse always).

1: default: TIME_WAIT-only check for TCP,
allow the sysctl_expire_nodest_conn check for TCP/SCTP.
In fact, check for this state is the global conn_reuse_mode != 0.

3 (&2 if treated as bitmask): like 1 but also
allows DR/TUN in FIN_WAIT to expire (int conn_reuse_mode can
come as argument):

	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
	       ((conn_reuse_mode & 2) &&
		cp->state == IP_VS_TCP_S_FIN_WAIT &&
		(cp->flags & IP_VS_CONN_F_NOOUTPUT));

	In fact, 2 and 3 will do the same.

	Then we will have:

	int conn_reuse_mode;

	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
	if (conn_reuse_mode && !iph.fragoffs &&
	    is_new_conn(skb, &iph) && cp &&
	    ...
	    unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {

	conn_reuse_mode=0 will work as global disable for reuse.
But 1 will be the default value.

	You can grep in following way to see the places that
define sysctl vars, "backup_only" is a good example to use
as reference:
# grep -r backup_only include/net/ip_vs.h net/netfilter/ipvs/

	Order of vars in struct, vs_vars and
ip_vs_control_net_init_sysctl() should be same, we
have per-netns vars. You can add the new var after backup_only.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-10 21:27       ` Julian Anastasov
@ 2014-12-11 12:44         ` Marcelo Ricardo Leitner
  2014-12-11 22:39           ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-12-11 12:44 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

On 10-12-2014 19:27, Julian Anastasov wrote:
>
> 	Hello,
>
> On Wed, 10 Dec 2014, Marcelo Ricardo Leitner wrote:
>
>>>>      return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>>>>             (cp->state == IP_VS_TCP_S_FIN_WAIT &&
>>>>          (cp->flags & IP_VS_CONN_F_NOOUTPUT) &&
>>>>          time_after_eq(jiffies + cp->timeout,
>>>>                    cp->timer.expires + 1 * HZ));
>>>>
>>>>      I.e. for INPUT+OUTPUT IP_VS_TCP_S_FIN_WAIT is
>>>> not expired because only one side sent FIN. For
>>>
>>> And when the other side send it too, it will be in TIME_WAIT then. Ok.
>
> 	Yes, for IPVS-NAT TIME_WAIT means both sides closed.
>
>>>>      If such checks look dangerous we can try to expire
>>>> only in IP_VS_TCP_S_TIME_WAIT.
>>>
>>> Or only enable it within some sysctl, like we have expire_nodest_conn? We
>>> could have a "(tcp_)rebalance_on_port_reuse" or something like that.
>>
>> Actually, I'm more for this tunable than a timeout, as it would allow one to
>> easily rebalance the cluster when one add new nodes. More like:
>>
>>       return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>> 	    (sysctl_tcp_reschedule_on_syn &&
>> 	     cp->state == IP_VS_TCP_S_FIN_WAIT &&
>> 	     (cp->flags & IP_VS_CONN_F_NOOUTPUT));
>>
>> Then we play safe, leave that sysctl_tcp_reschedule_on_syn off by default, but
>> one may use it when applicable. WDYT?
>
> 	Agreed, a sysctl var is ok. Only that my preference
> is to have sysctl var that is checked even before
> is_new_conn() call to save some CPU cycles for a rare packet
> such as SYN :)
>
> 	In theory, skb_header_pointer() is fast and we will
> get pointer to header in the common case (without copy) but
> checks here and there cost some cycles...
>
> 	One option would be to implement "conn_reuse_mode"
> as a bitmask (for now 2 bits), values 0, 1 or 3:
>
> 0: do not call is_new_conn(), even for sysctl_expire_nodest_conn()
> purposes. Fastest but without any expiation (reuse always).
>
> 1: default: TIME_WAIT-only check for TCP,
> allow the sysctl_expire_nodest_conn check for TCP/SCTP.
> In fact, check for this state is the global conn_reuse_mode != 0.
>
> 3 (&2 if treated as bitmask): like 1 but also
> allows DR/TUN in FIN_WAIT to expire (int conn_reuse_mode can
> come as argument):
>
> 	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
> 	       ((conn_reuse_mode & 2) &&
> 		cp->state == IP_VS_TCP_S_FIN_WAIT &&
> 		(cp->flags & IP_VS_CONN_F_NOOUTPUT));
>
> 	In fact, 2 and 3 will do the same.
>
> 	Then we will have:
>
> 	int conn_reuse_mode;
>
> 	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
> 	if (conn_reuse_mode && !iph.fragoffs &&
> 	    is_new_conn(skb, &iph) && cp &&
> 	    ...
> 	    unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {

Looks great! Thanks for all the directions.

One question: at this if(), shouldn't we also check for &cp->n_control or you 
considered it in the '...'? Because as you said, if this connection controls 
another one we won't expire it and AFAICT we may end up with two duplicate 
entries for the given src host/port/service.

Currently I'm with:
         if (conn_reuse_mode && !iph.fragoffs &&
             is_new_conn(skb, &iph) && cp &&
             !atomic_read(&cp->n_control) &&
             ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
               unlikely(!atomic_read(&cp->dest->weight))) ||
              unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {

Thanks,
Marcelo

> 	conn_reuse_mode=0 will work as global disable for reuse.
> But 1 will be the default value.
>
> 	You can grep in following way to see the places that
> define sysctl vars, "backup_only" is a good example to use
> as reference:
> # grep -r backup_only include/net/ip_vs.h net/netfilter/ipvs/
>
> 	Order of vars in struct, vs_vars and
> ip_vs_control_net_init_sysctl() should be same, we
> have per-netns vars. You can add the new var after backup_only.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-11 12:44         ` Marcelo Ricardo Leitner
@ 2014-12-11 22:39           ` Julian Anastasov
  2014-12-12 11:58             ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2014-12-11 22:39 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: lvs-devel, hannes, jbrouer


	Hello,

On Thu, 11 Dec 2014, Marcelo Ricardo Leitner wrote:

> On 10-12-2014 19:27, Julian Anastasov wrote:
> >
> >  int conn_reuse_mode;
> >
> >  conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
> >  if (conn_reuse_mode && !iph.fragoffs &&
> >      is_new_conn(skb, &iph) && cp &&
> >      ...
> >      unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
> 
> Looks great! Thanks for all the directions.
> 
> One question: at this if(), shouldn't we also check for &cp->n_control or you
> considered it in the '...'? Because as you said, if this connection controls
> another one we won't expire it and AFAICT we may end up with two duplicate
> entries for the given src host/port/service.
> 
> Currently I'm with:
>         if (conn_reuse_mode && !iph.fragoffs &&
>             is_new_conn(skb, &iph) && cp &&
>             !atomic_read(&cp->n_control) &&
>             ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
>              unlikely(!atomic_read(&cp->dest->weight))) ||
>              unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {

	I think, the cp->n_control check is not needed,
we can live with many connections... in master server.

	But it looks like the new mechanism opens the door
for problems with the sync protocol. The master will
send sync messages for the new connection that we
create but backup server will hit the old connection
that can be to another real server (daddr). Backup will update
the cp->flags and cp->state but cp->dest can not be changed.
Our old (expired) connection will try to send sync messages
but as its state does not change ip_vs_sync_conn_needed()
will prevent the sending for FW/TW. Even if the old connection
does not cause problems, the new connection is not properly synced,
ip_vs_proc_conn() will call ip_vs_conn_in_get() and
we do not notice that daddr/dport differs.

	May be it can be solved by adding daddr+dport
comparison after the ip_vs_conn_in_get() call in
ip_vs_proc_conn(), so that we can support multiple
connections that differ in daddr+dport. Or even
daddr+dport check in ip_vs_conn_in_get just for the backup.
But even in this case there is a risk because the sync messages
can come in random order, for example, a short connection
is expired in TW state, we create new connection but
the backup server gets sync messages in reverse order,
i.e. first for the new connection.

	If above is not implemented, the
'!(ipvs->sync_state & IP_VS_STATE_MASTER) &&' check
should prevent the expiration. We will create new connection
due to weight=0 but backup can remain with wrong daddr
from the old connection, that is the price we pay on
rescheduling to different real server.

	And may be we have to add also a cp->control check
in is_new_conn_expected(), the reason: persistence
should have priority. 'cp->control' check should be in
is_new_conn_expected(). IIRC, people use expire_nodest_conn
even for setups with persistence, so cp->control should
not prevent rescheduling in this case. cp->control also covers
the case with FTP DATA, such conns should use the same
real server.

	What about:

static inline bool can_expire_connection()
{
	/* Controlled (FTP DATA or persistence)? */
	if (cp->control)
		return false;
	---> Below is the previous version, unchanged:
        return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
               ((conn_reuse_mode & 2) &&
                cp->state == IP_VS_TCP_S_FIN_WAIT &&
                (cp->flags & IP_VS_CONN_F_NOOUTPUT));
}

	New version:

        conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
        if (conn_reuse_mode && !iph.fragoffs &&
            is_new_conn(skb, &iph) && cp &&
	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
	      unlikely(!atomic_read(&cp->dest->weight))) ||
	     (!(ipvs->sync_state & IP_VS_STATE_MASTER) &&
	      unlikely(can_expire_connection(cp, conn_reuse_mode))))) {

	As result, actions by priority are as follows:

- reschedule for expire_nodest_conn=1 when weight=0, even
if daddr/dport can become different in backup, even on persistence

- sync protocol is active: reuse to prevent conns with different
real server in master and backup

- reuse on persistence

- reschedule on FW/TW: when no sync and no persistence

- reuse, by default

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-11 22:39           ` Julian Anastasov
@ 2014-12-12 11:58             ` Marcelo Ricardo Leitner
  2014-12-30 19:49               ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-12-12 11:58 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

Hi,

On 11-12-2014 20:39, Julian Anastasov wrote:
>
> 	Hello,
>
> On Thu, 11 Dec 2014, Marcelo Ricardo Leitner wrote:
>
>> On 10-12-2014 19:27, Julian Anastasov wrote:
>>>
>>>   int conn_reuse_mode;
>>>
>>>   conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
>>>   if (conn_reuse_mode && !iph.fragoffs &&
>>>       is_new_conn(skb, &iph) && cp &&
>>>       ...
>>>       unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>>
>> Looks great! Thanks for all the directions.
>>
>> One question: at this if(), shouldn't we also check for &cp->n_control or you
>> considered it in the '...'? Because as you said, if this connection controls
>> another one we won't expire it and AFAICT we may end up with two duplicate
>> entries for the given src host/port/service.
>>
>> Currently I'm with:
>>          if (conn_reuse_mode && !iph.fragoffs &&
>>              is_new_conn(skb, &iph) && cp &&
>>              !atomic_read(&cp->n_control) &&
>>              ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
>>               unlikely(!atomic_read(&cp->dest->weight))) ||
>>               unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>
> 	I think, the cp->n_control check is not needed,
> we can live with many connections... in master server.

Okay, good.

> 	But it looks like the new mechanism opens the door
> for problems with the sync protocol. The master will
> send sync messages for the new connection that we
> create but backup server will hit the old connection
> that can be to another real server (daddr). Backup will update
> the cp->flags and cp->state but cp->dest can not be changed.
> Our old (expired) connection will try to send sync messages
> but as its state does not change ip_vs_sync_conn_needed()
> will prevent the sending for FW/TW. Even if the old connection
> does not cause problems, the new connection is not properly synced,
> ip_vs_proc_conn() will call ip_vs_conn_in_get() and
> we do not notice that daddr/dport differs.
>
> 	May be it can be solved by adding daddr+dport
> comparison after the ip_vs_conn_in_get() call in
> ip_vs_proc_conn(), so that we can support multiple
> connections that differ in daddr+dport. Or even
> daddr+dport check in ip_vs_conn_in_get just for the backup.
> But even in this case there is a risk because the sync messages
> can come in random order, for example, a short connection
> is expired in TW state, we create new connection but
> the backup server gets sync messages in reverse order,
> i.e. first for the new connection.
>
> 	If above is not implemented, the
> '!(ipvs->sync_state & IP_VS_STATE_MASTER) &&' check
> should prevent the expiration. We will create new connection
> due to weight=0 but backup can remain with wrong daddr
> from the old connection, that is the price we pay on
> rescheduling to different real server.
>
> 	And may be we have to add also a cp->control check
> in is_new_conn_expected(), the reason: persistence
> should have priority. 'cp->control' check should be in
> is_new_conn_expected(). IIRC, people use expire_nodest_conn
> even for setups with persistence, so cp->control should
> not prevent rescheduling in this case. cp->control also covers
> the case with FTP DATA, such conns should use the same
> real server.
>
> 	What about:
>
> static inline bool can_expire_connection()
> {
> 	/* Controlled (FTP DATA or persistence)? */
> 	if (cp->control)
> 		return false;
> 	---> Below is the previous version, unchanged:
>          return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>                 ((conn_reuse_mode & 2) &&
>                  cp->state == IP_VS_TCP_S_FIN_WAIT &&
>                  (cp->flags & IP_VS_CONN_F_NOOUTPUT));
> }
>
> 	New version:
>
>          conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
>          if (conn_reuse_mode && !iph.fragoffs &&
>              is_new_conn(skb, &iph) && cp &&
> 	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
> 	      unlikely(!atomic_read(&cp->dest->weight))) ||
> 	     (!(ipvs->sync_state & IP_VS_STATE_MASTER) &&
> 	      unlikely(can_expire_connection(cp, conn_reuse_mode))))) {
>
> 	As result, actions by priority are as follows:
>
> - reschedule for expire_nodest_conn=1 when weight=0, even
> if daddr/dport can become different in backup, even on persistence
>
> - sync protocol is active: reuse to prevent conns with different
> real server in master and backup
>
> - reuse on persistence
>
> - reschedule on FW/TW: when no sync and no persistence
>
> - reuse, by default

TBH now I have to read more on the sync mechanism, it's been a while. I'll try 
going with the first option, a way to get this properly synced to backup 
server instead of disabling it when with sync enabled, because this last would 
put a severe limitation on this feature. I haven't seen many single node setups...

I'll get back to you by Tue I think.

Thanks,
Marcelo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-12 11:58             ` Marcelo Ricardo Leitner
@ 2014-12-30 19:49               ` Marcelo Ricardo Leitner
  2015-01-05 22:26                 ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-12-30 19:49 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

Hi Julian,

On 12-12-2014 09:58, Marcelo Ricardo Leitner wrote:
> Hi,
> 
> On 11-12-2014 20:39, Julian Anastasov wrote:
>>
>>     Hello,
>>
>> On Thu, 11 Dec 2014, Marcelo Ricardo Leitner wrote:
>>
>>> On 10-12-2014 19:27, Julian Anastasov wrote:
>>>>
>>>>   int conn_reuse_mode;
>>>>
>>>>   conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
>>>>   if (conn_reuse_mode && !iph.fragoffs &&
>>>>       is_new_conn(skb, &iph) && cp &&
>>>>       ...
>>>>       unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>>>
>>> Looks great! Thanks for all the directions.
>>>
>>> One question: at this if(), shouldn't we also check for &cp->n_control or you
>>> considered it in the '...'? Because as you said, if this connection controls
>>> another one we won't expire it and AFAICT we may end up with two duplicate
>>> entries for the given src host/port/service.
>>>
>>> Currently I'm with:
>>>          if (conn_reuse_mode && !iph.fragoffs &&
>>>              is_new_conn(skb, &iph) && cp &&
>>>              !atomic_read(&cp->n_control) &&
>>>              ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
>>>               unlikely(!atomic_read(&cp->dest->weight))) ||
>>>               unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>>
>>     I think, the cp->n_control check is not needed,
>> we can live with many connections... in master server.
> 
> Okay, good.
> 
>>     But it looks like the new mechanism opens the door
>> for problems with the sync protocol. The master will
>> send sync messages for the new connection that we
>> create but backup server will hit the old connection
>> that can be to another real server (daddr). Backup will update
>> the cp->flags and cp->state but cp->dest can not be changed.
>> Our old (expired) connection will try to send sync messages
>> but as its state does not change ip_vs_sync_conn_needed()
>> will prevent the sending for FW/TW. Even if the old connection
>> does not cause problems, the new connection is not properly synced,
>> ip_vs_proc_conn() will call ip_vs_conn_in_get() and
>> we do not notice that daddr/dport differs.
>>
>>     May be it can be solved by adding daddr+dport
>> comparison after the ip_vs_conn_in_get() call in
>> ip_vs_proc_conn(), so that we can support multiple
>> connections that differ in daddr+dport. Or even
>> daddr+dport check in ip_vs_conn_in_get just for the backup.
>> But even in this case there is a risk because the sync messages
>> can come in random order, for example, a short connection
>> is expired in TW state, we create new connection but
>> the backup server gets sync messages in reverse order,
>> i.e. first for the new connection.
>>
>>     If above is not implemented, the
>> '!(ipvs->sync_state & IP_VS_STATE_MASTER) &&' check
>> should prevent the expiration. We will create new connection
>> due to weight=0 but backup can remain with wrong daddr
>> from the old connection, that is the price we pay on
>> rescheduling to different real server.
>>
>>     And may be we have to add also a cp->control check
>> in is_new_conn_expected(), the reason: persistence
>> should have priority. 'cp->control' check should be in
>> is_new_conn_expected(). IIRC, people use expire_nodest_conn
>> even for setups with persistence, so cp->control should
>> not prevent rescheduling in this case. cp->control also covers
>> the case with FTP DATA, such conns should use the same
>> real server.
>>
>>     What about:
>>
>> static inline bool can_expire_connection()
>> {
>>     /* Controlled (FTP DATA or persistence)? */
>>     if (cp->control)
>>         return false;
>>     ---> Below is the previous version, unchanged:
>>          return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>>                 ((conn_reuse_mode & 2) &&
>>                  cp->state == IP_VS_TCP_S_FIN_WAIT &&
>>                  (cp->flags & IP_VS_CONN_F_NOOUTPUT));
>> }
>>
>>     New version:
>>
>>          conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
>>          if (conn_reuse_mode && !iph.fragoffs &&
>>              is_new_conn(skb, &iph) && cp &&
>>         ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
>>           unlikely(!atomic_read(&cp->dest->weight))) ||
>>          (!(ipvs->sync_state & IP_VS_STATE_MASTER) &&
>>           unlikely(can_expire_connection(cp, conn_reuse_mode))))) {
>>
>>     As result, actions by priority are as follows:
>>
>> - reschedule for expire_nodest_conn=1 when weight=0, even
>> if daddr/dport can become different in backup, even on persistence
>>
>> - sync protocol is active: reuse to prevent conns with different
>> real server in master and backup
>>
>> - reuse on persistence
>>
>> - reschedule on FW/TW: when no sync and no persistence
>>
>> - reuse, by default
> 
> TBH now I have to read more on the sync mechanism, it's been a while. I'll try 
> going with the first option, a way to get this properly synced to backup 
> server instead of disabling it when with sync enabled, because this last would 
> put a severe limitation on this feature. I haven't seen many single node 
> setups...

What if we make the backup server purge the old entry, just like the active server does, and ignore old updates if needed? Please note the new hunk on ip_vs_sync.c. This is the patch I'm using so far, and I'm seeing good results with even with master/backup. Backup server successfully purges the old and creates a new entry when it detects such update:

[  344.971673] IPVS: Bind-dest TCP c:192.168.122.1:22222 v:192.168.122.200:80 d:192.168.122.67:80 fwd:R s:0 conn->flags:A3 conn->refcnt:1 dest->refcnt:2
[  344.972071] IPVS: Unbind-dest TCP c:192.168.122.1:22222 v:192.168.122.200:80 d:192.168.122.95:80 fwd:R s:4 conn->flags:1A3 conn->refcnt:0 dest->refcnt:2

-- 8< --

diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
index 7a3c047295914cbc8c4273506a9b6d35246a1750..3ba709531adba970595251fa73d6d471ed14c5c1 100644
--- a/Documentation/networking/ipvs-sysctl.txt
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -22,6 +22,27 @@ backup_only - BOOLEAN
 	If set, disable the director function while the server is
 	in backup mode to avoid packet loops for DR/TUN methods.
 
+conn_reuse_mode - INTEGER
+	1 - default
+
+	Controls how ipvs will deal with connections that are detected
+	port reuse. It is a bitmap, with the values being:
+
+	0: disable any special handling on port reuse. The new
+	connection will be delivered to the same real server that was
+	servicing the previous connection. This will effectively
+	disable expire_nodest_conn.
+
+	bit 1: enable rescheduling of new connections when it is safe.
+	That is, whenever expire_nodest_conn and for TCP sockets, when
+	the connection is in TIME_WAIT state (which is only possible if
+	you use NAT mode).
+
+	bit 2: it is bit 1 plus, for TCP connections, when connections
+	are in FIN_WAIT state, as this is the last state seen by load
+	balancer in Direct Routing mode. This bit helps on adding new
+	real servers to a very busy cluster.
+
 conntrack - BOOLEAN
 	0 - disabled (default)
 	not 0 - enabled
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 615b20b585452111a25085890d8fa875657dbe76..6c7ee0ae7ef1694671e4b6af0906b2fa077f5c7c 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -924,6 +924,7 @@ struct netns_ipvs {
 	int			sysctl_nat_icmp_send;
 	int			sysctl_pmtu_disc;
 	int			sysctl_backup_only;
+	int			sysctl_conn_reuse_mode;
 
 	/* ip_vs_lblc */
 	int			sysctl_lblc_expiration;
@@ -1042,6 +1043,11 @@ static inline int sysctl_backup_only(struct netns_ipvs *ipvs)
 	       ipvs->sysctl_backup_only;
 }
 
+static inline int sysctl_conn_reuse_mode(struct netns_ipvs *ipvs)
+{
+	return ipvs->sysctl_conn_reuse_mode;
+}
+
 #else
 
 static inline int sysctl_sync_threshold(struct netns_ipvs *ipvs)
@@ -1109,6 +1115,11 @@ static inline int sysctl_backup_only(struct netns_ipvs *ipvs)
 	return 0;
 }
 
+static inline int sysctl_conn_reuse_mode(struct netns_ipvs *ipvs)
+{
+	return 1;
+}
+
 #endif
 
 /* IPVS core functions
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 990decba1fe418e36e59a1f081fcf0e47188da29..da2760496314d49a5b4f6f588de7cb565a28e223 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1036,6 +1036,20 @@ static inline bool is_new_conn(const struct sk_buff *skb,
 	}
 }
 
+static inline bool is_new_conn_expected(const struct ip_vs_conn *cp,
+					int conn_reuse_mode)
+{
+	if ((cp->protocol != IPPROTO_TCP) && (cp->protocol != IPPROTO_SCTP))
+		return false;
+	/* Controlled (FTP DATA or persistence)? */
+	if (cp->control)
+		return false;
+	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
+	       ((conn_reuse_mode & 2) &&
+		(cp->state == IP_VS_TCP_S_FIN_WAIT) &&
+		(cp->flags & IP_VS_CONN_F_NOOUTPUT));
+}
+
 /* Handle response packets: rewrite addresses and send away...
  */
 static unsigned int
@@ -1574,6 +1588,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	struct ip_vs_conn *cp;
 	int ret, pkts;
 	struct netns_ipvs *ipvs;
+	int conn_reuse_mode;
 
 	/* Already marked as IPVS request or reply? */
 	if (skb->ipvs_property)
@@ -1642,9 +1657,13 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	 */
 	cp = pp->conn_in_get(af, skb, &iph, 0);
 
-	if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp && cp->dest &&
-	    unlikely(!atomic_read(&cp->dest->weight)) && !iph.fragoffs &&
-	    is_new_conn(skb, &iph)) {
+	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
+	if (conn_reuse_mode && !iph.fragoffs &&
+	    is_new_conn(skb, &iph) && cp &&
+	    !atomic_read(&cp->n_control) &&
+	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
+	      unlikely(!atomic_read(&cp->dest->weight))) ||
+	     unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
 		ip_vs_conn_expire_now(cp);
 		__ip_vs_conn_put(cp);
 		cp = NULL;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index b8295a430a5600d35b6de4163ba3b98e75c5f28c..4a24879c6aefd55cd0d65b08efc191febb45c19d 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1808,6 +1808,12 @@ static struct ctl_table vs_vars[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "conn_reuse_mode",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 #ifdef CONFIG_IP_VS_DEBUG
 	{
 		.procname	= "debug_level",
@@ -3729,6 +3735,8 @@ static int __net_init ip_vs_control_net_init_sysctl(struct net *net)
 	ipvs->sysctl_pmtu_disc = 1;
 	tbl[idx++].data = &ipvs->sysctl_pmtu_disc;
 	tbl[idx++].data = &ipvs->sysctl_backup_only;
+	ipvs->sysctl_conn_reuse_mode = 1;
+	tbl[idx++].data = &ipvs->sysctl_conn_reuse_mode;
 
 
 	ipvs->sysctl_hdr = register_net_sysctl(net, "net/ipv4/vs", tbl);
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index c47ffd7a0a709cb73834c84652f251960f25db79..7ed4484c2c93e0aed89be55c773c94f3c80cc09e 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -850,6 +850,21 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
 	else
 		cp = ip_vs_ct_in_get(param);
 
+	if (cp && ((cp->dport != dport) ||
+		   !ip_vs_addr_equal(cp->af, &cp->daddr, daddr))) {
+		if (!(flags & IP_VS_CONN_F_INACTIVE)) {
+			ip_vs_conn_expire_now(cp);
+			__ip_vs_conn_put(cp);
+			cp = NULL;
+		} else {
+			/* This is the expiration message for the
+			 * connection that was already replaced, so we
+			 * just ignore it.
+			 */
+			goto out;
+		}
+	}
+
 	if (cp) {
 		/* Free pe_data */
 		kfree(param->pe_data);
@@ -925,6 +940,8 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
 		else
 			cp->timeout = (3*60*HZ);
 	}
+
+out:
 	ip_vs_conn_put(cp);
 }
 
-- 
1.9.3




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2014-12-30 19:49               ` Marcelo Ricardo Leitner
@ 2015-01-05 22:26                 ` Julian Anastasov
  2015-01-06 12:12                   ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2015-01-05 22:26 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: lvs-devel, hannes, jbrouer


	Hello,

On Tue, 30 Dec 2014, Marcelo Ricardo Leitner wrote:

> What if we make the backup server purge the old entry, just like the active server does, and ignore old updates if needed? Please note the new hunk on ip_vs_sync.c. This is the patch I'm using so far, and I'm seeing good results with even with master/backup. Backup server successfully purges the old and creates a new entry when it detects such update:
> 
> [  344.971673] IPVS: Bind-dest TCP c:192.168.122.1:22222 v:192.168.122.200:80 d:192.168.122.67:80 fwd:R s:0 conn->flags:A3 conn->refcnt:1 dest->refcnt:2
> [  344.972071] IPVS: Unbind-dest TCP c:192.168.122.1:22222 v:192.168.122.200:80 d:192.168.122.95:80 fwd:R s:4 conn->flags:1A3 conn->refcnt:0 dest->refcnt:2

	Good idea

> -- 8< --
> 
>  /* IPVS core functions
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 990decba1fe418e36e59a1f081fcf0e47188da29..da2760496314d49a5b4f6f588de7cb565a28e223 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -1036,6 +1036,20 @@ static inline bool is_new_conn(const struct sk_buff *skb,
>  	}
>  }
>  
> +static inline bool is_new_conn_expected(const struct ip_vs_conn *cp,
> +					int conn_reuse_mode)
> +{
> +	if ((cp->protocol != IPPROTO_TCP) && (cp->protocol != IPPROTO_SCTP))
> +		return false;

	If we want SCTP support may be we have to check
cp->state for IP_VS_SCTP_S_CLOSED, it is analog to
our IP_VS_TCP_S_TIME_WAIT state. But cp->state checks
should be per protocol, eg:

	/* Controlled (FTP DATA or persistence)? */
	if (cp->control)
		return false;
	switch (cp->protocol)
	{
	case IPPROTO_TCP:
		return ...
	case IPPROTO_SCTP:
		return cp->state == IP_VS_SCTP_S_CLOSED;
	default:
		return false;
	}

> +	/* Controlled (FTP DATA or persistence)? */
> +	if (cp->control)
> +		return false;
> +	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
> +	       ((conn_reuse_mode & 2) &&
> +		(cp->state == IP_VS_TCP_S_FIN_WAIT) &&
> +		(cp->flags & IP_VS_CONN_F_NOOUTPUT));
> +}
> +
>  /* Handle response packets: rewrite addresses and send away...
>   */
>  static unsigned int
> @@ -1574,6 +1588,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
>  	struct ip_vs_conn *cp;
>  	int ret, pkts;
>  	struct netns_ipvs *ipvs;
> +	int conn_reuse_mode;
>  
>  	/* Already marked as IPVS request or reply? */
>  	if (skb->ipvs_property)
> @@ -1642,9 +1657,13 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
>  	 */
>  	cp = pp->conn_in_get(af, skb, &iph, 0);
>  
> -	if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp && cp->dest &&
> -	    unlikely(!atomic_read(&cp->dest->weight)) && !iph.fragoffs &&
> -	    is_new_conn(skb, &iph)) {
> +	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
> +	if (conn_reuse_mode && !iph.fragoffs &&
> +	    is_new_conn(skb, &iph) && cp &&
> +	    !atomic_read(&cp->n_control) &&

	Above n_control check will prevent FTP:21 connections
to be replaced when expire_nodest_conn=1. May be the n_control
check should be moved below to avoid ip_vs_conn_expire_now?
Like this:

	if (!atomic_read(&cp->n_control))
		ip_vs_conn_expire_now(cp);
	__ip_vs_conn_put(cp);
	cp = NULL;

	I.e. old connection will not be tried for expiration
but we still create new connection. The old will not be visible
to packets.

> +	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
> +	      unlikely(!atomic_read(&cp->dest->weight))) ||
> +	     unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>  		ip_vs_conn_expire_now(cp);
>  		__ip_vs_conn_put(cp);
>  		cp = NULL;

> diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
> index c47ffd7a0a709cb73834c84652f251960f25db79..7ed4484c2c93e0aed89be55c773c94f3c80cc09e 100644
> --- a/net/netfilter/ipvs/ip_vs_sync.c
> +++ b/net/netfilter/ipvs/ip_vs_sync.c
> @@ -850,6 +850,21 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
>  	else
>  		cp = ip_vs_ct_in_get(param);
>  
> +	if (cp && ((cp->dport != dport) ||
> +		   !ip_vs_addr_equal(cp->af, &cp->daddr, daddr))) {

	Better to use here the new cp->daf field
instead of cp->af.

> +		if (!(flags & IP_VS_CONN_F_INACTIVE)) {

	I guess, we here try to avoid old sync message to
expire new connection. The problem is that UDP conns and
templates are always IP_VS_CONN_F_INACTIVE, may be a TCP+SCTP
protocol check is needed.

> +			ip_vs_conn_expire_now(cp);
> +			__ip_vs_conn_put(cp);
> +			cp = NULL;
> +		} else {
> +			/* This is the expiration message for the
> +			 * connection that was already replaced, so we
> +			 * just ignore it.
> +			 */
> +			goto out;
> +		}
> +	}
> +
>  	if (cp) {
>  		/* Free pe_data */
>  		kfree(param->pe_data);
> @@ -925,6 +940,8 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
>  		else
>  			cp->timeout = (3*60*HZ);
>  	}
> +
> +out:
>  	ip_vs_conn_put(cp);
>  }
>  

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2015-01-05 22:26                 ` Julian Anastasov
@ 2015-01-06 12:12                   ` Marcelo Ricardo Leitner
  2015-01-06 21:06                     ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-01-06 12:12 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

Hi,

On 05-01-2015 20:26, Julian Anastasov wrote:
>
> 	Hello,
>
> On Tue, 30 Dec 2014, Marcelo Ricardo Leitner wrote:
>
>> What if we make the backup server purge the old entry, just like the active server does, and ignore old updates if needed? Please note the new hunk on ip_vs_sync.c. This is the patch I'm using so far, and I'm seeing good results with even with master/backup. Backup server successfully purges the old and creates a new entry when it detects such update:
>>
>> [  344.971673] IPVS: Bind-dest TCP c:192.168.122.1:22222 v:192.168.122.200:80 d:192.168.122.67:80 fwd:R s:0 conn->flags:A3 conn->refcnt:1 dest->refcnt:2
>> [  344.972071] IPVS: Unbind-dest TCP c:192.168.122.1:22222 v:192.168.122.200:80 d:192.168.122.95:80 fwd:R s:4 conn->flags:1A3 conn->refcnt:0 dest->refcnt:2
>
> 	Good idea
>
>> -- 8< --
>>
>>   /* IPVS core functions
>> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
>> index 990decba1fe418e36e59a1f081fcf0e47188da29..da2760496314d49a5b4f6f588de7cb565a28e223 100644
>> --- a/net/netfilter/ipvs/ip_vs_core.c
>> +++ b/net/netfilter/ipvs/ip_vs_core.c
>> @@ -1036,6 +1036,20 @@ static inline bool is_new_conn(const struct sk_buff *skb,
>>   	}
>>   }
>>
>> +static inline bool is_new_conn_expected(const struct ip_vs_conn *cp,
>> +					int conn_reuse_mode)
>> +{
>> +	if ((cp->protocol != IPPROTO_TCP) && (cp->protocol != IPPROTO_SCTP))
>> +		return false;
>
> 	If we want SCTP support may be we have to check
> cp->state for IP_VS_SCTP_S_CLOSED, it is analog to
> our IP_VS_TCP_S_TIME_WAIT state. But cp->state checks
> should be per protocol, eg:
>
> 	/* Controlled (FTP DATA or persistence)? */
> 	if (cp->control)
> 		return false;
> 	switch (cp->protocol)
> 	{
> 	case IPPROTO_TCP:
> 		return ...
> 	case IPPROTO_SCTP:
> 		return cp->state == IP_VS_SCTP_S_CLOSED;
> 	default:
> 		return false;
> 	}

Ouch, indeed! Otherwise SCTP weren't going to be handled at all.

>> +	/* Controlled (FTP DATA or persistence)? */
>> +	if (cp->control)
>> +		return false;
>> +	return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>> +	       ((conn_reuse_mode & 2) &&
>> +		(cp->state == IP_VS_TCP_S_FIN_WAIT) &&
>> +		(cp->flags & IP_VS_CONN_F_NOOUTPUT));
>> +}
>> +
>>   /* Handle response packets: rewrite addresses and send away...
>>    */
>>   static unsigned int
>> @@ -1574,6 +1588,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
>>   	struct ip_vs_conn *cp;
>>   	int ret, pkts;
>>   	struct netns_ipvs *ipvs;
>> +	int conn_reuse_mode;
>>
>>   	/* Already marked as IPVS request or reply? */
>>   	if (skb->ipvs_property)
>> @@ -1642,9 +1657,13 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
>>   	 */
>>   	cp = pp->conn_in_get(af, skb, &iph, 0);
>>
>> -	if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp && cp->dest &&
>> -	    unlikely(!atomic_read(&cp->dest->weight)) && !iph.fragoffs &&
>> -	    is_new_conn(skb, &iph)) {
>> +	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
>> +	if (conn_reuse_mode && !iph.fragoffs &&
>> +	    is_new_conn(skb, &iph) && cp &&
>> +	    !atomic_read(&cp->n_control) &&
>
> 	Above n_control check will prevent FTP:21 connections
> to be replaced when expire_nodest_conn=1. May be the n_control
> check should be moved below to avoid ip_vs_conn_expire_now?
> Like this:
>
> 	if (!atomic_read(&cp->n_control))
> 		ip_vs_conn_expire_now(cp);
> 	__ip_vs_conn_put(cp);
> 	cp = NULL;
>
> 	I.e. old connection will not be tried for expiration
> but we still create new connection. The old will not be visible
> to packets.

Nice catch, ok.

>> +	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
>> +	      unlikely(!atomic_read(&cp->dest->weight))) ||
>> +	     unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>>   		ip_vs_conn_expire_now(cp);
>>   		__ip_vs_conn_put(cp);
>>   		cp = NULL;
>
>> diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
>> index c47ffd7a0a709cb73834c84652f251960f25db79..7ed4484c2c93e0aed89be55c773c94f3c80cc09e 100644
>> --- a/net/netfilter/ipvs/ip_vs_sync.c
>> +++ b/net/netfilter/ipvs/ip_vs_sync.c
>> @@ -850,6 +850,21 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
>>   	else
>>   		cp = ip_vs_ct_in_get(param);
>>
>> +	if (cp && ((cp->dport != dport) ||
>> +		   !ip_vs_addr_equal(cp->af, &cp->daddr, daddr))) {
>
> 	Better to use here the new cp->daf field
> instead of cp->af.

oh, right.

>> +		if (!(flags & IP_VS_CONN_F_INACTIVE)) {
>
> 	I guess, we here try to avoid old sync message to
> expire new connection. The problem is that UDP conns and
> templates are always IP_VS_CONN_F_INACTIVE, may be a TCP+SCTP
> protocol check is needed.

Yes, that's the idea, avoiding the old sync messages. I don't think we need a 
check for protocol here because the outer if() (the one that checks daddr and 
dport) wouldn't happen for UDP, right? Unless it was really recycled.. But 
yeah I missed the template case. Perhaps we can just move this block together 
with the previous if(!).

         if (!(flags & IP_VS_CONN_F_TEMPLATE)) {
                 cp = ip_vs_conn_in_get(param);
                 if (cp && ((cp->dport != dport) ||
                            !ip_vs_addr_equal(cp->daf, &cp->daddr, daddr))) {
                         if (!(flags & IP_VS_CONN_F_INACTIVE)) {
                                 ip_vs_conn_expire_now(cp);
                                 __ip_vs_conn_put(cp);
                                 cp = NULL;
                         } else {
                                 /* This is the expiration message for the
                                  * connection that was already replaced, so we
                                  * just ignore it.
                                  */
                                 goto out;
                         }
                 }
         } else {
                 cp = ip_vs_ct_in_get(param);
         }

Thanks,
Marcelo

>> +			ip_vs_conn_expire_now(cp);
>> +			__ip_vs_conn_put(cp);
>> +			cp = NULL;
>> +		} else {
>> +			/* This is the expiration message for the
>> +			 * connection that was already replaced, so we
>> +			 * just ignore it.
>> +			 */
>> +			goto out;
>> +		}
>> +	}
>> +
>>   	if (cp) {
>>   		/* Free pe_data */
>>   		kfree(param->pe_data);
>> @@ -925,6 +940,8 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
>>   		else
>>   			cp->timeout = (3*60*HZ);
>>   	}
>> +
>> +out:
>>   	ip_vs_conn_put(cp);
>>   }
>>
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2015-01-06 12:12                   ` Marcelo Ricardo Leitner
@ 2015-01-06 21:06                     ` Julian Anastasov
  2015-01-07 15:52                       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2015-01-06 21:06 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: lvs-devel, hannes, jbrouer


	Hello,

On Tue, 6 Jan 2015, Marcelo Ricardo Leitner wrote:

> > 	I guess, we here try to avoid old sync message to
> > expire new connection. The problem is that UDP conns and
> > templates are always IP_VS_CONN_F_INACTIVE, may be a TCP+SCTP
> > protocol check is needed.
> 
> Yes, that's the idea, avoiding the old sync messages. I don't think we need a
> check for protocol here because the outer if() (the one that checks daddr and
> dport) wouldn't happen for UDP, right? Unless it was really recycled.. But
> yeah I missed the template case. Perhaps we can just move this block together
> with the previous if(!).

	OK, just make sure patch has short lines (80):

# scripts/checkpatch.pl --strict /tmp/file.patch

> 
>         if (!(flags & IP_VS_CONN_F_TEMPLATE)) {
>                 cp = ip_vs_conn_in_get(param);
>                 if (cp && ((cp->dport != dport) ||
>                            !ip_vs_addr_equal(cp->daf, &cp->daddr, daddr))) {
>                         if (!(flags & IP_VS_CONN_F_INACTIVE)) {
>                                 ip_vs_conn_expire_now(cp);
>                                 __ip_vs_conn_put(cp);
>                                 cp = NULL;
>                         } else {

	I assume we will not stop here sync for some connection that
was normally expired in master but was delayed in backup. TCP sync
starts for EST state, so I think it will hit the above case.

>                                 /* This is the expiration message for the
>                                  * connection that was already replaced, so we
>                                  * just ignore it.
>                                  */
>                                 goto out;
>                         }
>                 }
>         } else {
>                 cp = ip_vs_ct_in_get(param);
>         }

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2015-01-06 21:06                     ` Julian Anastasov
@ 2015-01-07 15:52                       ` Marcelo Ricardo Leitner
  2015-01-07 19:31                         ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-01-07 15:52 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

Hi,

On 06-01-2015 19:06, Julian Anastasov wrote:
>
> 	Hello,
>
> On Tue, 6 Jan 2015, Marcelo Ricardo Leitner wrote:
>
>>> 	I guess, we here try to avoid old sync message to
>>> expire new connection. The problem is that UDP conns and
>>> templates are always IP_VS_CONN_F_INACTIVE, may be a TCP+SCTP
>>> protocol check is needed.
>>
>> Yes, that's the idea, avoiding the old sync messages. I don't think we need a
>> check for protocol here because the outer if() (the one that checks daddr and
>> dport) wouldn't happen for UDP, right? Unless it was really recycled.. But
>> yeah I missed the template case. Perhaps we can just move this block together
>> with the previous if(!).
>
> 	OK, just make sure patch has short lines (80):
>
> # scripts/checkpatch.pl --strict /tmp/file.patch

Sure thing

>>
>>          if (!(flags & IP_VS_CONN_F_TEMPLATE)) {
>>                  cp = ip_vs_conn_in_get(param);
>>                  if (cp && ((cp->dport != dport) ||
>>                             !ip_vs_addr_equal(cp->daf, &cp->daddr, daddr))) {
>>                          if (!(flags & IP_VS_CONN_F_INACTIVE)) {
>>                                  ip_vs_conn_expire_now(cp);
>>                                  __ip_vs_conn_put(cp);
>>                                  cp = NULL;
>>                          } else {
>
> 	I assume we will not stop here sync for some connection that
> was normally expired in master but was delayed in backup. TCP sync
> starts for EST state, so I think it will hit the above case.

You mean that we could end up ignoring a sync msg that we shouldn't ignore?

Cheers,
Marcelo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2015-01-07 15:52                       ` Marcelo Ricardo Leitner
@ 2015-01-07 19:31                         ` Julian Anastasov
  2015-01-08 12:35                           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2015-01-07 19:31 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: lvs-devel, hannes, jbrouer


	Hello,

On Wed, 7 Jan 2015, Marcelo Ricardo Leitner wrote:

> On 06-01-2015 19:06, Julian Anastasov wrote:
> >
> > On Tue, 6 Jan 2015, Marcelo Ricardo Leitner wrote:
> >
> > >
> > >          if (!(flags & IP_VS_CONN_F_TEMPLATE)) {
> > >                  cp = ip_vs_conn_in_get(param);
> > >                  if (cp && ((cp->dport != dport) ||
> > >                             !ip_vs_addr_equal(cp->daf, &cp->daddr,
> > >                          daddr))) {
> > >                          if (!(flags & IP_VS_CONN_F_INACTIVE)) {
> > >                                  ip_vs_conn_expire_now(cp);
> > >                                  __ip_vs_conn_put(cp);
> > >                                  cp = NULL;
> > >                          } else {
> >
> > 	I assume we will not stop here sync for some connection that
> > was normally expired in master but was delayed in backup. TCP sync
> > starts for EST state, so I think it will hit the above case.
> 
> You mean that we could end up ignoring a sync msg that we shouldn't ignore?

	Yes, that was my worry but I don't see how the code
can fail, so it looks fine to me.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
  2015-01-07 19:31                         ` Julian Anastasov
@ 2015-01-08 12:35                           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-01-08 12:35 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, hannes, jbrouer

Hi,

On 07-01-2015 17:31, Julian Anastasov wrote:
>
> 	Hello,
>
> On Wed, 7 Jan 2015, Marcelo Ricardo Leitner wrote:
>
>> On 06-01-2015 19:06, Julian Anastasov wrote:
>>>
>>> On Tue, 6 Jan 2015, Marcelo Ricardo Leitner wrote:
>>>
>>>>
>>>>           if (!(flags & IP_VS_CONN_F_TEMPLATE)) {
>>>>                   cp = ip_vs_conn_in_get(param);
>>>>                   if (cp && ((cp->dport != dport) ||
>>>>                              !ip_vs_addr_equal(cp->daf, &cp->daddr,
>>>>                           daddr))) {
>>>>                           if (!(flags & IP_VS_CONN_F_INACTIVE)) {
>>>>                                   ip_vs_conn_expire_now(cp);
>>>>                                   __ip_vs_conn_put(cp);
>>>>                                   cp = NULL;
>>>>                           } else {
>>>
>>> 	I assume we will not stop here sync for some connection that
>>> was normally expired in master but was delayed in backup. TCP sync
>>> starts for EST state, so I think it will hit the above case.
>>
>> You mean that we could end up ignoring a sync msg that we shouldn't ignore?
>
> 	Yes, that was my worry but I don't see how the code
> can fail, so it looks fine to me.

Ok, cool. I'll do more tests and probably post this by next week.

Thanks Julian.

Regards,
Marcelo


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-01-08 12:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-08 14:27 [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT Marcelo Ricardo Leitner
2014-12-08 14:29 ` Marcelo Ricardo Leitner
2014-12-09 23:37 ` Julian Anastasov
2014-12-10 12:34   ` Marcelo Ricardo Leitner
2014-12-10 16:56     ` Marcelo Ricardo Leitner
2014-12-10 21:27       ` Julian Anastasov
2014-12-11 12:44         ` Marcelo Ricardo Leitner
2014-12-11 22:39           ` Julian Anastasov
2014-12-12 11:58             ` Marcelo Ricardo Leitner
2014-12-30 19:49               ` Marcelo Ricardo Leitner
2015-01-05 22:26                 ` Julian Anastasov
2015-01-06 12:12                   ` Marcelo Ricardo Leitner
2015-01-06 21:06                     ` Julian Anastasov
2015-01-07 15:52                       ` Marcelo Ricardo Leitner
2015-01-07 19:31                         ` Julian Anastasov
2015-01-08 12:35                           ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.