All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julian Anastasov <ja@ssi.bg>
To: Andrew Sy Kim <kim.andrewsy@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Wensong Zhang <wensong@linux-vs.org>,
	Simon Horman <horms@verge.net.au>,
	Jakub Kicinski <kuba@kernel.org>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Jozsef Kadlecsik <kadlec@netfilter.org>,
	Florian Westphal <fw@strlen.de>,
	netdev@vger.kernel.org, lvs-devel@vger.kernel.org,
	linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org,
	coreteam@netfilter.org
Subject: Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1
Date: Mon, 18 May 2020 22:10:32 +0300 (EEST)	[thread overview]
Message-ID: <alpine.LFD.2.21.2005182027460.4524@ja.home.ssi.bg> (raw)
In-Reply-To: <20200517171654.8194-1-kim.andrewsy@gmail.com>


	Hello,

On Sun, 17 May 2020, Andrew Sy Kim wrote:

> If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should
> also expire all matching connections immiediately instead of waiting for
> the next matching packet. This is particulary useful when there are a
> lot of packets coming from a few number of clients. Those clients are
> likely to match against existing entries if a source port in the
> connection hash is reused. When the number of entries in the connection
> tracker is large, we can significantly reduce the number of dropped
> packets by expiring all connections upon deletion.
> 
> Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
> ---
>  include/net/ip_vs.h             |  7 ++++++
>  net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++
>  net/netfilter/ipvs/ip_vs_core.c |  5 -----
>  net/netfilter/ipvs/ip_vs_ctl.c  |  9 ++++++++
>  4 files changed, 54 insertions(+), 5 deletions(-)
> 

> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index 02f2f636798d..c69dfbbc3416 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs)
>  		goto flush_again;
>  	}
>  }
> +
> +/*	Flush all the connection entries in the ip_vs_conn_tab with a
> + *	matching destination.
> + */
> +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest)
> +{
> +	int idx;
> +	struct ip_vs_conn *cp, *cp_c;
> +
> +	rcu_read_lock();
> +	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
> +		hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) {
> +			if (cp->ipvs != ipvs)
> +				continue;
> +
> +			if (cp->dest != dest)
> +				continue;
> +
> +			/* As timers are expired in LIFO order, restart
> +			 * the timer of controlling connection first, so
> +			 * that it is expired after us.
> +			 */
> +			cp_c = cp->control;
> +			/* cp->control is valid only with reference to cp */
> +			if (cp_c && __ip_vs_conn_get(cp)) {
> +				IP_VS_DBG(4, "del controlling connection\n");
> +				ip_vs_conn_expire_now(cp_c);
> +				__ip_vs_conn_put(cp);
> +			}
> +			IP_VS_DBG(4, "del connection\n");
> +			ip_vs_conn_expire_now(cp);
> +		}
> +		cond_resched_rcu();

	Such kind of loop is correct if done in another context:

1. kthread
or
2. delayed work: mod_delayed_work(system_long_wq, ...)

	Otherwise cond_resched_rcu() can schedule() while holding
__ip_vs_mutex. Also, it will add long delay if many dests are
removed.

	If such loop analyzes instead all cp->dest for 
IP_VS_DEST_F_AVAILABLE, it should be done after calling
__ip_vs_conn_get().

>  static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; }
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 8d14a1acbc37..f87c03622874 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
>  	 */
>  	__ip_vs_del_dest(svc->ipvs, dest, false);
>  
> +	/*	If expire_nodest_conn is enabled and protocol is UDP,
> +	 *	attempt best effort flush of all connections with this
> +	 *	destination.
> +	 */
> +	if (sysctl_expire_nodest_conn(svc->ipvs) &&
> +	    dest->protocol == IPPROTO_UDP) {
> +		ip_vs_conn_flush_dest(svc->ipvs, dest);

	Above work should be scheduled from __ip_vs_del_dest().
Check for UDP is not needed, sysctl_expire_nodest_conn() is for
all protocols.

	If the flushing is complex to implement, we can still allow
rescheduling for unavailable dests:

- first we should move this block above the ip_vs_try_to_schedule() 
block because:

	1. the scheduling does not return unavailabel dests, even
	for persistence, so no need to check new connections for
	the flag

	2. it will allow to create new connection if dest for
	existing connection is unavailable

	if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
		/* the destination server is not available */

		if (sysctl_expire_nodest_conn(ipvs)) {
			bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb);

			ip_vs_conn_expire_now(cp);
			__ip_vs_conn_put(cp);
			if (uses_ct)
				return NF_DROP;
			cp = NULL;
		} else {
			__ip_vs_conn_put(cp);
			return NF_DROP;
		}
	}

	if (unlikely(!cp)) {
		int v;

		if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph))
			return v;
	}

	Before now, we always waited one jiffie connection to expire,
now one packet will:

- schedule expiration for existing connection with unavailable dest,
as before

- create new connection to available destination that will be found
first in lists. But it can work only when sysctl var "conntrack" is 0,
we do not want to create two netfilter conntracks to different
real servers.

	Note that we intentionally removed the timer_pending() check
because we can not see existing ONE_PACKET connections in table.

Regards

--
Julian Anastasov <ja@ssi.bg>

  reply	other threads:[~2020-05-18 19:11 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15  1:35 [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 Andrew Sy Kim
2020-05-15  1:35 ` Andrew Sy Kim
2020-05-15 18:07 ` Julian Anastasov
2020-05-15 18:07   ` Julian Anastasov
2020-05-17 17:27   ` Andrew Kim
2020-05-17 17:30     ` Andrew Kim
2020-05-17 17:16 ` [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 Andrew Sy Kim
2020-05-17 17:16   ` Andrew Sy Kim
2020-05-18 19:10   ` Julian Anastasov [this message]
2020-05-18 19:54     ` Andrew Kim
2020-05-19 11:46       ` Marco Angaroni
2020-05-19 11:46         ` Marco Angaroni
2020-05-19 14:18         ` Andrew Kim
2020-05-19 19:46         ` Julian Anastasov
2020-05-24 21:31           ` [PATCH] netfilter/ipvs: immediately expire no destination connections in kthread " Andrew Sy Kim
2020-05-24 21:31             ` Andrew Sy Kim
2020-05-26 21:24             ` Julian Anastasov
2020-05-26 21:47               ` Andrew Kim
2020-05-28  1:41                 ` [PATCH] netfilter/ipvs: queue delayed work to expire no destination connections " Andrew Sy Kim
2020-05-28  1:41                   ` Andrew Sy Kim
2020-05-28 17:26                   ` Julian Anastasov
2020-06-08 17:22                     ` Andrew Sy Kim
2020-06-08 17:22                       ` Andrew Sy Kim
2020-06-08 17:29                       ` Andrew Kim
2020-06-08 17:29                         ` Andrew Kim
2020-06-08 17:34                     ` Andrew Sy Kim
2020-06-08 17:34                       ` Andrew Sy Kim
2020-06-08 20:20                       ` Andrew Sy Kim
2020-06-08 20:20                         ` Andrew Sy Kim
2020-06-08 20:24                         ` Andrew Kim
2020-06-08 20:24                           ` Andrew Kim
2020-06-15 19:24                         ` Julian Anastasov
2020-07-01 21:24                           ` Andrew Sy Kim
2020-07-01 21:24                             ` Andrew Sy Kim
2020-07-02  4:33                             ` Julian Anastasov
2020-07-08 13:58                               ` [PATCH net-next] " Andrew Sy Kim
2020-07-08 13:58                                 ` Andrew Sy Kim
2020-07-08 16:00                                 ` Julian Anastasov
2020-07-08 16:06                                   ` [PATCHv2 net-next] ipvs: " Andrew Sy Kim
2020-07-08 16:06                                     ` Andrew Sy Kim
2020-07-08 16:12                                     ` Pablo Neira Ayuso
2020-07-08 16:14                                       ` Andrew Kim
2020-07-08 16:16                                       ` [PATCH " Andrew Sy Kim
2020-07-08 16:16                                         ` Andrew Sy Kim
2020-07-08 17:19                                         ` Julian Anastasov
2020-07-15 18:54                                           ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.21.2005182027460.4524@ja.home.ssi.bg \
    --to=ja@ssi.bg \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=horms@verge.net.au \
    --cc=kadlec@netfilter.org \
    --cc=kim.andrewsy@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvs-devel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=wensong@linux-vs.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.