From: Andrew Kim <kim.andrewsy@gmail.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: "David S. Miller" <davem@davemloft.net>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
Wensong Zhang <wensong@linux-vs.org>,
Simon Horman <horms@verge.net.au>,
Jakub Kicinski <kuba@kernel.org>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Jozsef Kadlecsik <kadlec@netfilter.org>,
Florian Westphal <fw@strlen.de>,
"open list:IPVS" <netdev@vger.kernel.org>,
"open list:IPVS" <lvs-devel@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
"open list:NETFILTER" <netfilter-devel@vger.kernel.org>,
"open list:NETFILTER" <coreteam@netfilter.org>
Subject: Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1
Date: Mon, 18 May 2020 15:54:30 -0400 [thread overview]
Message-ID: <CABc050G-yW-frv0mCmg=hMnC4iOx9Ht2Zv8eoS1cxQ8uKX6NQw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.21.2005182027460.4524@ja.home.ssi.bg>
Hi Julian,
Thank you for getting back to me. I will update the patch based on
your feedback shortly.
Regards,
Andrew
On Mon, May 18, 2020 at 3:10 PM Julian Anastasov <ja@ssi.bg> wrote:
>
>
> Hello,
>
> On Sun, 17 May 2020, Andrew Sy Kim wrote:
>
> > If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should
> > also expire all matching connections immiediately instead of waiting for
> > the next matching packet. This is particulary useful when there are a
> > lot of packets coming from a few number of clients. Those clients are
> > likely to match against existing entries if a source port in the
> > connection hash is reused. When the number of entries in the connection
> > tracker is large, we can significantly reduce the number of dropped
> > packets by expiring all connections upon deletion.
> >
> > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
> > ---
> > include/net/ip_vs.h | 7 ++++++
> > net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++
> > net/netfilter/ipvs/ip_vs_core.c | 5 -----
> > net/netfilter/ipvs/ip_vs_ctl.c | 9 ++++++++
> > 4 files changed, 54 insertions(+), 5 deletions(-)
> >
>
> > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> > index 02f2f636798d..c69dfbbc3416 100644
> > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs)
> > goto flush_again;
> > }
> > }
> > +
> > +/* Flush all the connection entries in the ip_vs_conn_tab with a
> > + * matching destination.
> > + */
> > +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest)
> > +{
> > + int idx;
> > + struct ip_vs_conn *cp, *cp_c;
> > +
> > + rcu_read_lock();
> > + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
> > + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) {
> > + if (cp->ipvs != ipvs)
> > + continue;
> > +
> > + if (cp->dest != dest)
> > + continue;
> > +
> > + /* As timers are expired in LIFO order, restart
> > + * the timer of controlling connection first, so
> > + * that it is expired after us.
> > + */
> > + cp_c = cp->control;
> > + /* cp->control is valid only with reference to cp */
> > + if (cp_c && __ip_vs_conn_get(cp)) {
> > + IP_VS_DBG(4, "del controlling connection\n");
> > + ip_vs_conn_expire_now(cp_c);
> > + __ip_vs_conn_put(cp);
> > + }
> > + IP_VS_DBG(4, "del connection\n");
> > + ip_vs_conn_expire_now(cp);
> > + }
> > + cond_resched_rcu();
>
> Such kind of loop is correct if done in another context:
>
> 1. kthread
> or
> 2. delayed work: mod_delayed_work(system_long_wq, ...)
>
> Otherwise cond_resched_rcu() can schedule() while holding
> __ip_vs_mutex. Also, it will add long delay if many dests are
> removed.
>
> If such loop analyzes instead all cp->dest for
> IP_VS_DEST_F_AVAILABLE, it should be done after calling
> __ip_vs_conn_get().
>
> > static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; }
> > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> > index 8d14a1acbc37..f87c03622874 100644
> > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> > @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
> > */
> > __ip_vs_del_dest(svc->ipvs, dest, false);
> >
> > + /* If expire_nodest_conn is enabled and protocol is UDP,
> > + * attempt best effort flush of all connections with this
> > + * destination.
> > + */
> > + if (sysctl_expire_nodest_conn(svc->ipvs) &&
> > + dest->protocol == IPPROTO_UDP) {
> > + ip_vs_conn_flush_dest(svc->ipvs, dest);
>
> Above work should be scheduled from __ip_vs_del_dest().
> Check for UDP is not needed, sysctl_expire_nodest_conn() is for
> all protocols.
>
> If the flushing is complex to implement, we can still allow
> rescheduling for unavailable dests:
>
> - first we should move this block above the ip_vs_try_to_schedule()
> block because:
>
> 1. the scheduling does not return unavailabel dests, even
> for persistence, so no need to check new connections for
> the flag
>
> 2. it will allow to create new connection if dest for
> existing connection is unavailable
>
> if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
> /* the destination server is not available */
>
> if (sysctl_expire_nodest_conn(ipvs)) {
> bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
>
> ip_vs_conn_expire_now(cp);
> __ip_vs_conn_put(cp);
> if (uses_ct)
> return NF_DROP;
> cp = NULL;
> } else {
> __ip_vs_conn_put(cp);
> return NF_DROP;
> }
> }
>
> if (unlikely(!cp)) {
> int v;
>
> if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph))
> return v;
> }
>
> Before now, we always waited one jiffie connection to expire,
> now one packet will:
>
> - schedule expiration for existing connection with unavailable dest,
> as before
>
> - create new connection to available destination that will be found
> first in lists. But it can work only when sysctl var "conntrack" is 0,
> we do not want to create two netfilter conntracks to different
> real servers.
>
> Note that we intentionally removed the timer_pending() check
> because we can not see existing ONE_PACKET connections in table.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
next prev parent reply other threads:[~2020-05-18 19:54 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-15 1:35 [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 Andrew Sy Kim
2020-05-15 18:07 ` Julian Anastasov
2020-05-17 17:27 ` Andrew Kim
2020-05-17 17:30 ` Andrew Kim
2020-05-17 17:16 ` [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 Andrew Sy Kim
2020-05-18 19:10 ` Julian Anastasov
2020-05-18 19:54 ` Andrew Kim [this message]
2020-05-19 11:46 ` Marco Angaroni
2020-05-19 14:18 ` Andrew Kim
2020-05-19 19:46 ` Julian Anastasov
2020-05-24 21:31 ` [PATCH] netfilter/ipvs: immediately expire no destination connections in kthread " Andrew Sy Kim
2020-05-26 21:24 ` Julian Anastasov
2020-05-26 21:47 ` Andrew Kim
2020-05-28 1:41 ` [PATCH] netfilter/ipvs: queue delayed work to expire no destination connections " Andrew Sy Kim
2020-05-28 17:26 ` Julian Anastasov
2020-06-08 17:22 ` Andrew Sy Kim
2020-06-08 17:29 ` Andrew Kim
2020-06-08 17:34 ` Andrew Sy Kim
2020-06-08 20:20 ` Andrew Sy Kim
2020-06-08 20:24 ` Andrew Kim
2020-06-15 19:24 ` Julian Anastasov
2020-07-01 21:24 ` Andrew Sy Kim
2020-07-02 4:33 ` Julian Anastasov
2020-07-08 13:58 ` [PATCH net-next] " Andrew Sy Kim
2020-07-08 16:00 ` Julian Anastasov
2020-07-08 16:06 ` [PATCHv2 net-next] ipvs: " Andrew Sy Kim
2020-07-08 16:12 ` Pablo Neira Ayuso
2020-07-08 16:14 ` Andrew Kim
2020-07-08 16:16 ` [PATCH " Andrew Sy Kim
2020-07-08 17:19 ` Julian Anastasov
2020-07-15 18:54 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABc050G-yW-frv0mCmg=hMnC4iOx9Ht2Zv8eoS1cxQ8uKX6NQw@mail.gmail.com' \
--to=kim.andrewsy@gmail.com \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=horms@verge.net.au \
--cc=ja@ssi.bg \
--cc=kadlec@netfilter.org \
--cc=kuba@kernel.org \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=lvs-devel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=wensong@linux-vs.org \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).