netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Kim <kim.andrewsy@gmail.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: "David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Wensong Zhang <wensong@linux-vs.org>,
	Simon Horman <horms@verge.net.au>,
	Jakub Kicinski <kuba@kernel.org>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Jozsef Kadlecsik <kadlec@netfilter.org>,
	Florian Westphal <fw@strlen.de>,
	"open list:IPVS" <netdev@vger.kernel.org>,
	"open list:IPVS" <lvs-devel@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:NETFILTER" <netfilter-devel@vger.kernel.org>,
	"open list:NETFILTER" <coreteam@netfilter.org>
Subject: Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1
Date: Mon, 18 May 2020 15:54:30 -0400	[thread overview]
Message-ID: <CABc050G-yW-frv0mCmg=hMnC4iOx9Ht2Zv8eoS1cxQ8uKX6NQw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.21.2005182027460.4524@ja.home.ssi.bg>

Hi Julian,

Thank you for getting back to me. I will update the patch based on
your feedback shortly.

Regards,

Andrew

On Mon, May 18, 2020 at 3:10 PM Julian Anastasov <ja@ssi.bg> wrote:
>
>
>         Hello,
>
> On Sun, 17 May 2020, Andrew Sy Kim wrote:
>
> > If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should
> > also expire all matching connections immiediately instead of waiting for
> > the next matching packet. This is particulary useful when there are a
> > lot of packets coming from a few number of clients. Those clients are
> > likely to match against existing entries if a source port in the
> > connection hash is reused. When the number of entries in the connection
> > tracker is large, we can significantly reduce the number of dropped
> > packets by expiring all connections upon deletion.
> >
> > Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
> > ---
> >  include/net/ip_vs.h             |  7 ++++++
> >  net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++
> >  net/netfilter/ipvs/ip_vs_core.c |  5 -----
> >  net/netfilter/ipvs/ip_vs_ctl.c  |  9 ++++++++
> >  4 files changed, 54 insertions(+), 5 deletions(-)
> >
>
> > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> > index 02f2f636798d..c69dfbbc3416 100644
> > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs)
> >               goto flush_again;
> >       }
> >  }
> > +
> > +/*   Flush all the connection entries in the ip_vs_conn_tab with a
> > + *   matching destination.
> > + */
> > +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest)
> > +{
> > +     int idx;
> > +     struct ip_vs_conn *cp, *cp_c;
> > +
> > +     rcu_read_lock();
> > +     for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
> > +             hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) {
> > +                     if (cp->ipvs != ipvs)
> > +                             continue;
> > +
> > +                     if (cp->dest != dest)
> > +                             continue;
> > +
> > +                     /* As timers are expired in LIFO order, restart
> > +                      * the timer of controlling connection first, so
> > +                      * that it is expired after us.
> > +                      */
> > +                     cp_c = cp->control;
> > +                     /* cp->control is valid only with reference to cp */
> > +                     if (cp_c && __ip_vs_conn_get(cp)) {
> > +                             IP_VS_DBG(4, "del controlling connection\n");
> > +                             ip_vs_conn_expire_now(cp_c);
> > +                             __ip_vs_conn_put(cp);
> > +                     }
> > +                     IP_VS_DBG(4, "del connection\n");
> > +                     ip_vs_conn_expire_now(cp);
> > +             }
> > +             cond_resched_rcu();
>
>         Such kind of loop is correct if done in another context:
>
> 1. kthread
> or
> 2. delayed work: mod_delayed_work(system_long_wq, ...)
>
>         Otherwise cond_resched_rcu() can schedule() while holding
> __ip_vs_mutex. Also, it will add long delay if many dests are
> removed.
>
>         If such loop analyzes instead all cp->dest for
> IP_VS_DEST_F_AVAILABLE, it should be done after calling
> __ip_vs_conn_get().
>
> >  static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; }
> > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> > index 8d14a1acbc37..f87c03622874 100644
> > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> > @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
> >        */
> >       __ip_vs_del_dest(svc->ipvs, dest, false);
> >
> > +     /*      If expire_nodest_conn is enabled and protocol is UDP,
> > +      *      attempt best effort flush of all connections with this
> > +      *      destination.
> > +      */
> > +     if (sysctl_expire_nodest_conn(svc->ipvs) &&
> > +         dest->protocol == IPPROTO_UDP) {
> > +             ip_vs_conn_flush_dest(svc->ipvs, dest);
>
>         Above work should be scheduled from __ip_vs_del_dest().
> Check for UDP is not needed, sysctl_expire_nodest_conn() is for
> all protocols.
>
>         If the flushing is complex to implement, we can still allow
> rescheduling for unavailable dests:
>
> - first we should move this block above the ip_vs_try_to_schedule()
> block because:
>
>         1. the scheduling does not return unavailabel dests, even
>         for persistence, so no need to check new connections for
>         the flag
>
>         2. it will allow to create new connection if dest for
>         existing connection is unavailable
>
>         if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
>                 /* the destination server is not available */
>
>                 if (sysctl_expire_nodest_conn(ipvs)) {
>                         bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
>
>                         ip_vs_conn_expire_now(cp);
>                         __ip_vs_conn_put(cp);
>                         if (uses_ct)
>                                 return NF_DROP;
>                         cp = NULL;
>                 } else {
>                         __ip_vs_conn_put(cp);
>                         return NF_DROP;
>                 }
>         }
>
>         if (unlikely(!cp)) {
>                 int v;
>
>                 if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph))
>                         return v;
>         }
>
>         Before now, we always waited one jiffie connection to expire,
> now one packet will:
>
> - schedule expiration for existing connection with unavailable dest,
> as before
>
> - create new connection to available destination that will be found
> first in lists. But it can work only when sysctl var "conntrack" is 0,
> we do not want to create two netfilter conntracks to different
> real servers.
>
>         Note that we intentionally removed the timer_pending() check
> because we can not see existing ONE_PACKET connections in table.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

  reply	other threads:[~2020-05-18 19:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15  1:35 [PATCH] netfilter/ipvs: expire no destination UDP connections when expire_nodest_conn=1 Andrew Sy Kim
2020-05-15 18:07 ` Julian Anastasov
2020-05-17 17:27   ` Andrew Kim
2020-05-17 17:30     ` Andrew Kim
2020-05-17 17:16 ` [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1 Andrew Sy Kim
2020-05-18 19:10   ` Julian Anastasov
2020-05-18 19:54     ` Andrew Kim [this message]
2020-05-19 11:46       ` Marco Angaroni
2020-05-19 14:18         ` Andrew Kim
2020-05-19 19:46         ` Julian Anastasov
2020-05-24 21:31           ` [PATCH] netfilter/ipvs: immediately expire no destination connections in kthread " Andrew Sy Kim
2020-05-26 21:24             ` Julian Anastasov
2020-05-26 21:47               ` Andrew Kim
2020-05-28  1:41                 ` [PATCH] netfilter/ipvs: queue delayed work to expire no destination connections " Andrew Sy Kim
2020-05-28 17:26                   ` Julian Anastasov
2020-06-08 17:22                     ` Andrew Sy Kim
2020-06-08 17:29                       ` Andrew Kim
2020-06-08 17:34                     ` Andrew Sy Kim
2020-06-08 20:20                       ` Andrew Sy Kim
2020-06-08 20:24                         ` Andrew Kim
2020-06-15 19:24                         ` Julian Anastasov
2020-07-01 21:24                           ` Andrew Sy Kim
2020-07-02  4:33                             ` Julian Anastasov
2020-07-08 13:58                               ` [PATCH net-next] " Andrew Sy Kim
2020-07-08 16:00                                 ` Julian Anastasov
2020-07-08 16:06                                   ` [PATCHv2 net-next] ipvs: " Andrew Sy Kim
2020-07-08 16:12                                     ` Pablo Neira Ayuso
2020-07-08 16:14                                       ` Andrew Kim
2020-07-08 16:16                                       ` [PATCH " Andrew Sy Kim
2020-07-08 17:19                                         ` Julian Anastasov
2020-07-15 18:54                                           ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABc050G-yW-frv0mCmg=hMnC4iOx9Ht2Zv8eoS1cxQ8uKX6NQw@mail.gmail.com' \
    --to=kim.andrewsy@gmail.com \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=horms@verge.net.au \
    --cc=ja@ssi.bg \
    --cc=kadlec@netfilter.org \
    --cc=kuba@kernel.org \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvs-devel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=wensong@linux-vs.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).