From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.3 required=3.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5771C433DF for ; Mon, 8 Jun 2020 17:34:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C4B720870 for ; Mon, 8 Jun 2020 17:34:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="puuml+9+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729964AbgFHRe1 (ORCPT ); Mon, 8 Jun 2020 13:34:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726097AbgFHRe0 (ORCPT ); Mon, 8 Jun 2020 13:34:26 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62473C08C5C2; Mon, 8 Jun 2020 10:34:26 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id g11so2266203qvs.2; Mon, 08 Jun 2020 10:34:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BQfHe5nHe6T64nJZ48xJuQUzNOsqzYNjV6sGWHRurS4=; b=puuml+9+jmfM35SwDwyG9trAffIsVPJBds++JHY9xjCronp2lDcl/8OKgSI5xJPXzv 25YVqtEvrByw5S5E/4ew9TsmUi2Cd9IeYMZootnX8yrXX7DsowKoBEQkSEU478yipB7S yzOyi2T+Ghi0V5iNaFvWv6tGrcXbCpJwumo8/ck9EEfzdHZhqKcK/RTpjDGWxdgGbC65 LHDHn9gbDA3ENMMy50yl3pdLxhOTANsBgY0X61Nz5wPxMmgzv4ZcXoiFVO+yZKPzECU/ hWNMMpGAazjtwnT7eIjYICEJdcP0rw8qiGlErCqIEBzF9/GcIPx3QfYLUnBInwolTJ91 mpCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BQfHe5nHe6T64nJZ48xJuQUzNOsqzYNjV6sGWHRurS4=; b=FqD/vJkeXiNnSukVnKWLV0jO+GmAPLrx/sLYq29WVjJyRkS/Y7EheUeq4zFumnOVib Jx7LLTCuZ33Lh1VvLxSpFXw34dn53YB2enj97f59K6ex/bMKI17B2f+NVcrQviBxm4Fc uAWbN+cZjFY6EBbihYrpdpL/8drHW5EgpGpgvB3Fb89y4pkAMjysH/94vl4OvmadFRB4 fCjIOC3S1/hVmM1Kvngu1R0iwx5/7DxP1hMbN0ntb+I8s8gCh/G0gwsgyrtnwrZhUNKw RG/5oGNZdfJlYCRysegOL3BHhWlPkd+0k6BlKuanmZeB8esKRB5uXCSGbdOsGlWOkj5s 5+NA== X-Gm-Message-State: AOAM531pSwZ3pvEGEmwnLE7WSXyn+lylKEKd18tXejgPCr3tMoxExN5v dQDqOIj3z0M8/GtWZuyxYx1LawB07FM= X-Google-Smtp-Source: ABdhPJyjWN6a0UUnfIvQLXudDSgdikKY0iFJ5htAzJpGtXqaiO9QCf+N4obwNviOfORdXBCZNqOpjw== X-Received: by 2002:a0c:b791:: with SMTP id l17mr24070471qve.44.1591637665518; Mon, 08 Jun 2020 10:34:25 -0700 (PDT) Received: from localhost.localdomain (toroon0411w-lp130-02-64-231-189-42.dsl.bell.ca. [64.231.189.42]) by smtp.googlemail.com with ESMTPSA id a27sm8182805qtc.92.2020.06.08.10.34.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2020 10:34:24 -0700 (PDT) From: Andrew Sy Kim Cc: Julian Anastasov , Wensong Zhang , Simon Horman , lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org, Andrew Sy Kim Subject: [PATCH] netfilter/ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1 Date: Mon, 8 Jun 2020 13:34:12 -0400 Message-Id: <20200608173413.13870-1-kim.andrewsy@gmail.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit To: unlisted-recipients:; (no To-header on input) Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org When expire_nodest_conn=1 and a destination is deleted, IPVS does not expire the existing connections until the next matching incoming packet. If there are many connection entries from a single client to a single destination, many packets may get dropped before all the connections are expired (more likely with lots of UDP traffic). An optimization can be made where upon deletion of a destination, IPVS queues up delayed work to immediately expire any connections with a deleted destination. This ensures any reused source ports from a client (within the IPVS timeouts) are scheduled to new real servers instead of silently dropped. Signed-off-by: Andrew Sy Kim --- include/net/ip_vs.h | 29 +++++++++++++++++++++ net/netfilter/ipvs/ip_vs_conn.c | 43 +++++++++++++++++++++++++++++++ net/netfilter/ipvs/ip_vs_core.c | 45 +++++++++++++-------------------- net/netfilter/ipvs/ip_vs_ctl.c | 22 ++++++++++++++++ 4 files changed, 112 insertions(+), 27 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 83be2d93b407..49ca61765298 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -14,6 +14,7 @@ #include /* for struct rwlock_t */ #include /* for struct atomic_t */ #include /* for struct refcount_t */ +#include #include #include @@ -885,6 +886,8 @@ struct netns_ipvs { atomic_t conn_out_counter; #ifdef CONFIG_SYSCTL + /* delayed work for expiring no dest connections */ + struct delayed_work expire_nodest_conn_work; /* 1/rate drop and drop-entry variables */ struct delayed_work defense_work; /* Work handler */ int drop_rate; @@ -1049,6 +1052,11 @@ static inline int sysctl_conn_reuse_mode(struct netns_ipvs *ipvs) return ipvs->sysctl_conn_reuse_mode; } +static inline int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) +{ + return ipvs->sysctl_expire_nodest_conn; +} + static inline int sysctl_schedule_icmp(struct netns_ipvs *ipvs) { return ipvs->sysctl_schedule_icmp; @@ -1136,6 +1144,11 @@ static inline int sysctl_conn_reuse_mode(struct netns_ipvs *ipvs) return 1; } +static inline int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) +{ + return 0; +} + static inline int sysctl_schedule_icmp(struct netns_ipvs *ipvs) { return 0; @@ -1505,6 +1518,22 @@ static inline int ip_vs_todrop(struct netns_ipvs *ipvs) static inline int ip_vs_todrop(struct netns_ipvs *ipvs) { return 0; } #endif +#ifdef CONFIG_SYSCTL +/* Enqueue delayed work for expiring no dest connections + * Only run when sysctl_expire_nodest=1 + */ +static inline void ip_vs_enqueue_expire_nodest_conns(struct netns_ipvs *ipvs) +{ + if (sysctl_expire_nodest_conn(ipvs)) + queue_delayed_work(system_long_wq, + &ipvs->expire_nodest_conn_work, 1); +} + +void ip_vs_expire_nodest_conn_flush(struct netns_ipvs *ipvs); +#else +static inline void ip_vs_enqueue_expire_nodest_conns(struct netns_ipvs) {} +#endif + #define IP_VS_DFWD_METHOD(dest) (atomic_read(&(dest)->conn_flags) & \ IP_VS_CONN_F_FWD_MASK) diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index 02f2f636798d..8396ab098953 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1366,6 +1366,49 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs) goto flush_again; } } + +#ifdef CONFIG_SYSCTL +void ip_vs_expire_nodest_conn_flush(struct netns_ipvs *ipvs) +{ + int idx; + struct ip_vs_conn *cp, *cp_c; + struct ip_vs_dest *dest; + + rcu_read_lock(); + for (idx = 0; idx < ip_vs_conn_tab_size; idx++) { + hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) { + if (cp->ipvs != ipvs) + continue; + + dest = cp->dest; + if (!dest || (dest->flags & IP_VS_DEST_F_AVAILABLE)) + continue; + + /* As timers are expired in LIFO order, restart + * the timer of controlling connection first, so + * that it is expired after us. + */ + cp_c = cp->control; + /* cp->control is valid only with reference to cp */ + if (cp_c && __ip_vs_conn_get(cp)) { + IP_VS_DBG(4, "del controlling connection\n"); + ip_vs_conn_expire_now(cp_c); + __ip_vs_conn_put(cp); + } + + IP_VS_DBG(4, "del connection\n"); + ip_vs_conn_expire_now(cp); + } + cond_resched_rcu(); + + /* netns clean up started, aborted delayed work */ + if (!ipvs->enable) + return; + } + rcu_read_unlock(); +} +#endif + /* * per netns init and exit */ diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index aa6a603a2425..2508a9caeae8 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -694,16 +694,10 @@ static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) return ipvs->sysctl_nat_icmp_send; } -static int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) -{ - return ipvs->sysctl_expire_nodest_conn; -} - #else static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) { return 0; } -static int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) { return 0; } #endif @@ -2095,36 +2089,33 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int } } - if (unlikely(!cp)) { - int v; - - if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) - return v; - } - - IP_VS_DBG_PKT(11, af, pp, skb, iph.off, "Incoming packet"); - /* Check the server status */ - if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { + if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { /* the destination server is not available */ - __u32 flags = cp->flags; - - /* when timer already started, silently drop the packet.*/ - if (timer_pending(&cp->timer)) - __ip_vs_conn_put(cp); - else - ip_vs_conn_put(cp); + if (sysctl_expire_nodest_conn(ipvs)) { + bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb); - if (sysctl_expire_nodest_conn(ipvs) && - !(flags & IP_VS_CONN_F_ONE_PACKET)) { - /* try to expire the connection immediately */ ip_vs_conn_expire_now(cp); + __ip_vs_conn_put(cp); + if (uses_ct) + return NF_DROP; + cp = NULL; + } else { + __ip_vs_conn_put(cp); + return NF_DROP; } + } - return NF_DROP; + if (unlikely(!cp)) { + int v; + + if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) + return v; } + IP_VS_DBG_PKT(11, af, pp, skb, iph.off, "Incoming packet"); + ip_vs_in_stats(cp, skb); ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd); if (cp->packet_xmit) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 8d14a1acbc37..24736efac85c 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -210,6 +210,17 @@ static void update_defense_level(struct netns_ipvs *ipvs) local_bh_enable(); } +/* Handler for delayed work for expiring no + * destination connections + */ +static void expire_nodest_conn_handler(struct work_struct *work) +{ + struct netns_ipvs *ipvs; + + ipvs = container_of(work, struct netns_ipvs, + expire_nodest_conn_work.work); + ip_vs_expire_nodest_conn_flush(ipvs); +} /* * Timer for checking the defense @@ -1163,6 +1174,12 @@ static void __ip_vs_del_dest(struct netns_ipvs *ipvs, struct ip_vs_dest *dest, list_add(&dest->t_list, &ipvs->dest_trash); dest->idle_start = 0; spin_unlock_bh(&ipvs->dest_trash_lock); + + /* Queue up delayed work to expire all no estination connections. + * No-op when CONFIG_SYSCTL is disabled. + */ + if (!cleanup) + ip_vs_enqueue_expire_nodest_conns(ipvs); } @@ -4065,6 +4082,10 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs) INIT_DELAYED_WORK(&ipvs->defense_work, defense_work_handler); schedule_delayed_work(&ipvs->defense_work, DEFENSE_TIMER_PERIOD); + /* Init delayed work for expiring no dest conn */ + INIT_DELAYED_WORK(&ipvs->expire_nodest_conn_work, + expire_nodest_conn_handler); + return 0; } @@ -4072,6 +4093,7 @@ static void __net_exit ip_vs_control_net_cleanup_sysctl(struct netns_ipvs *ipvs) { struct net *net = ipvs->net; + cancel_delayed_work_sync(&ipvs->expire_nodest_conn_work); cancel_delayed_work_sync(&ipvs->defense_work); cancel_work_sync(&ipvs->defense_work.work); unregister_net_sysctl_table(ipvs->sysctl_hdr); -- 2.20.1