linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jiri Bohac <jbohac@suse.cz>,
	Julian Anastasov <ja@ssi.bg>, Simon Horman <horms@verge.net.au>
Subject: [PATCH 4.5 06/88] ipvs: drop first packet to redirect conntrack
Date: Mon,  9 May 2016 09:20:55 +0200	[thread overview]
Message-ID: <20160509071952.446465220@linuxfoundation.org> (raw)
In-Reply-To: <20160509071952.129092535@linuxfoundation.org>

4.5-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Julian Anastasov <ja@ssi.bg>

commit f719e3754ee2f7275437e61a6afd520181fdd43b upstream.

Jiri Bohac is reporting for a problem where the attempt
to reschedule existing connection to another real server
needs proper redirect for the conntrack used by the IPVS
connection. For example, when IPVS connection is created
to NAT-ed real server we alter the reply direction of
conntrack. If we later decide to select different real
server we can not alter again the conntrack. And if we
expire the old connection, the new connection is left
without conntrack.

So, the only way to redirect both the IPVS connection and
the Netfilter's conntrack is to drop the SYN packet that
hits existing connection, to wait for the next jiffie
to expire the old connection and its conntrack and to rely
on client's retransmission to create new connection as
usually.

Jiri Bohac provided a fix that drops all SYNs on rescheduling,
I extended his patch to do such drops only for connections
that use conntrack. Here is the original report from Jiri Bohac:

Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server
is dead"), new connections to dead servers are redistributed
immediately to new servers.  The old connection is expired using
ip_vs_conn_expire_now() which sets the connection timer to expire
immediately.

However, before the timer callback, ip_vs_conn_expire(), is run
to clean the connection's conntrack entry, the new redistributed
connection may already be established and its conntrack removed
instead.

Fix this by dropping the first packet of the new connection
instead, like we do when the destination server is not available.
The timer will have deleted the old conntrack entry long before
the first packet of the new connection is retransmitted.

Fixes: dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead")
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/net/ip_vs.h             |   17 +++++++++++++++++
 net/netfilter/ipvs/ip_vs_core.c |   37 ++++++++++++++++++++++++++++---------
 2 files changed, 45 insertions(+), 9 deletions(-)

--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1588,6 +1588,23 @@ static inline void ip_vs_conn_drop_connt
 }
 #endif /* CONFIG_IP_VS_NFCT */
 
+/* Really using conntrack? */
+static inline bool ip_vs_conn_uses_conntrack(struct ip_vs_conn *cp,
+					     struct sk_buff *skb)
+{
+#ifdef CONFIG_IP_VS_NFCT
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	if (!(cp->flags & IP_VS_CONN_F_NFCT))
+		return false;
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct && !nf_ct_is_untracked(ct))
+		return true;
+#endif
+	return false;
+}
+
 static inline int
 ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
 {
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1757,15 +1757,34 @@ ip_vs_in(struct netns_ipvs *ipvs, unsign
 	cp = pp->conn_in_get(ipvs, af, skb, &iph);
 
 	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
-	if (conn_reuse_mode && !iph.fragoffs &&
-	    is_new_conn(skb, &iph) && cp &&
-	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
-	      unlikely(!atomic_read(&cp->dest->weight))) ||
-	     unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
-		if (!atomic_read(&cp->n_control))
-			ip_vs_conn_expire_now(cp);
-		__ip_vs_conn_put(cp);
-		cp = NULL;
+	if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
+		bool uses_ct = false, resched = false;
+
+		if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
+		    unlikely(!atomic_read(&cp->dest->weight))) {
+			resched = true;
+			uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+		} else if (is_new_conn_expected(cp, conn_reuse_mode)) {
+			uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+			if (!atomic_read(&cp->n_control)) {
+				resched = true;
+			} else {
+				/* Do not reschedule controlling connection
+				 * that uses conntrack while it is still
+				 * referenced by controlled connection(s).
+				 */
+				resched = !uses_ct;
+			}
+		}
+
+		if (resched) {
+			if (!atomic_read(&cp->n_control))
+				ip_vs_conn_expire_now(cp);
+			__ip_vs_conn_put(cp);
+			if (uses_ct)
+				return NF_DROP;
+			cp = NULL;
+		}
 	}
 
 	if (unlikely(!cp)) {

  parent reply	other threads:[~2016-05-09  7:36 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-09  7:20 [PATCH 4.5 00/88] 4.5.4-stable review Greg Kroah-Hartman
2016-05-09  7:20 ` [PATCH 4.5 01/88] clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test Greg Kroah-Hartman
2016-05-09  7:20 ` [PATCH 4.5 02/88] RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips Greg Kroah-Hartman
2016-05-09  7:20 ` [PATCH 4.5 03/88] net/mlx5_core: Fix caching ATOMIC endian mode capability Greg Kroah-Hartman
2016-05-09  7:20 ` [PATCH 4.5 04/88] ipvs: handle ip_vs_fill_iph_skb_off failure Greg Kroah-Hartman
2016-05-09  7:20 ` [PATCH 4.5 05/88] ipvs: correct initial offset of Call-ID header search in SIP persistence engine Greg Kroah-Hartman
2016-05-09  7:20 ` Greg Kroah-Hartman [this message]
2016-05-09  7:20 ` [PATCH 4.5 07/88] rtlwifi: Fix size of wireless mode variable Greg Kroah-Hartman
2016-05-09  7:20 ` [PATCH 4.5 08/88] mfd: intel-lpss: Remove clock tree on error path Greg Kroah-Hartman
2016-05-09  7:20 ` [PATCH 4.5 09/88] nbd: ratelimit error msgs after socket close Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 11/88] null_blk: add lightnvm null_blk device to the nullb_list Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 12/88] ata: ahci_xgene: dereferencing uninitialized pointer in probe Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 13/88] wlcore: fix error handling in wlcore_event_fw_logger Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 14/88] ath10k: fix pktlog in QCA99X0 Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 15/88] mwifiex: fix corner case association failure Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 17/88] clk-divider: make sure read-only dividers do not write to their register Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 18/88] soc: rockchip: power-domain: fix err handle while probing Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 19/88] clk: rockchip: fix wrong mmc phase shift for rk3228 Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 20/88] clk: rockchip: free memory in error cases when registering clock branches Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 22/88] clk: qcom: msm8960: fix ce3_core clk enable register Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 23/88] clk: versatile: sp810: support reentrance Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 24/88] clk: qcom: msm8960: Fix ce3_src register offset Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 25/88] clk: sunxi: Fix sun8i-a23-apb0-clk divider flags Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 26/88] clk: xgene: Add missing parenthesis when clearing divider value Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 27/88] clk: bcm2835: fix check of error code returned by devm_ioremap_resource() Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 28/88] pwm: omap-dmtimer: Fix inaccurate period and duty cycle calculations Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 29/88] pwm: omap-dmtimer: Add sanity checking for load and match values Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 30/88] pwm: omap-dmtimer: Round load and match values rather than truncate Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 31/88] lpfc: fix misleading indentation Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 33/88] ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p initialisation Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 34/88] mac80211: fix statistics leak if dev_alloc_name() fails Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 35/88] tracing: Dont display trigger file for events that cant be enabled Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 36/88] MD: make bio mergeable Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 37/88] Minimal fix-up of bad hashing behavior of hash_64() Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 38/88] mm: memcontrol: let v2 cgroups follow changes in system swappiness Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 39/88] mm, cma: prevent nr_isolated_* counters from going negative Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 40/88] mm/zswap: provide unique zpool name Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 41/88] propogate_mnt: Handle the first propogated copy being a slave Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 42/88] modpost: fix module autoloading for OF devices with generic compatible property Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 43/88] ARM: EXYNOS: Properly skip unitialized parent clock in power domain on Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 44/88] ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 45/88] xen: Fix page <-> pfn conversion on 32 bit systems Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 46/88] xen/balloon: Fix crash when ballooning on x86 32 bit PAE Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 47/88] xen/evtchn: fix ring resize when binding new events Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 48/88] HID: wacom: Add support for DTK-1651 Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 49/88] HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 50/88] Input: zforce_ts - fix dual touch recognition Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 51/88] proc: prevent accessing /proc/<PID>/environ until its ready Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 52/88] mm: update min_free_kbytes from khugepaged after core initialization Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 53/88] batman-adv: fix DAT candidate selection (must use vid) Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 54/88] batman-adv: Check skb size before using encapsulated ETH+VLAN header Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 56/88] batman-adv: Reduce refcnt of removed router when updating route Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 57/88] libnvdimm, pfn: fix memmap reservation sizing Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 58/88] writeback: Fix performance regression in wb_over_bg_thresh() Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 59/88] MAINTAINERS: Remove asterisk from EFI directory names Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 60/88] x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 61/88] ARM: cpuidle: Pass on arm_cpuidle_suspend()s return value Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 62/88] parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 63/88] cpufreq: st: enable selective initialization based on the platform Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 64/88] ARC: Add missing io barriers to io{read,write}{16,32}be() Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 65/88] x86/sysfb_efi: Fix valid BAR address range check Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 66/88] ARM: dts: apq8064: add ahci ports-implemented mask Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 67/88] ACPICA: Dispatcher: Update thread ID for recursive method calls Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 68/88] powerpc: Fix bad inline asm constraint in create_zero_mask() Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 69/88] libahci: save port map for forced port map Greg Kroah-Hartman
2016-05-09  7:21 ` [PATCH 4.5 70/88] ata: ahci-platform: Add ports-implemented DT bindings Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 71/88] USB: serial: cp210x: add ID for Link ECU Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 72/88] USB: serial: cp210x: add Straizona Focusers device ids Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 73/88] Revert "USB / PM: Allow USB devices to remain runtime-suspended when sleeping" Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 74/88] nvmem: mxs-ocotp: fix buffer overflow in read Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 75/88] Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read() Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 76/88] gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 77/88] drm/amdgpu: make sure vertical front porch is at least 1 Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 79/88] iio: ak8975: Fix NULL pointer exception on early interrupt Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 81/88] drm/radeon: make sure vertical front porch is at least 1 Greg Kroah-Hartman
2016-05-09  7:22 ` [PATCH 4.5 88/88] ACPI / processor: Request native thermal interrupt handling via _OSC Greg Kroah-Hartman
     [not found] ` <5730411d.47afc20a.a55a6.ffffa291@mx.google.com>
2016-05-09  8:06   ` [PATCH 4.5 00/88] 4.5.4-stable review Greg Kroah-Hartman
2016-05-11  8:45     ` Kevin Hilman
2016-05-09 13:08 ` Guenter Roeck
2016-05-10  7:03   ` Greg Kroah-Hartman
2016-05-09 19:41 ` Shuah Khan
2016-05-10  7:03   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160509071952.446465220@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=horms@verge.net.au \
    --cc=ja@ssi.bg \
    --cc=jbohac@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).