All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Dan Streetman <ddstreet@canonical.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 3.18 41/52] net: tcp: close sock if net namespace is exiting
Date: Mon, 29 Jan 2018 13:56:59 +0100	[thread overview]
Message-ID: <20180129123629.983111406@linuxfoundation.org> (raw)
In-Reply-To: <20180129123628.168904217@linuxfoundation.org>

3.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Streetman <ddstreet@ieee.org>


[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/net_namespace.h |   10 ++++++++++
 net/ipv4/tcp.c              |    3 +++
 net/ipv4/tcp_timer.c        |   15 +++++++++++++++
 3 files changed, 28 insertions(+)

--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -200,6 +200,11 @@ int net_eq(const struct net *net1, const
 	return net1 == net2;
 }
 
+static inline int check_net(const struct net *net)
+{
+	return atomic_read(&net->count) != 0;
+}
+
 void net_drop_ns(void *);
 
 #else
@@ -223,6 +228,11 @@ int net_eq(const struct net *net1, const
 {
 	return 1;
 }
+
+static inline int check_net(const struct net *net)
+{
+	return 1;
+}
 
 #define net_drop_ns NULL
 #endif
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2182,6 +2182,9 @@ adjudge_to_death:
 			tcp_send_active_reset(sk, GFP_ATOMIC);
 			NET_INC_STATS_BH(sock_net(sk),
 					LINUX_MIB_TCPABORTONMEMORY);
+		} else if (!check_net(sock_net(sk))) {
+			/* Not possible to send reset; just close */
+			tcp_set_state(sk, TCP_CLOSE);
 		}
 	}
 
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -46,11 +46,19 @@ static void tcp_write_err(struct sock *s
  * to prevent DoS attacks. It is called when a retransmission timeout
  * or zero probe timeout occurs on orphaned socket.
  *
+ * Also close if our net namespace is exiting; in that case there is no
+ * hope of ever communicating again since all netns interfaces are already
+ * down (or about to be down), and we need to release our dst references,
+ * which have been moved to the netns loopback interface, so the namespace
+ * can finish exiting.  This condition is only possible if we are a kernel
+ * socket, as those do not hold references to the namespace.
+ *
  * Criteria is still not confirmed experimentally and may change.
  * We kill the socket, if:
  * 1. If number of orphaned sockets exceeds an administratively configured
  *    limit.
  * 2. If we have strong memory pressure.
+ * 3. If our net namespace is exiting.
  */
 static int tcp_out_of_resources(struct sock *sk, bool do_reset)
 {
@@ -79,6 +87,13 @@ static int tcp_out_of_resources(struct s
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY);
 		return 1;
 	}
+
+	if (!check_net(sock_net(sk))) {
+		/* Not possible to send reset; just close */
+		tcp_done(sk);
+		return 1;
+	}
+
 	return 0;
 }
 

  parent reply	other threads:[~2018-01-29 12:56 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-29 12:56 [PATCH 3.18 00/52] 3.18.93-stable review Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 01/52] gcov: disable for COMPILE_TEST Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 02/52] scsi: sg: disable SET_FORCE_LOW_DMA Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 03/52] futex: Prevent overflow by strengthen input validation Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 04/52] ALSA: pcm: Remove yet superfluous WARN_ON() Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 05/52] ALSA: hda - Apply the existing quirk to iMac 14,1 Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 06/52] af_key: fix buffer overread in verify_address_len() Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 07/52] af_key: fix buffer overread in parse_exthdrs() Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 08/52] pipe: avoid round_pipe_size() nr_pages overflow on 32-bit Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 09/52] Input: 88pm860x-ts - fix child-node lookup Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 10/52] Input: twl6040-vibra - fix DT node memory management Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 11/52] Input: twl6040-vibra - fix child-node lookup Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 12/52] Input: twl4030-vibra - fix ERROR: Bad of_node_put() warning Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 13/52] Input: twl4030-vibra - fix sibling-node lookup Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 14/52] phy: work around phys references to usb-nop-xceiv devices Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 15/52] ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7 Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 16/52] dm btree: fix serious bug in btree_split_beneath() Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 17/52] dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 18/52] arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 19/52] MIPS: AR7: ensure the port types FCR value is used Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 20/52] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels Greg Kroah-Hartman
2018-01-29 12:56   ` Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 21/52] usbip: Fix implicit fallthrough warning Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 22/52] can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 23/52] can: af_can: canfd_rcv(): " Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 24/52] mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 25/52] hwpoison, memcg: forcibly uncharge LRU pages Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 26/52] ipc: msg, make msgrcv work with LONG_MIN Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 27/52] netfilter: nf_ct_expect: remove the redundant slash when policy name is empty Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 28/52] netfilter: restart search if moved to other chain Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 29/52] netfilter: nf_conntrack_sip: extend request line validation Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 30/52] netfilter: nfnetlink_cthelper: Add missing permission checks Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 31/52] netfilter: xt_osf: " Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 32/52] reiserfs: fix race in prealloc discard Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 33/52] reiserfs: dont preallocate blocks for extended attributes Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 34/52] fs/fcntl: f_setown, avoid undefined behaviour Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 35/52] scsi: libiscsi: fix shifting of DID_REQUEUE host byte Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 36/52] um: link vmlinux with -no-pie Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 37/52] eventpoll.h: add missing epoll event masks Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 38/52] um: Stop abusing __KERNEL__ Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 39/52] um: Remove copy&paste code from init.h Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 3.18 40/52] x86/microcode/intel: Extend BDW late-loading further with LLC size check Greg Kroah-Hartman
2018-01-29 12:56 ` Greg Kroah-Hartman [this message]
2018-01-29 12:57 ` [PATCH 3.18 42/52] dccp: dont restart ccid2_hc_tx_rto_expire() if sk in closed state Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 43/52] net: igmp: fix source address check for IGMPv3 reports Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 44/52] tcp: __tcp_hdrlen() helper Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 45/52] net: qdisc_pkt_len_init() should be more robust Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 46/52] pppoe: take ->needed_headroom of lower device into account on xmit Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 47/52] sctp: do not allow the v4 socket to bind a v4mapped v6 address Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 48/52] sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 49/52] vmxnet3: repair memory leak Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 50/52] net: Allow neigh contructor functions ability to modify the primary_key Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 51/52] ipv6: fix udpv6 sendmsg crash caused by too small MTU Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 3.18 52/52] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY Greg Kroah-Hartman
2018-01-29 23:58 ` [PATCH 3.18 00/52] 3.18.93-stable review Shuah Khan
2018-01-30  7:37   ` Greg Kroah-Hartman
2018-01-30  5:09 ` Harsh Shandilya
2018-01-30  7:38   ` Greg Kroah-Hartman
2018-01-30 14:19 ` Guenter Roeck
2018-01-30 14:51   ` Greg Kroah-Hartman
2018-01-30 18:51     ` Greg Kroah-Hartman
2018-01-30 19:48       ` Guenter Roeck
2018-01-31  8:52         ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180129123629.983111406@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=ddstreet@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.