linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Juha-Matti Tilli <juha-matti.tilli@iki.fi>,
	Yuchung Cheng <ycheng@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.14 28/48] tcp: free batches of packets in tcp_prune_ofo_queue()
Date: Fri, 27 Jul 2018 12:00:13 +0200	[thread overview]
Message-ID: <20180727095921.457100398@linuxfoundation.org> (raw)
In-Reply-To: <20180727095918.503549522@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 72cd43ba64fc172a443410ce01645895850844c8 ]

Juha-Matti Tilli reported that malicious peers could inject tiny
packets in out_of_order_queue, forcing very expensive calls
to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
every incoming packet. out_of_order_queue rb-tree can contain
thousands of nodes, iterating over all of them is not nice.

Before linux-4.9, we would have pruned all packets in ofo_queue
in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.

Since we plan to increase tcp_rmem[2] in the future to cope with
modern BDP, can not revert to the old behavior, without great pain.

Strategy taken in this patch is to purge ~12.5 % of the queue capacity.

Fixes: 36a6503fedda ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |    2 ++
 net/ipv4/tcp_input.c   |   15 +++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3167,6 +3167,8 @@ static inline int __skb_grow_rcsum(struc
 	return __skb_grow(skb, len);
 }
 
+#define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)
+
 #define skb_queue_walk(queue, skb) \
 		for (skb = (queue)->next;					\
 		     skb != (struct sk_buff *)(queue);				\
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4924,6 +4924,7 @@ new_range:
  * 2) not add too big latencies if thousands of packets sit there.
  *    (But if application shrinks SO_RCVBUF, we could still end up
  *     freeing whole queue here)
+ * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
  *
  * Return true if queue has shrunk.
  */
@@ -4931,20 +4932,26 @@ static bool tcp_prune_ofo_queue(struct s
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct rb_node *node, *prev;
+	int goal;
 
 	if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
 		return false;
 
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
+	goal = sk->sk_rcvbuf >> 3;
 	node = &tp->ooo_last_skb->rbnode;
 	do {
 		prev = rb_prev(node);
 		rb_erase(node, &tp->out_of_order_queue);
+		goal -= rb_to_skb(node)->truesize;
 		tcp_drop(sk, rb_entry(node, struct sk_buff, rbnode));
-		sk_mem_reclaim(sk);
-		if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
-		    !tcp_under_memory_pressure(sk))
-			break;
+		if (!prev || goal <= 0) {
+			sk_mem_reclaim(sk);
+			if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
+			    !tcp_under_memory_pressure(sk))
+				break;
+			goal = sk->sk_rcvbuf >> 3;
+		}
 		node = prev;
 	} while (node);
 	tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode);



  parent reply	other threads:[~2018-07-27 10:02 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-27  9:59 [PATCH 4.14 00/48] 4.14.59-stable review Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 02/48] MIPS: ath79: fix register address in ath79_ddr_wb_flush() Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 03/48] MIPS: Fix off-by-one in pci_resource_to_user() Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 04/48] xen/PVH: Set up GS segment for stack canary Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 05/48] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 06/48] drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit() Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 07/48] drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 08/48] bonding: set default miimon value for non-arp modes if not set Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 09/48] ip: hash fragments consistently Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 10/48] ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 11/48] net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 12/48] net: skb_segment() should not return NULL Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 13/48] net/mlx5: Adjust clock overflow work period Greg Kroah-Hartman
2018-07-27  9:59 ` [PATCH 4.14 14/48] net/mlx5e: Dont allow aRFS for encapsulated packets Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 15/48] net/mlx5e: Fix quota counting in aRFS expire flow Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 16/48] net/ipv6: Fix linklocal to global address with VRF Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 17/48] multicast: do not restore deleted record source filter mode to new one Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 18/48] net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 19/48] sock: fix sg page frag coalescing in sk_alloc_sg Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 20/48] rtnetlink: add rtnl_link_state check in rtnl_configure_link Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 21/48] vxlan: add new fdb alloc and create helpers Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 22/48] vxlan: make netlink notify in vxlan_fdb_destroy optional Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 23/48] vxlan: fix default fdb entry netlink notify ordering during netdev create Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 24/48] tcp: fix dctcp delayed ACK schedule Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 25/48] tcp: helpers to send special DCTCP ack Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 26/48] tcp: do not cancel delay-AcK on DCTCP special ACK Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 27/48] tcp: do not delay ACK in DCTCP upon CE status change Greg Kroah-Hartman
2018-07-27 10:00 ` Greg Kroah-Hartman [this message]
2018-07-27 10:00 ` [PATCH 4.14 29/48] tcp: avoid collapses in tcp_prune_queue() if possible Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 30/48] tcp: detect malicious patterns in tcp_collapse_ofo_queue() Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 31/48] tcp: call tcp_drop() from tcp_data_queue_ofo() Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 32/48] tcp: add tcp_ooo_try_coalesce() helper Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 33/48] staging: speakup: fix wraparound in uaccess length check Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 34/48] usb: cdc_acm: Add quirk for Castles VEGA3000 Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 35/48] usb: core: handle hub C_PORT_OVER_CURRENT condition Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 37/48] usb: gadget: f_fs: Only return delayed status when len is 0 Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 38/48] driver core: Partially revert "driver core: correct devices shutdown order" Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 39/48] can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 40/48] can: xilinx_can: fix power management handling Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 41/48] can: xilinx_can: fix recovery from error states not being propagated Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 42/48] can: xilinx_can: fix device dropping off bus on RX overrun Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 43/48] can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 44/48] can: xilinx_can: fix incorrect clear of non-processed interrupts Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 45/48] can: xilinx_can: fix RX overflow interrupt not being enabled Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 46/48] can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 47/48] can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode Greg Kroah-Hartman
2018-07-27 10:00 ` [PATCH 4.14 48/48] turn off -Wattribute-alias Greg Kroah-Hartman
2018-07-27 17:31 ` [PATCH 4.14 00/48] 4.14.59-stable review Guenter Roeck
2018-07-27 19:55 ` Shuah Khan
2018-07-28  6:54 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727095921.457100398@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=juha-matti.tilli@iki.fi \
    --cc=linux-kernel@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=stable@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).