All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Juha-Matti Tilli <juha-matti.tilli@iki.fi>,
	Yuchung Cheng <ycheng@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.9 17/33] tcp: free batches of packets in tcp_prune_ofo_queue()
Date: Fri, 27 Jul 2018 12:08:58 +0200	[thread overview]
Message-ID: <20180727100828.323222556@linuxfoundation.org> (raw)
In-Reply-To: <20180727100827.665729981@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 72cd43ba64fc172a443410ce01645895850844c8 ]

Juha-Matti Tilli reported that malicious peers could inject tiny
packets in out_of_order_queue, forcing very expensive calls
to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
every incoming packet. out_of_order_queue rb-tree can contain
thousands of nodes, iterating over all of them is not nice.

Before linux-4.9, we would have pruned all packets in ofo_queue
in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.

Since we plan to increase tcp_rmem[2] in the future to cope with
modern BDP, can not revert to the old behavior, without great pain.

Strategy taken in this patch is to purge ~12.5 % of the queue capacity.

Fixes: 36a6503fedda ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |    2 ++
 net/ipv4/tcp_input.c   |   15 +++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2982,6 +2982,8 @@ static inline int __skb_grow_rcsum(struc
 	return __skb_grow(skb, len);
 }
 
+#define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)
+
 #define skb_queue_walk(queue, skb) \
 		for (skb = (queue)->next;					\
 		     skb != (struct sk_buff *)(queue);				\
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4965,6 +4965,7 @@ new_range:
  * 2) not add too big latencies if thousands of packets sit there.
  *    (But if application shrinks SO_RCVBUF, we could still end up
  *     freeing whole queue here)
+ * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
  *
  * Return true if queue has shrunk.
  */
@@ -4972,20 +4973,26 @@ static bool tcp_prune_ofo_queue(struct s
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct rb_node *node, *prev;
+	int goal;
 
 	if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
 		return false;
 
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
+	goal = sk->sk_rcvbuf >> 3;
 	node = &tp->ooo_last_skb->rbnode;
 	do {
 		prev = rb_prev(node);
 		rb_erase(node, &tp->out_of_order_queue);
+		goal -= rb_to_skb(node)->truesize;
 		tcp_drop(sk, rb_entry(node, struct sk_buff, rbnode));
-		sk_mem_reclaim(sk);
-		if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
-		    !tcp_under_memory_pressure(sk))
-			break;
+		if (!prev || goal <= 0) {
+			sk_mem_reclaim(sk);
+			if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
+			    !tcp_under_memory_pressure(sk))
+				break;
+			goal = sk->sk_rcvbuf >> 3;
+		}
 		node = prev;
 	} while (node);
 	tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode);



  parent reply	other threads:[~2018-07-27 10:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-27 10:08 [PATCH 4.9 00/33] 4.9.116-stable review Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 01/33] MIPS: ath79: fix register address in ath79_ddr_wb_flush() Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 02/33] MIPS: Fix off-by-one in pci_resource_to_user() Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 03/33] ip: hash fragments consistently Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 04/33] ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 05/33] net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 06/33] net: skb_segment() should not return NULL Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 07/33] net/mlx5: Adjust clock overflow work period Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 08/33] net/mlx5e: Dont allow aRFS for encapsulated packets Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 09/33] net/mlx5e: Fix quota counting in aRFS expire flow Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 10/33] multicast: do not restore deleted record source filter mode to new one Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 11/33] net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 12/33] rtnetlink: add rtnl_link_state check in rtnl_configure_link Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 13/33] tcp: fix dctcp delayed ACK schedule Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 14/33] tcp: helpers to send special DCTCP ack Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 15/33] tcp: do not cancel delay-AcK on DCTCP special ACK Greg Kroah-Hartman
2018-07-27 10:08 ` [PATCH 4.9 16/33] tcp: do not delay ACK in DCTCP upon CE status change Greg Kroah-Hartman
2018-07-27 10:08 ` Greg Kroah-Hartman [this message]
2018-07-27 10:08 ` [PATCH 4.9 18/33] tcp: avoid collapses in tcp_prune_queue() if possible Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 19/33] tcp: detect malicious patterns in tcp_collapse_ofo_queue() Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 20/33] tcp: call tcp_drop() from tcp_data_queue_ofo() Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 21/33] usb: cdc_acm: Add quirk for Castles VEGA3000 Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 22/33] usb: core: handle hub C_PORT_OVER_CURRENT condition Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 23/33] usb: gadget: f_fs: Only return delayed status when len is 0 Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 24/33] driver core: Partially revert "driver core: correct devices shutdown order" Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 25/33] can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 26/33] can: xilinx_can: fix power management handling Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 27/33] can: xilinx_can: fix recovery from error states not being propagated Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 28/33] can: xilinx_can: fix device dropping off bus on RX overrun Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 29/33] can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 30/33] can: xilinx_can: fix incorrect clear of non-processed interrupts Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 31/33] can: xilinx_can: fix RX overflow interrupt not being enabled Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 32/33] turn off -Wattribute-alias Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.9 33/33] exec: avoid gcc-8 warning for get_task_comm Greg Kroah-Hartman
2018-07-27 12:21 ` [PATCH 4.9 00/33] 4.9.116-stable review Nathan Chancellor
2018-07-27 17:29 ` Guenter Roeck
2018-07-27 20:01 ` Shuah Khan
2018-07-28  6:55 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727100828.323222556@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=juha-matti.tilli@iki.fi \
    --cc=linux-kernel@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=stable@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.