netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] net/packet: better behavior under DDOS
@ 2019-06-12 16:52 Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 1/8] net/packet: constify __packet_get_status() argument Eric Dumazet
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

Using tcpdump (or other af_packet user) on a busy host can lead to
catastrophic consequences, because suddenly, potentially all cpus
are spinning on a contended spinlock.

Both packet_rcv() and tpacket_rcv() grab the spinlock
to eventually find there is no room for an additional packet.

This patch series align packet_rcv() and tpacket_rcv() to both
check if the queue is full before grabbing the spinlock.

If the queue is full, they both increment a new atomic counter
placed on a separate cache line to let readers drain the queue faster.

There is still false sharing on this new atomic counter,
we might in the future make it per cpu if there is interest.

Eric Dumazet (8):
  net/packet: constify __packet_get_status() argument
  net/packet: constify packet_lookup_frame() and __tpacket_has_room()
  net/packet: constify prb_lookup_block() and __tpacket_v3_has_room()
  net/packet: constify __packet_rcv_has_room()
  net/packet: make tp_drops atomic
  net/packet: implement shortcut in tpacket_rcv()
  net/packet: remove locking from packet_rcv_has_room()
  net/packet: introduce packet_rcv_try_clear_pressure() helper

 net/packet/af_packet.c | 96 ++++++++++++++++++++++++------------------
 net/packet/internal.h  |  1 +
 2 files changed, 56 insertions(+), 41 deletions(-)

-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/8] net/packet: constify __packet_get_status() argument
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 2/8] net/packet: constify packet_lookup_frame() and __tpacket_has_room() Eric Dumazet
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

struct packet_sock  is only read.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7fa847dcea30e481b2f291cc6980a7b887629cd7..66fcfd5b51f82a861795e002e91d3cbc69ab545a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -384,7 +384,7 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status)
 	smp_wmb();
 }
 
-static int __packet_get_status(struct packet_sock *po, void *frame)
+static int __packet_get_status(const struct packet_sock *po, void *frame)
 {
 	union tpacket_uhdr h;
 
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/8] net/packet: constify packet_lookup_frame() and __tpacket_has_room()
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 1/8] net/packet: constify __packet_get_status() argument Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 3/8] net/packet: constify prb_lookup_block() and __tpacket_v3_has_room() Eric Dumazet
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

Goal is to be able to use __tpacket_has_room() without holding a lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 66fcfd5b51f82a861795e002e91d3cbc69ab545a..273bffd2130d36cceb9947540a8511a51895874a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -460,10 +460,10 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame,
 	return ts_status;
 }
 
-static void *packet_lookup_frame(struct packet_sock *po,
-		struct packet_ring_buffer *rb,
-		unsigned int position,
-		int status)
+static void *packet_lookup_frame(const struct packet_sock *po,
+				 const struct packet_ring_buffer *rb,
+				 unsigned int position,
+				 int status)
 {
 	unsigned int pg_vec_pos, frame_offset;
 	union tpacket_uhdr h;
@@ -1198,12 +1198,12 @@ static void packet_free_pending(struct packet_sock *po)
 #define ROOM_LOW	0x1
 #define ROOM_NORMAL	0x2
 
-static bool __tpacket_has_room(struct packet_sock *po, int pow_off)
+static bool __tpacket_has_room(const struct packet_sock *po, int pow_off)
 {
 	int idx, len;
 
-	len = po->rx_ring.frame_max + 1;
-	idx = po->rx_ring.head;
+	len = READ_ONCE(po->rx_ring.frame_max) + 1;
+	idx = READ_ONCE(po->rx_ring.head);
 	if (pow_off)
 		idx += len >> pow_off;
 	if (idx >= len)
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 3/8] net/packet: constify prb_lookup_block() and __tpacket_v3_has_room()
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 1/8] net/packet: constify __packet_get_status() argument Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 2/8] net/packet: constify packet_lookup_frame() and __tpacket_has_room() Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 4/8] net/packet: constify __packet_rcv_has_room() Eric Dumazet
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

Goal is to be able to use __tpacket_v3_has_room() without holding
a lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 273bffd2130d36cceb9947540a8511a51895874a..5ef63d0c3ad0a184a03429fdd52cad26349647d1 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1082,10 +1082,10 @@ static void *packet_current_rx_frame(struct packet_sock *po,
 	}
 }
 
-static void *prb_lookup_block(struct packet_sock *po,
-				     struct packet_ring_buffer *rb,
-				     unsigned int idx,
-				     int status)
+static void *prb_lookup_block(const struct packet_sock *po,
+			      const struct packet_ring_buffer *rb,
+			      unsigned int idx,
+			      int status)
 {
 	struct tpacket_kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
 	struct tpacket_block_desc *pbd = GET_PBLOCK_DESC(pkc, idx);
@@ -1211,12 +1211,12 @@ static bool __tpacket_has_room(const struct packet_sock *po, int pow_off)
 	return packet_lookup_frame(po, &po->rx_ring, idx, TP_STATUS_KERNEL);
 }
 
-static bool __tpacket_v3_has_room(struct packet_sock *po, int pow_off)
+static bool __tpacket_v3_has_room(const struct packet_sock *po, int pow_off)
 {
 	int idx, len;
 
-	len = po->rx_ring.prb_bdqc.knum_blocks;
-	idx = po->rx_ring.prb_bdqc.kactive_blk_num;
+	len = READ_ONCE(po->rx_ring.prb_bdqc.knum_blocks);
+	idx = READ_ONCE(po->rx_ring.prb_bdqc.kactive_blk_num);
 	if (pow_off)
 		idx += len >> pow_off;
 	if (idx >= len)
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 4/8] net/packet: constify __packet_rcv_has_room()
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-06-12 16:52 ` [PATCH net-next 3/8] net/packet: constify prb_lookup_block() and __tpacket_v3_has_room() Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 5/8] net/packet: make tp_drops atomic Eric Dumazet
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

Goal is use the helper without lock being held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 5ef63d0c3ad0a184a03429fdd52cad26349647d1..a0564855ed9dca4be37f70ed81c6dee1b38aca39 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1224,15 +1224,18 @@ static bool __tpacket_v3_has_room(const struct packet_sock *po, int pow_off)
 	return prb_lookup_block(po, &po->rx_ring, idx, TP_STATUS_KERNEL);
 }
 
-static int __packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb)
+static int __packet_rcv_has_room(const struct packet_sock *po,
+				 const struct sk_buff *skb)
 {
-	struct sock *sk = &po->sk;
+	const struct sock *sk = &po->sk;
 	int ret = ROOM_NONE;
 
 	if (po->prot_hook.func != tpacket_rcv) {
-		int avail = sk->sk_rcvbuf - atomic_read(&sk->sk_rmem_alloc)
-					  - (skb ? skb->truesize : 0);
-		if (avail > (sk->sk_rcvbuf >> ROOM_POW_OFF))
+		int rcvbuf = READ_ONCE(sk->sk_rcvbuf);
+		int avail = rcvbuf - atomic_read(&sk->sk_rmem_alloc)
+				   - (skb ? skb->truesize : 0);
+
+		if (avail > (rcvbuf >> ROOM_POW_OFF))
 			return ROOM_NORMAL;
 		else if (avail > 0)
 			return ROOM_LOW;
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 5/8] net/packet: make tp_drops atomic
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
                   ` (3 preceding siblings ...)
  2019-06-12 16:52 ` [PATCH net-next 4/8] net/packet: constify __packet_rcv_has_room() Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 6/8] net/packet: implement shortcut in tpacket_rcv() Eric Dumazet
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

Under DDOS, we want to be able to increment tp_drops without
touching the spinlock. This will help readers to drain
the receive queue slightly faster :/

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 20 +++++++++++---------
 net/packet/internal.h  |  1 +
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a0564855ed9dca4be37f70ed81c6dee1b38aca39..2d499679811af53886ce0c8a1cdd74cd73107eac 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -758,7 +758,7 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1,
 	struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1;
 	struct sock *sk = &po->sk;
 
-	if (po->stats.stats3.tp_drops)
+	if (atomic_read(&po->tp_drops))
 		status |= TP_STATUS_LOSING;
 
 	last_pkt = (struct tpacket3_hdr *)pkc1->prev;
@@ -2128,10 +2128,8 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
 
 drop_n_acct:
 	is_drop_n_account = true;
-	spin_lock(&sk->sk_receive_queue.lock);
-	po->stats.stats1.tp_drops++;
+	atomic_inc(&po->tp_drops);
 	atomic_inc(&sk->sk_drops);
-	spin_unlock(&sk->sk_receive_queue.lock);
 
 drop_n_restore:
 	if (skb_head != skb->data && skb_shared(skb)) {
@@ -2265,7 +2263,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	 * Anyways, moving it for V1/V2 only as V3 doesn't need this
 	 * at packet level.
 	 */
-		if (po->stats.stats1.tp_drops)
+		if (atomic_read(&po->tp_drops))
 			status |= TP_STATUS_LOSING;
 	}
 
@@ -2381,9 +2379,9 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	return 0;
 
 drop_n_account:
-	is_drop_n_account = true;
-	po->stats.stats1.tp_drops++;
 	spin_unlock(&sk->sk_receive_queue.lock);
+	atomic_inc(&po->tp_drops);
+	is_drop_n_account = true;
 
 	sk->sk_data_ready(sk);
 	kfree_skb(copy_skb);
@@ -3879,6 +3877,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 	void *data = &val;
 	union tpacket_stats_u st;
 	struct tpacket_rollover_stats rstats;
+	int drops;
 
 	if (level != SOL_PACKET)
 		return -ENOPROTOOPT;
@@ -3895,14 +3894,17 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 		memcpy(&st, &po->stats, sizeof(st));
 		memset(&po->stats, 0, sizeof(po->stats));
 		spin_unlock_bh(&sk->sk_receive_queue.lock);
+		drops = atomic_xchg(&po->tp_drops, 0);
 
 		if (po->tp_version == TPACKET_V3) {
 			lv = sizeof(struct tpacket_stats_v3);
-			st.stats3.tp_packets += st.stats3.tp_drops;
+			st.stats3.tp_drops = drops;
+			st.stats3.tp_packets += drops;
 			data = &st.stats3;
 		} else {
 			lv = sizeof(struct tpacket_stats);
-			st.stats1.tp_packets += st.stats1.tp_drops;
+			st.stats1.tp_drops = drops;
+			st.stats1.tp_packets += drops;
 			data = &st.stats1;
 		}
 
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 3bb7c5fb3bff2fd5d91c3d973d006d0cdde29a0b..b5bcff2b7a43b6c9cece329c8fe8b9b546b06cc5 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -131,6 +131,7 @@ struct packet_sock {
 	struct net_device __rcu	*cached_dev;
 	int			(*xmit)(struct sk_buff *skb);
 	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
+	atomic_t		tp_drops ____cacheline_aligned_in_smp;
 };
 
 static struct packet_sock *pkt_sk(struct sock *sk)
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 6/8] net/packet: implement shortcut in tpacket_rcv()
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
                   ` (4 preceding siblings ...)
  2019-06-12 16:52 ` [PATCH net-next 5/8] net/packet: make tp_drops atomic Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 7/8] net/packet: remove locking from packet_rcv_has_room() Eric Dumazet
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

tpacket_rcv() can be hit under DDOS quite hard, since
it will always grab a socket spinlock, to eventually find
there is no room for an additional packet.

Using tcpdump [1] on a busy host can lead to catastrophic consequences,
because of all cpus spinning on a contended spinlock.

This replicates a similar strategy used in packet_rcv()

[1] Also some applications mistakenly use af_packet socket
bound to ETH_P_ALL only to send packets.
Receive queue is never drained and immediately full.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2d499679811af53886ce0c8a1cdd74cd73107eac..860ca3e6abf5198214612e9acc095530b61dac40 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2193,6 +2193,12 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (!res)
 		goto drop_n_restore;
 
+	/* If we are flooded, just give up */
+	if (__packet_rcv_has_room(po, skb) == ROOM_NONE) {
+		atomic_inc(&po->tp_drops);
+		goto drop_n_restore;
+	}
+
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
 		status |= TP_STATUS_CSUMNOTREADY;
 	else if (skb->pkt_type != PACKET_OUTGOING &&
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 7/8] net/packet: remove locking from packet_rcv_has_room()
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
                   ` (5 preceding siblings ...)
  2019-06-12 16:52 ` [PATCH net-next 6/8] net/packet: implement shortcut in tpacket_rcv() Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-12 16:52 ` [PATCH net-next 8/8] net/packet: introduce packet_rcv_try_clear_pressure() helper Eric Dumazet
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

__packet_rcv_has_room() can now be run without lock being held.

po->pressure is only a non persistent hint, we can mark
all read/write accesses with READ_ONCE()/WRITE_ONCE()
to document the fact that the field could be written
without any synchronization.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 860ca3e6abf5198214612e9acc095530b61dac40..d409e2fdaa7ee8ddf261354f91b682e403f40e9e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1260,15 +1260,13 @@ static int __packet_rcv_has_room(const struct packet_sock *po,
 
 static int packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb)
 {
-	int ret;
-	bool has_room;
+	int pressure, ret;
 
-	spin_lock_bh(&po->sk.sk_receive_queue.lock);
 	ret = __packet_rcv_has_room(po, skb);
-	has_room = ret == ROOM_NORMAL;
-	if (po->pressure == has_room)
-		po->pressure = !has_room;
-	spin_unlock_bh(&po->sk.sk_receive_queue.lock);
+	pressure = ret != ROOM_NORMAL;
+
+	if (READ_ONCE(po->pressure) != pressure)
+		WRITE_ONCE(po->pressure, pressure);
 
 	return ret;
 }
@@ -1353,7 +1351,7 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f,
 	i = j = min_t(int, po->rollover->sock, num - 1);
 	do {
 		po_next = pkt_sk(f->arr[i]);
-		if (po_next != po_skip && !po_next->pressure &&
+		if (po_next != po_skip && !READ_ONCE(po_next->pressure) &&
 		    packet_rcv_has_room(po_next, skb) == ROOM_NORMAL) {
 			if (i != j)
 				po->rollover->sock = i;
@@ -3310,7 +3308,7 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	if (skb == NULL)
 		goto out;
 
-	if (pkt_sk(sk)->pressure)
+	if (READ_ONCE(pkt_sk(sk)->pressure))
 		packet_rcv_has_room(pkt_sk(sk), NULL);
 
 	if (pkt_sk(sk)->has_vnet_hdr) {
@@ -4129,8 +4127,8 @@ static __poll_t packet_poll(struct file *file, struct socket *sock,
 			TP_STATUS_KERNEL))
 			mask |= EPOLLIN | EPOLLRDNORM;
 	}
-	if (po->pressure && __packet_rcv_has_room(po, NULL) == ROOM_NORMAL)
-		po->pressure = 0;
+	if (READ_ONCE(po->pressure) && __packet_rcv_has_room(po, NULL) == ROOM_NORMAL)
+		WRITE_ONCE(po->pressure, 0);
 	spin_unlock_bh(&sk->sk_receive_queue.lock);
 	spin_lock_bh(&sk->sk_write_queue.lock);
 	if (po->tx_ring.pg_vec) {
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 8/8] net/packet: introduce packet_rcv_try_clear_pressure() helper
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
                   ` (6 preceding siblings ...)
  2019-06-12 16:52 ` [PATCH net-next 7/8] net/packet: remove locking from packet_rcv_has_room() Eric Dumazet
@ 2019-06-12 16:52 ` Eric Dumazet
  2019-06-13  0:11   ` Vinicius Costa Gomes
  2019-06-12 17:15 ` [PATCH net-next 0/8] net/packet: better behavior under DDOS Willem de Bruijn
  2019-06-15  1:53 ` David Miller
  9 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2019-06-12 16:52 UTC (permalink / raw)
  To: David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

There are two places where we want to clear the pressure
if possible, add a helper to make it more obvious.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
---
 net/packet/af_packet.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d409e2fdaa7ee8ddf261354f91b682e403f40e9e..8c27e198268ab5148daa8e90aa2f53546623b9ed 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1271,6 +1271,13 @@ static int packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb)
 	return ret;
 }
 
+static void packet_rcv_try_clear_pressure(struct packet_sock *po)
+{
+	if (READ_ONCE(po->pressure) &&
+	    __packet_rcv_has_room(po, NULL) == ROOM_NORMAL)
+		WRITE_ONCE(po->pressure,  0);
+}
+
 static void packet_sock_destruct(struct sock *sk)
 {
 	skb_queue_purge(&sk->sk_error_queue);
@@ -3308,8 +3315,7 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	if (skb == NULL)
 		goto out;
 
-	if (READ_ONCE(pkt_sk(sk)->pressure))
-		packet_rcv_has_room(pkt_sk(sk), NULL);
+	packet_rcv_try_clear_pressure(pkt_sk(sk));
 
 	if (pkt_sk(sk)->has_vnet_hdr) {
 		err = packet_rcv_vnet(msg, skb, &len);
@@ -4127,8 +4133,7 @@ static __poll_t packet_poll(struct file *file, struct socket *sock,
 			TP_STATUS_KERNEL))
 			mask |= EPOLLIN | EPOLLRDNORM;
 	}
-	if (READ_ONCE(po->pressure) && __packet_rcv_has_room(po, NULL) == ROOM_NORMAL)
-		WRITE_ONCE(po->pressure, 0);
+	packet_rcv_try_clear_pressure(po);
 	spin_unlock_bh(&sk->sk_receive_queue.lock);
 	spin_lock_bh(&sk->sk_write_queue.lock);
 	if (po->tx_ring.pg_vec) {
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/8] net/packet: better behavior under DDOS
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
                   ` (7 preceding siblings ...)
  2019-06-12 16:52 ` [PATCH net-next 8/8] net/packet: introduce packet_rcv_try_clear_pressure() helper Eric Dumazet
@ 2019-06-12 17:15 ` Willem de Bruijn
  2019-06-15  1:53 ` David Miller
  9 siblings, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2019-06-12 17:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, Mahesh Bandewar, netdev, Eric Dumazet

On Wed, Jun 12, 2019 at 12:52 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Using tcpdump (or other af_packet user) on a busy host can lead to
> catastrophic consequences, because suddenly, potentially all cpus
> are spinning on a contended spinlock.
>
> Both packet_rcv() and tpacket_rcv() grab the spinlock
> to eventually find there is no room for an additional packet.
>
> This patch series align packet_rcv() and tpacket_rcv() to both
> check if the queue is full before grabbing the spinlock.
>
> If the queue is full, they both increment a new atomic counter
> placed on a separate cache line to let readers drain the queue faster.
>
> There is still false sharing on this new atomic counter,
> we might in the future make it per cpu if there is interest.
>
> Eric Dumazet (8):
>   net/packet: constify __packet_get_status() argument
>   net/packet: constify packet_lookup_frame() and __tpacket_has_room()
>   net/packet: constify prb_lookup_block() and __tpacket_v3_has_room()
>   net/packet: constify __packet_rcv_has_room()
>   net/packet: make tp_drops atomic
>   net/packet: implement shortcut in tpacket_rcv()
>   net/packet: remove locking from packet_rcv_has_room()
>   net/packet: introduce packet_rcv_try_clear_pressure() helper
>
>  net/packet/af_packet.c | 96 ++++++++++++++++++++++++-----------

For the series:

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 8/8] net/packet: introduce packet_rcv_try_clear_pressure() helper
  2019-06-12 16:52 ` [PATCH net-next 8/8] net/packet: introduce packet_rcv_try_clear_pressure() helper Eric Dumazet
@ 2019-06-13  0:11   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 12+ messages in thread
From: Vinicius Costa Gomes @ 2019-06-13  0:11 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Willem de Bruijn, Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Eric Dumazet

Hi,

Eric Dumazet <edumazet@google.com> writes:

> There are two places where we want to clear the pressure
> if possible, add a helper to make it more obvious.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Suggested-by: Willem de Bruijn <willemb@google.com>
> ---
>  net/packet/af_packet.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d409e2fdaa7ee8ddf261354f91b682e403f40e9e..8c27e198268ab5148daa8e90aa2f53546623b9ed 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1271,6 +1271,13 @@ static int packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb)
>  	return ret;
>  }
>  
> +static void packet_rcv_try_clear_pressure(struct packet_sock *po)
> +{
> +	if (READ_ONCE(po->pressure) &&
> +	    __packet_rcv_has_room(po, NULL) == ROOM_NORMAL)
> +		WRITE_ONCE(po->pressure,  0);

Just a couple of (microscopical?) nitpicks, double space here and on the
commit message of patch 1/8.

Series look good.

Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>


Cheers,
--
Vinicius

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/8] net/packet: better behavior under DDOS
  2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
                   ` (8 preceding siblings ...)
  2019-06-12 17:15 ` [PATCH net-next 0/8] net/packet: better behavior under DDOS Willem de Bruijn
@ 2019-06-15  1:53 ` David Miller
  9 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2019-06-15  1:53 UTC (permalink / raw)
  To: edumazet; +Cc: willemb, maheshb, netdev, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 12 Jun 2019 09:52:25 -0700

> Using tcpdump (or other af_packet user) on a busy host can lead to
> catastrophic consequences, because suddenly, potentially all cpus
> are spinning on a contended spinlock.
> 
> Both packet_rcv() and tpacket_rcv() grab the spinlock
> to eventually find there is no room for an additional packet.
> 
> This patch series align packet_rcv() and tpacket_rcv() to both
> check if the queue is full before grabbing the spinlock.
> 
> If the queue is full, they both increment a new atomic counter
> placed on a separate cache line to let readers drain the queue faster.
> 
> There is still false sharing on this new atomic counter,
> we might in the future make it per cpu if there is interest.

Series applied.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-15  1:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-12 16:52 [PATCH net-next 0/8] net/packet: better behavior under DDOS Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 1/8] net/packet: constify __packet_get_status() argument Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 2/8] net/packet: constify packet_lookup_frame() and __tpacket_has_room() Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 3/8] net/packet: constify prb_lookup_block() and __tpacket_v3_has_room() Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 4/8] net/packet: constify __packet_rcv_has_room() Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 5/8] net/packet: make tp_drops atomic Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 6/8] net/packet: implement shortcut in tpacket_rcv() Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 7/8] net/packet: remove locking from packet_rcv_has_room() Eric Dumazet
2019-06-12 16:52 ` [PATCH net-next 8/8] net/packet: introduce packet_rcv_try_clear_pressure() helper Eric Dumazet
2019-06-13  0:11   ` Vinicius Costa Gomes
2019-06-12 17:15 ` [PATCH net-next 0/8] net/packet: better behavior under DDOS Willem de Bruijn
2019-06-15  1:53 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).