All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/8] net: rps: misc changes
@ 2024-03-29 15:42 Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

Make RPS/RFS a bit more efficient with better cache locality
and heuristics.

Aso shrink include/linux/netdevice.h a bit.

v2: fixed a build issue in patch 6/8 with CONFIG_RPS=n
    (Jakub and kernel build bots)

Eric Dumazet (8):
  net: move kick_defer_list_purge() to net/core/dev.h
  net: move dev_xmit_recursion() helpers to net/core/dev.h
  net: enqueue_to_backlog() change vs not running device
  net: make softnet_data.dropped an atomic_t
  net: enqueue_to_backlog() cleanup
  net: rps: change input_queue_tail_incr_save()
  net: rps: add rps_input_queue_head_add() helper
  net: rps: move received_rps field to a better location

 include/linux/netdevice.h | 38 ++------------------
 include/net/rps.h         | 28 +++++++++++++++
 net/core/dev.c            | 73 ++++++++++++++++++++++-----------------
 net/core/dev.h            | 23 ++++++++++--
 net/core/net-procfs.c     |  3 +-
 5 files changed, 95 insertions(+), 70 deletions(-)

-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers " Eric Dumazet
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

kick_defer_list_purge() is defined in net/core/dev.c
and used from net/core/skubff.c

Because we need softnet_data, include <linux/netdevice.h>
from net/core/dev.h

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 1 -
 net/core/dev.h            | 6 +++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e41d30ebaca61e48a2ceb43edf777fa8b9859ef2..cb37817d6382c29117afd8ce54db6dba94f8c930 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3287,7 +3287,6 @@ static inline void dev_xmit_recursion_dec(void)
 	__this_cpu_dec(softnet_data.xmit.recursion);
 }
 
-void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
 void __netif_schedule(struct Qdisc *q);
 void netif_schedule_queue(struct netdev_queue *txq);
 
diff --git a/net/core/dev.h b/net/core/dev.h
index 2bcaf8eee50c179db2ca59880521b9be6ecd45c8..9d0f8b4587f81f4c12487f1783d8ba5cc49fc1d6 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -4,11 +4,9 @@
 
 #include <linux/types.h>
 #include <linux/rwsem.h>
+#include <linux/netdevice.h>
 
 struct net;
-struct net_device;
-struct netdev_bpf;
-struct netdev_phys_item_id;
 struct netlink_ext_ack;
 struct cpumask;
 
@@ -150,4 +148,6 @@ static inline void xdp_do_check_flushed(struct napi_struct *napi) { }
 #endif
 
 struct napi_struct *napi_by_id(unsigned int napi_id);
+void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
+
 #endif
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers to net/core/dev.h
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device Eric Dumazet
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

Move dev_xmit_recursion() and friends to net/core/dev.h

They are only used from net/core/dev.c and net/core/filter.c.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 17 -----------------
 net/core/dev.h            | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb37817d6382c29117afd8ce54db6dba94f8c930..70775021cc269e0983f538619115237b0067d408 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3270,23 +3270,6 @@ static inline int dev_recursion_level(void)
 	return this_cpu_read(softnet_data.xmit.recursion);
 }
 
-#define XMIT_RECURSION_LIMIT	8
-static inline bool dev_xmit_recursion(void)
-{
-	return unlikely(__this_cpu_read(softnet_data.xmit.recursion) >
-			XMIT_RECURSION_LIMIT);
-}
-
-static inline void dev_xmit_recursion_inc(void)
-{
-	__this_cpu_inc(softnet_data.xmit.recursion);
-}
-
-static inline void dev_xmit_recursion_dec(void)
-{
-	__this_cpu_dec(softnet_data.xmit.recursion);
-}
-
 void __netif_schedule(struct Qdisc *q);
 void netif_schedule_queue(struct netdev_queue *txq);
 
diff --git a/net/core/dev.h b/net/core/dev.h
index 9d0f8b4587f81f4c12487f1783d8ba5cc49fc1d6..8572d2c8dc4adce75c98868c888363e6a32e0f52 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -150,4 +150,21 @@ static inline void xdp_do_check_flushed(struct napi_struct *napi) { }
 struct napi_struct *napi_by_id(unsigned int napi_id);
 void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
 
+#define XMIT_RECURSION_LIMIT	8
+static inline bool dev_xmit_recursion(void)
+{
+	return unlikely(__this_cpu_read(softnet_data.xmit.recursion) >
+			XMIT_RECURSION_LIMIT);
+}
+
+static inline void dev_xmit_recursion_inc(void)
+{
+	__this_cpu_inc(softnet_data.xmit.recursion);
+}
+
+static inline void dev_xmit_recursion_dec(void)
+{
+	__this_cpu_dec(softnet_data.xmit.recursion);
+}
+
 #endif
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers " Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t Eric Dumazet
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

If the device attached to the packet given to enqueue_to_backlog()
is not running, we drop the packet.

But we accidentally increase sd->dropped, giving false signals
to admins: sd->dropped should be reserved to cpu backlog pressure,
not to temporary glitches at device dismantles.

While we are at it, perform the netif_running() test before
we get the rps lock, and use REASON_DEV_READY
drop reason instead of NOT_SPECIFIED.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c136e80dea6182faac153f0cc4149bf8698b6676..4ad7836365e68f700b26dba2c50515a8c18329cf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4801,12 +4801,13 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	unsigned long flags;
 	unsigned int qlen;
 
-	reason = SKB_DROP_REASON_NOT_SPECIFIED;
+	reason = SKB_DROP_REASON_DEV_READY;
+	if (!netif_running(skb->dev))
+		goto bad_dev;
+
 	sd = &per_cpu(softnet_data, cpu);
 
 	backlog_lock_irq_save(sd, &flags);
-	if (!netif_running(skb->dev))
-		goto drop;
 	qlen = skb_queue_len(&sd->input_pkt_queue);
 	if (qlen <= READ_ONCE(net_hotdata.max_backlog) &&
 	    !skb_flow_limit(skb, qlen)) {
@@ -4827,10 +4828,10 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	}
 	reason = SKB_DROP_REASON_CPU_BACKLOG;
 
-drop:
 	sd->dropped++;
 	backlog_unlock_irq_restore(sd, &flags);
 
+bad_dev:
 	dev_core_stats_rx_dropped_inc(skb->dev);
 	kfree_skb_reason(skb, reason);
 	return NET_RX_DROP;
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
                   ` (2 preceding siblings ...)
  2024-03-29 15:42 ` [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup Eric Dumazet
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

If under extreme cpu backlog pressure enqueue_to_backlog() has
to drop a packet, it could do this without dirtying a cache line
and potentially slowing down the target cpu.

Move sd->dropped into a separate cache line, and make it atomic.

In non pressure mode, this field is not touched, no need to consume
valuable space in a hot cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |  3 ++-
 net/core/dev.c            | 13 +++++++++----
 net/core/net-procfs.c     |  3 ++-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 70775021cc269e0983f538619115237b0067d408..1c31cd2691d32064613836141fbdeeebc831b21f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3236,10 +3236,11 @@ struct softnet_data {
 	unsigned int		input_queue_tail;
 #endif
 	unsigned int		received_rps;
-	unsigned int		dropped;
 	struct sk_buff_head	input_pkt_queue;
 	struct napi_struct	backlog;
 
+	atomic_t		dropped ____cacheline_aligned_in_smp;
+
 	/* Another possibly contended cache line */
 	spinlock_t		defer_lock ____cacheline_aligned_in_smp;
 	int			defer_count;
diff --git a/net/core/dev.c b/net/core/dev.c
index 4ad7836365e68f700b26dba2c50515a8c18329cf..02c98f115243202c409ee00c16e08fb0cf4d9ab9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4800,17 +4800,22 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	struct softnet_data *sd;
 	unsigned long flags;
 	unsigned int qlen;
+	int max_backlog;
 
 	reason = SKB_DROP_REASON_DEV_READY;
 	if (!netif_running(skb->dev))
 		goto bad_dev;
 
+	reason = SKB_DROP_REASON_CPU_BACKLOG;
 	sd = &per_cpu(softnet_data, cpu);
 
+	qlen = skb_queue_len_lockless(&sd->input_pkt_queue);
+	max_backlog = READ_ONCE(net_hotdata.max_backlog);
+	if (unlikely(qlen > max_backlog))
+		goto cpu_backlog_drop;
 	backlog_lock_irq_save(sd, &flags);
 	qlen = skb_queue_len(&sd->input_pkt_queue);
-	if (qlen <= READ_ONCE(net_hotdata.max_backlog) &&
-	    !skb_flow_limit(skb, qlen)) {
+	if (qlen <= max_backlog && !skb_flow_limit(skb, qlen)) {
 		if (qlen) {
 enqueue:
 			__skb_queue_tail(&sd->input_pkt_queue, skb);
@@ -4826,11 +4831,11 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 			napi_schedule_rps(sd);
 		goto enqueue;
 	}
-	reason = SKB_DROP_REASON_CPU_BACKLOG;
 
-	sd->dropped++;
 	backlog_unlock_irq_restore(sd, &flags);
 
+cpu_backlog_drop:
+	atomic_inc(&sd->dropped);
 bad_dev:
 	dev_core_stats_rx_dropped_inc(skb->dev);
 	kfree_skb_reason(skb, reason);
diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index a97eceb84e61ec347ad132ff0f22c8ce82f12e90..fa6d3969734a6ec154c3444d1b25ee93edfc5588 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -144,7 +144,8 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq,
 		   "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
 		   "%08x %08x\n",
-		   sd->processed, sd->dropped, sd->time_squeeze, 0,
+		   sd->processed, atomic_read(&sd->dropped),
+		   sd->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   0,	/* was cpu_collision */
 		   sd->received_rps, flow_limit_count,
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
                   ` (3 preceding siblings ...)
  2024-03-29 15:42 ` [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

We can remove a goto and a label by reversing a condition.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 02c98f115243202c409ee00c16e08fb0cf4d9ab9..0a8ccb0451c30a39f8f8b45d26b7e5548b8bfba4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4816,20 +4816,18 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	backlog_lock_irq_save(sd, &flags);
 	qlen = skb_queue_len(&sd->input_pkt_queue);
 	if (qlen <= max_backlog && !skb_flow_limit(skb, qlen)) {
-		if (qlen) {
-enqueue:
-			__skb_queue_tail(&sd->input_pkt_queue, skb);
-			input_queue_tail_incr_save(sd, qtail);
-			backlog_unlock_irq_restore(sd, &flags);
-			return NET_RX_SUCCESS;
+		if (!qlen) {
+			/* Schedule NAPI for backlog device. We can use
+			 * non atomic operation as we own the queue lock.
+			 */
+			if (!__test_and_set_bit(NAPI_STATE_SCHED,
+						&sd->backlog.state))
+				napi_schedule_rps(sd);
 		}
-
-		/* Schedule NAPI for backlog device
-		 * We can use non atomic operation since we own the queue lock
-		 */
-		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state))
-			napi_schedule_rps(sd);
-		goto enqueue;
+		__skb_queue_tail(&sd->input_pkt_queue, skb);
+		input_queue_tail_incr_save(sd, qtail);
+		backlog_unlock_irq_restore(sd, &flags);
+		return NET_RX_SUCCESS;
 	}
 
 	backlog_unlock_irq_restore(sd, &flags);
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
                   ` (4 preceding siblings ...)
  2024-03-29 15:42 ` [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-30 14:46   ` Jason Xing
  2024-03-29 15:42 ` [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper Eric Dumazet
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

input_queue_tail_incr_save() is incrementing the sd queue_tail
and save it in the flow last_qtail.

Two issues here :

- no lock protects the write on last_qtail, we should use appropriate
  annotations.

- We can perform this write after releasing the per-cpu backlog lock,
  to decrease this lock hold duration (move away the cache line miss)

Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
while adding rps_ prefix to better reflect their role.

v2: Fixed a build issue (Jakub and kernel build bots)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 15 ---------------
 include/net/rps.h         | 23 +++++++++++++++++++++++
 net/core/dev.c            | 20 ++++++++++++--------
 3 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3249,21 +3249,6 @@ struct softnet_data {
 	call_single_data_t	defer_csd;
 };
 
-static inline void input_queue_head_incr(struct softnet_data *sd)
-{
-#ifdef CONFIG_RPS
-	sd->input_queue_head++;
-#endif
-}
-
-static inline void input_queue_tail_incr_save(struct softnet_data *sd,
-					      unsigned int *qtail)
-{
-#ifdef CONFIG_RPS
-	*qtail = ++sd->input_queue_tail;
-#endif
-}
-
 DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
 
 static inline int dev_recursion_level(void)
diff --git a/include/net/rps.h b/include/net/rps.h
index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
 #endif
 }
 
+static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
+{
+#ifdef CONFIG_RPS
+	return ++sd->input_queue_tail;
+#else
+	return 0;
+#endif
+}
+
+static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
+{
+#ifdef CONFIG_RPS
+	WRITE_ONCE(*dest, tail);
+#endif
+}
+
+static inline void rps_input_queue_head_incr(struct softnet_data *sd)
+{
+#ifdef CONFIG_RPS
+	sd->input_queue_head++;
+#endif
+}
+
 #endif /* _NET_RPS_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 0a8ccb0451c30a39f8f8b45d26b7e5548b8bfba4..79073bbc9a644049cacf8433310f4641745049e9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4611,7 +4611,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		if (unlikely(tcpu != next_cpu) &&
 		    (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
 		     ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
-		      rflow->last_qtail)) >= 0)) {
+		      READ_ONCE(rflow->last_qtail))) >= 0)) {
 			tcpu = next_cpu;
 			rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
 		}
@@ -4666,7 +4666,7 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
 		cpu = READ_ONCE(rflow->cpu);
 		if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
 		    ((int)(per_cpu(softnet_data, cpu).input_queue_head -
-			   rflow->last_qtail) <
+			   READ_ONCE(rflow->last_qtail)) <
 		     (int)(10 * flow_table->mask)))
 			expire = false;
 	}
@@ -4801,6 +4801,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	unsigned long flags;
 	unsigned int qlen;
 	int max_backlog;
+	u32 tail;
 
 	reason = SKB_DROP_REASON_DEV_READY;
 	if (!netif_running(skb->dev))
@@ -4825,8 +4826,11 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 				napi_schedule_rps(sd);
 		}
 		__skb_queue_tail(&sd->input_pkt_queue, skb);
-		input_queue_tail_incr_save(sd, qtail);
+		tail = rps_input_queue_tail_incr(sd);
 		backlog_unlock_irq_restore(sd, &flags);
+
+		/* save the tail outside of the critical section */
+		rps_input_queue_tail_save(qtail, tail);
 		return NET_RX_SUCCESS;
 	}
 
@@ -5904,7 +5908,7 @@ static void flush_backlog(struct work_struct *work)
 		if (skb->dev->reg_state == NETREG_UNREGISTERING) {
 			__skb_unlink(skb, &sd->input_pkt_queue);
 			dev_kfree_skb_irq(skb);
-			input_queue_head_incr(sd);
+			rps_input_queue_head_incr(sd);
 		}
 	}
 	backlog_unlock_irq_enable(sd);
@@ -5913,7 +5917,7 @@ static void flush_backlog(struct work_struct *work)
 		if (skb->dev->reg_state == NETREG_UNREGISTERING) {
 			__skb_unlink(skb, &sd->process_queue);
 			kfree_skb(skb);
-			input_queue_head_incr(sd);
+			rps_input_queue_head_incr(sd);
 		}
 	}
 	local_bh_enable();
@@ -6041,7 +6045,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
 			rcu_read_lock();
 			__netif_receive_skb(skb);
 			rcu_read_unlock();
-			input_queue_head_incr(sd);
+			rps_input_queue_head_incr(sd);
 			if (++work >= quota)
 				return work;
 
@@ -11455,11 +11459,11 @@ static int dev_cpu_dead(unsigned int oldcpu)
 	/* Process offline CPU's input_pkt_queue */
 	while ((skb = __skb_dequeue(&oldsd->process_queue))) {
 		netif_rx(skb);
-		input_queue_head_incr(oldsd);
+		rps_input_queue_head_incr(oldsd);
 	}
 	while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
 		netif_rx(skb);
-		input_queue_head_incr(oldsd);
+		rps_input_queue_head_incr(oldsd);
 	}
 
 	return 0;
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
                   ` (5 preceding siblings ...)
  2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-29 15:42 ` [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location Eric Dumazet
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

process_backlog() can batch increments of sd->input_queue_head,
saving some memory bandwidth.

Also add READ_ONCE()/WRITE_ONCE() annotations around
sd->input_queue_head accesses.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/rps.h |  9 +++++++--
 net/core/dev.c    | 13 ++++++++-----
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/net/rps.h b/include/net/rps.h
index 10ca25731c1ef766715fe7ee415ad0b71ec643a8..a93401d23d66e45210acc73f0326087813b69d59 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -138,11 +138,16 @@ static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
 #endif
 }
 
-static inline void rps_input_queue_head_incr(struct softnet_data *sd)
+static inline void rps_input_queue_head_add(struct softnet_data *sd, int val)
 {
 #ifdef CONFIG_RPS
-	sd->input_queue_head++;
+	WRITE_ONCE(sd->input_queue_head, sd->input_queue_head + val);
 #endif
 }
 
+static inline void rps_input_queue_head_incr(struct softnet_data *sd)
+{
+	rps_input_queue_head_add(sd, 1);
+}
+
 #endif /* _NET_RPS_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 79073bbc9a644049cacf8433310f4641745049e9..818699dea9d7040ee74532ccdebf01c4fd6887cc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4528,7 +4528,7 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	out:
 #endif
 		rflow->last_qtail =
-			per_cpu(softnet_data, next_cpu).input_queue_head;
+			READ_ONCE(per_cpu(softnet_data, next_cpu).input_queue_head);
 	}
 
 	rflow->cpu = next_cpu;
@@ -4610,7 +4610,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		 */
 		if (unlikely(tcpu != next_cpu) &&
 		    (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
-		     ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
+		     ((int)(READ_ONCE(per_cpu(softnet_data, tcpu).input_queue_head) -
 		      READ_ONCE(rflow->last_qtail))) >= 0)) {
 			tcpu = next_cpu;
 			rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
@@ -4665,7 +4665,7 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
 		rflow = &flow_table->flows[flow_id];
 		cpu = READ_ONCE(rflow->cpu);
 		if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
-		    ((int)(per_cpu(softnet_data, cpu).input_queue_head -
+		    ((int)(READ_ONCE(per_cpu(softnet_data, cpu).input_queue_head) -
 			   READ_ONCE(rflow->last_qtail)) <
 		     (int)(10 * flow_table->mask)))
 			expire = false;
@@ -6045,9 +6045,10 @@ static int process_backlog(struct napi_struct *napi, int quota)
 			rcu_read_lock();
 			__netif_receive_skb(skb);
 			rcu_read_unlock();
-			rps_input_queue_head_incr(sd);
-			if (++work >= quota)
+			if (++work >= quota) {
+				rps_input_queue_head_add(sd, work);
 				return work;
+			}
 
 		}
 
@@ -6070,6 +6071,8 @@ static int process_backlog(struct napi_struct *napi, int quota)
 		backlog_unlock_irq_enable(sd);
 	}
 
+	if (work)
+		rps_input_queue_head_add(sd, work);
 	return work;
 }
 
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
                   ` (6 preceding siblings ...)
  2024-03-29 15:42 ` [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
  2024-03-30 16:11 ` [PATCH v2 net-next 0/8] net: rps: misc changes Jason Xing
  2024-04-02  4:31 ` Jakub Kicinski
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

Commit 14d898f3c1b3 ("dev: Move received_rps counter next
to RPS members in softnet data") was unfortunate:

received_rps is dirtied by a cpu and never read by other
cpus in fast path.

Its presence in the hot RPS cache line (shared by many cpus)
is hurting RPS/RFS performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 14f19cc2616452d7e6afbbaa52f8ad3e61a419e9..274d8db48b4858c70b43ea4628544e924ba6a263 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3203,6 +3203,7 @@ struct softnet_data {
 	struct softnet_data	*rps_ipi_list;
 #endif
 
+	unsigned int		received_rps;
 	bool			in_net_rx_action;
 	bool			in_napi_threaded_poll;
 
@@ -3235,7 +3236,6 @@ struct softnet_data {
 	unsigned int		cpu;
 	unsigned int		input_queue_tail;
 #endif
-	unsigned int		received_rps;
 	struct sk_buff_head	input_pkt_queue;
 	struct napi_struct	backlog;
 
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
  2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
@ 2024-03-30 14:46   ` Jason Xing
  2024-03-30 16:01     ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Xing @ 2024-03-30 14:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet

Hello Eric,

On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
>
> input_queue_tail_incr_save() is incrementing the sd queue_tail
> and save it in the flow last_qtail.
>
> Two issues here :
>
> - no lock protects the write on last_qtail, we should use appropriate
>   annotations.
>
> - We can perform this write after releasing the per-cpu backlog lock,
>   to decrease this lock hold duration (move away the cache line miss)
>
> Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> while adding rps_ prefix to better reflect their role.
>
> v2: Fixed a build issue (Jakub and kernel build bots)
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/netdevice.h | 15 ---------------
>  include/net/rps.h         | 23 +++++++++++++++++++++++
>  net/core/dev.c            | 20 ++++++++++++--------
>  3 files changed, 35 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3249,21 +3249,6 @@ struct softnet_data {
>         call_single_data_t      defer_csd;
>  };
>
> -static inline void input_queue_head_incr(struct softnet_data *sd)
> -{
> -#ifdef CONFIG_RPS
> -       sd->input_queue_head++;
> -#endif
> -}
> -
> -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> -                                             unsigned int *qtail)
> -{
> -#ifdef CONFIG_RPS
> -       *qtail = ++sd->input_queue_tail;
> -#endif
> -}
> -
>  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
>
>  static inline int dev_recursion_level(void)
> diff --git a/include/net/rps.h b/include/net/rps.h
> index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> --- a/include/net/rps.h
> +++ b/include/net/rps.h
> @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
>  #endif
>  }
>
> +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> +{
> +#ifdef CONFIG_RPS
> +       return ++sd->input_queue_tail;
> +#else
> +       return 0;
> +#endif
> +}
> +
> +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> +{
> +#ifdef CONFIG_RPS
> +       WRITE_ONCE(*dest, tail);
> +#endif
> +}

I wonder if we should also call this new helper to WRITE_ONCE
last_qtail in the set_rps_cpu()?

Thanks,
Jason

> +
> +static inline void rps_input_queue_head_incr(struct softnet_data *sd)
> +{
> +#ifdef CONFIG_RPS
> +       sd->input_queue_head++;
> +#endif
> +}
> +
>  #endif /* _NET_RPS_H */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0a8ccb0451c30a39f8f8b45d26b7e5548b8bfba4..79073bbc9a644049cacf8433310f4641745049e9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4611,7 +4611,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>                 if (unlikely(tcpu != next_cpu) &&
>                     (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
>                      ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
> -                     rflow->last_qtail)) >= 0)) {
> +                     READ_ONCE(rflow->last_qtail))) >= 0)) {
>                         tcpu = next_cpu;
>                         rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
>                 }
> @@ -4666,7 +4666,7 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
>                 cpu = READ_ONCE(rflow->cpu);
>                 if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
>                     ((int)(per_cpu(softnet_data, cpu).input_queue_head -
> -                          rflow->last_qtail) <
> +                          READ_ONCE(rflow->last_qtail)) <
>                      (int)(10 * flow_table->mask)))
>                         expire = false;
>         }
> @@ -4801,6 +4801,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
>         unsigned long flags;
>         unsigned int qlen;
>         int max_backlog;
> +       u32 tail;
>
>         reason = SKB_DROP_REASON_DEV_READY;
>         if (!netif_running(skb->dev))
> @@ -4825,8 +4826,11 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
>                                 napi_schedule_rps(sd);
>                 }
>                 __skb_queue_tail(&sd->input_pkt_queue, skb);
> -               input_queue_tail_incr_save(sd, qtail);
> +               tail = rps_input_queue_tail_incr(sd);
>                 backlog_unlock_irq_restore(sd, &flags);
> +
> +               /* save the tail outside of the critical section */
> +               rps_input_queue_tail_save(qtail, tail);
>                 return NET_RX_SUCCESS;
>         }
>
> @@ -5904,7 +5908,7 @@ static void flush_backlog(struct work_struct *work)
>                 if (skb->dev->reg_state == NETREG_UNREGISTERING) {
>                         __skb_unlink(skb, &sd->input_pkt_queue);
>                         dev_kfree_skb_irq(skb);
> -                       input_queue_head_incr(sd);
> +                       rps_input_queue_head_incr(sd);
>                 }
>         }
>         backlog_unlock_irq_enable(sd);
> @@ -5913,7 +5917,7 @@ static void flush_backlog(struct work_struct *work)
>                 if (skb->dev->reg_state == NETREG_UNREGISTERING) {
>                         __skb_unlink(skb, &sd->process_queue);
>                         kfree_skb(skb);
> -                       input_queue_head_incr(sd);
> +                       rps_input_queue_head_incr(sd);
>                 }
>         }
>         local_bh_enable();
> @@ -6041,7 +6045,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
>                         rcu_read_lock();
>                         __netif_receive_skb(skb);
>                         rcu_read_unlock();
> -                       input_queue_head_incr(sd);
> +                       rps_input_queue_head_incr(sd);
>                         if (++work >= quota)
>                                 return work;
>
> @@ -11455,11 +11459,11 @@ static int dev_cpu_dead(unsigned int oldcpu)
>         /* Process offline CPU's input_pkt_queue */
>         while ((skb = __skb_dequeue(&oldsd->process_queue))) {
>                 netif_rx(skb);
> -               input_queue_head_incr(oldsd);
> +               rps_input_queue_head_incr(oldsd);
>         }
>         while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
>                 netif_rx(skb);
> -               input_queue_head_incr(oldsd);
> +               rps_input_queue_head_incr(oldsd);
>         }
>
>         return 0;
> --
> 2.44.0.478.gd926399ef9-goog
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
  2024-03-30 14:46   ` Jason Xing
@ 2024-03-30 16:01     ` Eric Dumazet
  2024-03-30 16:07         ` Jason Xing
  2024-04-12 15:09       ` Jason Xing
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-30 16:01 UTC (permalink / raw)
  To: Jason Xing
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet

On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > and save it in the flow last_qtail.
> >
> > Two issues here :
> >
> > - no lock protects the write on last_qtail, we should use appropriate
> >   annotations.
> >
> > - We can perform this write after releasing the per-cpu backlog lock,
> >   to decrease this lock hold duration (move away the cache line miss)
> >
> > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > while adding rps_ prefix to better reflect their role.
> >
> > v2: Fixed a build issue (Jakub and kernel build bots)
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  include/linux/netdevice.h | 15 ---------------
> >  include/net/rps.h         | 23 +++++++++++++++++++++++
> >  net/core/dev.c            | 20 ++++++++++++--------
> >  3 files changed, 35 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3249,21 +3249,6 @@ struct softnet_data {
> >         call_single_data_t      defer_csd;
> >  };
> >
> > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > -{
> > -#ifdef CONFIG_RPS
> > -       sd->input_queue_head++;
> > -#endif
> > -}
> > -
> > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > -                                             unsigned int *qtail)
> > -{
> > -#ifdef CONFIG_RPS
> > -       *qtail = ++sd->input_queue_tail;
> > -#endif
> > -}
> > -
> >  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> >
> >  static inline int dev_recursion_level(void)
> > diff --git a/include/net/rps.h b/include/net/rps.h
> > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > --- a/include/net/rps.h
> > +++ b/include/net/rps.h
> > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> >  #endif
> >  }
> >
> > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > +{
> > +#ifdef CONFIG_RPS
> > +       return ++sd->input_queue_tail;
> > +#else
> > +       return 0;
> > +#endif
> > +}
> > +
> > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > +{
> > +#ifdef CONFIG_RPS
> > +       WRITE_ONCE(*dest, tail);
> > +#endif
> > +}
>
> I wonder if we should also call this new helper to WRITE_ONCE
> last_qtail in the set_rps_cpu()?
>

Absolutely, I have another patch series to address remaining races
(rflow->cpu, rflow->filter ...)

I chose to make a small one, to ease reviews.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
@ 2024-03-30 16:07         ` Jason Xing
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-03-30 16:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet

On Sun, Mar 31, 2024 at 12:01 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > >   annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > >   to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  include/linux/netdevice.h | 15 ---------------
> > >  include/net/rps.h         | 23 +++++++++++++++++++++++
> > >  net/core/dev.c            | 20 ++++++++++++--------
> > >  3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > >         call_single_data_t      defer_csd;
> > >  };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > -                                             unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > >  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > >  static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > >  #endif
> > >  }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       return ++sd->input_queue_tail;
> > > +#else
> > > +       return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.

Great. Now I have no more questions :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
@ 2024-03-30 16:07         ` Jason Xing
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-03-30 16:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 15737 bytes --]

On Sun, Mar 31, 2024 at 12:01 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > >   annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > >   to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  include/linux/netdevice.h | 15 ---------------
> > >  include/net/rps.h         | 23 +++++++++++++++++++++++
> > >  net/core/dev.c            | 20 ++++++++++++--------
> > >  3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > >         call_single_data_t      defer_csd;
> > >  };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > -                                             unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > >  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > >  static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > >  #endif
> > >  }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       return ++sd->input_queue_tail;
> > > +#else
> > > +       return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.

Great. Now I have no more questions :)

Thanks,
Jason

X-sender: <netdev+bounces-83512-steffen.klassert=secunet.com@vger.kernel.org>
X-Receiver: <steffen.klassert@secunet.com> ORCPT=rfc822;steffen.klassert@secunet.com
X-CreatedBy: MSExchange15
X-HeloDomain: a.mx.secunet.com
X-ExtendedProps: BQBjAAoA8pGmlidQ3AgFAGEACAABAAAABQA3AAIAAA8APAAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuTWFpbFJlY2lwaWVudC5Pcmdhbml6YXRpb25TY29wZREAAAAAAAAAAAAAAAAAAAAAAA=X-Source: SMTP:Default MBX-ESSEN-02
X-SourceIPAddress: 62.96.220.36
X-EndOfInjectedXHeaders: 12056
Received: from cas-essen-02.secunet.de (10.53.40.202) by
 mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.37; Sat, 30 Mar 2024 17:08:31 +0100
Received: from a.mx.secunet.com (62.96.220.36) by cas-essen-02.secunet.de
 (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend
 Transport; Sat, 30 Mar 2024 17:08:31 +0100
Received: from localhost (localhost [127.0.0.1])
	by a.mx.secunet.com (Postfix) with ESMTP id 650D92087D
	for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 17:08:31 +0100 (CET)
X-Virus-Scanned: by secunet
X-Spam-Flag: NO
X-Spam-Score: -2.749
X-Spam-Level:
X-Spam-Status: No, score=-2.749 tagged_above=-999 required=2.1
	tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001,
	FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249,
	MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_NONE=-0.0001,
	SPF_HELO_NONE=0.001, SPF_PASS=-0.001]
	autolearn=unavailable autolearn_force=no
Authentication-Results: a.mx.secunet.com (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from a.mx.secunet.com ([127.0.0.1])
	by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Xiz4JgMEUFHd for <steffen.klassert@secunet.com>;
	Sat, 30 Mar 2024 17:08:30 +0100 (CET)
Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip\x147.75.48.161; helo=sy.mirrors.kernel.org; envelope-from=netdev+bounces-83512-steffen.klassert=secunet.com@vger.kernel.org; receiver=steffen.klassert@secunet.com
DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 79CED20758
Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org [147.75.48.161])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by a.mx.secunet.com (Postfix) with ESMTPS id 79CED20758
	for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 17:08:29 +0100 (CET)
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sy.mirrors.kernel.org (Postfix) with ESMTPS id D770AB21F24
	for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 16:08:24 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 1357B3A1D8;
	Sat, 30 Mar 2024 16:08:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lbJ15pgk"
X-Original-To: netdev@vger.kernel.org
Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C524F335D8
	for <netdev@vger.kernel.org>; Sat, 30 Mar 2024 16:08:16 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip 9.85.218.54
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t\x1711814898; cv=none; b=TtNtvnRM4mLHxpsyuSf2oUJ4eUHilE+lOUFYTkGUrBLVItc17kCESpcfGtrxRgYLCIruzsDy2kCRV4l0exBhyX2nSmWPvwb6KAROqkBsT8lZi1KqUNBI1tAml0RTmdUP3MyX8zS6X0RMwjNxxE4FdPsrOpnVrjxN+0yzymEAs4YARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t\x1711814898; c=relaxed/simple;
	bh=aZ+ahDgFULwD6gTd+XIhN5rHorld6C4W6kMFpEbq40E=;
	h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject:
	 To:Cc:Content-Type; b=p/Bid6CqnEu/kbI4uE+IoSDwyUpAzD6f1enGfEqH2f2Gczw99UJZjg2xkBjvMRLRX+eeKJ4CL6/bs00gzWZMnNROzrPYeBVpCHQkcBZrk7Y3/6IrOKn5I8TZJroZK3X8kC1xo6V3ivodwx5O4gCj591YGDTy2AeZ/O57CTJMVE0ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lbJ15pgk; arc=none smtp.client-ip 9.85.218.54
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a4e4e7057fbso46227466b.0
        for <netdev@vger.kernel.org>; Sat, 30 Mar 2024 09:08:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s 230601; t\x1711814895; x\x1712419695; darn=vger.kernel.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=tqYtiFl3gMtiR/slz7/31AGueSg/Uoe5u6YbVitMFwQ=;
        b=lbJ15pgkjoFIosMOnM15oDV4ZjsVOR+mLG7MpqWIe/OxlPUzLHiUCKsp9rT7wvG1/v
         +FWT5/m8KnaR7+HlmP/96uL5L0YSEtfv4iNb5zaSGe31yGn2ybLrNPbKTllri+aecnrz
         5g0ch8eiMFa1/lZSV8IhhaAXQ5qtQM10Ark4YR/Uyg0uDcytyz0u+TNvu6yDei+2Fi8H
         tfz0DE/lSBnTJ59PqGsCwpFtSYcpirfgAukTfu/EH6DBDOXXHGq37KCLeq0QzHgLHnKd
         zBOpR0SHS4OUA+5bnxBqzOkJyRwV1+l6Vh4lOWMUr00oQGYQg+j9azLILp8x0wrOMLU5
         MnDA=X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d\x1e100.net; s 230601; t\x1711814895; x\x1712419695;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=tqYtiFl3gMtiR/slz7/31AGueSg/Uoe5u6YbVitMFwQ=;
        b=u/CA6LmMHt4V6bE8x8M7s37MmylQALCtHq4+mceDQZAqNd4/fAxZPgDOZl6GVJR59W
         gsiGrrlsE73I4X7lUvt2DB8/WNP+8GmqitAd5O3Kv1sVowW6igHWTevgTIOL7hNfcHE7
         C9WkfXWXhlqxz0RDcQv6PpVuwihJtGNHexUNRcc2prg2EvRQgD/JvUU9CT9jT4USy3fy
         5LiLJGbdeyMnYn2x5wn/H72miFHTUI7ZjvU0ftP4du63VN8sIyL0ZcXBX6ehQEkTn1+r
         iOMpJwiwtWWTeL7+mPbfqUNXqqG7lVjo/jbWeqeZqEQYj3fUCQVFyN013hSM2C9MGF20
         CrhQ=X-Forwarded-Encrypted: i=1; AJvYcCWTtXU+6pEXQJqrUGJAmKipoLc4xvGdwzemL7yJ/6GhHzxpfU+6n9utKv0EVZFUkG7jmZs4rHIzmrUaxO1F/1uCg7WgeUqR
X-Gm-Message-State: AOJu0YxLq3mAZHis+WeS7/yyLMaRyTmizM9mxWipMwQ5aIcfR8N4nBSE
	c1l9c8aoYfwDbWd6s2y7sTXtt/bCVy3m/QycqxaUn4FHoiTgq3VchaGc3Fo+4NbUzjlStDJy17X
	08ncnUrtnyGt+5GX/AB+9H2+N10QX-Google-Smtp-Source: AGHT+IGd2O8mKMifiEWtXwxmvNbMq1I5C6mjN0KPpO4vKSiTRx2qpOuLgKan1Goru7+nEAF1Dq82X61eIrdjqwvC7yUX-Received: by 2002:a17:907:86a0:b0:a4e:5abe:8164 with SMTP id
 qa32-20020a17090786a000b00a4e5abe8164mr115987ejc.59.1711814894828; Sat, 30
 Mar 2024 09:08:14 -0700 (PDT)
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
References: <20240329154225.349288-1-edumazet@google.com> <20240329154225.349288-7-edumazet@google.com>
 <CAL+tcoBa1g1Ps5V_P1TqVtGWD482AvSy=wgvvUMT3RCHH+x2=Q@mail.gmail.com> <CANn89iJKQuSLUivtGQRNxA2Xd3t8n68GQ_BAz2dp28eU9wzVcg@mail.gmail.com>
In-Reply-To: <CANn89iJKQuSLUivtGQRNxA2Xd3t8n68GQ_BAz2dp28eU9wzVcg@mail.gmail.com>
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Sun, 31 Mar 2024 00:07:37 +0800
Message-ID: <CAL+tcoBqrWX0AMFfGOJoTTT6-Mgc7HqmCU0-bzLH7rrxBGqEng@mail.gmail.com>
Subject: Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, eric.dumazet@gmail.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Return-Path: netdev+bounces-83512-steffen.klassert=secunet.com@vger.kernel.org
X-MS-Exchange-Organization-OriginalArrivalTime: 30 Mar 2024 16:08:31.4465
 (UTC)
X-MS-Exchange-Organization-Network-Message-Id: 5dfc8bdd-92fb-49c1-dd23-08dc50d3a4e3
X-MS-Exchange-Organization-OriginalClientIPAddress: 62.96.220.36
X-MS-Exchange-Organization-OriginalServerIPAddress: 10.53.40.202
X-MS-Exchange-Organization-Cross-Premises-Headers-Processed: cas-essen-02.secunet.de
X-MS-Exchange-Organization-OrderedPrecisionLatencyInProgress: LSRVÊs-essen-02.secunet.de:TOTAL-FE=0.010|SMR=0.010(SMRPI=0.007(SMRPI-FrontendProxyAgent=0.007));2024-03-30T16:08:31.456Z
X-MS-Exchange-Forest-ArrivalHubServer: mbx-essen-02.secunet.de
X-MS-Exchange-Organization-AuthSource: cas-essen-02.secunet.de
X-MS-Exchange-Organization-AuthAs: Anonymous
X-MS-Exchange-Organization-FromEntityHeader: Internet
X-MS-Exchange-Organization-OriginalSize: 11653

On Sun, Mar 31, 2024 at 12:01 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > >   annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > >   to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  include/linux/netdevice.h | 15 ---------------
> > >  include/net/rps.h         | 23 +++++++++++++++++++++++
> > >  net/core/dev.c            | 20 ++++++++++++--------
> > >  3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > >         call_single_data_t      defer_csd;
> > >  };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > -                                             unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > >  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > >  static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > >  #endif
> > >  }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       return ++sd->input_queue_tail;
> > > +#else
> > > +       return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.

Great. Now I have no more questions :)

Thanks,
Jason


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/8] net: rps: misc changes
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
                   ` (7 preceding siblings ...)
  2024-03-29 15:42 ` [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location Eric Dumazet
@ 2024-03-30 16:11 ` Jason Xing
  2024-04-02  4:31 ` Jakub Kicinski
  9 siblings, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-03-30 16:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet

On Fri, Mar 29, 2024 at 11:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Make RPS/RFS a bit more efficient with better cache locality
> and heuristics.
>
> Aso shrink include/linux/netdevice.h a bit.
>
> v2: fixed a build issue in patch 6/8 with CONFIG_RPS=n
>     (Jakub and kernel build bots)
>
> Eric Dumazet (8):
>   net: move kick_defer_list_purge() to net/core/dev.h
>   net: move dev_xmit_recursion() helpers to net/core/dev.h
>   net: enqueue_to_backlog() change vs not running device
>   net: make softnet_data.dropped an atomic_t
>   net: enqueue_to_backlog() cleanup
>   net: rps: change input_queue_tail_incr_save()
>   net: rps: add rps_input_queue_head_add() helper
>   net: rps: move received_rps field to a better location
>
>  include/linux/netdevice.h | 38 ++------------------
>  include/net/rps.h         | 28 +++++++++++++++
>  net/core/dev.c            | 73 ++++++++++++++++++++++-----------------
>  net/core/dev.h            | 23 ++++++++++--
>  net/core/net-procfs.c     |  3 +-
>  5 files changed, 95 insertions(+), 70 deletions(-)

For this series, feel free to add:

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thanks!

>
> --
> 2.44.0.478.gd926399ef9-goog
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/8] net: rps: misc changes
  2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
                   ` (8 preceding siblings ...)
  2024-03-30 16:11 ` [PATCH v2 net-next 0/8] net: rps: misc changes Jason Xing
@ 2024-04-02  4:31 ` Jakub Kicinski
  9 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2024-04-02  4:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet

On Fri, 29 Mar 2024 15:42:17 +0000 Eric Dumazet wrote:
> Make RPS/RFS a bit more efficient with better cache locality
> and heuristics.
> 
> Aso shrink include/linux/netdevice.h a bit.
> 
> v2: fixed a build issue in patch 6/8 with CONFIG_RPS=n
>     (Jakub and kernel build bots)

FTR already applied by DaveM last night.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
  2024-03-30 16:01     ` Eric Dumazet
  2024-03-30 16:07         ` Jason Xing
@ 2024-04-12 15:09       ` Jason Xing
  1 sibling, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-04-12 15:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet

On Sun, Mar 31, 2024 at 12:01 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > >   annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > >   to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  include/linux/netdevice.h | 15 ---------------
> > >  include/net/rps.h         | 23 +++++++++++++++++++++++
> > >  net/core/dev.c            | 20 ++++++++++++--------
> > >  3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > >         call_single_data_t      defer_csd;
> > >  };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > -                                             unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > -       *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > >  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > >  static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > >  #endif
> > >  }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       return ++sd->input_queue_tail;
> > > +#else
> > > +       return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > +       WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.

Hello Eric,

I wonder if you already have a patchset to change those three members
in struct rps_dev_flow? I looked through this part and found it's not
that complicated. So if not, I can do it :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-04-12 15:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers " Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
2024-03-30 14:46   ` Jason Xing
2024-03-30 16:01     ` Eric Dumazet
2024-03-30 16:07       ` Jason Xing
2024-03-30 16:07         ` Jason Xing
2024-04-12 15:09       ` Jason Xing
2024-03-29 15:42 ` [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location Eric Dumazet
2024-03-30 16:11 ` [PATCH v2 net-next 0/8] net: rps: misc changes Jason Xing
2024-04-02  4:31 ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.