* [PATCH v2 net-next 0/8] net: rps: misc changes
@ 2024-03-29 15:42 Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
` (9 more replies)
0 siblings, 10 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Make RPS/RFS a bit more efficient with better cache locality
and heuristics.
Aso shrink include/linux/netdevice.h a bit.
v2: fixed a build issue in patch 6/8 with CONFIG_RPS=n
(Jakub and kernel build bots)
Eric Dumazet (8):
net: move kick_defer_list_purge() to net/core/dev.h
net: move dev_xmit_recursion() helpers to net/core/dev.h
net: enqueue_to_backlog() change vs not running device
net: make softnet_data.dropped an atomic_t
net: enqueue_to_backlog() cleanup
net: rps: change input_queue_tail_incr_save()
net: rps: add rps_input_queue_head_add() helper
net: rps: move received_rps field to a better location
include/linux/netdevice.h | 38 ++------------------
include/net/rps.h | 28 +++++++++++++++
net/core/dev.c | 73 ++++++++++++++++++++++-----------------
net/core/dev.h | 23 ++++++++++--
net/core/net-procfs.c | 3 +-
5 files changed, 95 insertions(+), 70 deletions(-)
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers " Eric Dumazet
` (8 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
kick_defer_list_purge() is defined in net/core/dev.c
and used from net/core/skubff.c
Because we need softnet_data, include <linux/netdevice.h>
from net/core/dev.h
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netdevice.h | 1 -
net/core/dev.h | 6 +++---
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e41d30ebaca61e48a2ceb43edf777fa8b9859ef2..cb37817d6382c29117afd8ce54db6dba94f8c930 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3287,7 +3287,6 @@ static inline void dev_xmit_recursion_dec(void)
__this_cpu_dec(softnet_data.xmit.recursion);
}
-void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
void __netif_schedule(struct Qdisc *q);
void netif_schedule_queue(struct netdev_queue *txq);
diff --git a/net/core/dev.h b/net/core/dev.h
index 2bcaf8eee50c179db2ca59880521b9be6ecd45c8..9d0f8b4587f81f4c12487f1783d8ba5cc49fc1d6 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -4,11 +4,9 @@
#include <linux/types.h>
#include <linux/rwsem.h>
+#include <linux/netdevice.h>
struct net;
-struct net_device;
-struct netdev_bpf;
-struct netdev_phys_item_id;
struct netlink_ext_ack;
struct cpumask;
@@ -150,4 +148,6 @@ static inline void xdp_do_check_flushed(struct napi_struct *napi) { }
#endif
struct napi_struct *napi_by_id(unsigned int napi_id);
+void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
+
#endif
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers to net/core/dev.h
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device Eric Dumazet
` (7 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Move dev_xmit_recursion() and friends to net/core/dev.h
They are only used from net/core/dev.c and net/core/filter.c.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netdevice.h | 17 -----------------
net/core/dev.h | 17 +++++++++++++++++
2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb37817d6382c29117afd8ce54db6dba94f8c930..70775021cc269e0983f538619115237b0067d408 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3270,23 +3270,6 @@ static inline int dev_recursion_level(void)
return this_cpu_read(softnet_data.xmit.recursion);
}
-#define XMIT_RECURSION_LIMIT 8
-static inline bool dev_xmit_recursion(void)
-{
- return unlikely(__this_cpu_read(softnet_data.xmit.recursion) >
- XMIT_RECURSION_LIMIT);
-}
-
-static inline void dev_xmit_recursion_inc(void)
-{
- __this_cpu_inc(softnet_data.xmit.recursion);
-}
-
-static inline void dev_xmit_recursion_dec(void)
-{
- __this_cpu_dec(softnet_data.xmit.recursion);
-}
-
void __netif_schedule(struct Qdisc *q);
void netif_schedule_queue(struct netdev_queue *txq);
diff --git a/net/core/dev.h b/net/core/dev.h
index 9d0f8b4587f81f4c12487f1783d8ba5cc49fc1d6..8572d2c8dc4adce75c98868c888363e6a32e0f52 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -150,4 +150,21 @@ static inline void xdp_do_check_flushed(struct napi_struct *napi) { }
struct napi_struct *napi_by_id(unsigned int napi_id);
void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
+#define XMIT_RECURSION_LIMIT 8
+static inline bool dev_xmit_recursion(void)
+{
+ return unlikely(__this_cpu_read(softnet_data.xmit.recursion) >
+ XMIT_RECURSION_LIMIT);
+}
+
+static inline void dev_xmit_recursion_inc(void)
+{
+ __this_cpu_inc(softnet_data.xmit.recursion);
+}
+
+static inline void dev_xmit_recursion_dec(void)
+{
+ __this_cpu_dec(softnet_data.xmit.recursion);
+}
+
#endif
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers " Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t Eric Dumazet
` (6 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
If the device attached to the packet given to enqueue_to_backlog()
is not running, we drop the packet.
But we accidentally increase sd->dropped, giving false signals
to admins: sd->dropped should be reserved to cpu backlog pressure,
not to temporary glitches at device dismantles.
While we are at it, perform the netif_running() test before
we get the rps lock, and use REASON_DEV_READY
drop reason instead of NOT_SPECIFIED.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index c136e80dea6182faac153f0cc4149bf8698b6676..4ad7836365e68f700b26dba2c50515a8c18329cf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4801,12 +4801,13 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
unsigned long flags;
unsigned int qlen;
- reason = SKB_DROP_REASON_NOT_SPECIFIED;
+ reason = SKB_DROP_REASON_DEV_READY;
+ if (!netif_running(skb->dev))
+ goto bad_dev;
+
sd = &per_cpu(softnet_data, cpu);
backlog_lock_irq_save(sd, &flags);
- if (!netif_running(skb->dev))
- goto drop;
qlen = skb_queue_len(&sd->input_pkt_queue);
if (qlen <= READ_ONCE(net_hotdata.max_backlog) &&
!skb_flow_limit(skb, qlen)) {
@@ -4827,10 +4828,10 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
}
reason = SKB_DROP_REASON_CPU_BACKLOG;
-drop:
sd->dropped++;
backlog_unlock_irq_restore(sd, &flags);
+bad_dev:
dev_core_stats_rx_dropped_inc(skb->dev);
kfree_skb_reason(skb, reason);
return NET_RX_DROP;
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
` (2 preceding siblings ...)
2024-03-29 15:42 ` [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup Eric Dumazet
` (5 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
If under extreme cpu backlog pressure enqueue_to_backlog() has
to drop a packet, it could do this without dirtying a cache line
and potentially slowing down the target cpu.
Move sd->dropped into a separate cache line, and make it atomic.
In non pressure mode, this field is not touched, no need to consume
valuable space in a hot cache line.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netdevice.h | 3 ++-
net/core/dev.c | 13 +++++++++----
net/core/net-procfs.c | 3 ++-
3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 70775021cc269e0983f538619115237b0067d408..1c31cd2691d32064613836141fbdeeebc831b21f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3236,10 +3236,11 @@ struct softnet_data {
unsigned int input_queue_tail;
#endif
unsigned int received_rps;
- unsigned int dropped;
struct sk_buff_head input_pkt_queue;
struct napi_struct backlog;
+ atomic_t dropped ____cacheline_aligned_in_smp;
+
/* Another possibly contended cache line */
spinlock_t defer_lock ____cacheline_aligned_in_smp;
int defer_count;
diff --git a/net/core/dev.c b/net/core/dev.c
index 4ad7836365e68f700b26dba2c50515a8c18329cf..02c98f115243202c409ee00c16e08fb0cf4d9ab9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4800,17 +4800,22 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
struct softnet_data *sd;
unsigned long flags;
unsigned int qlen;
+ int max_backlog;
reason = SKB_DROP_REASON_DEV_READY;
if (!netif_running(skb->dev))
goto bad_dev;
+ reason = SKB_DROP_REASON_CPU_BACKLOG;
sd = &per_cpu(softnet_data, cpu);
+ qlen = skb_queue_len_lockless(&sd->input_pkt_queue);
+ max_backlog = READ_ONCE(net_hotdata.max_backlog);
+ if (unlikely(qlen > max_backlog))
+ goto cpu_backlog_drop;
backlog_lock_irq_save(sd, &flags);
qlen = skb_queue_len(&sd->input_pkt_queue);
- if (qlen <= READ_ONCE(net_hotdata.max_backlog) &&
- !skb_flow_limit(skb, qlen)) {
+ if (qlen <= max_backlog && !skb_flow_limit(skb, qlen)) {
if (qlen) {
enqueue:
__skb_queue_tail(&sd->input_pkt_queue, skb);
@@ -4826,11 +4831,11 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
napi_schedule_rps(sd);
goto enqueue;
}
- reason = SKB_DROP_REASON_CPU_BACKLOG;
- sd->dropped++;
backlog_unlock_irq_restore(sd, &flags);
+cpu_backlog_drop:
+ atomic_inc(&sd->dropped);
bad_dev:
dev_core_stats_rx_dropped_inc(skb->dev);
kfree_skb_reason(skb, reason);
diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index a97eceb84e61ec347ad132ff0f22c8ce82f12e90..fa6d3969734a6ec154c3444d1b25ee93edfc5588 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -144,7 +144,8 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
seq_printf(seq,
"%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
"%08x %08x\n",
- sd->processed, sd->dropped, sd->time_squeeze, 0,
+ sd->processed, atomic_read(&sd->dropped),
+ sd->time_squeeze, 0,
0, 0, 0, 0, /* was fastroute */
0, /* was cpu_collision */
sd->received_rps, flow_limit_count,
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
` (3 preceding siblings ...)
2024-03-29 15:42 ` [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
` (4 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We can remove a goto and a label by reversing a condition.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 02c98f115243202c409ee00c16e08fb0cf4d9ab9..0a8ccb0451c30a39f8f8b45d26b7e5548b8bfba4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4816,20 +4816,18 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
backlog_lock_irq_save(sd, &flags);
qlen = skb_queue_len(&sd->input_pkt_queue);
if (qlen <= max_backlog && !skb_flow_limit(skb, qlen)) {
- if (qlen) {
-enqueue:
- __skb_queue_tail(&sd->input_pkt_queue, skb);
- input_queue_tail_incr_save(sd, qtail);
- backlog_unlock_irq_restore(sd, &flags);
- return NET_RX_SUCCESS;
+ if (!qlen) {
+ /* Schedule NAPI for backlog device. We can use
+ * non atomic operation as we own the queue lock.
+ */
+ if (!__test_and_set_bit(NAPI_STATE_SCHED,
+ &sd->backlog.state))
+ napi_schedule_rps(sd);
}
-
- /* Schedule NAPI for backlog device
- * We can use non atomic operation since we own the queue lock
- */
- if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state))
- napi_schedule_rps(sd);
- goto enqueue;
+ __skb_queue_tail(&sd->input_pkt_queue, skb);
+ input_queue_tail_incr_save(sd, qtail);
+ backlog_unlock_irq_restore(sd, &flags);
+ return NET_RX_SUCCESS;
}
backlog_unlock_irq_restore(sd, &flags);
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
` (4 preceding siblings ...)
2024-03-29 15:42 ` [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-30 14:46 ` Jason Xing
2024-03-29 15:42 ` [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper Eric Dumazet
` (3 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
input_queue_tail_incr_save() is incrementing the sd queue_tail
and save it in the flow last_qtail.
Two issues here :
- no lock protects the write on last_qtail, we should use appropriate
annotations.
- We can perform this write after releasing the per-cpu backlog lock,
to decrease this lock hold duration (move away the cache line miss)
Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
while adding rps_ prefix to better reflect their role.
v2: Fixed a build issue (Jakub and kernel build bots)
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netdevice.h | 15 ---------------
include/net/rps.h | 23 +++++++++++++++++++++++
net/core/dev.c | 20 ++++++++++++--------
3 files changed, 35 insertions(+), 23 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3249,21 +3249,6 @@ struct softnet_data {
call_single_data_t defer_csd;
};
-static inline void input_queue_head_incr(struct softnet_data *sd)
-{
-#ifdef CONFIG_RPS
- sd->input_queue_head++;
-#endif
-}
-
-static inline void input_queue_tail_incr_save(struct softnet_data *sd,
- unsigned int *qtail)
-{
-#ifdef CONFIG_RPS
- *qtail = ++sd->input_queue_tail;
-#endif
-}
-
DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
static inline int dev_recursion_level(void)
diff --git a/include/net/rps.h b/include/net/rps.h
index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
#endif
}
+static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
+{
+#ifdef CONFIG_RPS
+ return ++sd->input_queue_tail;
+#else
+ return 0;
+#endif
+}
+
+static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
+{
+#ifdef CONFIG_RPS
+ WRITE_ONCE(*dest, tail);
+#endif
+}
+
+static inline void rps_input_queue_head_incr(struct softnet_data *sd)
+{
+#ifdef CONFIG_RPS
+ sd->input_queue_head++;
+#endif
+}
+
#endif /* _NET_RPS_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 0a8ccb0451c30a39f8f8b45d26b7e5548b8bfba4..79073bbc9a644049cacf8433310f4641745049e9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4611,7 +4611,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
if (unlikely(tcpu != next_cpu) &&
(tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
((int)(per_cpu(softnet_data, tcpu).input_queue_head -
- rflow->last_qtail)) >= 0)) {
+ READ_ONCE(rflow->last_qtail))) >= 0)) {
tcpu = next_cpu;
rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
}
@@ -4666,7 +4666,7 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
cpu = READ_ONCE(rflow->cpu);
if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
((int)(per_cpu(softnet_data, cpu).input_queue_head -
- rflow->last_qtail) <
+ READ_ONCE(rflow->last_qtail)) <
(int)(10 * flow_table->mask)))
expire = false;
}
@@ -4801,6 +4801,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
unsigned long flags;
unsigned int qlen;
int max_backlog;
+ u32 tail;
reason = SKB_DROP_REASON_DEV_READY;
if (!netif_running(skb->dev))
@@ -4825,8 +4826,11 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
napi_schedule_rps(sd);
}
__skb_queue_tail(&sd->input_pkt_queue, skb);
- input_queue_tail_incr_save(sd, qtail);
+ tail = rps_input_queue_tail_incr(sd);
backlog_unlock_irq_restore(sd, &flags);
+
+ /* save the tail outside of the critical section */
+ rps_input_queue_tail_save(qtail, tail);
return NET_RX_SUCCESS;
}
@@ -5904,7 +5908,7 @@ static void flush_backlog(struct work_struct *work)
if (skb->dev->reg_state == NETREG_UNREGISTERING) {
__skb_unlink(skb, &sd->input_pkt_queue);
dev_kfree_skb_irq(skb);
- input_queue_head_incr(sd);
+ rps_input_queue_head_incr(sd);
}
}
backlog_unlock_irq_enable(sd);
@@ -5913,7 +5917,7 @@ static void flush_backlog(struct work_struct *work)
if (skb->dev->reg_state == NETREG_UNREGISTERING) {
__skb_unlink(skb, &sd->process_queue);
kfree_skb(skb);
- input_queue_head_incr(sd);
+ rps_input_queue_head_incr(sd);
}
}
local_bh_enable();
@@ -6041,7 +6045,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
rcu_read_lock();
__netif_receive_skb(skb);
rcu_read_unlock();
- input_queue_head_incr(sd);
+ rps_input_queue_head_incr(sd);
if (++work >= quota)
return work;
@@ -11455,11 +11459,11 @@ static int dev_cpu_dead(unsigned int oldcpu)
/* Process offline CPU's input_pkt_queue */
while ((skb = __skb_dequeue(&oldsd->process_queue))) {
netif_rx(skb);
- input_queue_head_incr(oldsd);
+ rps_input_queue_head_incr(oldsd);
}
while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
netif_rx(skb);
- input_queue_head_incr(oldsd);
+ rps_input_queue_head_incr(oldsd);
}
return 0;
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
` (5 preceding siblings ...)
2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location Eric Dumazet
` (2 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
process_backlog() can batch increments of sd->input_queue_head,
saving some memory bandwidth.
Also add READ_ONCE()/WRITE_ONCE() annotations around
sd->input_queue_head accesses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/rps.h | 9 +++++++--
net/core/dev.c | 13 ++++++++-----
2 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/include/net/rps.h b/include/net/rps.h
index 10ca25731c1ef766715fe7ee415ad0b71ec643a8..a93401d23d66e45210acc73f0326087813b69d59 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -138,11 +138,16 @@ static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
#endif
}
-static inline void rps_input_queue_head_incr(struct softnet_data *sd)
+static inline void rps_input_queue_head_add(struct softnet_data *sd, int val)
{
#ifdef CONFIG_RPS
- sd->input_queue_head++;
+ WRITE_ONCE(sd->input_queue_head, sd->input_queue_head + val);
#endif
}
+static inline void rps_input_queue_head_incr(struct softnet_data *sd)
+{
+ rps_input_queue_head_add(sd, 1);
+}
+
#endif /* _NET_RPS_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 79073bbc9a644049cacf8433310f4641745049e9..818699dea9d7040ee74532ccdebf01c4fd6887cc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4528,7 +4528,7 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
out:
#endif
rflow->last_qtail =
- per_cpu(softnet_data, next_cpu).input_queue_head;
+ READ_ONCE(per_cpu(softnet_data, next_cpu).input_queue_head);
}
rflow->cpu = next_cpu;
@@ -4610,7 +4610,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
*/
if (unlikely(tcpu != next_cpu) &&
(tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
- ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
+ ((int)(READ_ONCE(per_cpu(softnet_data, tcpu).input_queue_head) -
READ_ONCE(rflow->last_qtail))) >= 0)) {
tcpu = next_cpu;
rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
@@ -4665,7 +4665,7 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
rflow = &flow_table->flows[flow_id];
cpu = READ_ONCE(rflow->cpu);
if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
- ((int)(per_cpu(softnet_data, cpu).input_queue_head -
+ ((int)(READ_ONCE(per_cpu(softnet_data, cpu).input_queue_head) -
READ_ONCE(rflow->last_qtail)) <
(int)(10 * flow_table->mask)))
expire = false;
@@ -6045,9 +6045,10 @@ static int process_backlog(struct napi_struct *napi, int quota)
rcu_read_lock();
__netif_receive_skb(skb);
rcu_read_unlock();
- rps_input_queue_head_incr(sd);
- if (++work >= quota)
+ if (++work >= quota) {
+ rps_input_queue_head_add(sd, work);
return work;
+ }
}
@@ -6070,6 +6071,8 @@ static int process_backlog(struct napi_struct *napi, int quota)
backlog_unlock_irq_enable(sd);
}
+ if (work)
+ rps_input_queue_head_add(sd, work);
return work;
}
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
` (6 preceding siblings ...)
2024-03-29 15:42 ` [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper Eric Dumazet
@ 2024-03-29 15:42 ` Eric Dumazet
2024-03-30 16:11 ` [PATCH v2 net-next 0/8] net: rps: misc changes Jason Xing
2024-04-02 4:31 ` Jakub Kicinski
9 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-29 15:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Commit 14d898f3c1b3 ("dev: Move received_rps counter next
to RPS members in softnet data") was unfortunate:
received_rps is dirtied by a cpu and never read by other
cpus in fast path.
Its presence in the hot RPS cache line (shared by many cpus)
is hurting RPS/RFS performance.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netdevice.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 14f19cc2616452d7e6afbbaa52f8ad3e61a419e9..274d8db48b4858c70b43ea4628544e924ba6a263 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3203,6 +3203,7 @@ struct softnet_data {
struct softnet_data *rps_ipi_list;
#endif
+ unsigned int received_rps;
bool in_net_rx_action;
bool in_napi_threaded_poll;
@@ -3235,7 +3236,6 @@ struct softnet_data {
unsigned int cpu;
unsigned int input_queue_tail;
#endif
- unsigned int received_rps;
struct sk_buff_head input_pkt_queue;
struct napi_struct backlog;
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
@ 2024-03-30 14:46 ` Jason Xing
2024-03-30 16:01 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Jason Xing @ 2024-03-30 14:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet
Hello Eric,
On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
>
> input_queue_tail_incr_save() is incrementing the sd queue_tail
> and save it in the flow last_qtail.
>
> Two issues here :
>
> - no lock protects the write on last_qtail, we should use appropriate
> annotations.
>
> - We can perform this write after releasing the per-cpu backlog lock,
> to decrease this lock hold duration (move away the cache line miss)
>
> Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> while adding rps_ prefix to better reflect their role.
>
> v2: Fixed a build issue (Jakub and kernel build bots)
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/netdevice.h | 15 ---------------
> include/net/rps.h | 23 +++++++++++++++++++++++
> net/core/dev.c | 20 ++++++++++++--------
> 3 files changed, 35 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3249,21 +3249,6 @@ struct softnet_data {
> call_single_data_t defer_csd;
> };
>
> -static inline void input_queue_head_incr(struct softnet_data *sd)
> -{
> -#ifdef CONFIG_RPS
> - sd->input_queue_head++;
> -#endif
> -}
> -
> -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> - unsigned int *qtail)
> -{
> -#ifdef CONFIG_RPS
> - *qtail = ++sd->input_queue_tail;
> -#endif
> -}
> -
> DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
>
> static inline int dev_recursion_level(void)
> diff --git a/include/net/rps.h b/include/net/rps.h
> index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> --- a/include/net/rps.h
> +++ b/include/net/rps.h
> @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> #endif
> }
>
> +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> +{
> +#ifdef CONFIG_RPS
> + return ++sd->input_queue_tail;
> +#else
> + return 0;
> +#endif
> +}
> +
> +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> +{
> +#ifdef CONFIG_RPS
> + WRITE_ONCE(*dest, tail);
> +#endif
> +}
I wonder if we should also call this new helper to WRITE_ONCE
last_qtail in the set_rps_cpu()?
Thanks,
Jason
> +
> +static inline void rps_input_queue_head_incr(struct softnet_data *sd)
> +{
> +#ifdef CONFIG_RPS
> + sd->input_queue_head++;
> +#endif
> +}
> +
> #endif /* _NET_RPS_H */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0a8ccb0451c30a39f8f8b45d26b7e5548b8bfba4..79073bbc9a644049cacf8433310f4641745049e9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4611,7 +4611,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
> if (unlikely(tcpu != next_cpu) &&
> (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
> ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
> - rflow->last_qtail)) >= 0)) {
> + READ_ONCE(rflow->last_qtail))) >= 0)) {
> tcpu = next_cpu;
> rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
> }
> @@ -4666,7 +4666,7 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
> cpu = READ_ONCE(rflow->cpu);
> if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
> ((int)(per_cpu(softnet_data, cpu).input_queue_head -
> - rflow->last_qtail) <
> + READ_ONCE(rflow->last_qtail)) <
> (int)(10 * flow_table->mask)))
> expire = false;
> }
> @@ -4801,6 +4801,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
> unsigned long flags;
> unsigned int qlen;
> int max_backlog;
> + u32 tail;
>
> reason = SKB_DROP_REASON_DEV_READY;
> if (!netif_running(skb->dev))
> @@ -4825,8 +4826,11 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
> napi_schedule_rps(sd);
> }
> __skb_queue_tail(&sd->input_pkt_queue, skb);
> - input_queue_tail_incr_save(sd, qtail);
> + tail = rps_input_queue_tail_incr(sd);
> backlog_unlock_irq_restore(sd, &flags);
> +
> + /* save the tail outside of the critical section */
> + rps_input_queue_tail_save(qtail, tail);
> return NET_RX_SUCCESS;
> }
>
> @@ -5904,7 +5908,7 @@ static void flush_backlog(struct work_struct *work)
> if (skb->dev->reg_state == NETREG_UNREGISTERING) {
> __skb_unlink(skb, &sd->input_pkt_queue);
> dev_kfree_skb_irq(skb);
> - input_queue_head_incr(sd);
> + rps_input_queue_head_incr(sd);
> }
> }
> backlog_unlock_irq_enable(sd);
> @@ -5913,7 +5917,7 @@ static void flush_backlog(struct work_struct *work)
> if (skb->dev->reg_state == NETREG_UNREGISTERING) {
> __skb_unlink(skb, &sd->process_queue);
> kfree_skb(skb);
> - input_queue_head_incr(sd);
> + rps_input_queue_head_incr(sd);
> }
> }
> local_bh_enable();
> @@ -6041,7 +6045,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
> rcu_read_lock();
> __netif_receive_skb(skb);
> rcu_read_unlock();
> - input_queue_head_incr(sd);
> + rps_input_queue_head_incr(sd);
> if (++work >= quota)
> return work;
>
> @@ -11455,11 +11459,11 @@ static int dev_cpu_dead(unsigned int oldcpu)
> /* Process offline CPU's input_pkt_queue */
> while ((skb = __skb_dequeue(&oldsd->process_queue))) {
> netif_rx(skb);
> - input_queue_head_incr(oldsd);
> + rps_input_queue_head_incr(oldsd);
> }
> while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
> netif_rx(skb);
> - input_queue_head_incr(oldsd);
> + rps_input_queue_head_incr(oldsd);
> }
>
> return 0;
> --
> 2.44.0.478.gd926399ef9-goog
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
2024-03-30 14:46 ` Jason Xing
@ 2024-03-30 16:01 ` Eric Dumazet
2024-03-30 16:07 ` Jason Xing
2024-04-12 15:09 ` Jason Xing
0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-03-30 16:01 UTC (permalink / raw)
To: Jason Xing
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet
On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > and save it in the flow last_qtail.
> >
> > Two issues here :
> >
> > - no lock protects the write on last_qtail, we should use appropriate
> > annotations.
> >
> > - We can perform this write after releasing the per-cpu backlog lock,
> > to decrease this lock hold duration (move away the cache line miss)
> >
> > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > while adding rps_ prefix to better reflect their role.
> >
> > v2: Fixed a build issue (Jakub and kernel build bots)
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> > include/linux/netdevice.h | 15 ---------------
> > include/net/rps.h | 23 +++++++++++++++++++++++
> > net/core/dev.c | 20 ++++++++++++--------
> > 3 files changed, 35 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > call_single_data_t defer_csd;
> > };
> >
> > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > -{
> > -#ifdef CONFIG_RPS
> > - sd->input_queue_head++;
> > -#endif
> > -}
> > -
> > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > - unsigned int *qtail)
> > -{
> > -#ifdef CONFIG_RPS
> > - *qtail = ++sd->input_queue_tail;
> > -#endif
> > -}
> > -
> > DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> >
> > static inline int dev_recursion_level(void)
> > diff --git a/include/net/rps.h b/include/net/rps.h
> > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > --- a/include/net/rps.h
> > +++ b/include/net/rps.h
> > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > #endif
> > }
> >
> > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > +{
> > +#ifdef CONFIG_RPS
> > + return ++sd->input_queue_tail;
> > +#else
> > + return 0;
> > +#endif
> > +}
> > +
> > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > +{
> > +#ifdef CONFIG_RPS
> > + WRITE_ONCE(*dest, tail);
> > +#endif
> > +}
>
> I wonder if we should also call this new helper to WRITE_ONCE
> last_qtail in the set_rps_cpu()?
>
Absolutely, I have another patch series to address remaining races
(rflow->cpu, rflow->filter ...)
I chose to make a small one, to ease reviews.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
@ 2024-03-30 16:07 ` Jason Xing
0 siblings, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-03-30 16:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet
On Sun, Mar 31, 2024 at 12:01 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > > annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > > to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > > include/linux/netdevice.h | 15 ---------------
> > > include/net/rps.h | 23 +++++++++++++++++++++++
> > > net/core/dev.c | 20 ++++++++++++--------
> > > 3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > > call_single_data_t defer_csd;
> > > };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > - unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > > DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > > static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > > #endif
> > > }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + return ++sd->input_queue_tail;
> > > +#else
> > > + return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.
Great. Now I have no more questions :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
@ 2024-03-30 16:07 ` Jason Xing
0 siblings, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-03-30 16:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 15737 bytes --]
On Sun, Mar 31, 2024 at 12:01â¯AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47â¯PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43â¯PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > > annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > > to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > > include/linux/netdevice.h | 15 ---------------
> > > include/net/rps.h | 23 +++++++++++++++++++++++
> > > net/core/dev.c | 20 ++++++++++++--------
> > > 3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > > call_single_data_t defer_csd;
> > > };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > - unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > > DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > > static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > > #endif
> > > }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + return ++sd->input_queue_tail;
> > > +#else
> > > + return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.
Great. Now I have no more questions :)
Thanks,
Jason
X-sender: <netdev+bounces-83512-steffen.klassert=secunet.com@vger.kernel.org>
X-Receiver: <steffen.klassert@secunet.com> ORCPT=rfc822;steffen.klassert@secunet.com
X-CreatedBy: MSExchange15
X-HeloDomain: a.mx.secunet.com
X-ExtendedProps: BQBjAAoA8pGmlidQ3AgFAGEACAABAAAABQA3AAIAAA8APAAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuTWFpbFJlY2lwaWVudC5Pcmdhbml6YXRpb25TY29wZREAAAAAAAAAAAAAAAAAAAAAAA=X-Source: SMTP:Default MBX-ESSEN-02
X-SourceIPAddress: 62.96.220.36
X-EndOfInjectedXHeaders: 12056
Received: from cas-essen-02.secunet.de (10.53.40.202) by
mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server
(version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
15.1.2507.37; Sat, 30 Mar 2024 17:08:31 +0100
Received: from a.mx.secunet.com (62.96.220.36) by cas-essen-02.secunet.de
(10.53.40.202) with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend
Transport; Sat, 30 Mar 2024 17:08:31 +0100
Received: from localhost (localhost [127.0.0.1])
by a.mx.secunet.com (Postfix) with ESMTP id 650D92087D
for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 17:08:31 +0100 (CET)
X-Virus-Scanned: by secunet
X-Spam-Flag: NO
X-Spam-Score: -2.749
X-Spam-Level:
X-Spam-Status: No, score=-2.749 tagged_above=-999 required=2.1
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
DKIM_VALID_AU=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001,
FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249,
MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_NONE=-0.0001,
SPF_HELO_NONE=0.001, SPF_PASS=-0.001]
autolearn=unavailable autolearn_force=no
Authentication-Results: a.mx.secunet.com (amavisd-new);
dkim=pass (2048-bit key) header.d=gmail.com
Received: from a.mx.secunet.com ([127.0.0.1])
by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id Xiz4JgMEUFHd for <steffen.klassert@secunet.com>;
Sat, 30 Mar 2024 17:08:30 +0100 (CET)
Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip\x147.75.48.161; helo=sy.mirrors.kernel.org; envelope-from=netdev+bounces-83512-steffen.klassert=secunet.com@vger.kernel.org; receiver=steffen.klassert@secunet.com
DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 79CED20758
Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org [147.75.48.161])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by a.mx.secunet.com (Postfix) with ESMTPS id 79CED20758
for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 17:08:29 +0100 (CET)
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by sy.mirrors.kernel.org (Postfix) with ESMTPS id D770AB21F24
for <steffen.klassert@secunet.com>; Sat, 30 Mar 2024 16:08:24 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
by smtp.subspace.kernel.org (Postfix) with ESMTP id 1357B3A1D8;
Sat, 30 Mar 2024 16:08:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lbJ15pgk"
X-Original-To: netdev@vger.kernel.org
Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54])
(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
(No client certificate requested)
by smtp.subspace.kernel.org (Postfix) with ESMTPS id C524F335D8
for <netdev@vger.kernel.org>; Sat, 30 Mar 2024 16:08:16 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip 9.85.218.54
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
t\x1711814898; cv=none; b=TtNtvnRM4mLHxpsyuSf2oUJ4eUHilE+lOUFYTkGUrBLVItc17kCESpcfGtrxRgYLCIruzsDy2kCRV4l0exBhyX2nSmWPvwb6KAROqkBsT8lZi1KqUNBI1tAml0RTmdUP3MyX8zS6X0RMwjNxxE4FdPsrOpnVrjxN+0yzymEAs4YARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
s=arc-20240116; t\x1711814898; c=relaxed/simple;
bh=aZ+ahDgFULwD6gTd+XIhN5rHorld6C4W6kMFpEbq40E=;
h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject:
To:Cc:Content-Type; b=p/Bid6CqnEu/kbI4uE+IoSDwyUpAzD6f1enGfEqH2f2Gczw99UJZjg2xkBjvMRLRX+eeKJ4CL6/bs00gzWZMnNROzrPYeBVpCHQkcBZrk7Y3/6IrOKn5I8TZJroZK3X8kC1xo6V3ivodwx5O4gCj591YGDTy2AeZ/O57CTJMVE0ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lbJ15pgk; arc=none smtp.client-ip 9.85.218.54
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a4e4e7057fbso46227466b.0
for <netdev@vger.kernel.org>; Sat, 30 Mar 2024 09:08:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s 230601; t\x1711814895; x\x1712419695; darn=vger.kernel.org;
h=content-transfer-encoding:cc:to:subject:message-id:date:from
:in-reply-to:references:mime-version:from:to:cc:subject:date
:message-id:reply-to;
bh=tqYtiFl3gMtiR/slz7/31AGueSg/Uoe5u6YbVitMFwQ=;
b=lbJ15pgkjoFIosMOnM15oDV4ZjsVOR+mLG7MpqWIe/OxlPUzLHiUCKsp9rT7wvG1/v
+FWT5/m8KnaR7+HlmP/96uL5L0YSEtfv4iNb5zaSGe31yGn2ybLrNPbKTllri+aecnrz
5g0ch8eiMFa1/lZSV8IhhaAXQ5qtQM10Ark4YR/Uyg0uDcytyz0u+TNvu6yDei+2Fi8H
tfz0DE/lSBnTJ59PqGsCwpFtSYcpirfgAukTfu/EH6DBDOXXHGq37KCLeq0QzHgLHnKd
zBOpR0SHS4OUA+5bnxBqzOkJyRwV1+l6Vh4lOWMUr00oQGYQg+j9azLILp8x0wrOMLU5
MnDA=X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d\x1e100.net; s 230601; t\x1711814895; x\x1712419695;
h=content-transfer-encoding:cc:to:subject:message-id:date:from
:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
:subject:date:message-id:reply-to;
bh=tqYtiFl3gMtiR/slz7/31AGueSg/Uoe5u6YbVitMFwQ=;
b=u/CA6LmMHt4V6bE8x8M7s37MmylQALCtHq4+mceDQZAqNd4/fAxZPgDOZl6GVJR59W
gsiGrrlsE73I4X7lUvt2DB8/WNP+8GmqitAd5O3Kv1sVowW6igHWTevgTIOL7hNfcHE7
C9WkfXWXhlqxz0RDcQv6PpVuwihJtGNHexUNRcc2prg2EvRQgD/JvUU9CT9jT4USy3fy
5LiLJGbdeyMnYn2x5wn/H72miFHTUI7ZjvU0ftP4du63VN8sIyL0ZcXBX6ehQEkTn1+r
iOMpJwiwtWWTeL7+mPbfqUNXqqG7lVjo/jbWeqeZqEQYj3fUCQVFyN013hSM2C9MGF20
CrhQ=X-Forwarded-Encrypted: i=1; AJvYcCWTtXU+6pEXQJqrUGJAmKipoLc4xvGdwzemL7yJ/6GhHzxpfU+6n9utKv0EVZFUkG7jmZs4rHIzmrUaxO1F/1uCg7WgeUqR
X-Gm-Message-State: AOJu0YxLq3mAZHis+WeS7/yyLMaRyTmizM9mxWipMwQ5aIcfR8N4nBSE
c1l9c8aoYfwDbWd6s2y7sTXtt/bCVy3m/QycqxaUn4FHoiTgq3VchaGc3Fo+4NbUzjlStDJy17X
08ncnUrtnyGt+5GX/AB+9H2+N10QX-Google-Smtp-Source: AGHT+IGd2O8mKMifiEWtXwxmvNbMq1I5C6mjN0KPpO4vKSiTRx2qpOuLgKan1Goru7+nEAF1Dq82X61eIrdjqwvC7yUX-Received: by 2002:a17:907:86a0:b0:a4e:5abe:8164 with SMTP id
qa32-20020a17090786a000b00a4e5abe8164mr115987ejc.59.1711814894828; Sat, 30
Mar 2024 09:08:14 -0700 (PDT)
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
References: <20240329154225.349288-1-edumazet@google.com> <20240329154225.349288-7-edumazet@google.com>
<CAL+tcoBa1g1Ps5V_P1TqVtGWD482AvSy=wgvvUMT3RCHH+x2=Q@mail.gmail.com> <CANn89iJKQuSLUivtGQRNxA2Xd3t8n68GQ_BAz2dp28eU9wzVcg@mail.gmail.com>
In-Reply-To: <CANn89iJKQuSLUivtGQRNxA2Xd3t8n68GQ_BAz2dp28eU9wzVcg@mail.gmail.com>
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Sun, 31 Mar 2024 00:07:37 +0800
Message-ID: <CAL+tcoBqrWX0AMFfGOJoTTT6-Mgc7HqmCU0-bzLH7rrxBGqEng@mail.gmail.com>
Subject: Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, eric.dumazet@gmail.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Return-Path: netdev+bounces-83512-steffen.klassert=secunet.com@vger.kernel.org
X-MS-Exchange-Organization-OriginalArrivalTime: 30 Mar 2024 16:08:31.4465
(UTC)
X-MS-Exchange-Organization-Network-Message-Id: 5dfc8bdd-92fb-49c1-dd23-08dc50d3a4e3
X-MS-Exchange-Organization-OriginalClientIPAddress: 62.96.220.36
X-MS-Exchange-Organization-OriginalServerIPAddress: 10.53.40.202
X-MS-Exchange-Organization-Cross-Premises-Headers-Processed: cas-essen-02.secunet.de
X-MS-Exchange-Organization-OrderedPrecisionLatencyInProgress: LSRVÊs-essen-02.secunet.de:TOTAL-FE=0.010|SMR=0.010(SMRPI=0.007(SMRPI-FrontendProxyAgent=0.007));2024-03-30T16:08:31.456Z
X-MS-Exchange-Forest-ArrivalHubServer: mbx-essen-02.secunet.de
X-MS-Exchange-Organization-AuthSource: cas-essen-02.secunet.de
X-MS-Exchange-Organization-AuthAs: Anonymous
X-MS-Exchange-Organization-FromEntityHeader: Internet
X-MS-Exchange-Organization-OriginalSize: 11653
On Sun, Mar 31, 2024 at 12:01â¯AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47â¯PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43â¯PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > > annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > > to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > > include/linux/netdevice.h | 15 ---------------
> > > include/net/rps.h | 23 +++++++++++++++++++++++
> > > net/core/dev.c | 20 ++++++++++++--------
> > > 3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > > call_single_data_t defer_csd;
> > > };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > - unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > > DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > > static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > > #endif
> > > }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + return ++sd->input_queue_tail;
> > > +#else
> > > + return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.
Great. Now I have no more questions :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/8] net: rps: misc changes
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
` (7 preceding siblings ...)
2024-03-29 15:42 ` [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location Eric Dumazet
@ 2024-03-30 16:11 ` Jason Xing
2024-04-02 4:31 ` Jakub Kicinski
9 siblings, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-03-30 16:11 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet
On Fri, Mar 29, 2024 at 11:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Make RPS/RFS a bit more efficient with better cache locality
> and heuristics.
>
> Aso shrink include/linux/netdevice.h a bit.
>
> v2: fixed a build issue in patch 6/8 with CONFIG_RPS=n
> (Jakub and kernel build bots)
>
> Eric Dumazet (8):
> net: move kick_defer_list_purge() to net/core/dev.h
> net: move dev_xmit_recursion() helpers to net/core/dev.h
> net: enqueue_to_backlog() change vs not running device
> net: make softnet_data.dropped an atomic_t
> net: enqueue_to_backlog() cleanup
> net: rps: change input_queue_tail_incr_save()
> net: rps: add rps_input_queue_head_add() helper
> net: rps: move received_rps field to a better location
>
> include/linux/netdevice.h | 38 ++------------------
> include/net/rps.h | 28 +++++++++++++++
> net/core/dev.c | 73 ++++++++++++++++++++++-----------------
> net/core/dev.h | 23 ++++++++++--
> net/core/net-procfs.c | 3 +-
> 5 files changed, 95 insertions(+), 70 deletions(-)
For this series, feel free to add:
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Thanks!
>
> --
> 2.44.0.478.gd926399ef9-goog
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/8] net: rps: misc changes
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
` (8 preceding siblings ...)
2024-03-30 16:11 ` [PATCH v2 net-next 0/8] net: rps: misc changes Jason Xing
@ 2024-04-02 4:31 ` Jakub Kicinski
9 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2024-04-02 4:31 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet
On Fri, 29 Mar 2024 15:42:17 +0000 Eric Dumazet wrote:
> Make RPS/RFS a bit more efficient with better cache locality
> and heuristics.
>
> Aso shrink include/linux/netdevice.h a bit.
>
> v2: fixed a build issue in patch 6/8 with CONFIG_RPS=n
> (Jakub and kernel build bots)
FTR already applied by DaveM last night.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save()
2024-03-30 16:01 ` Eric Dumazet
2024-03-30 16:07 ` Jason Xing
@ 2024-04-12 15:09 ` Jason Xing
1 sibling, 0 replies; 16+ messages in thread
From: Jason Xing @ 2024-04-12 15:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet
On Sun, Mar 31, 2024 at 12:01 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 30, 2024 at 3:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Fri, Mar 29, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > input_queue_tail_incr_save() is incrementing the sd queue_tail
> > > and save it in the flow last_qtail.
> > >
> > > Two issues here :
> > >
> > > - no lock protects the write on last_qtail, we should use appropriate
> > > annotations.
> > >
> > > - We can perform this write after releasing the per-cpu backlog lock,
> > > to decrease this lock hold duration (move away the cache line miss)
> > >
> > > Also move input_queue_head_incr() and rps helpers to include/net/rps.h,
> > > while adding rps_ prefix to better reflect their role.
> > >
> > > v2: Fixed a build issue (Jakub and kernel build bots)
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > > include/linux/netdevice.h | 15 ---------------
> > > include/net/rps.h | 23 +++++++++++++++++++++++
> > > net/core/dev.c | 20 ++++++++++++--------
> > > 3 files changed, 35 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1c31cd2691d32064613836141fbdeeebc831b21f..14f19cc2616452d7e6afbbaa52f8ad3e61a419e9 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3249,21 +3249,6 @@ struct softnet_data {
> > > call_single_data_t defer_csd;
> > > };
> > >
> > > -static inline void input_queue_head_incr(struct softnet_data *sd)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - sd->input_queue_head++;
> > > -#endif
> > > -}
> > > -
> > > -static inline void input_queue_tail_incr_save(struct softnet_data *sd,
> > > - unsigned int *qtail)
> > > -{
> > > -#ifdef CONFIG_RPS
> > > - *qtail = ++sd->input_queue_tail;
> > > -#endif
> > > -}
> > > -
> > > DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> > >
> > > static inline int dev_recursion_level(void)
> > > diff --git a/include/net/rps.h b/include/net/rps.h
> > > index 7660243e905b92651a41292e04caf72c5f12f26e..10ca25731c1ef766715fe7ee415ad0b71ec643a8 100644
> > > --- a/include/net/rps.h
> > > +++ b/include/net/rps.h
> > > @@ -122,4 +122,27 @@ static inline void sock_rps_record_flow(const struct sock *sk)
> > > #endif
> > > }
> > >
> > > +static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + return ++sd->input_queue_tail;
> > > +#else
> > > + return 0;
> > > +#endif
> > > +}
> > > +
> > > +static inline void rps_input_queue_tail_save(u32 *dest, u32 tail)
> > > +{
> > > +#ifdef CONFIG_RPS
> > > + WRITE_ONCE(*dest, tail);
> > > +#endif
> > > +}
> >
> > I wonder if we should also call this new helper to WRITE_ONCE
> > last_qtail in the set_rps_cpu()?
> >
>
> Absolutely, I have another patch series to address remaining races
> (rflow->cpu, rflow->filter ...)
>
> I chose to make a small one, to ease reviews.
Hello Eric,
I wonder if you already have a patchset to change those three members
in struct rps_dev_flow? I looked through this part and found it's not
that complicated. So if not, I can do it :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-04-12 15:10 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29 15:42 [PATCH v2 net-next 0/8] net: rps: misc changes Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 1/8] net: move kick_defer_list_purge() to net/core/dev.h Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 2/8] net: move dev_xmit_recursion() helpers " Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 3/8] net: enqueue_to_backlog() change vs not running device Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 4/8] net: make softnet_data.dropped an atomic_t Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 5/8] net: enqueue_to_backlog() cleanup Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 6/8] net: rps: change input_queue_tail_incr_save() Eric Dumazet
2024-03-30 14:46 ` Jason Xing
2024-03-30 16:01 ` Eric Dumazet
2024-03-30 16:07 ` Jason Xing
2024-03-30 16:07 ` Jason Xing
2024-04-12 15:09 ` Jason Xing
2024-03-29 15:42 ` [PATCH v2 net-next 7/8] net: rps: add rps_input_queue_head_add() helper Eric Dumazet
2024-03-29 15:42 ` [PATCH v2 net-next 8/8] net: rps: move received_rps field to a better location Eric Dumazet
2024-03-30 16:11 ` [PATCH v2 net-next 0/8] net: rps: misc changes Jason Xing
2024-04-02 4:31 ` Jakub Kicinski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.