All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups.
@ 2022-02-02 12:28 Sebastian Andrzej Siewior
  2022-02-02 12:28 ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal() Sebastian Andrzej Siewior
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-02 12:28 UTC (permalink / raw)
  To: bpf, netdev
  Cc: David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

Hi,

this series removes or replaces preempt_disable() and local_irq_save()
sections which are problematic on PREEMPT_RT.
Patch 3 makes netif_rx() work from any context after I found suggestions
for it in an old thread. Should that work, then the context-specific
variants could be removed.

Sebastian


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal().
  2022-02-02 12:28 [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Sebastian Andrzej Siewior
@ 2022-02-02 12:28 ` Sebastian Andrzej Siewior
  2022-02-02 17:10   ` Eric Dumazet
  2022-02-02 12:28 ` [PATCH net-next 2/4] net: dev: Remove get_cpu() " Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-02 12:28 UTC (permalink / raw)
  To: bpf, netdev
  Cc: David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Sebastian Andrzej Siewior

The preempt_disable() and rcu_disable() section was introduced in commit
   bbbe211c295ff ("net: rcu lock and preempt disable missing around generic xdp")

The backtrace shows that bottom halves were disabled and so the usage of
smp_processor_id() would not trigger a warning.
The "suspicious RCU usage" warning was triggered because
rcu_dereference() was not used in rcu_read_lock() section (only
rcu_read_lock_bh()). A rcu_read_lock() is sufficient.

Remove the preempt_disable() statement which is not needed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/dev.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1baab07820f65..325b70074f4ae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4796,7 +4796,6 @@ static int netif_rx_internal(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu;
 
-		preempt_disable();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -4806,7 +4805,6 @@ static int netif_rx_internal(struct sk_buff *skb)
 		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 
 		rcu_read_unlock();
-		preempt_enable();
 	} else
 #endif
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 2/4] net: dev: Remove get_cpu() in netif_rx_internal().
  2022-02-02 12:28 [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Sebastian Andrzej Siewior
  2022-02-02 12:28 ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal() Sebastian Andrzej Siewior
@ 2022-02-02 12:28 ` Sebastian Andrzej Siewior
  2022-02-02 17:14   ` Eric Dumazet
  2022-02-03 12:14   ` Toke Høiland-Jørgensen
  2022-02-02 12:28 ` [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-02 12:28 UTC (permalink / raw)
  To: bpf, netdev
  Cc: David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Sebastian Andrzej Siewior

The get_cpu() usage was added in commit
    b0e28f1effd1d ("net: netif_rx() must disable preemption")

because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemtion
causing a warning in smp_processor_id(). The function netif_rx() should
only be invoked from an interrupt context which implies disabled
preemption. The commit
   e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()")

was addressing this and replaced netif_rx() with in netif_rx_ni() in
ip_dev_loopback_xmit().

Based on the discussion on the list, the former patch (b0e28f1effd1d)
should not have been applied only the latter (e30b38c298b55).

Remove get_cpu() since the function is supossed to be invoked from
context with stable per-CPU pointers (either by disabling preemption or
software interrupts).

Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/dev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 325b70074f4ae..0d13340ed4054 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4810,8 +4810,7 @@ static int netif_rx_internal(struct sk_buff *skb)
 	{
 		unsigned int qtail;
 
-		ret = enqueue_to_backlog(skb, get_cpu(), &qtail);
-		put_cpu();
+		ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail);
 	}
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-02 12:28 [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Sebastian Andrzej Siewior
  2022-02-02 12:28 ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal() Sebastian Andrzej Siewior
  2022-02-02 12:28 ` [PATCH net-next 2/4] net: dev: Remove get_cpu() " Sebastian Andrzej Siewior
@ 2022-02-02 12:28 ` Sebastian Andrzej Siewior
  2022-02-02 16:50   ` Jakub Kicinski
  2022-02-02 17:43   ` Eric Dumazet
  2022-02-02 12:28 ` [PATCH net-next 4/4] net: dev: Make rps_lock() disable interrupts Sebastian Andrzej Siewior
  2022-02-02 16:14 ` [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Jakub Kicinski
  4 siblings, 2 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-02 12:28 UTC (permalink / raw)
  To: bpf, netdev
  Cc: David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Sebastian Andrzej Siewior

Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
work in all contexts and get rid of netif_rx_ni()". Eric agreed and
pointed out that modern devices should use netif_receive_skb() to avoid
the overhead.
In the meantime someone added another variant, netif_rx_any_context(),
which behaves as suggested.

netif_rx() must be invoked with disabled bottom halves to ensure that
pending softirqs, which were raised within the function, are handled.
netif_rx_ni() can be invoked only from process context (bottom halves
must be enabled) because the function handles pending softirqs without
checking if bottom halves were disabled or not.
netif_rx_any_context() invokes on the former functions by checking
in_interrupts().

netif_rx() could be taught to handle both cases (disabled and enabled
bottom halves) by simply disabling bottom halves while invoking
netif_rx_internal(). The local_bh_enable() invocation will then invoke
pending softirqs only if the BH-disable counter drops to zero.

Add a local_bh_disable() section in netif_rx() to ensure softirqs are
handled if needed. Make netif_rx_ni() and netif_rx_any_context() invoke
netif_rx() so they can be removed once they are no more users left.

Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/netdevice.h  | 13 +++++++++++--
 include/trace/events/net.h | 14 --------------
 net/core/dev.c             | 34 ++--------------------------------
 3 files changed, 13 insertions(+), 48 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e490b84732d16..4086f312f814e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3669,8 +3669,17 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
 int netif_rx(struct sk_buff *skb);
-int netif_rx_ni(struct sk_buff *skb);
-int netif_rx_any_context(struct sk_buff *skb);
+
+static inline int netif_rx_ni(struct sk_buff *skb)
+{
+	return netif_rx(skb);
+}
+
+static inline int netif_rx_any_context(struct sk_buff *skb)
+{
+	return netif_rx(skb);
+}
+
 int netif_receive_skb(struct sk_buff *skb);
 int netif_receive_skb_core(struct sk_buff *skb);
 void netif_receive_skb_list_internal(struct list_head *head);
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 78c448c6ab4c5..032b431b987b6 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -260,13 +260,6 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_entry,
 	TP_ARGS(skb)
 );
 
-DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry,
-
-	TP_PROTO(const struct sk_buff *skb),
-
-	TP_ARGS(skb)
-);
-
 DECLARE_EVENT_CLASS(net_dev_rx_exit_template,
 
 	TP_PROTO(int ret),
@@ -312,13 +305,6 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_exit,
 	TP_ARGS(ret)
 );
 
-DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_ni_exit,
-
-	TP_PROTO(int ret),
-
-	TP_ARGS(ret)
-);
-
 DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit,
 
 	TP_PROTO(int ret),
diff --git a/net/core/dev.c b/net/core/dev.c
index 0d13340ed4054..f43d0580fa11d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4834,47 +4834,17 @@ int netif_rx(struct sk_buff *skb)
 {
 	int ret;
 
+	local_bh_disable();
 	trace_netif_rx_entry(skb);
 
 	ret = netif_rx_internal(skb);
 	trace_netif_rx_exit(ret);
+	local_bh_enable();
 
 	return ret;
 }
 EXPORT_SYMBOL(netif_rx);
 
-int netif_rx_ni(struct sk_buff *skb)
-{
-	int err;
-
-	trace_netif_rx_ni_entry(skb);
-
-	preempt_disable();
-	err = netif_rx_internal(skb);
-	if (local_softirq_pending())
-		do_softirq();
-	preempt_enable();
-	trace_netif_rx_ni_exit(err);
-
-	return err;
-}
-EXPORT_SYMBOL(netif_rx_ni);
-
-int netif_rx_any_context(struct sk_buff *skb)
-{
-	/*
-	 * If invoked from contexts which do not invoke bottom half
-	 * processing either at return from interrupt or when softrqs are
-	 * reenabled, use netif_rx_ni() which invokes bottomhalf processing
-	 * directly.
-	 */
-	if (in_interrupt())
-		return netif_rx(skb);
-	else
-		return netif_rx_ni(skb);
-}
-EXPORT_SYMBOL(netif_rx_any_context);
-
 static __latent_entropy void net_tx_action(struct softirq_action *h)
 {
 	struct softnet_data *sd = this_cpu_ptr(&softnet_data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 4/4] net: dev: Make rps_lock() disable interrupts.
  2022-02-02 12:28 [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2022-02-02 12:28 ` [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context Sebastian Andrzej Siewior
@ 2022-02-02 12:28 ` Sebastian Andrzej Siewior
  2022-02-02 16:47   ` Jakub Kicinski
  2022-02-02 16:14 ` [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Jakub Kicinski
  4 siblings, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-02 12:28 UTC (permalink / raw)
  To: bpf, netdev
  Cc: David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Sebastian Andrzej Siewior

Disabling interrupts and in the RPS case locking input_pkt_queue is
split into local_irq_disable() and optional spin_lock().

This breaks on PREEMPT_RT because the spinlock_t typed lock can not be
acquired with disabled interrupts.
The sections in which the lock is acquired is usually short in a sense that it
is not causing long und unbounded latiencies. One exception is the
skb_flow_limit() invocation which may invoke a BPF program (and may
require sleeping locks).

By moving local_irq_disable() + spin_lock() into rps_lock(), we can keep
interrupts disabled on !PREEMPT_RT and enabled on PREEMPT_RT kernels.
Without RPS on a PREEMPT_RT kernel, the needed synchronisation happens
as part of local_bh_disable() on the local CPU.
Since interrupts remain enabled, enqueue_to_backlog() needs to disable
interrupts for ____napi_schedule().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/dev.c | 72 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f43d0580fa11d..e9ea56daee2f0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -216,18 +216,38 @@ static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)
 	return &net->dev_index_head[ifindex & (NETDEV_HASHENTRIES - 1)];
 }
 
-static inline void rps_lock(struct softnet_data *sd)
+static inline void rps_lock_irqsave(struct softnet_data *sd,
+				    unsigned long *flags)
 {
-#ifdef CONFIG_RPS
-	spin_lock(&sd->input_pkt_queue.lock);
-#endif
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_lock_irqsave(&sd->input_pkt_queue.lock, *flags);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_save(*flags);
 }
 
-static inline void rps_unlock(struct softnet_data *sd)
+static inline void rps_lock_irq_disable(struct softnet_data *sd)
 {
-#ifdef CONFIG_RPS
-	spin_unlock(&sd->input_pkt_queue.lock);
-#endif
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_lock_irq(&sd->input_pkt_queue.lock);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_disable();
+}
+
+static inline void rps_unlock_irq_restore(struct softnet_data *sd,
+					  unsigned long *flags)
+{
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_unlock_irqrestore(&sd->input_pkt_queue.lock, *flags);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_restore(*flags);
+}
+
+static inline void rps_unlock_irq_enable(struct softnet_data *sd)
+{
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_unlock_irq(&sd->input_pkt_queue.lock);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_enable();
 }
 
 static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev,
@@ -4525,9 +4545,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 
 	sd = &per_cpu(softnet_data, cpu);
 
-	local_irq_save(flags);
-
-	rps_lock(sd);
+	rps_lock_irqsave(sd, &flags);
 	if (!netif_running(skb->dev))
 		goto drop;
 	qlen = skb_queue_len(&sd->input_pkt_queue);
@@ -4536,26 +4554,30 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 enqueue:
 			__skb_queue_tail(&sd->input_pkt_queue, skb);
 			input_queue_tail_incr_save(sd, qtail);
-			rps_unlock(sd);
-			local_irq_restore(flags);
+			rps_unlock_irq_restore(sd, &flags);
 			return NET_RX_SUCCESS;
 		}
 
 		/* Schedule NAPI for backlog device
 		 * We can use non atomic operation since we own the queue lock
+		 * PREEMPT_RT needs to disable interrupts here for
+		 * synchronisation needed in napi_schedule.
 		 */
+		if (IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_disable();
+
 		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) {
 			if (!rps_ipi_queued(sd))
 				____napi_schedule(sd, &sd->backlog);
 		}
+		if (IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_enable();
 		goto enqueue;
 	}
 
 drop:
 	sd->dropped++;
-	rps_unlock(sd);
-
-	local_irq_restore(flags);
+	rps_unlock_irq_restore(sd, &flags);
 
 	atomic_long_inc(&skb->dev->rx_dropped);
 	kfree_skb(skb);
@@ -5617,8 +5639,7 @@ static void flush_backlog(struct work_struct *work)
 	local_bh_disable();
 	sd = this_cpu_ptr(&softnet_data);
 
-	local_irq_disable();
-	rps_lock(sd);
+	rps_lock_irq_disable(sd);
 	skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
 		if (skb->dev->reg_state == NETREG_UNREGISTERING) {
 			__skb_unlink(skb, &sd->input_pkt_queue);
@@ -5626,8 +5647,7 @@ static void flush_backlog(struct work_struct *work)
 			input_queue_head_incr(sd);
 		}
 	}
-	rps_unlock(sd);
-	local_irq_enable();
+	rps_unlock_irq_enable(sd);
 
 	skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
 		if (skb->dev->reg_state == NETREG_UNREGISTERING) {
@@ -5645,16 +5665,14 @@ static bool flush_required(int cpu)
 	struct softnet_data *sd = &per_cpu(softnet_data, cpu);
 	bool do_flush;
 
-	local_irq_disable();
-	rps_lock(sd);
+	rps_lock_irq_disable(sd);
 
 	/* as insertion into process_queue happens with the rps lock held,
 	 * process_queue access may race only with dequeue
 	 */
 	do_flush = !skb_queue_empty(&sd->input_pkt_queue) ||
 		   !skb_queue_empty_lockless(&sd->process_queue);
-	rps_unlock(sd);
-	local_irq_enable();
+	rps_unlock_irq_enable(sd);
 
 	return do_flush;
 #endif
@@ -5769,8 +5787,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
 
 		}
 
-		local_irq_disable();
-		rps_lock(sd);
+		rps_lock_irq_disable(sd);
 		if (skb_queue_empty(&sd->input_pkt_queue)) {
 			/*
 			 * Inline a custom version of __napi_complete().
@@ -5786,8 +5803,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
 			skb_queue_splice_tail_init(&sd->input_pkt_queue,
 						   &sd->process_queue);
 		}
-		rps_unlock(sd);
-		local_irq_enable();
+		rps_unlock_irq_enable(sd);
 	}
 
 	return work;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups.
  2022-02-02 12:28 [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2022-02-02 12:28 ` [PATCH net-next 4/4] net: dev: Make rps_lock() disable interrupts Sebastian Andrzej Siewior
@ 2022-02-02 16:14 ` Jakub Kicinski
  2022-02-03 11:59   ` Toke Høiland-Jørgensen
  4 siblings, 1 reply; 35+ messages in thread
From: Jakub Kicinski @ 2022-02-02 16:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner,
	Toke Høiland-Jørgensen

On Wed,  2 Feb 2022 13:28:44 +0100 Sebastian Andrzej Siewior wrote:
> Hi,
> 
> this series removes or replaces preempt_disable() and local_irq_save()
> sections which are problematic on PREEMPT_RT.
> Patch 3 makes netif_rx() work from any context after I found suggestions
> for it in an old thread. Should that work, then the context-specific
> variants could be removed.

Let's CC Toke, lest it escapes his attention.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 4/4] net: dev: Make rps_lock() disable interrupts.
  2022-02-02 12:28 ` [PATCH net-next 4/4] net: dev: Make rps_lock() disable interrupts Sebastian Andrzej Siewior
@ 2022-02-02 16:47   ` Jakub Kicinski
  2022-02-03 16:41     ` [PATCH net-next v2 " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 35+ messages in thread
From: Jakub Kicinski @ 2022-02-02 16:47 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Wed,  2 Feb 2022 13:28:48 +0100 Sebastian Andrzej Siewior wrote:
>  		/* Schedule NAPI for backlog device
>  		 * We can use non atomic operation since we own the queue lock
> +		 * PREEMPT_RT needs to disable interrupts here for
> +		 * synchronisation needed in napi_schedule.
>  		 */
> +		if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +			local_irq_disable();
> +
>  		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) {
>  			if (!rps_ipi_queued(sd))
>  				____napi_schedule(sd, &sd->backlog);
>  		}
> +		if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +			local_irq_enable();
>  		goto enqueue;

I think you can re-jig this a little more - rps_ipi_queued() only return
0 if sd is "local" so maybe we can call __napi_schedule_irqoff()
instead which already has the if () for PREEMPT_RT?

Maybe moving the ____napi_schedule() into rps_ipi_queued() and
renaming it to napi_schedule_backlog() or napi_schedule_rps() 
would make the code easier to follow in that case?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-02 12:28 ` [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context Sebastian Andrzej Siewior
@ 2022-02-02 16:50   ` Jakub Kicinski
  2022-02-03 12:20     ` Sebastian Andrzej Siewior
  2022-02-02 17:43   ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Jakub Kicinski @ 2022-02-02 16:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Wed,  2 Feb 2022 13:28:47 +0100 Sebastian Andrzej Siewior wrote:
> so they can be removed once they are no more users left.

Any plans for doing the cleanup? 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal().
  2022-02-02 12:28 ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal() Sebastian Andrzej Siewior
@ 2022-02-02 17:10   ` Eric Dumazet
  2022-02-03 12:00     ` Toke Høiland-Jørgensen
  2022-02-03 12:16     ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() " Sebastian Andrzej Siewior
  0 siblings, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2022-02-02 17:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The preempt_disable() and rcu_disable() section was introduced in commit
>    bbbe211c295ff ("net: rcu lock and preempt disable missing around generic xdp")
>
> The backtrace shows that bottom halves were disabled and so the usage of
> smp_processor_id() would not trigger a warning.
> The "suspicious RCU usage" warning was triggered because
> rcu_dereference() was not used in rcu_read_lock() section (only
> rcu_read_lock_bh()). A rcu_read_lock() is sufficient.
>
> Remove the preempt_disable() statement which is not needed.

I am confused by this changelog/analysis of yours.

According to git blame, you are reverting this patch.

commit cece1945bffcf1a823cdfa36669beae118419351
Author: Changli Gao <xiaosuo@gmail.com>
Date:   Sat Aug 7 20:35:43 2010 -0700

    net: disable preemption before call smp_processor_id()

    Although netif_rx() isn't expected to be called in process context with
    preemption enabled, it'd better handle this case. And this is why get_cpu()
    is used in the non-RPS #ifdef branch. If tree RCU is selected,
    rcu_read_lock() won't disable preemption, so preempt_disable() should be
    called explictly.

    Signed-off-by: Changli Gao <xiaosuo@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


But I am not sure we can.

Here is the code in larger context:

#ifdef CONFIG_RPS
    if (static_branch_unlikely(&rps_needed)) {
        struct rps_dev_flow voidflow, *rflow = &voidflow;
        int cpu;

        preempt_disable();
        rcu_read_lock();

        cpu = get_rps_cpu(skb->dev, skb, &rflow);
        if (cpu < 0)
            cpu = smp_processor_id();

        ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);

        rcu_read_unlock();
        preempt_enable();
    } else
#endif

This code needs the preempt_disable().


>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  net/core/dev.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1baab07820f65..325b70074f4ae 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4796,7 +4796,6 @@ static int netif_rx_internal(struct sk_buff *skb)
>                 struct rps_dev_flow voidflow, *rflow = &voidflow;
>                 int cpu;
>
> -               preempt_disable();
>                 rcu_read_lock();
>
>                 cpu = get_rps_cpu(skb->dev, skb, &rflow);
> @@ -4806,7 +4805,6 @@ static int netif_rx_internal(struct sk_buff *skb)
>                 ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>
>                 rcu_read_unlock();
> -               preempt_enable();
>         } else
>  #endif
>         {
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 2/4] net: dev: Remove get_cpu() in netif_rx_internal().
  2022-02-02 12:28 ` [PATCH net-next 2/4] net: dev: Remove get_cpu() " Sebastian Andrzej Siewior
@ 2022-02-02 17:14   ` Eric Dumazet
  2022-02-03 12:14   ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2022-02-02 17:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The get_cpu() usage was added in commit
>     b0e28f1effd1d ("net: netif_rx() must disable preemption")
>
> because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemtion
> causing a warning in smp_processor_id(). The function netif_rx() should
> only be invoked from an interrupt context which implies disabled
> preemption. The commit
>    e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()")
>
> was addressing this and replaced netif_rx() with in netif_rx_ni() in
> ip_dev_loopback_xmit().
>
> Based on the discussion on the list, the former patch (b0e28f1effd1d)
> should not have been applied only the latter (e30b38c298b55).
>
> Remove get_cpu() since the function is supossed to be invoked from
> context with stable per-CPU pointers (either by disabling preemption or
> software interrupts).
>
> Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-02 12:28 ` [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context Sebastian Andrzej Siewior
  2022-02-02 16:50   ` Jakub Kicinski
@ 2022-02-02 17:43   ` Eric Dumazet
  2022-02-03 12:19     ` Toke Høiland-Jørgensen
  2022-02-03 15:10     ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2022-02-02 17:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
> work in all contexts and get rid of netif_rx_ni()". Eric agreed and
> pointed out that modern devices should use netif_receive_skb() to avoid
> the overhead.
> In the meantime someone added another variant, netif_rx_any_context(),
> which behaves as suggested.
>
> netif_rx() must be invoked with disabled bottom halves to ensure that
> pending softirqs, which were raised within the function, are handled.
> netif_rx_ni() can be invoked only from process context (bottom halves
> must be enabled) because the function handles pending softirqs without
> checking if bottom halves were disabled or not.
> netif_rx_any_context() invokes on the former functions by checking
> in_interrupts().
>
> netif_rx() could be taught to handle both cases (disabled and enabled
> bottom halves) by simply disabling bottom halves while invoking
> netif_rx_internal(). The local_bh_enable() invocation will then invoke
> pending softirqs only if the BH-disable counter drops to zero.
>
> Add a local_bh_disable() section in netif_rx() to ensure softirqs are
> handled if needed. Make netif_rx_ni() and netif_rx_any_context() invoke
> netif_rx() so they can be removed once they are no more users left.
>
> Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Maybe worth mentioning this commit will show a negative impact, for
network traffic
over loopback interface.

My measure of the cost of local_bh_disable()/local_bh_enable() is ~6
nsec on one of my lab x86 hosts.

Perhaps we could have a generic netif_rx(), and a __netif_rx() for the
virtual drivers (lo and maybe tunnels).

void __netif_rx(struct sk_buff *skb);

static inline int netif_rx(struct sk_buff *skb)
{
   int res;
    local_bh_disable();
    res = __netif_rx(skb);
  local_bh_enable();
  return res;
}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups.
  2022-02-02 16:14 ` [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Jakub Kicinski
@ 2022-02-03 11:59   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-03 11:59 UTC (permalink / raw)
  To: Jakub Kicinski, Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed,  2 Feb 2022 13:28:44 +0100 Sebastian Andrzej Siewior wrote:
>> Hi,
>> 
>> this series removes or replaces preempt_disable() and local_irq_save()
>> sections which are problematic on PREEMPT_RT.
>> Patch 3 makes netif_rx() work from any context after I found suggestions
>> for it in an old thread. Should that work, then the context-specific
>> variants could be removed.
>
> Let's CC Toke, lest it escapes his attention.

Thanks! I'll take a look :)

-Toke

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal().
  2022-02-02 17:10   ` Eric Dumazet
@ 2022-02-03 12:00     ` Toke Høiland-Jørgensen
  2022-02-03 12:17       ` Sebastian Andrzej Siewior
  2022-02-03 12:16     ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() " Sebastian Andrzej Siewior
  1 sibling, 1 reply; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-03 12:00 UTC (permalink / raw)
  To: Eric Dumazet, Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

Eric Dumazet <edumazet@google.com> writes:

> On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
>>
>> The preempt_disable() and rcu_disable() section was introduced in commit
>>    bbbe211c295ff ("net: rcu lock and preempt disable missing around generic xdp")
>>
>> The backtrace shows that bottom halves were disabled and so the usage of
>> smp_processor_id() would not trigger a warning.
>> The "suspicious RCU usage" warning was triggered because
>> rcu_dereference() was not used in rcu_read_lock() section (only
>> rcu_read_lock_bh()). A rcu_read_lock() is sufficient.
>>
>> Remove the preempt_disable() statement which is not needed.
>
> I am confused by this changelog/analysis of yours.
>
> According to git blame, you are reverting this patch.
>
> commit cece1945bffcf1a823cdfa36669beae118419351
> Author: Changli Gao <xiaosuo@gmail.com>
> Date:   Sat Aug 7 20:35:43 2010 -0700
>
>     net: disable preemption before call smp_processor_id()
>
>     Although netif_rx() isn't expected to be called in process context with
>     preemption enabled, it'd better handle this case. And this is why get_cpu()
>     is used in the non-RPS #ifdef branch. If tree RCU is selected,
>     rcu_read_lock() won't disable preemption, so preempt_disable() should be
>     called explictly.
>
>     Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> But I am not sure we can.
>
> Here is the code in larger context:
>
> #ifdef CONFIG_RPS
>     if (static_branch_unlikely(&rps_needed)) {
>         struct rps_dev_flow voidflow, *rflow = &voidflow;
>         int cpu;
>
>         preempt_disable();
>         rcu_read_lock();
>
>         cpu = get_rps_cpu(skb->dev, skb, &rflow);
>         if (cpu < 0)
>             cpu = smp_processor_id();
>
>         ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>
>         rcu_read_unlock();
>         preempt_enable();
>     } else
> #endif
>
> This code needs the preempt_disable().

This is mostly so that the CPU ID stays the same throughout that section
of code, though, right? So wouldn't it work to replace the
preempt_disable() with a migrate_disable()? That should keep _RT happy,
no?

-Toke

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 2/4] net: dev: Remove get_cpu() in netif_rx_internal().
  2022-02-02 12:28 ` [PATCH net-next 2/4] net: dev: Remove get_cpu() " Sebastian Andrzej Siewior
  2022-02-02 17:14   ` Eric Dumazet
@ 2022-02-03 12:14   ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-03 12:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, bpf, netdev
  Cc: David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Sebastian Andrzej Siewior

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> The get_cpu() usage was added in commit
>     b0e28f1effd1d ("net: netif_rx() must disable preemption")
>
> because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemtion
> causing a warning in smp_processor_id(). The function netif_rx() should
> only be invoked from an interrupt context which implies disabled
> preemption. The commit
>    e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()")
>
> was addressing this and replaced netif_rx() with in netif_rx_ni() in
> ip_dev_loopback_xmit().
>
> Based on the discussion on the list, the former patch (b0e28f1effd1d)
> should not have been applied only the latter (e30b38c298b55).
>
> Remove get_cpu() since the function is supossed to be invoked from
> context with stable per-CPU pointers (either by disabling preemption or
> software interrupts).
>
> Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal().
  2022-02-02 17:10   ` Eric Dumazet
  2022-02-03 12:00     ` Toke Høiland-Jørgensen
@ 2022-02-03 12:16     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 12:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On 2022-02-02 09:10:10 [-0800], Eric Dumazet wrote:
> On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > The preempt_disable() and rcu_disable() section was introduced in commit
> >    bbbe211c295ff ("net: rcu lock and preempt disable missing around generic xdp")
> >
> > The backtrace shows that bottom halves were disabled and so the usage of
> > smp_processor_id() would not trigger a warning.
> > The "suspicious RCU usage" warning was triggered because
> > rcu_dereference() was not used in rcu_read_lock() section (only
> > rcu_read_lock_bh()). A rcu_read_lock() is sufficient.
> >
> > Remove the preempt_disable() statement which is not needed.
> 
> I am confused by this changelog/analysis of yours.
> 
> According to git blame, you are reverting this patch.
> 
> commit cece1945bffcf1a823cdfa36669beae118419351
> Author: Changli Gao <xiaosuo@gmail.com>
> Date:   Sat Aug 7 20:35:43 2010 -0700
> 
>     net: disable preemption before call smp_processor_id()
> 
>     Although netif_rx() isn't expected to be called in process context with
>     preemption enabled, it'd better handle this case. And this is why get_cpu()
>     is used in the non-RPS #ifdef branch. If tree RCU is selected,
>     rcu_read_lock() won't disable preemption, so preempt_disable() should be
>     called explictly.
> 
>     Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Nut sure if I ignored it or made a wrong turn somewhere. But I remember
reading it. But here, preempt_disable() was added because
| Although netif_rx() isn't expected to be called in process context with
| preemption enabled, it'd better handle this case.

and this isn't much of a good reason. Simply because netif_rx()
shouldn't not be called from preemptible context.

> But I am not sure we can.
> 
> Here is the code in larger context:
> 
> #ifdef CONFIG_RPS
>     if (static_branch_unlikely(&rps_needed)) {
>         struct rps_dev_flow voidflow, *rflow = &voidflow;
>         int cpu;
> 
>         preempt_disable();
>         rcu_read_lock();
> 
>         cpu = get_rps_cpu(skb->dev, skb, &rflow);
>         if (cpu < 0)
>             cpu = smp_processor_id();
> 
>         ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
> 
>         rcu_read_unlock();
>         preempt_enable();
>     } else
> #endif
> 
> This code needs the preempt_disable().

But why? netif_rx_internal() should be invoked with disabled BH so I
don't see a reason why preemption needs additionally be disabled in this
section.
On PREEMPT_RT we can get preempted but the task remains on the CPU and
other network activity will be block on the BH-lock.

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal().
  2022-02-03 12:00     ` Toke Høiland-Jørgensen
@ 2022-02-03 12:17       ` Sebastian Andrzej Siewior
  2022-02-03 12:41         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 12:17 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Eric Dumazet, bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On 2022-02-03 13:00:06 [+0100], Toke Høiland-Jørgensen wrote:
> > Here is the code in larger context:
> >
> > #ifdef CONFIG_RPS
> >     if (static_branch_unlikely(&rps_needed)) {
> >         struct rps_dev_flow voidflow, *rflow = &voidflow;
> >         int cpu;
> >
> >         preempt_disable();
> >         rcu_read_lock();
> >
> >         cpu = get_rps_cpu(skb->dev, skb, &rflow);
> >         if (cpu < 0)
> >             cpu = smp_processor_id();
> >
> >         ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
> >
> >         rcu_read_unlock();
> >         preempt_enable();
> >     } else
> > #endif
> >
> > This code needs the preempt_disable().
> 
> This is mostly so that the CPU ID stays the same throughout that section
> of code, though, right? So wouldn't it work to replace the
> preempt_disable() with a migrate_disable()? That should keep _RT happy,
> no?

It would but as mentioned previously: BH is disabled and
smp_processor_id() is stable.

> -Toke

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-02 17:43   ` Eric Dumazet
@ 2022-02-03 12:19     ` Toke Høiland-Jørgensen
  2022-02-03 15:10     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-03 12:19 UTC (permalink / raw)
  To: Eric Dumazet, Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

Eric Dumazet <edumazet@google.com> writes:

> On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
>>
>> Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
>> work in all contexts and get rid of netif_rx_ni()". Eric agreed and
>> pointed out that modern devices should use netif_receive_skb() to avoid
>> the overhead.
>> In the meantime someone added another variant, netif_rx_any_context(),
>> which behaves as suggested.
>>
>> netif_rx() must be invoked with disabled bottom halves to ensure that
>> pending softirqs, which were raised within the function, are handled.
>> netif_rx_ni() can be invoked only from process context (bottom halves
>> must be enabled) because the function handles pending softirqs without
>> checking if bottom halves were disabled or not.
>> netif_rx_any_context() invokes on the former functions by checking
>> in_interrupts().
>>
>> netif_rx() could be taught to handle both cases (disabled and enabled
>> bottom halves) by simply disabling bottom halves while invoking
>> netif_rx_internal(). The local_bh_enable() invocation will then invoke
>> pending softirqs only if the BH-disable counter drops to zero.
>>
>> Add a local_bh_disable() section in netif_rx() to ensure softirqs are
>> handled if needed. Make netif_rx_ni() and netif_rx_any_context() invoke
>> netif_rx() so they can be removed once they are no more users left.
>>
>> Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> Maybe worth mentioning this commit will show a negative impact, for
> network traffic
> over loopback interface.
>
> My measure of the cost of local_bh_disable()/local_bh_enable() is ~6
> nsec on one of my lab x86 hosts.
>
> Perhaps we could have a generic netif_rx(), and a __netif_rx() for the
> virtual drivers (lo and maybe tunnels).
>
> void __netif_rx(struct sk_buff *skb);
>
> static inline int netif_rx(struct sk_buff *skb)
> {
>    int res;
>     local_bh_disable();
>     res = __netif_rx(skb);
>   local_bh_enable();
>   return res;
> }

+1, this seems like a reasonable solution!

-Toke

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-02 16:50   ` Jakub Kicinski
@ 2022-02-03 12:20     ` Sebastian Andrzej Siewior
  2022-02-03 19:38       ` Jakub Kicinski
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 12:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On 2022-02-02 08:50:04 [-0800], Jakub Kicinski wrote:
> On Wed,  2 Feb 2022 13:28:47 +0100 Sebastian Andrzej Siewior wrote:
> > so they can be removed once they are no more users left.
> 
> Any plans for doing the cleanup? 

Sure. If this is not rejected I can go and hunt netif_rx_ni() and
netif_rx_any_context().

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal().
  2022-02-03 12:17       ` Sebastian Andrzej Siewior
@ 2022-02-03 12:41         ` Toke Høiland-Jørgensen
  2022-02-03 15:50           ` Sebastian Andrzej Siewior
  2022-02-04 15:20           ` [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() " Sebastian Andrzej Siewior
  0 siblings, 2 replies; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-03 12:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Eric Dumazet, bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-02-03 13:00:06 [+0100], Toke Høiland-Jørgensen wrote:
>> > Here is the code in larger context:
>> >
>> > #ifdef CONFIG_RPS
>> >     if (static_branch_unlikely(&rps_needed)) {
>> >         struct rps_dev_flow voidflow, *rflow = &voidflow;
>> >         int cpu;
>> >
>> >         preempt_disable();
>> >         rcu_read_lock();
>> >
>> >         cpu = get_rps_cpu(skb->dev, skb, &rflow);
>> >         if (cpu < 0)
>> >             cpu = smp_processor_id();
>> >
>> >         ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>> >
>> >         rcu_read_unlock();
>> >         preempt_enable();
>> >     } else
>> > #endif
>> >
>> > This code needs the preempt_disable().
>> 
>> This is mostly so that the CPU ID stays the same throughout that section
>> of code, though, right? So wouldn't it work to replace the
>> preempt_disable() with a migrate_disable()? That should keep _RT happy,
>> no?
>
> It would but as mentioned previously: BH is disabled and
> smp_processor_id() is stable.

Ah, right, because of the change in loopback to use netif_rx_ni()? But
that bit of the analysis only comes later in your series, so at the very
least you should be explaining this in the commit message here. Or you
could potentially squash patches 1 and 2 and do both changes at once,
since it's changing two bits of the same function and both need the same
analysis...

However, if we're going with Eric's suggestion of an internal
__netif_rx() for loopback that *doesn't* do local_bh_disable() then this
code would end up being called without BH disable, so we'd need the
migrate_disable() anyway, no?

-Toke

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-02 17:43   ` Eric Dumazet
  2022-02-03 12:19     ` Toke Høiland-Jørgensen
@ 2022-02-03 15:10     ` Sebastian Andrzej Siewior
  2022-02-03 15:25       ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 15:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On 2022-02-02 09:43:14 [-0800], Eric Dumazet wrote:
> Maybe worth mentioning this commit will show a negative impact, for
> network traffic
> over loopback interface.
> 
> My measure of the cost of local_bh_disable()/local_bh_enable() is ~6
> nsec on one of my lab x86 hosts.

So you are worried that 
    dev_loopback_xmit() -> netif_rx_ni()

becomes
    dev_loopback_xmit() -> netif_rx()

and by that 6nsec slower because of that bh off/on? Can these 6nsec get
a little lower if we substract the overhead of preempt-off/on? 
But maybe I picked the wrong loopback here.

> Perhaps we could have a generic netif_rx(), and a __netif_rx() for the
> virtual drivers (lo and maybe tunnels).
> 
> void __netif_rx(struct sk_buff *skb);
> 
> static inline int netif_rx(struct sk_buff *skb)
> {
>    int res;
>     local_bh_disable();
>     res = __netif_rx(skb);
>   local_bh_enable();
>   return res;
> }

But what is __netif_rx() doing? netif_rx_ni() has this part:

|       preempt_disable();
|       err = netif_rx_internal(skb);
|       if (local_softirq_pending())
|               do_softirq();
|       preempt_enable();

to ensure that smp_processor_id() and friends are quiet plus any raised
softirqs are processed. With the current netif_rx() we end up with:

|       local_bh_disable();
|       ret = netif_rx_internal(skb);
|       local_bh_enable();

which provides the same. Assuming __netif_rx() as:

| int __netif_rx(skb)
| {
|         trace_netif_rx_entry(skb);
| 
|         ret = netif_rx_internal(skb);
|         trace_netif_rx_exit(ret);
| 
|         return ret;
| }

and the loopback interface is not invoking this in_interrupt() context.

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-03 15:10     ` Sebastian Andrzej Siewior
@ 2022-02-03 15:25       ` Eric Dumazet
  2022-02-03 15:40         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2022-02-03 15:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Thu, Feb 3, 2022 at 7:10 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-02-02 09:43:14 [-0800], Eric Dumazet wrote:
> > Maybe worth mentioning this commit will show a negative impact, for
> > network traffic
> > over loopback interface.
> >
> > My measure of the cost of local_bh_disable()/local_bh_enable() is ~6
> > nsec on one of my lab x86 hosts.
>
> So you are worried that
>     dev_loopback_xmit() -> netif_rx_ni()
>
> becomes
>     dev_loopback_xmit() -> netif_rx()


No, the loopback device (ifconfig log) I am referring to is in
drivers/net/loopback.c

loopback_xmit() calls netif_rx() directly, while bh are already disabled.

>
> and by that 6nsec slower because of that bh off/on? Can these 6nsec get
> a little lower if we substract the overhead of preempt-off/on?
> But maybe I picked the wrong loopback here.
>
> > Perhaps we could have a generic netif_rx(), and a __netif_rx() for the
> > virtual drivers (lo and maybe tunnels).
> >
> > void __netif_rx(struct sk_buff *skb);
> >
> > static inline int netif_rx(struct sk_buff *skb)
> > {
> >    int res;
> >     local_bh_disable();
> >     res = __netif_rx(skb);
> >   local_bh_enable();
> >   return res;
> > }
>
> But what is __netif_rx() doing? netif_rx_ni() has this part:
>
> |       preempt_disable();
> |       err = netif_rx_internal(skb);
> |       if (local_softirq_pending())
> |               do_softirq();
> |       preempt_enable();
>
> to ensure that smp_processor_id() and friends are quiet plus any raised
> softirqs are processed. With the current netif_rx() we end up with:
>
> |       local_bh_disable();
> |       ret = netif_rx_internal(skb);
> |       local_bh_enable();
>
> which provides the same. Assuming __netif_rx() as:
>
> | int __netif_rx(skb)
> | {
> |         trace_netif_rx_entry(skb);
> |
> |         ret = netif_rx_internal(skb);
> |         trace_netif_rx_exit(ret);
> |
> |         return ret;
> | }
>
> and the loopback interface is not invoking this in_interrupt() context.
>
> Sebastian

Instead of adding a local_bh_disable()/local_bh_enable() in netif_rx()
I suggested
to rename current netif_rx() to __netif_rx() and add a wrapper, eg :

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e490b84732d1654bf067b30f2bb0b0825f88dea9..39232d99995cbd54c74e85905bb4af43b5b301ca
100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3668,7 +3668,17 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff
*skb, struct xdp_buff *xdp,
                             struct bpf_prog *xdp_prog);
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
-int netif_rx(struct sk_buff *skb);
+int __netif_rx(struct sk_buff *skb);
+static inline int netif_rx(struct sk_buff *skb)
+{
+       int res;
+
+       local_bh_disable();
+       res = __netif_rx(skb);
+       local_bh_enable();
+       return res;
+}
+
 int netif_rx_ni(struct sk_buff *skb);
 int netif_rx_any_context(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 1baab07820f65f9bcf88a6d73e2c9ff741d33c18..f962e549e0bfea96cdba5bc7e1d8694e46652eac
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4819,7 +4819,7 @@ static int netif_rx_internal(struct sk_buff *skb)
 }

 /**
- *     netif_rx        -       post buffer to the network code
+ *     __netif_rx      -       post buffer to the network code
  *     @skb: buffer to post
  *
  *     This function receives a packet from a device driver and queues it for
@@ -4833,7 +4833,7 @@ static int netif_rx_internal(struct sk_buff *skb)
  *
  */

-int netif_rx(struct sk_buff *skb)
+int __netif_rx(struct sk_buff *skb)
 {
        int ret;

@@ -4844,7 +4844,7 @@ int netif_rx(struct sk_buff *skb)

        return ret;
 }
-EXPORT_SYMBOL(netif_rx);
+EXPORT_SYMBOL(__netif_rx);

 int netif_rx_ni(struct sk_buff *skb)
 {

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-03 15:25       ` Eric Dumazet
@ 2022-02-03 15:40         ` Sebastian Andrzej Siewior
  2022-02-03 16:18           ` Eric Dumazet
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 15:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Peter Zijlstra

On 2022-02-03 07:25:01 [-0800], Eric Dumazet wrote:
> 
> No, the loopback device (ifconfig log) I am referring to is in
> drivers/net/loopback.c
> 
> loopback_xmit() calls netif_rx() directly, while bh are already disabled.

ah okay. Makes sense.

> Instead of adding a local_bh_disable()/local_bh_enable() in netif_rx()
> I suggested
> to rename current netif_rx() to __netif_rx() and add a wrapper, eg :

So we still end up with two interfaces. Do I move a few callers like the
one you already mentioned over to the __netif_rx() interface or will it
be the one previously mentioned for now?

Would something like 

diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
index fc53e0ad56d90..561cbca431ca6 100644
--- a/include/linux/bottom_half.h
+++ b/include/linux/bottom_half.h
@@ -30,7 +30,12 @@ static inline void local_bh_enable_ip(unsigned long ip)
 
 static inline void local_bh_enable(void)
 {
-	__local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
+	if (unlikely(softirq_count() == SOFTIRQ_DISABLE_OFFSET)) {
+		__local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
+	} else {
+		preempt_count_sub(SOFTIRQ_DISABLE_OFFSET);
+		barrier();
+	}
 }
 
 #ifdef CONFIG_PREEMPT_RT

lower the overhead to acceptable range? (I still need to sell this to
peterz first).

Sebastian

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal().
  2022-02-03 12:41         ` Toke Høiland-Jørgensen
@ 2022-02-03 15:50           ` Sebastian Andrzej Siewior
  2022-02-04 15:20           ` [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() " Sebastian Andrzej Siewior
  1 sibling, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 15:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Eric Dumazet, bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On 2022-02-03 13:41:13 [+0100], Toke Høiland-Jørgensen wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > It would but as mentioned previously: BH is disabled and
> > smp_processor_id() is stable.
> 
> Ah, right, because of the change in loopback to use netif_rx_ni()? But
> that bit of the analysis only comes later in your series, so at the very
> least you should be explaining this in the commit message here. Or you
> could potentially squash patches 1 and 2 and do both changes at once,
> since it's changing two bits of the same function and both need the same
> analysis...
> 
> However, if we're going with Eric's suggestion of an internal
> __netif_rx() for loopback that *doesn't* do local_bh_disable() then this
> code would end up being called without BH disable, so we'd need the
> migrate_disable() anyway, no?

Eric suggested to the __netif_rx() for loopback which is already in
BH-disabled section. So if that is the only "allowed" caller, we
wouldn't have to worry.
If __netif_rx() becomes more users and one calls it from preemptible
context then we have a problem (like netif_rx() vs netif_rx_ni()).

migrate_disable() will shut up smp_processor_id(), yes, but we need
something to process pending softirqs. Otherwise they are delayed until
the next IRQ, spin_unlock_bh(), etc.

> -Toke

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-03 15:40         ` Sebastian Andrzej Siewior
@ 2022-02-03 16:18           ` Eric Dumazet
  2022-02-03 16:44             ` Sebastian Andrzej Siewior
  2022-02-04 13:00             ` [PATCH net-next v2 " Sebastian Andrzej Siewior
  0 siblings, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2022-02-03 16:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Peter Zijlstra

On Thu, Feb 3, 2022 at 7:40 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-02-03 07:25:01 [-0800], Eric Dumazet wrote:
> >
> > No, the loopback device (ifconfig log) I am referring to is in
> > drivers/net/loopback.c
> >
> > loopback_xmit() calls netif_rx() directly, while bh are already disabled.
>
> ah okay. Makes sense.
>
> > Instead of adding a local_bh_disable()/local_bh_enable() in netif_rx()
> > I suggested
> > to rename current netif_rx() to __netif_rx() and add a wrapper, eg :
>
> So we still end up with two interfaces. Do I move a few callers like the
> one you already mentioned over to the __netif_rx() interface or will it
> be the one previously mentioned for now?


I would say vast majority of drivers would use netif_rx()

Only the one we consider critical (loopback traffic) would use
__netif_rx(), after careful inspection.

As we said modern/high performance NIC are using NAPI and GRO these days.

Only virtual drivers might still use legacy netif_rx() and be in critical paths.

>
> Would something like
>
> diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
> index fc53e0ad56d90..561cbca431ca6 100644
> --- a/include/linux/bottom_half.h
> +++ b/include/linux/bottom_half.h
> @@ -30,7 +30,12 @@ static inline void local_bh_enable_ip(unsigned long ip)
>
>  static inline void local_bh_enable(void)
>  {
> -       __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
> +       if (unlikely(softirq_count() == SOFTIRQ_DISABLE_OFFSET)) {
> +               __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
> +       } else {
> +               preempt_count_sub(SOFTIRQ_DISABLE_OFFSET);
> +               barrier();
> +       }
>  }
>
>  #ifdef CONFIG_PREEMPT_RT
>
> lower the overhead to acceptable range? (I still need to sell this to
> peterz first).

I guess the cost of the  local_bh_enable()/local_bh_disable() pair
will be roughly the same, please measure it :)

>
> Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next v2 4/4] net: dev: Make rps_lock() disable interrupts.
  2022-02-02 16:47   ` Jakub Kicinski
@ 2022-02-03 16:41     ` Sebastian Andrzej Siewior
  2022-02-03 19:39       ` Jakub Kicinski
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 16:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

Disabling interrupts and in the RPS case locking input_pkt_queue is
split into local_irq_disable() and optional spin_lock().

This breaks on PREEMPT_RT because the spinlock_t typed lock can not be
acquired with disabled interrupts.
The sections in which the lock is acquired is usually short in a sense that it
is not causing long und unbounded latiencies. One exception is the
skb_flow_limit() invocation which may invoke a BPF program (and may
require sleeping locks).

By moving local_irq_disable() + spin_lock() into rps_lock(), we can keep
interrupts disabled on !PREEMPT_RT and enabled on PREEMPT_RT kernels.
Without RPS on a PREEMPT_RT kernel, the needed synchronisation happens
as part of local_bh_disable() on the local CPU.
____napi_schedule() is only invoked if sd is from the local CPU. Replace
it with __napi_schedule_irqoff() which already disables interrupts on
PREEMPT_RT as needed. Move this call to rps_ipi_queued() and rename the
function to napi_schedule_rps as suggested by Jakub.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
On 2022-02-02 08:47:35 [-0800], Jakub Kicinski wrote:
> 
> I think you can re-jig this a little more - rps_ipi_queued() only return
> 0 if sd is "local" so maybe we can call __napi_schedule_irqoff()
> instead which already has the if () for PREEMPT_RT?
> 
> Maybe moving the ____napi_schedule() into rps_ipi_queued() and
> renaming it to napi_schedule_backlog() or napi_schedule_rps() 
> would make the code easier to follow in that case?

Something like this then?

 net/core/dev.c | 76 ++++++++++++++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 34 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f43d0580fa11d..18f9941287c2e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -216,18 +216,38 @@ static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)
 	return &net->dev_index_head[ifindex & (NETDEV_HASHENTRIES - 1)];
 }
 
-static inline void rps_lock(struct softnet_data *sd)
+static inline void rps_lock_irqsave(struct softnet_data *sd,
+				    unsigned long *flags)
 {
-#ifdef CONFIG_RPS
-	spin_lock(&sd->input_pkt_queue.lock);
-#endif
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_lock_irqsave(&sd->input_pkt_queue.lock, *flags);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_save(*flags);
 }
 
-static inline void rps_unlock(struct softnet_data *sd)
+static inline void rps_lock_irq_disable(struct softnet_data *sd)
 {
-#ifdef CONFIG_RPS
-	spin_unlock(&sd->input_pkt_queue.lock);
-#endif
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_lock_irq(&sd->input_pkt_queue.lock);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_disable();
+}
+
+static inline void rps_unlock_irq_restore(struct softnet_data *sd,
+					  unsigned long *flags)
+{
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_unlock_irqrestore(&sd->input_pkt_queue.lock, *flags);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_restore(*flags);
+}
+
+static inline void rps_unlock_irq_enable(struct softnet_data *sd)
+{
+	if (IS_ENABLED(CONFIG_RPS))
+		spin_unlock_irq(&sd->input_pkt_queue.lock);
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_enable();
 }
 
 static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev,
@@ -4456,11 +4476,11 @@ static void rps_trigger_softirq(void *data)
  * If yes, queue it to our IPI list and return 1
  * If no, return 0
  */
-static int rps_ipi_queued(struct softnet_data *sd)
+static int napi_schedule_rps(struct softnet_data *sd)
 {
-#ifdef CONFIG_RPS
 	struct softnet_data *mysd = this_cpu_ptr(&softnet_data);
 
+#ifdef CONFIG_RPS
 	if (sd != mysd) {
 		sd->rps_ipi_next = mysd->rps_ipi_list;
 		mysd->rps_ipi_list = sd;
@@ -4469,6 +4489,7 @@ static int rps_ipi_queued(struct softnet_data *sd)
 		return 1;
 	}
 #endif /* CONFIG_RPS */
+	__napi_schedule_irqoff(&mysd->backlog);
 	return 0;
 }
 
@@ -4525,9 +4546,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 
 	sd = &per_cpu(softnet_data, cpu);
 
-	local_irq_save(flags);
-
-	rps_lock(sd);
+	rps_lock_irqsave(sd, &flags);
 	if (!netif_running(skb->dev))
 		goto drop;
 	qlen = skb_queue_len(&sd->input_pkt_queue);
@@ -4536,26 +4555,21 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 enqueue:
 			__skb_queue_tail(&sd->input_pkt_queue, skb);
 			input_queue_tail_incr_save(sd, qtail);
-			rps_unlock(sd);
-			local_irq_restore(flags);
+			rps_unlock_irq_restore(sd, &flags);
 			return NET_RX_SUCCESS;
 		}
 
 		/* Schedule NAPI for backlog device
 		 * We can use non atomic operation since we own the queue lock
 		 */
-		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) {
-			if (!rps_ipi_queued(sd))
-				____napi_schedule(sd, &sd->backlog);
-		}
+		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state))
+			napi_schedule_rps(sd);
 		goto enqueue;
 	}
 
 drop:
 	sd->dropped++;
-	rps_unlock(sd);
-
-	local_irq_restore(flags);
+	rps_unlock_irq_restore(sd, &flags);
 
 	atomic_long_inc(&skb->dev->rx_dropped);
 	kfree_skb(skb);
@@ -5617,8 +5631,7 @@ static void flush_backlog(struct work_struct *work)
 	local_bh_disable();
 	sd = this_cpu_ptr(&softnet_data);
 
-	local_irq_disable();
-	rps_lock(sd);
+	rps_lock_irq_disable(sd);
 	skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
 		if (skb->dev->reg_state == NETREG_UNREGISTERING) {
 			__skb_unlink(skb, &sd->input_pkt_queue);
@@ -5626,8 +5639,7 @@ static void flush_backlog(struct work_struct *work)
 			input_queue_head_incr(sd);
 		}
 	}
-	rps_unlock(sd);
-	local_irq_enable();
+	rps_unlock_irq_enable(sd);
 
 	skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
 		if (skb->dev->reg_state == NETREG_UNREGISTERING) {
@@ -5645,16 +5657,14 @@ static bool flush_required(int cpu)
 	struct softnet_data *sd = &per_cpu(softnet_data, cpu);
 	bool do_flush;
 
-	local_irq_disable();
-	rps_lock(sd);
+	rps_lock_irq_disable(sd);
 
 	/* as insertion into process_queue happens with the rps lock held,
 	 * process_queue access may race only with dequeue
 	 */
 	do_flush = !skb_queue_empty(&sd->input_pkt_queue) ||
 		   !skb_queue_empty_lockless(&sd->process_queue);
-	rps_unlock(sd);
-	local_irq_enable();
+	rps_unlock_irq_enable(sd);
 
 	return do_flush;
 #endif
@@ -5769,8 +5779,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
 
 		}
 
-		local_irq_disable();
-		rps_lock(sd);
+		rps_lock_irq_disable(sd);
 		if (skb_queue_empty(&sd->input_pkt_queue)) {
 			/*
 			 * Inline a custom version of __napi_complete().
@@ -5786,8 +5795,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
 			skb_queue_splice_tail_init(&sd->input_pkt_queue,
 						   &sd->process_queue);
 		}
-		rps_unlock(sd);
-		local_irq_enable();
+		rps_unlock_irq_enable(sd);
 	}
 
 	return work;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-03 16:18           ` Eric Dumazet
@ 2022-02-03 16:44             ` Sebastian Andrzej Siewior
  2022-02-03 17:45               ` Sebastian Andrzej Siewior
  2022-02-04 13:00             ` [PATCH net-next v2 " Sebastian Andrzej Siewior
  1 sibling, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 16:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Peter Zijlstra

On 2022-02-03 08:18:34 [-0800], Eric Dumazet wrote:
> > So we still end up with two interfaces. Do I move a few callers like the
> > one you already mentioned over to the __netif_rx() interface or will it
> > be the one previously mentioned for now?
> 
> 
> I would say vast majority of drivers would use netif_rx()
> 
> Only the one we consider critical (loopback traffic) would use
> __netif_rx(), after careful inspection.
> 
> As we said modern/high performance NIC are using NAPI and GRO these days.
> 
> Only virtual drivers might still use legacy netif_rx() and be in critical paths.

Let me then update something to the documentation so it becomes obvious.

> >  static inline void local_bh_enable(void)
> >  {
> > -       __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
> > +       if (unlikely(softirq_count() == SOFTIRQ_DISABLE_OFFSET)) {
> > +               __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
> > +       } else {
> > +               preempt_count_sub(SOFTIRQ_DISABLE_OFFSET);
> > +               barrier();
> > +       }
> >  }
> >
> >  #ifdef CONFIG_PREEMPT_RT
> >
> > lower the overhead to acceptable range? (I still need to sell this to
> > peterz first).
> 
> I guess the cost of the  local_bh_enable()/local_bh_disable() pair
> will be roughly the same, please measure it :)

We would avoid that branch maybe that helps. Will measure.

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-03 16:44             ` Sebastian Andrzej Siewior
@ 2022-02-03 17:45               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-03 17:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Peter Zijlstra

On 2022-02-03 17:44:33 [+0100], To Eric Dumazet wrote:
> > I guess the cost of the  local_bh_enable()/local_bh_disable() pair
> > will be roughly the same, please measure it :)
> 
> We would avoid that branch maybe that helps. Will measure.

|  BH OFF/ON     : 722922586
|  BH OFF/ON     : 722931661
|  BH OFF/ON     : 725341486
|  BH OFF/ON     : 725909591
|  BH OFF/ON     : 741705606
|  BH OFF/ON-OPT : 536683873
|  BH OFF/ON-OPT : 536933779
|  BH OFF/ON-OPT : 536967581
|  BH OFF/ON-OPT : 537109700
|  BH OFF/ON-OPT : 537148631

in a tight loop of 100000000 iterations:
BH OFF/ON = local_bh_disable(); local_bh_enable()
BH OFF/ON-OPT = local_bh_disable(); local_bh_enable_opt()
where local_bh_enable_opt() is the proposed function.

725341486 = ~7.3ns for one iteration.
536967581 = ~5.4ns for one iteration.

This is without tracing+lockdep. So I don't need to sell this to peterz
and focus one the previously suggested version.

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-03 12:20     ` Sebastian Andrzej Siewior
@ 2022-02-03 19:38       ` Jakub Kicinski
  0 siblings, 0 replies; 35+ messages in thread
From: Jakub Kicinski @ 2022-02-03 19:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Thu, 3 Feb 2022 13:20:17 +0100 Sebastian Andrzej Siewior wrote:
> On 2022-02-02 08:50:04 [-0800], Jakub Kicinski wrote:
> > On Wed,  2 Feb 2022 13:28:47 +0100 Sebastian Andrzej Siewior wrote:  
> > > so they can be removed once they are no more users left.  
> > 
> > Any plans for doing the cleanup?   
> 
> Sure. If this is not rejected I can go and hunt netif_rx_ni() and
> netif_rx_any_context().

Thanks! Primarily asking because I'm trying to take note of "outstanding
cleanups", but would be perfect if you can take care of it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v2 4/4] net: dev: Make rps_lock() disable interrupts.
  2022-02-03 16:41     ` [PATCH net-next v2 " Sebastian Andrzej Siewior
@ 2022-02-03 19:39       ` Jakub Kicinski
  0 siblings, 0 replies; 35+ messages in thread
From: Jakub Kicinski @ 2022-02-03 19:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

On Thu, 3 Feb 2022 17:41:30 +0100 Sebastian Andrzej Siewior wrote:
> > I think you can re-jig this a little more - rps_ipi_queued() only return
> > 0 if sd is "local" so maybe we can call __napi_schedule_irqoff()
> > instead which already has the if () for PREEMPT_RT?
> > 
> > Maybe moving the ____napi_schedule() into rps_ipi_queued() and
> > renaming it to napi_schedule_backlog() or napi_schedule_rps() 
> > would make the code easier to follow in that case?  
> 
> Something like this then?

Exactly!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next v2 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-03 16:18           ` Eric Dumazet
  2022-02-03 16:44             ` Sebastian Andrzej Siewior
@ 2022-02-04 13:00             ` Sebastian Andrzej Siewior
  2022-02-04 18:46               ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-04 13:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Peter Zijlstra

Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
work in all contexts and get rid of netif_rx_ni()". Eric agreed and
pointed out that modern devices should use netif_receive_skb() to avoid
the overhead.
In the meantime someone added another variant, netif_rx_any_context(),
which behaves as suggested.

netif_rx() must be invoked with disabled bottom halves to ensure that
pending softirqs, which were raised within the function, are handled.
netif_rx_ni() can be invoked only from process context (bottom halves
must be enabled) because the function handles pending softirqs without
checking if bottom halves were disabled or not.
netif_rx_any_context() invokes on the former functions by checking
in_interrupts().

netif_rx() could be taught to handle both cases (disabled and enabled
bottom halves) by simply disabling bottom halves while invoking
netif_rx_internal(). The local_bh_enable() invocation will then invoke
pending softirqs only if the BH-disable counter drops to zero.

Eric is concerned about the overhead of BH-disable+enable especially in
regard to the loopback driver. As critical as this driver is, it will
receive a shortcut to avoid the additional overhead which is not needed.

Add a local_bh_disable() section in netif_rx() to ensure softirqs are
handled if needed. Provide the internal bits as __netif_rx() which can
be used by the loopback driver. This function is not exported so it
can't be used by modules.
Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they
can be removed once they are no more users left.

Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2:
  - Provide netif_rx() as in v1 and additionally __netif_rx() without
    local_bh disable()+enable() for the loopback driver. __netif_rx() is
    not exported (loopback is built-in only) so it won't be used
    drivers. If this doesn't work then we can still export/ define a
    wrapper as Eric suggested.

  - Added a comment that netif_rx() considered legacy.

 drivers/net/loopback.c     |  2 +-
 include/linux/netdevice.h  | 14 ++++++++--
 include/trace/events/net.h | 14 ----------
 net/core/dev.c             | 53 +++++++++++---------------------------
 4 files changed, 28 insertions(+), 55 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index ed0edf5884ef8..77f5b564382b6 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -86,7 +86,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 	skb->protocol = eth_type_trans(skb, dev);
 
 	len = skb->len;
-	if (likely(netif_rx(skb) == NET_RX_SUCCESS))
+	if (likely(__netif_rx(skb) == NET_RX_SUCCESS))
 		dev_lstats_add(dev, len);
 
 	return NETDEV_TX_OK;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e490b84732d16..c9e883104adb1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3669,8 +3669,18 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
 int netif_rx(struct sk_buff *skb);
-int netif_rx_ni(struct sk_buff *skb);
-int netif_rx_any_context(struct sk_buff *skb);
+int __netif_rx(struct sk_buff *skb);
+
+static inline int netif_rx_ni(struct sk_buff *skb)
+{
+	return netif_rx(skb);
+}
+
+static inline int netif_rx_any_context(struct sk_buff *skb)
+{
+	return netif_rx(skb);
+}
+
 int netif_receive_skb(struct sk_buff *skb);
 int netif_receive_skb_core(struct sk_buff *skb);
 void netif_receive_skb_list_internal(struct list_head *head);
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 78c448c6ab4c5..032b431b987b6 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -260,13 +260,6 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_entry,
 	TP_ARGS(skb)
 );
 
-DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry,
-
-	TP_PROTO(const struct sk_buff *skb),
-
-	TP_ARGS(skb)
-);
-
 DECLARE_EVENT_CLASS(net_dev_rx_exit_template,
 
 	TP_PROTO(int ret),
@@ -312,13 +305,6 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_exit,
 	TP_ARGS(ret)
 );
 
-DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_ni_exit,
-
-	TP_PROTO(int ret),
-
-	TP_ARGS(ret)
-);
-
 DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit,
 
 	TP_PROTO(int ret),
diff --git a/net/core/dev.c b/net/core/dev.c
index 5ef77b53507d4..b7578f47e151c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4829,6 +4829,16 @@ static int netif_rx_internal(struct sk_buff *skb)
 	return ret;
 }
 
+int __netif_rx(struct sk_buff *skb)
+{
+	int ret;
+
+	trace_netif_rx_entry(skb);
+	ret = netif_rx_internal(skb);
+	trace_netif_rx_exit(ret);
+	return ret;
+}
+
 /**
  *	netif_rx	-	post buffer to the network code
  *	@skb: buffer to post
@@ -4837,58 +4847,25 @@ static int netif_rx_internal(struct sk_buff *skb)
  *	the upper (protocol) levels to process.  It always succeeds. The buffer
  *	may be dropped during processing for congestion control or by the
  *	protocol layers.
+ *	This interface is considered legacy. Modern NIC driver should use NAPI
+ *	and GRO.
  *
  *	return values:
  *	NET_RX_SUCCESS	(no congestion)
  *	NET_RX_DROP     (packet was dropped)
  *
  */
-
 int netif_rx(struct sk_buff *skb)
 {
 	int ret;
 
-	trace_netif_rx_entry(skb);
-
-	ret = netif_rx_internal(skb);
-	trace_netif_rx_exit(ret);
-
+	local_bh_disable();
+	ret = __netif_rx(skb);
+	local_bh_enable();
 	return ret;
 }
 EXPORT_SYMBOL(netif_rx);
 
-int netif_rx_ni(struct sk_buff *skb)
-{
-	int err;
-
-	trace_netif_rx_ni_entry(skb);
-
-	preempt_disable();
-	err = netif_rx_internal(skb);
-	if (local_softirq_pending())
-		do_softirq();
-	preempt_enable();
-	trace_netif_rx_ni_exit(err);
-
-	return err;
-}
-EXPORT_SYMBOL(netif_rx_ni);
-
-int netif_rx_any_context(struct sk_buff *skb)
-{
-	/*
-	 * If invoked from contexts which do not invoke bottom half
-	 * processing either at return from interrupt or when softrqs are
-	 * reenabled, use netif_rx_ni() which invokes bottomhalf processing
-	 * directly.
-	 */
-	if (in_interrupt())
-		return netif_rx(skb);
-	else
-		return netif_rx_ni(skb);
-}
-EXPORT_SYMBOL(netif_rx_any_context);
-
 static __latent_entropy void net_tx_action(struct softirq_action *h)
 {
 	struct softnet_data *sd = this_cpu_ptr(&softnet_data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal().
  2022-02-03 12:41         ` Toke Høiland-Jørgensen
  2022-02-03 15:50           ` Sebastian Andrzej Siewior
@ 2022-02-04 15:20           ` Sebastian Andrzej Siewior
  2022-02-04 16:31             ` Jakub Kicinski
  2022-02-04 16:32             ` Eric Dumazet
  1 sibling, 2 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-04 15:20 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Eric Dumazet, bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner

The preempt_disable() () section was introduced in commit
    cece1945bffcf ("net: disable preemption before call smp_processor_id()")

and adds it in case this function is invoked from preemtible context and
because get_cpu() later on as been added.

The get_cpu() usage was added in commit
    b0e28f1effd1d ("net: netif_rx() must disable preemption")

because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption
causing a warning in smp_processor_id(). The function netif_rx() should
only be invoked from an interrupt context which implies disabled
preemption. The commit
   e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()")

was addressing this and replaced netif_rx() with in netif_rx_ni() in
ip_dev_loopback_xmit().

Based on the discussion on the list, the former patch (b0e28f1effd1d)
should not have been applied only the latter (e30b38c298b55).

Remove get_cpu() and preempt_disable() since the function is supposed to
be invoked from context with stable per-CPU pointers. Bottom halves have
to be disabled at this point because the function may raise softirqs
which need to be processed.

Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2:
  - merge patch 1 und 2 from the series (as per Toke).
  - updated patch description and corrected the first commit number (as
    per Eric).

 net/core/dev.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1baab07820f65..0d13340ed4054 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4796,7 +4796,6 @@ static int netif_rx_internal(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu;
 
-		preempt_disable();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -4806,14 +4805,12 @@ static int netif_rx_internal(struct sk_buff *skb)
 		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 
 		rcu_read_unlock();
-		preempt_enable();
 	} else
 #endif
 	{
 		unsigned int qtail;
 
-		ret = enqueue_to_backlog(skb, get_cpu(), &qtail);
-		put_cpu();
+		ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail);
 	}
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal().
  2022-02-04 15:20           ` [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() " Sebastian Andrzej Siewior
@ 2022-02-04 16:31             ` Jakub Kicinski
  2022-02-04 16:42               ` Sebastian Andrzej Siewior
  2022-02-04 16:32             ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Jakub Kicinski @ 2022-02-04 16:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Toke Høiland-Jørgensen, Eric Dumazet, bpf, netdev,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner

On Fri, 4 Feb 2022 16:20:56 +0100 Sebastian Andrzej Siewior wrote:
> Subject: [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and  get_cpu() in netif_rx_internal().

FWIW, you'll need to repost the full series for it to be applied at 
the end. It'd be useful to add RFC in the subject of the one-off update
patches, maybe, so that I know that you know and patchwork knows that
we both know that a full repost will come. A little circle of knowing.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal().
  2022-02-04 15:20           ` [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() " Sebastian Andrzej Siewior
  2022-02-04 16:31             ` Jakub Kicinski
@ 2022-02-04 16:32             ` Eric Dumazet
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2022-02-04 16:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Toke Høiland-Jørgensen, bpf, netdev, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner

On Fri, Feb 4, 2022 at 7:20 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The preempt_disable() () section was introduced in commit
>     cece1945bffcf ("net: disable preemption before call smp_processor_id()")
>
> and adds it in case this function is invoked from preemtible context and
> because get_cpu() later on as been added.
>
> The get_cpu() usage was added in commit
>     b0e28f1effd1d ("net: netif_rx() must disable preemption")
>
> because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption
> causing a warning in smp_processor_id(). The function netif_rx() should
> only be invoked from an interrupt context which implies disabled
> preemption. The commit
>    e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()")
>
> was addressing this and replaced netif_rx() with in netif_rx_ni() in
> ip_dev_loopback_xmit().
>
> Based on the discussion on the list, the former patch (b0e28f1effd1d)
> should not have been applied only the latter (e30b38c298b55).
>
> Remove get_cpu() and preempt_disable() since the function is supposed to
> be invoked from context with stable per-CPU pointers. Bottom halves have
> to be disabled at this point because the function may raise softirqs
> which need to be processed.
>
> Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> v1…v2:
>   - merge patch 1 und 2 from the series (as per Toke).
>   - updated patch description and corrected the first commit number (as
>     per Eric).
>

SGTM thanks, please add for the next submission:

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal().
  2022-02-04 16:31             ` Jakub Kicinski
@ 2022-02-04 16:42               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-04 16:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, Eric Dumazet, bpf, netdev,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner

On 2022-02-04 08:31:07 [-0800], Jakub Kicinski wrote:
> On Fri, 4 Feb 2022 16:20:56 +0100 Sebastian Andrzej Siewior wrote:
> > Subject: [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and  get_cpu() in netif_rx_internal().
> 
> FWIW, you'll need to repost the full series for it to be applied at 
> the end. It'd be useful to add RFC in the subject of the one-off update
> patches, maybe, so that I know that you know and patchwork knows that
> we both know that a full repost will come. A little circle of knowing.

Sure, will remember the RFC part.

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v2 3/4] net: dev: Makes sure netif_rx() can be invoked in any context.
  2022-02-04 13:00             ` [PATCH net-next v2 " Sebastian Andrzej Siewior
@ 2022-02-04 18:46               ` Eric Dumazet
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2022-02-04 18:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Thomas Gleixner, Peter Zijlstra

On Fri, Feb 4, 2022 at 5:00 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
> work in all contexts and get rid of netif_rx_ni()". Eric agreed and
> pointed out that modern devices should use netif_receive_skb() to avoid
> the overhead.
> In the meantime someone added another variant, netif_rx_any_context(),
> which behaves as suggested.
>
> netif_rx() must be invoked with disabled bottom halves to ensure that
> pending softirqs, which were raised within the function, are handled.
> netif_rx_ni() can be invoked only from process context (bottom halves
> must be enabled) because the function handles pending softirqs without
> checking if bottom halves were disabled or not.
> netif_rx_any_context() invokes on the former functions by checking
> in_interrupts().
>
> netif_rx() could be taught to handle both cases (disabled and enabled
> bottom halves) by simply disabling bottom halves while invoking
> netif_rx_internal(). The local_bh_enable() invocation will then invoke
> pending softirqs only if the BH-disable counter drops to zero.
>
> Eric is concerned about the overhead of BH-disable+enable especially in
> regard to the loopback driver. As critical as this driver is, it will
> receive a shortcut to avoid the additional overhead which is not needed.
>
> Add a local_bh_disable() section in netif_rx() to ensure softirqs are
> handled if needed. Provide the internal bits as __netif_rx() which can
> be used by the loopback driver. This function is not exported so it
> can't be used by modules.
> Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they
> can be removed once they are no more users left.
>
> Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>

Nice, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-02-04 18:46 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 12:28 [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Sebastian Andrzej Siewior
2022-02-02 12:28 ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() in netif_rx_internal() Sebastian Andrzej Siewior
2022-02-02 17:10   ` Eric Dumazet
2022-02-03 12:00     ` Toke Høiland-Jørgensen
2022-02-03 12:17       ` Sebastian Andrzej Siewior
2022-02-03 12:41         ` Toke Høiland-Jørgensen
2022-02-03 15:50           ` Sebastian Andrzej Siewior
2022-02-04 15:20           ` [PATCH net-next v2 1/4] net: dev: Remove preempt_disable() and get_cpu() " Sebastian Andrzej Siewior
2022-02-04 16:31             ` Jakub Kicinski
2022-02-04 16:42               ` Sebastian Andrzej Siewior
2022-02-04 16:32             ` Eric Dumazet
2022-02-03 12:16     ` [PATCH net-next 1/4] net: dev: Remove the preempt_disable() " Sebastian Andrzej Siewior
2022-02-02 12:28 ` [PATCH net-next 2/4] net: dev: Remove get_cpu() " Sebastian Andrzej Siewior
2022-02-02 17:14   ` Eric Dumazet
2022-02-03 12:14   ` Toke Høiland-Jørgensen
2022-02-02 12:28 ` [PATCH net-next 3/4] net: dev: Makes sure netif_rx() can be invoked in any context Sebastian Andrzej Siewior
2022-02-02 16:50   ` Jakub Kicinski
2022-02-03 12:20     ` Sebastian Andrzej Siewior
2022-02-03 19:38       ` Jakub Kicinski
2022-02-02 17:43   ` Eric Dumazet
2022-02-03 12:19     ` Toke Høiland-Jørgensen
2022-02-03 15:10     ` Sebastian Andrzej Siewior
2022-02-03 15:25       ` Eric Dumazet
2022-02-03 15:40         ` Sebastian Andrzej Siewior
2022-02-03 16:18           ` Eric Dumazet
2022-02-03 16:44             ` Sebastian Andrzej Siewior
2022-02-03 17:45               ` Sebastian Andrzej Siewior
2022-02-04 13:00             ` [PATCH net-next v2 " Sebastian Andrzej Siewior
2022-02-04 18:46               ` Eric Dumazet
2022-02-02 12:28 ` [PATCH net-next 4/4] net: dev: Make rps_lock() disable interrupts Sebastian Andrzej Siewior
2022-02-02 16:47   ` Jakub Kicinski
2022-02-03 16:41     ` [PATCH net-next v2 " Sebastian Andrzej Siewior
2022-02-03 19:39       ` Jakub Kicinski
2022-02-02 16:14 ` [PATCH net-next 0/4] net: dev: PREEMPT_RT fixups Jakub Kicinski
2022-02-03 11:59   ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.