netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: introduce napi_schedule_irqoff()
@ 2014-10-29  1:05 Eric Dumazet
  2014-10-29  1:17 ` Alexei Starovoitov
  2014-10-29 20:08 ` David Miller
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2014-10-29  1:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

napi_schedule() can be called from any context and has to mask hard
irqs.

Add a variant that can only be called from hard interrupts handlers
or when irqs are already masked.

Many NIC drivers can use it from their hard IRQ handler instead of
generic variant.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |   13 +++++++++++++
 net/core/dev.c            |   15 ++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74fd5d37f15a472239c1c63cebe7ce9e59fc4c90..c85e065122460e9f077bcb6788018be38e1d7ddf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -386,6 +386,7 @@ typedef enum rx_handler_result rx_handler_result_t;
 typedef rx_handler_result_t rx_handler_func_t(struct sk_buff **pskb);
 
 void __napi_schedule(struct napi_struct *n);
+void __napi_schedule_irqoff(struct napi_struct *n);
 
 static inline bool napi_disable_pending(struct napi_struct *n)
 {
@@ -420,6 +421,18 @@ static inline void napi_schedule(struct napi_struct *n)
 		__napi_schedule(n);
 }
 
+/**
+ *	napi_schedule_irqoff - schedule NAPI poll
+ *	@n: napi context
+ *
+ * Variant of napi_schedule(), assuming hard irqs are masked.
+ */
+static inline void napi_schedule_irqoff(struct napi_struct *n)
+{
+	if (napi_schedule_prep(n))
+		__napi_schedule_irqoff(n);
+}
+
 /* Try to reschedule poll. Called by dev->poll() after napi_complete().  */
 static inline bool napi_reschedule(struct napi_struct *napi)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index b793e3521a3631319bf4d0e7c17c0c9a933331da..759940cdf89635d58778502276d9914e47f837f9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4372,7 +4372,8 @@ static int process_backlog(struct napi_struct *napi, int quota)
  * __napi_schedule - schedule for receive
  * @n: entry to schedule
  *
- * The entry's receive function will be scheduled to run
+ * The entry's receive function will be scheduled to run.
+ * Consider using __napi_schedule_irqoff() if hard irqs are masked.
  */
 void __napi_schedule(struct napi_struct *n)
 {
@@ -4384,6 +4385,18 @@ void __napi_schedule(struct napi_struct *n)
 }
 EXPORT_SYMBOL(__napi_schedule);
 
+/**
+ * __napi_schedule_irqoff - schedule for receive
+ * @n: entry to schedule
+ *
+ * Variant of __napi_schedule() assuming hard irqs are masked
+ */
+void __napi_schedule_irqoff(struct napi_struct *n)
+{
+	____napi_schedule(this_cpu_ptr(&softnet_data), n);
+}
+EXPORT_SYMBOL(__napi_schedule_irqoff);
+
 void __napi_complete(struct napi_struct *n)
 {
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  1:05 [PATCH net-next] net: introduce napi_schedule_irqoff() Eric Dumazet
@ 2014-10-29  1:17 ` Alexei Starovoitov
  2014-10-29  3:30   ` Eric Dumazet
  2014-10-29 20:08 ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2014-10-29  1:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, Oct 28, 2014 at 6:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> napi_schedule() can be called from any context and has to mask hard
> irqs.
>
> Add a variant that can only be called from hard interrupts handlers
> or when irqs are already masked.
>
> Many NIC drivers can use it from their hard IRQ handler instead of
> generic variant.

Interesting! Are you trying to optimize some of such NIC drivers? ;)
but all of 10G+ and virtio won't apply, right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  1:17 ` Alexei Starovoitov
@ 2014-10-29  3:30   ` Eric Dumazet
  2014-10-29  4:17     ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2014-10-29  3:30 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev

On Tue, 2014-10-28 at 18:17 -0700, Alexei Starovoitov wrote:

> Interesting! Are you trying to optimize some of such NIC drivers? ;)

??

> but all of 10G+ and virtio won't apply, right?


Which 10G+ driver could not use this ?

mlx4 patch would be :

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c8e75dab80553c876b195361456fb49587231055..c562c1468944f9ad4319e5faaf19bf9e66d15eaf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -878,8 +878,8 @@ void mlx4_en_rx_irq(struct mlx4_cq *mcq)
 	struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
 	struct mlx4_en_priv *priv = netdev_priv(cq->dev);
 
-	if (priv->port_up)
-		napi_schedule(&cq->napi);
+	if (likely(priv->port_up))
+		napi_schedule_irqoff(&cq->napi);
 	else
 		mlx4_en_arm_cq(priv, cq);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 34c137878545fc672dad1a3d86e11c034c0ac368..5c4062921cdf46f1a7021a39705275c33ca4de77 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -479,8 +479,8 @@ void mlx4_en_tx_irq(struct mlx4_cq *mcq)
 	struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
 	struct mlx4_en_priv *priv = netdev_priv(cq->dev);
 
-	if (priv->port_up)
-		napi_schedule(&cq->napi);
+	if (likely(priv->port_up))
+		napi_schedule_irqoff(&cq->napi);
 	else
 		mlx4_en_arm_cq(priv, cq);
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  3:30   ` Eric Dumazet
@ 2014-10-29  4:17     ` Alexei Starovoitov
  2014-10-29  4:40       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2014-10-29  4:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, Oct 28, 2014 at 8:30 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-10-28 at 18:17 -0700, Alexei Starovoitov wrote:
>
>> Interesting! Are you trying to optimize some of such NIC drivers? ;)
>
> ??
>
>> but all of 10G+ and virtio won't apply, right?
>
>
> Which 10G+ driver could not use this ?
>
> mlx4 patch would be :

right.
but it's not a critical path for napi drivers. Why bother?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  4:17     ` Alexei Starovoitov
@ 2014-10-29  4:40       ` Eric Dumazet
  2014-10-29  5:13         ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2014-10-29  4:40 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev

On Tue, 2014-10-28 at 21:17 -0700, Alexei Starovoitov wrote:
> On Tue, Oct 28, 2014 at 8:30 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2014-10-28 at 18:17 -0700, Alexei Starovoitov wrote:
> >
> >> Interesting! Are you trying to optimize some of such NIC drivers? ;)
> >
> > ??
> >
> >> but all of 10G+ and virtio won't apply, right?
> >
> >
> > Which 10G+ driver could not use this ?
> >
> > mlx4 patch would be :
> 
> right.
> but it's not a critical path for napi drivers. Why bother?

You probably are missing something.

This is a critical path, of course.

Try netperf -t TCP_RR for example.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  4:40       ` Eric Dumazet
@ 2014-10-29  5:13         ` Alexei Starovoitov
  2014-10-29  6:20           ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2014-10-29  5:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, Oct 28, 2014 at 9:40 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-10-28 at 21:17 -0700, Alexei Starovoitov wrote:
>> On Tue, Oct 28, 2014 at 8:30 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Tue, 2014-10-28 at 18:17 -0700, Alexei Starovoitov wrote:
>> >
>> >> Interesting! Are you trying to optimize some of such NIC drivers? ;)
>> >
>> > ??
>> >
>> >> but all of 10G+ and virtio won't apply, right?
>> >
>> >
>> > Which 10G+ driver could not use this ?
>> >
>> > mlx4 patch would be :
>>
>> right.
>> but it's not a critical path for napi drivers. Why bother?
>
> You probably are missing something.
>
> This is a critical path, of course.
>
> Try netperf -t TCP_RR for example.

tried 50 parallel netperf -t TCP_RR over ixgbe
and perf top were tcp stack bits, qdisc locks and netperf itself.
What do you see?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  5:13         ` Alexei Starovoitov
@ 2014-10-29  6:20           ` Eric Dumazet
  2014-10-29 15:26             ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2014-10-29  6:20 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev

On Tue, 2014-10-28 at 22:13 -0700, Alexei Starovoitov wrote:

> tried 50 parallel netperf -t TCP_RR over ixgbe
> and perf top were tcp stack bits, qdisc locks and netperf itself.
> What do you see?

You are kidding right ?

If you save 30 nsec ( 2 * 15 nsec) per transaction, and rtt is about 20
usec, its a 0.15 % gain. Not bad for a trivial patch.

Why are you using 50 parallel netperf, instead of trying a single
netperf, as I mentioned latency impact, not overall throughput ?

Do you believe typical servers in data centers are only sending &
receiving bulk packets, with no interrupt, and one cpu busy polling in
NAPI handler ?


Every atomic op we remove/avoid, every irq masking unmasking we remove,
every cache line miss or extra bus transaction we remove, TLB miss, is
the path for better latency.

You should take a look at recent commits I did, you'll get the general
picture if you missed it.

git log --oneline --author dumazet | head -100

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  6:20           ` Eric Dumazet
@ 2014-10-29 15:26             ` Alexei Starovoitov
  0 siblings, 0 replies; 9+ messages in thread
From: Alexei Starovoitov @ 2014-10-29 15:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, Oct 28, 2014 at 11:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-10-28 at 22:13 -0700, Alexei Starovoitov wrote:
>
>> tried 50 parallel netperf -t TCP_RR over ixgbe
>> and perf top were tcp stack bits, qdisc locks and netperf itself.
>> What do you see?
>
> You are kidding right ?
>
> If you save 30 nsec ( 2 * 15 nsec) per transaction, and rtt is about 20
> usec, its a 0.15 % gain. Not bad for a trivial patch.

agreed.
I wasn't arguing against the patch at all. Was just curious
what performance gain we'll see. 0.15% is tiny and some might
say the code bloat is not worth it, but imo it's a good one. ack.

> Every atomic op we remove/avoid, every irq masking unmasking we remove,
> every cache line miss or extra bus transaction we remove, TLB miss, is
> the path for better latency.

yes. We're saying the same thing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
  2014-10-29  1:05 [PATCH net-next] net: introduce napi_schedule_irqoff() Eric Dumazet
  2014-10-29  1:17 ` Alexei Starovoitov
@ 2014-10-29 20:08 ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2014-10-29 20:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 28 Oct 2014 18:05:13 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> napi_schedule() can be called from any context and has to mask hard
> irqs.
> 
> Add a variant that can only be called from hard interrupts handlers
> or when irqs are already masked.
> 
> Many NIC drivers can use it from their hard IRQ handler instead of
> generic variant.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-10-29 20:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-29  1:05 [PATCH net-next] net: introduce napi_schedule_irqoff() Eric Dumazet
2014-10-29  1:17 ` Alexei Starovoitov
2014-10-29  3:30   ` Eric Dumazet
2014-10-29  4:17     ` Alexei Starovoitov
2014-10-29  4:40       ` Eric Dumazet
2014-10-29  5:13         ` Alexei Starovoitov
2014-10-29  6:20           ` Eric Dumazet
2014-10-29 15:26             ` Alexei Starovoitov
2014-10-29 20:08 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).