netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* irq disable in __netdev_alloc_frag() ?
@ 2014-10-23  0:15 Alexei Starovoitov
  2014-10-23  1:52 ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2014-10-23  0:15 UTC (permalink / raw)
  To: Eric Dumazet, Network Development

Hi Eric,

in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()")
you mentioned that the reason to disable interrupts
in __netdev_alloc_frag() is:
"- Must be IRQ safe (non NAPI drivers can use it)"

Is there a way to do this conditionally?

Without it I see 10% performance gain for my RX tests
(from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag()
itself goes from 6.6% to 2.1%
(popf seems to be quite costly)

Thanks
Alexei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  0:15 irq disable in __netdev_alloc_frag() ? Alexei Starovoitov
@ 2014-10-23  1:52 ` Eric Dumazet
  2014-10-23  2:22   ` Alexei Starovoitov
  2014-10-23  3:19   ` Alexander Duyck
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2014-10-23  1:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Eric Dumazet, Network Development


On Wed, 2014-10-22 at 17:15 -0700, Alexei Starovoitov wrote:
> Hi Eric,
> 
> in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()")
> you mentioned that the reason to disable interrupts
> in __netdev_alloc_frag() is:
> "- Must be IRQ safe (non NAPI drivers can use it)"
> 
> Is there a way to do this conditionally?
> 
> Without it I see 10% performance gain for my RX tests
> (from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag()
> itself goes from 6.6% to 2.1%
> (popf seems to be quite costly)

Well, your driver is probably a NAPI one, so you need to
mask irqs, or to remove all non NAPI drivers from linux.

__netdev_alloc_frag() (__netdev_alloc_skb()) is used by all.

Problem is __netdev_alloc_frag() is generally deep inside caller
chain, so using a private pool might have quite an overhead.

Same could be said for skb_queue_head() /skb_queue_tail() /
sock_queue_rcv_skb() :
Many callers don't need to block irq.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  1:52 ` Eric Dumazet
@ 2014-10-23  2:22   ` Alexei Starovoitov
  2014-10-23  3:48     ` Eric Dumazet
  2014-10-23  3:19   ` Alexander Duyck
  1 sibling, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2014-10-23  2:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Network Development

On Wed, Oct 22, 2014 at 6:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> On Wed, 2014-10-22 at 17:15 -0700, Alexei Starovoitov wrote:
>> Hi Eric,
>>
>> in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()")
>> you mentioned that the reason to disable interrupts
>> in __netdev_alloc_frag() is:
>> "- Must be IRQ safe (non NAPI drivers can use it)"
>>
>> Is there a way to do this conditionally?
>>
>> Without it I see 10% performance gain for my RX tests
>> (from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag()
>> itself goes from 6.6% to 2.1%
>> (popf seems to be quite costly)
>
> Well, your driver is probably a NAPI one, so you need to
> mask irqs, or to remove all non NAPI drivers from linux.

yeah, the 10G+ nics I care about are all napi :)

> __netdev_alloc_frag() (__netdev_alloc_skb()) is used by all.
>
> Problem is __netdev_alloc_frag() is generally deep inside caller
> chain, so using a private pool might have quite an overhead.

yes. I was thinking, since dev is already passed
into __netdev_alloc_skb(), we can check whether
dev registered with napi via dev->napi_list and if so,
tell inner __netdev_alloc_frag() to skip irq disabling...

don't know about skb_queue_head() and friends.
I'm only looking at pure rx now. One challenge at a time.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  1:52 ` Eric Dumazet
  2014-10-23  2:22   ` Alexei Starovoitov
@ 2014-10-23  3:19   ` Alexander Duyck
  2014-10-23  3:51     ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Alexander Duyck @ 2014-10-23  3:19 UTC (permalink / raw)
  To: Eric Dumazet, Alexei Starovoitov; +Cc: Eric Dumazet, Network Development

On 10/22/2014 06:52 PM, Eric Dumazet wrote:
> On Wed, 2014-10-22 at 17:15 -0700, Alexei Starovoitov wrote:
>> Hi Eric,
>>
>> in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()")
>> you mentioned that the reason to disable interrupts
>> in __netdev_alloc_frag() is:
>> "- Must be IRQ safe (non NAPI drivers can use it)"
>>
>> Is there a way to do this conditionally?
>>
>> Without it I see 10% performance gain for my RX tests
>> (from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag()
>> itself goes from 6.6% to 2.1%
>> (popf seems to be quite costly)
> Well, your driver is probably a NAPI one, so you need to
> mask irqs, or to remove all non NAPI drivers from linux.
>
> __netdev_alloc_frag() (__netdev_alloc_skb()) is used by all.
>
> Problem is __netdev_alloc_frag() is generally deep inside caller
> chain, so using a private pool might have quite an overhead.
>
> Same could be said for skb_queue_head() /skb_queue_tail() /
> sock_queue_rcv_skb() :
> Many callers don't need to block irq.

Couldn't __netdev_alloc_frag() be forked into two functions, one that is
only called from inside the NAPI context and one that is called for all
other contexts?  It would mean having to double the number of pages
being held per CPU, but I would think something like that would be doable.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  2:22   ` Alexei Starovoitov
@ 2014-10-23  3:48     ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2014-10-23  3:48 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Eric Dumazet, Network Development

On Wed, 2014-10-22 at 19:22 -0700, Alexei Starovoitov wrote:

> yes. I was thinking, since dev is already passed
> into __netdev_alloc_skb(), we can check whether
> dev registered with napi via dev->napi_list and if so,
> tell inner __netdev_alloc_frag() to skip irq disabling...
> 

This does not matter. The problem is not _this_ device, the problem is
that another device might trigger a hard irq, and this hard irq could
mess your data.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  3:19   ` Alexander Duyck
@ 2014-10-23  3:51     ` Eric Dumazet
  2014-10-23  3:56       ` Eric Dumazet
  2014-10-27 20:35       ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2014-10-23  3:51 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Alexei Starovoitov, Eric Dumazet, Network Development

On Wed, 2014-10-22 at 20:19 -0700, Alexander Duyck wrote:

> Couldn't __netdev_alloc_frag() be forked into two functions, one that is
> only called from inside the NAPI context and one that is called for all
> other contexts?  It would mean having to double the number of pages
> being held per CPU, but I would think something like that would be doable.

Possibly, but this looks like code bloat for me.

On my hosts, this hard irq masking is pure noise.

What CPU are you using Alexander ?

Same could be done with some kmem_cache_alloc() : SLAB uses hard irq
masking while some caches are never used from hard irq context.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  3:51     ` Eric Dumazet
@ 2014-10-23  3:56       ` Eric Dumazet
  2014-10-23  4:29         ` Alexei Starovoitov
  2014-10-27 20:35       ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2014-10-23  3:56 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Alexei Starovoitov, Eric Dumazet, Network Development

On Wed, 2014-10-22 at 20:51 -0700, Eric Dumazet wrote:
> On Wed, 2014-10-22 at 20:19 -0700, Alexander Duyck wrote:
> 
> > Couldn't __netdev_alloc_frag() be forked into two functions, one that is
> > only called from inside the NAPI context and one that is called for all
> > other contexts?  It would mean having to double the number of pages
> > being held per CPU, but I would think something like that would be doable.
> 
> Possibly, but this looks like code bloat for me.
> 
> On my hosts, this hard irq masking is pure noise.
> 
> What CPU are you using Alexander ?

Sorry, the question was for Alexei ;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  3:56       ` Eric Dumazet
@ 2014-10-23  4:29         ` Alexei Starovoitov
  2014-10-23  5:14           ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2014-10-23  4:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, Eric Dumazet, Network Development

On Wed, Oct 22, 2014 at 8:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-10-22 at 20:51 -0700, Eric Dumazet wrote:
>> On Wed, 2014-10-22 at 20:19 -0700, Alexander Duyck wrote:
>>
>> > Couldn't __netdev_alloc_frag() be forked into two functions, one that is
>> > only called from inside the NAPI context and one that is called for all
>> > other contexts?  It would mean having to double the number of pages
>> > being held per CPU, but I would think something like that would be doable.
>>
>> Possibly, but this looks like code bloat for me.
>>
>> On my hosts, this hard irq masking is pure noise.
>>
>> What CPU are you using Alexander ?
>
> Sorry, the question was for Alexei ;)

my numbers were from just released Xeon E5-1630 v3
which is haswell-ep
I think Alexander's idea is worth pursuing.
Code-wise the bloat is minimal: one extra netdev_alloc_cache
and few 'if' which one to choose.
Most of the time only one of them is used, since
systems with napi and not-napi nics are very rare.
I think it should help virtualization case as well,
since virtio_net is napi already.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  4:29         ` Alexei Starovoitov
@ 2014-10-23  5:14           ` Eric Dumazet
  2014-10-23  6:12             ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2014-10-23  5:14 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Alexander Duyck, Eric Dumazet, Network Development

On Wed, 2014-10-22 at 21:29 -0700, Alexei Starovoitov wrote:

> my numbers were from just released Xeon E5-1630 v3
> which is haswell-ep
> I think Alexander's idea is worth pursuing.
> Code-wise the bloat is minimal: one extra netdev_alloc_cache
> and few 'if' which one to choose.


> Most of the time only one of them is used, since
> systems with napi and not-napi nics are very rare.


> I think it should help virtualization case as well,
> since virtio_net is napi already.

Really, I doubt this will make any difference in real workloads
where bottleneck is memory bandwidth, not the time taken by
pushf/cli/popf.

What difference do you have on a real workload like : netperf TCP_RR or
UDP_RR ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  5:14           ` Eric Dumazet
@ 2014-10-23  6:12             ` Alexei Starovoitov
  0 siblings, 0 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2014-10-23  6:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, Eric Dumazet, Network Development

On Wed, Oct 22, 2014 at 10:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Really, I doubt this will make any difference in real workloads
> where bottleneck is memory bandwidth, not the time taken by
> pushf/cli/popf.
>
> What difference do you have on a real workload like : netperf TCP_RR or
> UDP_RR ?

there are different workloads. l2/l3 switching is very real
as well. For tcp_rr on the host the difference will be
minimal, since the time is spent elsewhere. For cases
that don't use tcp stack, but do pure l2/l3 processing
all improvements will add up. imo the kernel should be
optimized for all use cases when possible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-23  3:51     ` Eric Dumazet
  2014-10-23  3:56       ` Eric Dumazet
@ 2014-10-27 20:35       ` Jesper Dangaard Brouer
  2014-10-28  2:30         ` Christoph Lameter
  1 sibling, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2014-10-27 20:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: brouer, Alexander Duyck, Alexei Starovoitov, Eric Dumazet,
	Network Development, Christoph Lameter

On Wed, 22 Oct 2014 20:51:16 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On my hosts, this hard irq masking is pure noise.

On my hosts I can measure a significant difference between using
local_irq_disable() vs. local_irq_save(flags)

 *  2.860 ns cost for local_irq_{disable,enable} 
 * 14.840 ns cost for local_irq_save()+local_irq_restore() 

This is quite significant in my nanosec world ;-)

 
> What CPU are you using Alexander ?

I'm using a E5-2695 (Ivy-bridge)

You can easily reproduce my results on your own system with my
time_bench_sample module here:
 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c#L173

> Same could be done with some kmem_cache_alloc() : SLAB uses hard irq
> masking while some caches are never used from hard irq context.

Sounds interesting.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-27 20:35       ` Jesper Dangaard Brouer
@ 2014-10-28  2:30         ` Christoph Lameter
  2014-10-28  2:46           ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2014-10-28  2:30 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, Alexander Duyck, Alexei Starovoitov, Eric Dumazet,
	Network Development

On Mon, 27 Oct 2014, Jesper Dangaard Brouer wrote:

> > Same could be done with some kmem_cache_alloc() : SLAB uses hard irq
> > masking while some caches are never used from hard irq context.
>
> Sounds interesting.

SLUB does not disable interrupts in the fast paths.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-28  2:30         ` Christoph Lameter
@ 2014-10-28  2:46           ` Eric Dumazet
  2014-10-28  4:56             ` David Miller
  2014-10-28 19:46             ` Christoph Lameter
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2014-10-28  2:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck,
	Alexei Starovoitov, Network Development

On Mon, Oct 27, 2014 at 7:30 PM, Christoph Lameter <cl@linux.com> wrote:
> On Mon, 27 Oct 2014, Jesper Dangaard Brouer wrote:
>
>> > Same could be done with some kmem_cache_alloc() : SLAB uses hard irq
>> > masking while some caches are never used from hard irq context.
>>
>> Sounds interesting.
>
> SLUB does not disable interrupts in the fast paths.
>

Unfortunately, SLUB is more expensive than SLAB for many networking workloads.

The cost of disabling interrupts is pure noise compared to cache line misses.

SLUB has poor behavior compared to SLAB with alien caches,
even with the side effect that 'struct page' is 64 bytes aligned
instead of being 56 bytes with SLAB

Note that I am not doing SLUB/SLAB tests every day, so it might be
better nowadays.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-28  2:46           ` Eric Dumazet
@ 2014-10-28  4:56             ` David Miller
  2014-10-28 19:46             ` Christoph Lameter
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2014-10-28  4:56 UTC (permalink / raw)
  To: edumazet; +Cc: cl, brouer, eric.dumazet, alexander.duyck, ast, netdev

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 27 Oct 2014 19:46:20 -0700

> Unfortunately, SLUB is more expensive than SLAB for many networking
> workloads.
> 
> The cost of disabling interrupts is pure noise compared to cache
> line misses.
> 
> SLUB has poor behavior compared to SLAB with alien caches, even with
> the side effect that 'struct page' is 64 bytes aligned instead of
> being 56 bytes with SLAB

And SLAB completely shits itself when lots of memory gets cached up on
a foreign node.

This discussion has happened many times, SLAB may be faster when things
work out nicely, but it acts poorly wrt. keeping foreign memory from
being cached too aggressively.

And there is a cost for that, which is that foreign memory has to be
properly balanced back to it's home node.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: irq disable in __netdev_alloc_frag() ?
  2014-10-28  2:46           ` Eric Dumazet
  2014-10-28  4:56             ` David Miller
@ 2014-10-28 19:46             ` Christoph Lameter
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2014-10-28 19:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck,
	Alexei Starovoitov, Network Development

On Mon, 27 Oct 2014, Eric Dumazet wrote:

> Unfortunately, SLUB is more expensive than SLAB for many networking workloads.

hackbench performs better with SLUB.

> The cost of disabling interrupts is pure noise compared to cache line misses.
>
> SLUB has poor behavior compared to SLAB with alien caches,
> even with the side effect that 'struct page' is 64 bytes aligned
> instead of being 56 bytes with SLAB
>
> Note that I am not doing SLUB/SLAB tests every day, so it might be
> better nowadays.

Well we did some intensive work on these issues a couple of years ago.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-10-28 19:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-23  0:15 irq disable in __netdev_alloc_frag() ? Alexei Starovoitov
2014-10-23  1:52 ` Eric Dumazet
2014-10-23  2:22   ` Alexei Starovoitov
2014-10-23  3:48     ` Eric Dumazet
2014-10-23  3:19   ` Alexander Duyck
2014-10-23  3:51     ` Eric Dumazet
2014-10-23  3:56       ` Eric Dumazet
2014-10-23  4:29         ` Alexei Starovoitov
2014-10-23  5:14           ` Eric Dumazet
2014-10-23  6:12             ` Alexei Starovoitov
2014-10-27 20:35       ` Jesper Dangaard Brouer
2014-10-28  2:30         ` Christoph Lameter
2014-10-28  2:46           ` Eric Dumazet
2014-10-28  4:56             ` David Miller
2014-10-28 19:46             ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).