All of lore.kernel.org
 help / color / mirror / Atom feed
* Add NAPI support to ll_temac driver
@ 2011-04-19  9:35 Michal Simek
  2011-04-19 10:43 ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Simek @ 2011-04-19  9:35 UTC (permalink / raw)
  To: netdev

Hi,

I would like to try to add NAPI support for ll_temac and look if help us to 
improve performance on Microblaze system. I would expect that bandwidth should 
be increased.
We have the second non mainline driver which use tasklets and it provides better 
  performance than mainline driver but not so big that's why I think that NAPI 
can increase performance.

Can you please point me to any driver which I could use as a template?
Or any developer guide to do so.

Do you know any other option how to improve driver performance on low speed cpu?

I have found that driver spends a lot of time on skb allocation and preallocated 
SKBs help a little bit. I have done a test where I increased number of 
preallocated BDs(SKBs) for rx to 35000 and disable new BD(SKB) allocation in 
rx_irq. 35000 BDs is setup because I need them to successfully finish netperf 
test. I have got 25% bandwidth increasing.

It will be also nice to be able to allocate several BDs(SKBs) which could be 
faster than allocate them in sequence.

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19  9:35 Add NAPI support to ll_temac driver Michal Simek
@ 2011-04-19 10:43 ` Eric Dumazet
  2011-04-19 12:25   ` Ben Hutchings
  2011-04-19 12:26   ` Michal Simek
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-04-19 10:43 UTC (permalink / raw)
  To: monstr; +Cc: netdev

Le mardi 19 avril 2011 à 11:35 +0200, Michal Simek a écrit :
> Hi,
> 
> I would like to try to add NAPI support for ll_temac and look if help us to 
> improve performance on Microblaze system. I would expect that bandwidth should 
> be increased.
> We have the second non mainline driver which use tasklets and it provides better 
>   performance than mainline driver but not so big that's why I think that NAPI 
> can increase performance.
> 
> Can you please point me to any driver which I could use as a template?
> Or any developer guide to do so.
> 
> Do you know any other option how to improve driver performance on low speed cpu?
> 
> I have found that driver spends a lot of time on skb allocation and preallocated 
> SKBs help a little bit. I have done a test where I increased number of 
> preallocated BDs(SKBs) for rx to 35000 and disable new BD(SKB) allocation in 
> rx_irq. 35000 BDs is setup because I need them to successfully finish netperf 
> test. I have got 25% bandwidth increasing.
> 
> It will be also nice to be able to allocate several BDs(SKBs) which could be 
> faster than allocate them in sequence.

Depends if your cpu has some cache. The best performance is to try to
get high cache hit ratios.

One possible way to get better performance is to change driver to
allocate skbs only right before calling netif_rx(), so that you dont
have to access cold sk_buff data twice (once when allocating skb and put
it in ring buffer, a second time when receiving frame)

drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
just in time + pull in skbhead only first cache line of packet)

drivers/net/ftmac100.c is also a recent driver (and probably a better
start with less complex hardware than NIU) using these tricks

{ skb = netdev_alloc_skb_ip_align(netdev, 128);
 __pskb_pull_tail(skb, min(length, 64)); 
}



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 10:43 ` Eric Dumazet
@ 2011-04-19 12:25   ` Ben Hutchings
  2011-04-19 12:48     ` Michal Simek
  2011-04-19 12:26   ` Michal Simek
  1 sibling, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2011-04-19 12:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: monstr, netdev

On Tue, 2011-04-19 at 12:43 +0200, Eric Dumazet wrote:
[...]
> One possible way to get better performance is to change driver to
> allocate skbs only right before calling netif_rx(), so that you dont
> have to access cold sk_buff data twice (once when allocating skb and put
> it in ring buffer, a second time when receiving frame)
>
> drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
> just in time + pull in skbhead only first cache line of packet)
[...]

If the hardware can do RX checksumming (it's not clear) then the driver
should pass the paged buffers into GRO and that will take care of skb
allocation as necessary.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 10:43 ` Eric Dumazet
  2011-04-19 12:25   ` Ben Hutchings
@ 2011-04-19 12:26   ` Michal Simek
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Simek @ 2011-04-19 12:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Eric Dumazet wrote:
> Le mardi 19 avril 2011 à 11:35 +0200, Michal Simek a écrit :
>> Hi,
>>
>> I would like to try to add NAPI support for ll_temac and look if help us to 
>> improve performance on Microblaze system. I would expect that bandwidth should 
>> be increased.
>> We have the second non mainline driver which use tasklets and it provides better 
>>   performance than mainline driver but not so big that's why I think that NAPI 
>> can increase performance.
>>
>> Can you please point me to any driver which I could use as a template?
>> Or any developer guide to do so.
>>
>> Do you know any other option how to improve driver performance on low speed cpu?
>>
>> I have found that driver spends a lot of time on skb allocation and preallocated 
>> SKBs help a little bit. I have done a test where I increased number of 
>> preallocated BDs(SKBs) for rx to 35000 and disable new BD(SKB) allocation in 
>> rx_irq. 35000 BDs is setup because I need them to successfully finish netperf 
>> test. I have got 25% bandwidth increasing.
>>
>> It will be also nice to be able to allocate several BDs(SKBs) which could be 
>> faster than allocate them in sequence.
> 
> Depends if your cpu has some cache. The best performance is to try to
> get high cache hit ratios.

Yes it has icache and dcache (write-back or write-through).


> 
> One possible way to get better performance is to change driver to
> allocate skbs only right before calling netif_rx(), so that you dont
> have to access cold sk_buff data twice (once when allocating skb and put
> it in ring buffer, a second time when receiving frame)

ok. But I need to allocate BD for dma with pointer to skb where dma should copy 
data to. I could do it in irq but I would have to wait till dma copy data from 
ethernet controller to memory. I haven't measure how slow/fast is that copying.

> 
> drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
> just in time + pull in skbhead only first cache line of packet)
> 
> drivers/net/ftmac100.c is also a recent driver (and probably a better
> start with less complex hardware than NIU) using these tricks
> 
> { skb = netdev_alloc_skb_ip_align(netdev, 128);
>  __pskb_pull_tail(skb, min(length, 64)); 
> }

I have change rx for napi but need to debug it a little bit. It works for some 
packets but I am not able to run any test right now.

Thanks,
Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 12:25   ` Ben Hutchings
@ 2011-04-19 12:48     ` Michal Simek
  2011-04-19 13:13       ` Eric Dumazet
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Michal Simek @ 2011-04-19 12:48 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eric Dumazet, netdev

Ben Hutchings wrote:
> On Tue, 2011-04-19 at 12:43 +0200, Eric Dumazet wrote:
> [...]
>> One possible way to get better performance is to change driver to
>> allocate skbs only right before calling netif_rx(), so that you dont
>> have to access cold sk_buff data twice (once when allocating skb and put
>> it in ring buffer, a second time when receiving frame)
>>
>> drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
>> just in time + pull in skbhead only first cache line of packet)
> [...]
> 
> If the hardware can do RX checksumming (it's not clear) then the driver
> should pass the paged buffers into GRO and that will take care of skb
> allocation as necessary.

Hardware supports RX and TX partial checksumming. I can enable it. The driver 
has also this option and from my tests there is of course some performance 
improvemetn.

Just for sure - here are links on documentation.
http://www.xilinx.com/support/documentation/ip_documentation/xps_ll_temac.pdf
or
http://www.xilinx.com/support/documentation/ip_documentation/axi_ethernet/v2_01_a/ds759_axi_ethernet.pdf

About SKB allocation. I fixed our non mainline driver to allocate skb based on 
current mtu size. Mainline driver allocate max mtu (9k). This has also impact on 
performance because Microblaze works with smaller SKBs.

Can you please be more specific about passing the paged buffers into GRO?
Or point me to any documentation or code which can help me to understand what 
that means.

Thanks,
Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 12:48     ` Michal Simek
@ 2011-04-19 13:13       ` Eric Dumazet
  2011-04-19 13:14       ` Ben Hutchings
  2011-04-19 13:21       ` Eric Dumazet
  2 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-04-19 13:13 UTC (permalink / raw)
  To: monstr; +Cc: Ben Hutchings, netdev

Le mardi 19 avril 2011 à 14:48 +0200, Michal Simek a écrit :

> Can you please be more specific about passing the paged buffers into GRO?
> Or point me to any documentation or code which can help me to understand what 
> that means.

Search for napi_get_frags() :

drivers/net/mlx4/en_rx.c:597:					struct sk_buff *gro_skb = napi_get_frags(&cq->napi);
drivers/net/cxgb3/sge.c:2091:		skb = napi_get_frags(&qs->napi);
drivers/net/cxgb4/sge.c:1517:	skb = napi_get_frags(&rxq->rspq.napi);
drivers/net/qlge/qlge_main.c:1484:	skb = napi_get_frags(napi);
drivers/net/sfc/rx.c:471:		skb = napi_get_frags(napi);
drivers/net/benet/be_main.c:1039:	skb = napi_get_frags(&eq_obj->napi);
drivers/net/cxgb4vf/sge.c:1479:	skb = napi_get_frags(&rxq->rspq.napi);




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 12:48     ` Michal Simek
  2011-04-19 13:13       ` Eric Dumazet
@ 2011-04-19 13:14       ` Ben Hutchings
  2011-04-19 13:18         ` Michal Simek
  2011-04-20 11:06         ` Michal Simek
  2011-04-19 13:21       ` Eric Dumazet
  2 siblings, 2 replies; 11+ messages in thread
From: Ben Hutchings @ 2011-04-19 13:14 UTC (permalink / raw)
  To: monstr; +Cc: Eric Dumazet, netdev

On Tue, 2011-04-19 at 14:48 +0200, Michal Simek wrote:
> Ben Hutchings wrote:
> > On Tue, 2011-04-19 at 12:43 +0200, Eric Dumazet wrote:
> > [...]
> >> One possible way to get better performance is to change driver to
> >> allocate skbs only right before calling netif_rx(), so that you dont
> >> have to access cold sk_buff data twice (once when allocating skb and put
> >> it in ring buffer, a second time when receiving frame)
> >>
> >> drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
> >> just in time + pull in skbhead only first cache line of packet)
> > [...]
> > 
> > If the hardware can do RX checksumming (it's not clear) then the driver
> > should pass the paged buffers into GRO and that will take care of skb
> > allocation as necessary.
> 
> Hardware supports RX and TX partial checksumming. I can enable it. The driver 
> has also this option and from my tests there is of course some performance 
> improvemetn.
> 
> Just for sure - here are links on documentation.
> http://www.xilinx.com/support/documentation/ip_documentation/xps_ll_temac.pdf
> or
> http://www.xilinx.com/support/documentation/ip_documentation/axi_ethernet/v2_01_a/ds759_axi_ethernet.pdf

I'm not going to read those.  Just providing brief advice.

> About SKB allocation. I fixed our non mainline driver to allocate skb based on 
> current mtu size. Mainline driver allocate max mtu (9k). This has also impact on 
> performance because Microblaze works with smaller SKBs.
> 
> Can you please be more specific about passing the paged buffers into GRO?
> Or point me to any documentation or code which can help me to understand what 
> that means.

You would use napi_get_frags() to get a new or recycled skb, fill in
skb->frags, then call napi_gro_frags() to pass it into GRO.  The benet,
cxgb3 and sfc drivers do this.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 13:14       ` Ben Hutchings
@ 2011-04-19 13:18         ` Michal Simek
  2011-04-20 11:06         ` Michal Simek
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Simek @ 2011-04-19 13:18 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eric Dumazet, netdev

Ben Hutchings wrote:
> On Tue, 2011-04-19 at 14:48 +0200, Michal Simek wrote:
>> Ben Hutchings wrote:
>>> On Tue, 2011-04-19 at 12:43 +0200, Eric Dumazet wrote:
>>> [...]
>>>> One possible way to get better performance is to change driver to
>>>> allocate skbs only right before calling netif_rx(), so that you dont
>>>> have to access cold sk_buff data twice (once when allocating skb and put
>>>> it in ring buffer, a second time when receiving frame)
>>>>
>>>> drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
>>>> just in time + pull in skbhead only first cache line of packet)
>>> [...]
>>>
>>> If the hardware can do RX checksumming (it's not clear) then the driver
>>> should pass the paged buffers into GRO and that will take care of skb
>>> allocation as necessary.
>> Hardware supports RX and TX partial checksumming. I can enable it. The driver 
>> has also this option and from my tests there is of course some performance 
>> improvemetn.
>>
>> Just for sure - here are links on documentation.
>> http://www.xilinx.com/support/documentation/ip_documentation/xps_ll_temac.pdf
>> or
>> http://www.xilinx.com/support/documentation/ip_documentation/axi_ethernet/v2_01_a/ds759_axi_ethernet.pdf
> 
> I'm not going to read those.  Just providing brief advice.
> 
>> About SKB allocation. I fixed our non mainline driver to allocate skb based on 
>> current mtu size. Mainline driver allocate max mtu (9k). This has also impact on 
>> performance because Microblaze works with smaller SKBs.
>>
>> Can you please be more specific about passing the paged buffers into GRO?
>> Or point me to any documentation or code which can help me to understand what 
>> that means.
> 
> You would use napi_get_frags() to get a new or recycled skb, fill in
> skb->frags, then call napi_gro_frags() to pass it into GRO.  The benet,
> cxgb3 and sfc drivers do this.

ok. Will look.

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 12:48     ` Michal Simek
  2011-04-19 13:13       ` Eric Dumazet
  2011-04-19 13:14       ` Ben Hutchings
@ 2011-04-19 13:21       ` Eric Dumazet
  2 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-04-19 13:21 UTC (permalink / raw)
  To: monstr; +Cc: Ben Hutchings, netdev

Le mardi 19 avril 2011 à 14:48 +0200, Michal Simek a écrit :

> About SKB allocation. I fixed our non mainline driver to allocate skb based on 
> current mtu size. Mainline driver allocate max mtu (9k). This has also impact on 
> performance because Microblaze works with smaller SKBs.
> 

Make sure it wont crash/freeze if some evil guy sends a big frame




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-19 13:14       ` Ben Hutchings
  2011-04-19 13:18         ` Michal Simek
@ 2011-04-20 11:06         ` Michal Simek
  2011-04-20 12:47           ` Ben Hutchings
  1 sibling, 1 reply; 11+ messages in thread
From: Michal Simek @ 2011-04-20 11:06 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eric Dumazet, netdev

Hi,

Ben Hutchings wrote:
> On Tue, 2011-04-19 at 14:48 +0200, Michal Simek wrote:
>> Ben Hutchings wrote:
>>> On Tue, 2011-04-19 at 12:43 +0200, Eric Dumazet wrote:
>>> [...]
>>>> One possible way to get better performance is to change driver to
>>>> allocate skbs only right before calling netif_rx(), so that you dont
>>>> have to access cold sk_buff data twice (once when allocating skb and put
>>>> it in ring buffer, a second time when receiving frame)
>>>>
>>>> drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
>>>> just in time + pull in skbhead only first cache line of packet)
>>> [...]
>>>
>>> If the hardware can do RX checksumming (it's not clear) then the driver
>>> should pass the paged buffers into GRO and that will take care of skb
>>> allocation as necessary.
>> Hardware supports RX and TX partial checksumming. I can enable it. The driver 
>> has also this option and from my tests there is of course some performance 
>> improvemetn.
>>
>> Just for sure - here are links on documentation.
>> http://www.xilinx.com/support/documentation/ip_documentation/xps_ll_temac.pdf
>> or
>> http://www.xilinx.com/support/documentation/ip_documentation/axi_ethernet/v2_01_a/ds759_axi_ethernet.pdf
> 
> I'm not going to read those.  Just providing brief advice.
> 
>> About SKB allocation. I fixed our non mainline driver to allocate skb based on 
>> current mtu size. Mainline driver allocate max mtu (9k). This has also impact on 
>> performance because Microblaze works with smaller SKBs.
>>
>> Can you please be more specific about passing the paged buffers into GRO?
>> Or point me to any documentation or code which can help me to understand what 
>> that means.
> 
> You would use napi_get_frags() to get a new or recycled skb, fill in
> skb->frags, then call napi_gro_frags() to pass it into GRO.  The benet,
> cxgb3 and sfc drivers do this.

I have measured TX path and I have found that driver design is not so good.
It is always create one BD for one SKB and it starts DMA to copy packet to 
controller and send it. On 66MHz cpu it takes approximately 800 cpu cycles (not 
800 instructions) for sending (1.5k packet).
Current driver also enable irq for TX and when the packet is send interrupt is 
generated and skb is freed.
I see that it takes more time to handle the IRQ than busy waiting when DMA is 
done. I looked at sfc driver and there is any TX queue and any notifier. Hos 
does it work? Is it required to have any hw support?

Thanks,
Michal



-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Add NAPI support to ll_temac driver
  2011-04-20 11:06         ` Michal Simek
@ 2011-04-20 12:47           ` Ben Hutchings
  0 siblings, 0 replies; 11+ messages in thread
From: Ben Hutchings @ 2011-04-20 12:47 UTC (permalink / raw)
  To: monstr; +Cc: Eric Dumazet, netdev

On Wed, 2011-04-20 at 13:06 +0200, Michal Simek wrote:
[...]
> I have measured TX path and I have found that driver design is not so good.
> It is always create one BD for one SKB and it starts DMA to copy packet to 
> controller and send it.

You will always get a single packet at a time to push to the hardware.
The only improvement you can make on this is to implement segmentation
offload, but if the hardware doesn't do this then... well, it's not
easy.

> On 66MHz cpu it takes approximately 800 cpu cycles (not 
> 800 instructions) for sending (1.5k packet).
> Current driver also enable irq for TX and when the packet is send interrupt is 
> generated and skb is freed.
> I see that it takes more time to handle the IRQ than busy waiting when DMA is 
> done. I looked at sfc driver and there is any TX queue and any notifier. Hos 
> does it work? Is it required to have any hw support?

The principle of NAPI is that once you receive an IRQ you mask it and
poll until there are no more completions to handle.  So it greatly
reduces this IRQ overhead.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-04-20 12:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-19  9:35 Add NAPI support to ll_temac driver Michal Simek
2011-04-19 10:43 ` Eric Dumazet
2011-04-19 12:25   ` Ben Hutchings
2011-04-19 12:48     ` Michal Simek
2011-04-19 13:13       ` Eric Dumazet
2011-04-19 13:14       ` Ben Hutchings
2011-04-19 13:18         ` Michal Simek
2011-04-20 11:06         ` Michal Simek
2011-04-20 12:47           ` Ben Hutchings
2011-04-19 13:21       ` Eric Dumazet
2011-04-19 12:26   ` Michal Simek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.