linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic
@ 2007-07-11 14:21 Jan-Bernd Themann
  2007-07-12 16:01 ` Andrew Gallatin
  2007-07-15  6:57 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Jan-Bernd Themann @ 2007-07-11 14:21 UTC (permalink / raw)
  To: netdev
  Cc: Christoph Raisch, Jan-Bernd Themann, linux-kernel, linux-ppc,
	Marcus Eder, Thomas Klein, Stefan Roscher

Generic Large Receive Offload proposal

After some discussions on the mailing list concerning our LRO approach,
we agreed to provide a generic LRO patch. The algorithm is based on
the version we developed for eHEA. The performance improvements we
observed were significant.

The LRO functionality is provided as a module, the files are put in
linux/net/ipv4/inet_lro.c
linux/include/linux/inet_lro.h

Would this be the proper place for this functionality?

Currently, the interface is exported via EXPORT_SYMBOL. Or should we use
EXPORT_SYMBOL_GPL instead?


The interface for the network drivers (see inet_lro.h):

A driver has to declare a "LRO Management" struct (lro_mgr) and a LRO descriptor array
of a driver defined size and enters the address of the array and the number of its
elements in the lro_mgr struct. The driver also specifies how many packets
may be aggregated per tcp session in the lro_mgr struct. In addition to that
the driver provides a function that determines the TCP and IP header for the
incoming packet (also entered in the lro_mgr struct). For some ethernet chips this
function doesn't need to do a lot of checking there as the tcp / ip checksums are 
checked by the HW and provides information about the packet protocoll.

To pass packets to the network stack using LRO the following functions are used 
in the NAPI poll function:

void lro_receive_skb(struct net_lro_mgr *lro_mgr,
                     struct sk_buff *skb,
                     void *priv);

or 

void lro_vlan_hwaccel_receive_skb(struct net_lro_mgr *lro_mgr,
                                  struct sk_buff *skb,
                                  struct vlan_group *vgrp,
                                  u16 vlan_tag,
                                  void *priv);

and before leaving the poll function the driver has to flush all open LRO sessions 
and pass the aggregated packets to the stack:

void lro_flush_all(struct net_lro_mgr *lro_mgr);

[RFC 3/3] includes an eHEA patch which shows how LRO is integrated
in the eHEA driver using this interface.


Regards,
Jan-Bernd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic
  2007-07-11 14:21 [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic Jan-Bernd Themann
@ 2007-07-12 16:01 ` Andrew Gallatin
  2007-07-15  6:57 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Gallatin @ 2007-07-12 16:01 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: netdev, Christoph Raisch, Jan-Bernd Themann, linux-kernel,
	linux-ppc, Marcus Eder, Thomas Klein, stefan.roscher, brice,
	loic

On 7/11/07, Jan-Bernd Themann <ossthema@de.ibm.com> wrote:
> Generic Large Receive Offload proposal

I'm very glad that somebody is stepping up to take responsibility
for this!

I'm the primary author of the Myricom Myri10GE driver, and its LRO mechanism
(which has been rejected several times when posted here).  I don't subscribe
to the LKML or netdev lists (the b/w is way too much for me).
Thankfully, my colleague Brice (who maintains our driver in the kernel)
forwarded me this message.

I looked at your patch, and I believe that we can improve the
performance further by using a slightly different approach, or at
least making that approach optional.

When I first did our LRO implementation, it aggregated packets
essentially the same way you are doing it -- by appending packets to
the frag_list.  I did quite extensive profiling, and the most expensive
operations seemed to be the allocation and freeing of memory.  A
colleague of mine (Loic, CC'ed) came up with the brilliant idea of
receiving into pages.  When I implemented his suggestion, it turned
out to be much, much more efficient to receive into pages, and to
accumulate the pages to the frags array.

The benchmarks I did on very low end machines in my lab (2GHz
amd64 x2 3800+) showed that the receiver was saturated at roughly
4.2Gb/s without LRO, 5.7Gb/s with frag_list based LRO, 8.6Gb/s with
frags array based LRO, and somewhat idle at line rate with frags array
based LRO and 32KB order=3 pages. (This is 1500b frames, BTW).

The savings comes from being able to do fewer allocations.  For
example, 2 1500b packets fit in a single page.  So, for a "full" frag
array, we have 16 1/2 4KB pages and a single skb holding them.
This works out to be 9 allocations for roughly 23KB of payload,
rather than 16. Using order 3 (32KB) pages, it gets even better,
and we have just 2 allocations per full skb frag list.

So... It would be wonderful if your patch could also deal with
data residing in pages, rather than in skbs.  I can understand
how you might not want to modify your driver to do this, which
is why I'm asking about making it optional.  However,
your driver would probably benefit from receiving into
pages, and I encourage you to investigate that.

I'm picturing an additional path into lro, such as:

int lro_maybe_receive_frags(struct net_lro_mgr *lro_mgr,
                struct skb_frag_struct *frags,
                int len,  void *priv);


This would return 0 if the packet was "accepted", and an
error if it was not.  It would then call a modified
__lro_proc_skb() (perhaps better named __lro_proc_segment())
which would have the length as an argument (so as to avoid
the need to pass the skb to lro_tcp_ip_check()) in addition
to the *frags.

The only real additional work would be in having an alternate path
inside lro_init_desc() which allocated an skb to hang the pages from
if the passed skb was null, and in having an alternate
lro_add_packet() path which added the frag(s), rather than chaining an
skb.

Also, your patch does not seem to maintain TCP checksums.
To be fair, we did not maintain checksums in our earlier LRO
work either, and it was much simpler that way! :) However,
not maintaining the TCP checksum can lead to user complaints
(invalid csum reported by traffic sniffers), as well as problems
when various filtering software is used which attempts to
re-write the TCP header.

I would very much appreciate it if you could look at my LRO
implementation in our Myri10GE driver which has been
posted (and rejected) several times in the past by Brice.
The source code is also available at:
http://www.myri.com/ftp/pub/Myri10GE/myri10ge-linux.1.3.0.tgz.
It uses a page based approach, and it maintains correct TCP
checksums.

This driver is Dual BSD/GPL licensed, and you are free to take
whatever you like.

Thank you again for stepping up with a generic implementation!

Drew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic
  2007-07-11 14:21 [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic Jan-Bernd Themann
  2007-07-12 16:01 ` Andrew Gallatin
@ 2007-07-15  6:57 ` David Miller
  2007-07-15  9:12   ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2007-07-15  6:57 UTC (permalink / raw)
  To: ossthema
  Cc: netdev, raisch, themann, linux-kernel, linuxppc-dev, meder,
	tklein, stefan.roscher


I think this work is great.  Thanks for doing it.

Besides the fixups Evgeniy has selected, and the suggestion
to receive into pages to cut down allocation costs, my
only request is to make this thing able to handle ipv6 as
well even if no current chips could facilitate that yet.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic
  2007-07-15  6:57 ` David Miller
@ 2007-07-15  9:12   ` Christoph Hellwig
  2007-07-15  9:40     ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2007-07-15  9:12 UTC (permalink / raw)
  To: David Miller
  Cc: ossthema, netdev, raisch, themann, linux-kernel, linuxppc-dev,
	meder, tklein, stefan.roscher

On Sat, Jul 14, 2007 at 11:57:48PM -0700, David Miller wrote:
> only request is to make this thing able to handle ipv6 as
> well even if no current chips could facilitate that yet.

I'm not sure that's a good idea.  If current chips can't handle ipv6
lro there is no way to actually test it and the code will surely bitrot.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic
  2007-07-15  9:12   ` Christoph Hellwig
@ 2007-07-15  9:40     ` David Miller
  2007-07-18 13:00       ` [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic (IPv6) Jan-Bernd Themann
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2007-07-15  9:40 UTC (permalink / raw)
  To: hch
  Cc: ossthema, netdev, raisch, themann, linux-kernel, linuxppc-dev,
	meder, tklein, stefan.roscher

From: Christoph Hellwig <hch@infradead.org>
Date: Sun, 15 Jul 2007 10:12:53 +0100

> I'm not sure that's a good idea.  If current chips can't handle ipv6
> lro there is no way to actually test it and the code will surely bitrot.

Christoph, you can do LRO pretty much completely in software.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic (IPv6)
  2007-07-15  9:40     ` David Miller
@ 2007-07-18 13:00       ` Jan-Bernd Themann
  2007-07-18 22:23         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Jan-Bernd Themann @ 2007-07-18 13:00 UTC (permalink / raw)
  To: David Miller
  Cc: hch, netdev, raisch, themann, linux-kernel, linuxppc-dev, meder,
	tklein, stefan.roscher

Hi,

I suggest we keep the interface open for IPv6 support by adding 
an additional parameter but first just get IPv4 support only 
into the kernel. IPv6 support can then incrementially be added.
Would that be ok?



On Sunday 15 July 2007 11:40, David Miller wrote:
> From: Christoph Hellwig <hch@infradead.org>
> Date: Sun, 15 Jul 2007 10:12:53 +0100
> 
> > I'm not sure that's a good idea.  If current chips can't handle ipv6
> > lro there is no way to actually test it and the code will surely bitrot.
> 
> Christoph, you can do LRO pretty much completely in software.
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic (IPv6)
  2007-07-18 13:00       ` [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic (IPv6) Jan-Bernd Themann
@ 2007-07-18 22:23         ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2007-07-18 22:23 UTC (permalink / raw)
  To: ossthema
  Cc: hch, netdev, raisch, themann, linux-kernel, linuxppc-dev, meder,
	tklein, stefan.roscher

From: Jan-Bernd Themann <ossthema@de.ibm.com>
Date: Wed, 18 Jul 2007 15:00:59 +0200

> Hi,
> 
> I suggest we keep the interface open for IPv6 support by adding 
> an additional parameter but first just get IPv4 support only 
> into the kernel. IPv6 support can then incrementially be added.
> Would that be ok?

I guess so.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-07-18 22:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-11 14:21 [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic Jan-Bernd Themann
2007-07-12 16:01 ` Andrew Gallatin
2007-07-15  6:57 ` David Miller
2007-07-15  9:12   ` Christoph Hellwig
2007-07-15  9:40     ` David Miller
2007-07-18 13:00       ` [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic (IPv6) Jan-Bernd Themann
2007-07-18 22:23         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).