All of lore.kernel.org
 help / color / mirror / Atom feed
* Adding Support for SG,GSO,GRO
@ 2010-12-09 10:33 Govindarajan, Sriramakrishnan
  2010-12-09 10:50 ` Eric Dumazet
  2010-12-09 15:02 ` Ben Hutchings
  0 siblings, 2 replies; 16+ messages in thread
From: Govindarajan, Sriramakrishnan @ 2010-12-09 10:33 UTC (permalink / raw)
  To: netdev

Hi
We have a NAPI compliant driver(net/drivers/davinci_emac.c), that does well at 10/100Mbps loads. Now the same controller/driver is used for 1000Mbps
mode as well, where the CPU gets saturated easily

Internally the module supports scatter gather DMA(which is currently not
exercised) but there is no HW checksum support.

To specifically implement GRO, GSO support would it be sufficient to add
SG support to the driver? Are there other means of increasing the throughput
and decreasing the CPU loading?

Any pointers to reference implementation for adding SG/GRO/GSO support will be helpful.

Regards
Sriram

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 10:33 Adding Support for SG,GSO,GRO Govindarajan, Sriramakrishnan
@ 2010-12-09 10:50 ` Eric Dumazet
  2010-12-09 14:49   ` Govindarajan, Sriramakrishnan
  2010-12-09 15:02 ` Ben Hutchings
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-12-09 10:50 UTC (permalink / raw)
  To: Govindarajan, Sriramakrishnan; +Cc: netdev

Le jeudi 09 décembre 2010 à 16:03 +0530, Govindarajan, Sriramakrishnan a
écrit :
> Hi
> We have a NAPI compliant driver(net/drivers/davinci_emac.c), that does well at 10/100Mbps loads. Now the same controller/driver is used for 1000Mbps
> mode as well, where the CPU gets saturated easily
> 
> Internally the module supports scatter gather DMA(which is currently not
> exercised) but there is no HW checksum support.
> 
> To specifically implement GRO, GSO support would it be sufficient to add
> SG support to the driver? Are there other means of increasing the throughput
> and decreasing the CPU loading?
> 
> Any pointers to reference implementation for adding SG/GRO/GSO support will be helpful.

Adding GRO is pretty easy, since you already are NAPI.

call 
	napi_gro_receive(&adapter->napi, skb) 
instead of 
	netif_receive_skb(skb);

(

Take commit 6a08d194ee4080 as an example of such conversion

http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=6a08d194ee40806e0ccd5f36ed768e64cbfc979f




^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Adding Support for SG,GSO,GRO
  2010-12-09 10:50 ` Eric Dumazet
@ 2010-12-09 14:49   ` Govindarajan, Sriramakrishnan
  0 siblings, 0 replies; 16+ messages in thread
From: Govindarajan, Sriramakrishnan @ 2010-12-09 14:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hi Eric,
> > Internally the module supports scatter gather DMA(which is currently not
> > exercised) but there is no HW checksum support.
> >
> > To specifically implement GRO, GSO support would it be sufficient to add
> > SG support to the driver? Are there other means of increasing the
> throughput
> > and decreasing the CPU loading?
> >
> > Any pointers to reference implementation for adding SG/GRO/GSO support
> will be helpful.
> 
> Adding GRO is pretty easy, since you already are NAPI.
> 
> call
> 	napi_gro_receive(&adapter->napi, skb)
> instead of
> 	netif_receive_skb(skb);
> 

[Sriram] I have tried this, but didn't see any improvement in performance.
For GRO and GSO, will adding SG capability alone suffice or the real gains
are realized only if HW checksum is also supported.

Can adding SG support alone yield any improvement?

Regards
Sriram

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 10:33 Adding Support for SG,GSO,GRO Govindarajan, Sriramakrishnan
  2010-12-09 10:50 ` Eric Dumazet
@ 2010-12-09 15:02 ` Ben Hutchings
  2010-12-09 18:29   ` David Miller
  2010-12-09 18:47   ` Michał Mirosław
  1 sibling, 2 replies; 16+ messages in thread
From: Ben Hutchings @ 2010-12-09 15:02 UTC (permalink / raw)
  To: Govindarajan, Sriramakrishnan; +Cc: netdev

On Thu, 2010-12-09 at 16:03 +0530, Govindarajan, Sriramakrishnan wrote:
> Hi
> We have a NAPI compliant driver(net/drivers/davinci_emac.c), that does
> well at 10/100Mbps loads. Now the same controller/driver is used for
> 1000Mbps
> mode as well, where the CPU gets saturated easily
> 
> Internally the module supports scatter gather DMA(which is currently not
> exercised) but there is no HW checksum support.
> 
> To specifically implement GRO, GSO support would it be sufficient to add
> SG support to the driver? Are there other means of increasing the throughput
> and decreasing the CPU loading?
[...]

On the RX side, the driver is allocating buffers and the stack doesn't
actually care whether you can do DMA scatter.  However GRO depends on
hardware checksum validation.

On the TX side, NETIF_F_SG means that the stack may include data in the
skb by reference to arbitrary pages *even if their contents are still
being changed* (think sendfile()), which means it depends on hardware
checksum generation.

So you really have no choice - you must implement hardware checksum
offload if you want any of the others.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 15:02 ` Ben Hutchings
@ 2010-12-09 18:29   ` David Miller
  2010-12-09 18:47   ` Michał Mirosław
  1 sibling, 0 replies; 16+ messages in thread
From: David Miller @ 2010-12-09 18:29 UTC (permalink / raw)
  To: bhutchings; +Cc: srk, netdev

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 09 Dec 2010 15:02:28 +0000

> So you really have no choice - you must implement hardware checksum
> offload if you want any of the others.

Right, advertising SG support is entirely pointless if you aren't
also advertising HW checksumming support.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 15:02 ` Ben Hutchings
  2010-12-09 18:29   ` David Miller
@ 2010-12-09 18:47   ` Michał Mirosław
  2010-12-09 18:58     ` Ben Hutchings
  2010-12-09 19:38     ` David Miller
  1 sibling, 2 replies; 16+ messages in thread
From: Michał Mirosław @ 2010-12-09 18:47 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Govindarajan, Sriramakrishnan, netdev

2010/12/9 Ben Hutchings <bhutchings@solarflare.com>:
> On Thu, 2010-12-09 at 16:03 +0530, Govindarajan, Sriramakrishnan wrote:
>> Hi
>> We have a NAPI compliant driver(net/drivers/davinci_emac.c), that does
>> well at 10/100Mbps loads. Now the same controller/driver is used for
>> 1000Mbps
>> mode as well, where the CPU gets saturated easily
>>
>> Internally the module supports scatter gather DMA(which is currently not
>> exercised) but there is no HW checksum support.
>>
>> To specifically implement GRO, GSO support would it be sufficient to add
>> SG support to the driver? Are there other means of increasing the throughput
>> and decreasing the CPU loading?
[...]
> On the TX side, NETIF_F_SG means that the stack may include data in the
> skb by reference to arbitrary pages *even if their contents are still
> being changed* (think sendfile()), which means it depends on hardware
> checksum generation.

Isn't that condition too broad? If the data could change after packet
is submitted to the driver then results would be unpredictable and
allow sending wrong data with correct (because hw-calculated)
checksum.

Right now NETIF_F_SG is removed from dev->features by
netdev_fix_features() if no checksum offloads are enabled.

Just an idea: would driver with NETIF_F_SG|NETIF_F_HW_CSUM using
skb_checksum_help() in xmit path work? This would allow to use DMA
scatter-gather without hardware checksumming (and avoid copying the
packet's data before sending).

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 18:47   ` Michał Mirosław
@ 2010-12-09 18:58     ` Ben Hutchings
  2010-12-10  8:27       ` Michał Mirosław
  2010-12-09 19:38     ` David Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Ben Hutchings @ 2010-12-09 18:58 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: Govindarajan, Sriramakrishnan, netdev

On Thu, 2010-12-09 at 19:47 +0100, Michał Mirosław wrote:
> 2010/12/9 Ben Hutchings <bhutchings@solarflare.com>:
> > On Thu, 2010-12-09 at 16:03 +0530, Govindarajan, Sriramakrishnan wrote:
> >> Hi
> >> We have a NAPI compliant driver(net/drivers/davinci_emac.c), that does
> >> well at 10/100Mbps loads. Now the same controller/driver is used for
> >> 1000Mbps
> >> mode as well, where the CPU gets saturated easily
> >>
> >> Internally the module supports scatter gather DMA(which is currently not
> >> exercised) but there is no HW checksum support.
> >>
> >> To specifically implement GRO, GSO support would it be sufficient to add
> >> SG support to the driver? Are there other means of increasing the throughput
> >> and decreasing the CPU loading?
> [...]
> > On the TX side, NETIF_F_SG means that the stack may include data in the
> > skb by reference to arbitrary pages *even if their contents are still
> > being changed* (think sendfile()), which means it depends on hardware
> > checksum generation.
> 
> Isn't that condition too broad? If the data could change after packet
> is submitted to the driver then results would be unpredictable and
> allow sending wrong data with correct (because hw-calculated)
> checksum.

This is not done for a regular send(), only for functions such as
sendfile() which are specified to read the data asynchronously.

> Right now NETIF_F_SG is removed from dev->features by
> netdev_fix_features() if no checksum offloads are enabled.
> 
> Just an idea: would driver with NETIF_F_SG|NETIF_F_HW_CSUM using
> skb_checksum_help() in xmit path work? This would allow to use DMA
> scatter-gather without hardware checksumming (and avoid copying the
> packet's data before sending).

No, you cannot calculate a checksum for the fragments without also
copying them to ensure the data doesn't change afterward and invalidate
the checksum.  You could in theory make a copy into multiple fragments,
but there's no point in doing that unless the frame size is larger than
a page.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 18:47   ` Michał Mirosław
  2010-12-09 18:58     ` Ben Hutchings
@ 2010-12-09 19:38     ` David Miller
  2010-12-10 14:18       ` Michał Mirosław
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2010-12-09 19:38 UTC (permalink / raw)
  To: mirqus; +Cc: bhutchings, srk, netdev

From: Michał Mirosław <mirqus@gmail.com>
Date: Thu, 9 Dec 2010 19:47:57 +0100

> Isn't that condition too broad? If the data could change after packet
> is submitted to the driver then results would be unpredictable and
> allow sending wrong data with correct (because hw-calculated)
> checksum.

They are intentionally like that, without question.

Otherwise we'd need to interlock with all application mapped,
filesystem, and other page writes while sending any page over the
network.

We absolutely do not want to have to freeze every page we try to send
via sendfile() or similar, the cost is just too high.

If the application or networked filesystem needs such synchronization,
it provides it for itself.

For example, SAMBA only uses sendfile() when the file has an op-lock
held on it.

The checksum requirement for using SG is not going away, so continuing
to discuss along the lines of removing that requirement is not a good
use of your time I don't think.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 18:58     ` Ben Hutchings
@ 2010-12-10  8:27       ` Michał Mirosław
  0 siblings, 0 replies; 16+ messages in thread
From: Michał Mirosław @ 2010-12-10  8:27 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Govindarajan, Sriramakrishnan, netdev

W dniu 9 grudnia 2010 19:58 użytkownik Ben Hutchings
<bhutchings@solarflare.com> napisał:
> On Thu, 2010-12-09 at 19:47 +0100, Michał Mirosław wrote:
>> 2010/12/9 Ben Hutchings <bhutchings@solarflare.com>:
>> > On Thu, 2010-12-09 at 16:03 +0530, Govindarajan, Sriramakrishnan wrote:
>> >> Hi
>> >> We have a NAPI compliant driver(net/drivers/davinci_emac.c), that does
>> >> well at 10/100Mbps loads. Now the same controller/driver is used for
>> >> 1000Mbps
>> >> mode as well, where the CPU gets saturated easily
>> >>
>> >> Internally the module supports scatter gather DMA(which is currently not
>> >> exercised) but there is no HW checksum support.
>> >>
>> >> To specifically implement GRO, GSO support would it be sufficient to add
>> >> SG support to the driver? Are there other means of increasing the throughput
>> >> and decreasing the CPU loading?
>> [...]
>> > On the TX side, NETIF_F_SG means that the stack may include data in the
>> > skb by reference to arbitrary pages *even if their contents are still
>> > being changed* (think sendfile()), which means it depends on hardware
>> > checksum generation.
>>
>> Isn't that condition too broad? If the data could change after packet
>> is submitted to the driver then results would be unpredictable and
>> allow sending wrong data with correct (because hw-calculated)
>> checksum.
> This is not done for a regular send(), only for functions such as
> sendfile() which are specified to read the data asynchronously.
>
>> Right now NETIF_F_SG is removed from dev->features by
>> netdev_fix_features() if no checksum offloads are enabled.
>>
>> Just an idea: would driver with NETIF_F_SG|NETIF_F_HW_CSUM using
>> skb_checksum_help() in xmit path work? This would allow to use DMA
>> scatter-gather without hardware checksumming (and avoid copying the
>> packet's data before sending).
> No, you cannot calculate a checksum for the fragments without also
> copying them to ensure the data doesn't change afterward and invalidate
> the checksum.  You could in theory make a copy into multiple fragments,
> but there's no point in doing that unless the frame size is larger than
> a page.

Then at least couple of drivers are broken in this way (ibmveth,
chelsio for UDP, there are other hits on skb_checksum_help).

I'm not familiar with splice/sendfile code, but I would expect, that
pages transferred with it have to be submitted after "data ready"
notification and pinned in memory by DMA mapping (with hardware
checksumming, CPU does not need to see the data, but still it has to
be there already) until the frame is transferred to the network
device. So, even if the contents of the page changes (because further
data is transferred), the part that is submitted to be sent should
not.

If that's really not the case, then splice/sendfile to network is
useless as it means random data corruption. Googling around shows
reports about sendfile() correllated corruption - could this the
cause?

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-09 19:38     ` David Miller
@ 2010-12-10 14:18       ` Michał Mirosław
  2010-12-10 14:31         ` David Lamparter
  0 siblings, 1 reply; 16+ messages in thread
From: Michał Mirosław @ 2010-12-10 14:18 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, srk, netdev

2010/12/9 David Miller <davem@davemloft.net>:
> From: Michał Mirosław <mirqus@gmail.com>
> Date: Thu, 9 Dec 2010 19:47:57 +0100
>> Isn't that condition too broad? If the data could change after packet
>> is submitted to the driver then results would be unpredictable and
>> allow sending wrong data with correct (because hw-calculated)
>> checksum.
> They are intentionally like that, without question.
>
> Otherwise we'd need to interlock with all application mapped,
> filesystem, and other page writes while sending any page over the
> network.
>
> We absolutely do not want to have to freeze every page we try to send
> via sendfile() or similar, the cost is just too high.
>
> If the application or networked filesystem needs such synchronization,
> it provides it for itself.
>
> For example, SAMBA only uses sendfile() when the file has an op-lock
> held on it.
>
> The checksum requirement for using SG is not going away, so continuing
> to discuss along the lines of removing that requirement is not a good
> use of your time I don't think.

I'm trying to understand the dependency because it looks artificial for me.

Unless I totally misunderstood, you say that we accept bogus data to
being sent using sendfile(). If yes, then we might as well allow
broken checksum (CPU calculated, before data changed and then was sent
to network).

If the splice/sendfile is taken out of the picture, are there any
other scenarios, when data pages could be changed after
ndo_start_xmit() entry and before TX DMA completion (between dma_map
.. dma_unmap)? And is it really what can happen with splice/sendfile?

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-10 14:18       ` Michał Mirosław
@ 2010-12-10 14:31         ` David Lamparter
  2010-12-10 16:01           ` Michał Mirosław
  0 siblings, 1 reply; 16+ messages in thread
From: David Lamparter @ 2010-12-10 14:31 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: David Miller, bhutchings, srk, netdev

On Fri, Dec 10, 2010 at 03:18:11PM +0100, Michał Mirosław wrote:
> I'm trying to understand the dependency because it looks artificial for me.

You have the data you want to send in the RAM, somewhere, possibly
scattered. The application calls sendfile(). The kernel puts the
transmission in the network card's queue, which might already have lots
of entries.

A millisecond later - an eternity for the CPU - the card decides to do
the transmission.

However, the data might have changed in the meantime.

sendfile() is defined so that it works asynchronously, that means if you
change the data while it is in the queue, you get unpredictable results.

But, what you should NOT get is packets with an invalid checksum.
Whatever data you are sending, it needs to have a correct checksum.

Now, if the card does the checksum itself, everything is fine. But what
are you supposed to do if the card can't checksum? Call back the kernel
at the point where the card does the TX? That's pointless (and racy).
Pre-calculate the Checksum at submission time? Doesn't work, you would
have to make a copy of the data, so it doesn't change anymore, so the
checksum stays correct. But not copying the data is the whole point of
sendfile().


You see why SG without HW checksum is useless here?


-David


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-10 14:31         ` David Lamparter
@ 2010-12-10 16:01           ` Michał Mirosław
  2010-12-10 16:20             ` Ben Hutchings
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Michał Mirosław @ 2010-12-10 16:01 UTC (permalink / raw)
  To: David Lamparter; +Cc: David Miller, bhutchings, srk, netdev, Jens Axboe

W dniu 10 grudnia 2010 15:31 użytkownik David Lamparter
<equinox@diac24.net> napisał:
> On Fri, Dec 10, 2010 at 03:18:11PM +0100, Michał Mirosław wrote:
>> I'm trying to understand the dependency because it looks artificial for me.
>
> You have the data you want to send in the RAM, somewhere, possibly
> scattered. The application calls sendfile(). The kernel puts the
> transmission in the network card's queue, which might already have lots
> of entries.
>
> A millisecond later - an eternity for the CPU - the card decides to do
> the transmission.
>
> However, the data might have changed in the meantime.
>
> sendfile() is defined so that it works asynchronously, that means if you
> change the data while it is in the queue, you get unpredictable results.
>
> But, what you should NOT get is packets with an invalid checksum.
> Whatever data you are sending, it needs to have a correct checksum.
>
> Now, if the card does the checksum itself, everything is fine. But what
> are you supposed to do if the card can't checksum? Call back the kernel
> at the point where the card does the TX? That's pointless (and racy).
> Pre-calculate the Checksum at submission time? Doesn't work, you would
> have to make a copy of the data, so it doesn't change anymore, so the
> checksum stays correct. But not copying the data is the whole point of
> sendfile().

The question is do we really want good checksum for bogus data? I
think that what matters is that good data (it had not changed between
queuing and sending, and so the checksum does not depend on whether
hardware or software calculated it) need to be accompanied by good
checksum. For broken data it would be even better to not send
anything, but that's not always possible. Bad checksum in this case is
actually a good thing as it clearly shows that something is broken in
the sender and avoids accepting the data as valid at the receiving
end.

sendfile() is supposed to replace read()/write() loops to avoid
copying data buffers. Whatever the optimizations, it has no additional
cavat that it might corrupt the transfer. Fixing the checksum in case
data gets corrupted is not the right thing, I think.

The change of file data might happen if sendfile() submits page from
pagecache when something writes to the file, or when an application
modifies vmsplice()-submitted memory. In current kernel sendfile() is
a wrapper around splice(), and so has a kernel pipe buffer between
file and the socket. Can this hide the possible data changes?

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-10 16:01           ` Michał Mirosław
@ 2010-12-10 16:20             ` Ben Hutchings
  2010-12-10 18:05               ` David Miller
  2010-12-10 16:23             ` David Lamparter
  2010-12-10 16:26             ` Eric Dumazet
  2 siblings, 1 reply; 16+ messages in thread
From: Ben Hutchings @ 2010-12-10 16:20 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: David Lamparter, David Miller, srk, netdev, Jens Axboe

On Fri, 2010-12-10 at 17:01 +0100, Michał Mirosław wrote:
[...]
> The question is do we really want good checksum for bogus data?
[...]

It's not bogus data.  It's a snapshot of the file contents at some
arbitrary point in time.

Now please stop wasting your own time and that of the networking
maintainers, and remember to check for checksum offload next time you
need to select a network controller.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-10 16:01           ` Michał Mirosław
  2010-12-10 16:20             ` Ben Hutchings
@ 2010-12-10 16:23             ` David Lamparter
  2010-12-10 16:26             ` Eric Dumazet
  2 siblings, 0 replies; 16+ messages in thread
From: David Lamparter @ 2010-12-10 16:23 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: David Lamparter, David Miller, bhutchings, srk, netdev, Jens Axboe

On Fri, Dec 10, 2010 at 05:01:33PM +0100, Michał Mirosław wrote:
> W dniu 10 grudnia 2010 15:31 użytkownik David Lamparter
> <equinox@diac24.net> napisał:
> > On Fri, Dec 10, 2010 at 03:18:11PM +0100, Michał Mirosław wrote:
> >> I'm trying to understand the dependency because it looks artificial for me.
> > sendfile() is defined so that it works asynchronously, that means if you
> > change the data while it is in the queue, you get unpredictable results.
> 
> The question is do we really want good checksum for bogus data?

The data isn't neccessarily bogus. It will be in some state inbetween
old and new. What that means is up to the application.

>                                          Bad checksum in this case is
> actually a good thing as it clearly shows that something is broken in
> the sender and avoids accepting the data as valid at the receiving
> end.

No, because nothing is broken. sendfile() is working as advertised. The
specification of sendfile() is that it sends out data, and it that it
grabs that data at some more or less random point in time. The data is
valid. It is some data that corresponds to what the application wanted
written. It might be a "future" version of the data, but it will not be
random. Unpredictable, yes, in that you won't know where it will choose
old and where new data. But not random or broken.


-David


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-10 16:01           ` Michał Mirosław
  2010-12-10 16:20             ` Ben Hutchings
  2010-12-10 16:23             ` David Lamparter
@ 2010-12-10 16:26             ` Eric Dumazet
  2 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-12-10 16:26 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: David Lamparter, David Miller, bhutchings, srk, netdev, Jens Axboe

Le vendredi 10 décembre 2010 à 17:01 +0100, Michał Mirosław a écrit :
> W dniu 10 grudnia 2010 15:31 użytkownik David Lamparter
> <equinox@diac24.net> napisał:
> > On Fri, Dec 10, 2010 at 03:18:11PM +0100, Michał Mirosław wrote:
> >> I'm trying to understand the dependency because it looks artificial for me.
> >
> > You have the data you want to send in the RAM, somewhere, possibly
> > scattered. The application calls sendfile(). The kernel puts the
> > transmission in the network card's queue, which might already have lots
> > of entries.
> >
> > A millisecond later - an eternity for the CPU - the card decides to do
> > the transmission.
> >
> > However, the data might have changed in the meantime.
> >
> > sendfile() is defined so that it works asynchronously, that means if you
> > change the data while it is in the queue, you get unpredictable results.
> >
> > But, what you should NOT get is packets with an invalid checksum.
> > Whatever data you are sending, it needs to have a correct checksum.
> >
> > Now, if the card does the checksum itself, everything is fine. But what
> > are you supposed to do if the card can't checksum? Call back the kernel
> > at the point where the card does the TX? That's pointless (and racy).
> > Pre-calculate the Checksum at submission time? Doesn't work, you would
> > have to make a copy of the data, so it doesn't change anymore, so the
> > checksum stays correct. But not copying the data is the whole point of
> > sendfile().
> 
> The question is do we really want good checksum for bogus data?

A frame must be sent with good checksum, or you violate very basic
transport rule. You mix several layers here.

>  I
> think that what matters is that good data (it had not changed between
> queuing and sending, and so the checksum does not depend on whether
> hardware or software calculated it) need to be accompanied by good
> checksum. For broken data it would be even better to not send
> anything, but that's not always possible. Bad checksum in this case is
> actually a good thing as it clearly shows that something is broken in
> the sender and avoids accepting the data as valid at the receiving
> end.
> 

a crc checksum is very lazy, and not guarantee the file you received is
consistent. Is a "per frame" checksum, not a "per file" checksum. If you
want, add a MD5 sum and all be fine.

> sendfile() is supposed to replace read()/write() loops to avoid
> copying data buffers. Whatever the optimizations, it has no additional
> cavat that it might corrupt the transfer. Fixing the checksum in case
> data gets corrupted is not the right thing, I think.
> 

You cannot fix the dam thing, or you MUST play VM games, that we cant to
avoid at the very beginning.

> The change of file data might happen if sendfile() submits page from
> pagecache when something writes to the file, or when an application
> modifies vmsplice()-submitted memory. In current kernel sendfile() is
> a wrapper around splice(), and so has a kernel pipe buffer between
> file and the socket. Can this hide the possible data changes?

No. Nothing in sendfile() API states that underlying file cannot change.

If we said : "We dont allow page content to change", then we _must_ add
special logic to detect a buggy application wont change the page while
we compute checksum or between this computation and final DMA to device.
 
That means playing VM games, and that is expensive, particularly in
multi threaded apps.

We instead say that : As we dont want to make this expensive checks, an
application is allowed to write into the page (but what should it, it
would be very stupid no ???), and a sendfile() wont be stopped/stalled
because of invalid checksum (because checksums are done after the page
content is fetched by NIC hardware)

But but but : the receiver of sendfile() get a file with possible
inconsistent content. In order to avoid this, higher level should take
locks (samba does), or make sure file is readonly.

> Unless I totally misunderstood, you say that we accept bogus data to
> being sent using sendfile(). If yes, then we might as well allow
> broken checksum (CPU calculated, before data changed and then was sent
> to network).
> 
> 


> If the splice/sendfile is taken out of the picture, are there any
> other scenarios, when data pages could be changed after
> ndo_start_xmit() entry and before TX DMA completion (between dma_map
> .. dma_unmap)? And is it really what can happen with splice/sendfile?
> 

If you want to perform checksum before DMA to device, then you also
should copy the page into private storage (a bounce buffer), to make
sure checksum wont be invalid.

You add one copy. Your choice, but not ours.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Adding Support for SG,GSO,GRO
  2010-12-10 16:20             ` Ben Hutchings
@ 2010-12-10 18:05               ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2010-12-10 18:05 UTC (permalink / raw)
  To: bhutchings; +Cc: mirqus, equinox, srk, netdev, axboe

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 10 Dec 2010 16:20:28 +0000

> Now please stop wasting your own time and that of the networking
> maintainers, and remember to check for checksum offload next time you
> need to select a network controller.

Seriously, I'm incredibly glad someone other than me said this
because this is how I've felt since the first two postings in
this thread.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-12-10 18:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-09 10:33 Adding Support for SG,GSO,GRO Govindarajan, Sriramakrishnan
2010-12-09 10:50 ` Eric Dumazet
2010-12-09 14:49   ` Govindarajan, Sriramakrishnan
2010-12-09 15:02 ` Ben Hutchings
2010-12-09 18:29   ` David Miller
2010-12-09 18:47   ` Michał Mirosław
2010-12-09 18:58     ` Ben Hutchings
2010-12-10  8:27       ` Michał Mirosław
2010-12-09 19:38     ` David Miller
2010-12-10 14:18       ` Michał Mirosław
2010-12-10 14:31         ` David Lamparter
2010-12-10 16:01           ` Michał Mirosław
2010-12-10 16:20             ` Ben Hutchings
2010-12-10 18:05               ` David Miller
2010-12-10 16:23             ` David Lamparter
2010-12-10 16:26             ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.