* CHECKSUM_COMPLETE question
@ 2020-02-12 9:52 Grygorii Strashko
2020-02-19 17:03 ` Grygorii Strashko
0 siblings, 1 reply; 5+ messages in thread
From: Grygorii Strashko @ 2020-02-12 9:52 UTC (permalink / raw)
To: netdev, David S . Miller; +Cc: Hideaki YOSHIFUJI
Hi All,
I'd like to ask expert opinion and clarify few points about HW RX checksum offload.
1) CHECKSUM_COMPLETE - from description in <linux/skbuff.h>
* CHECKSUM_COMPLETE:
*
* This is the most generic way. The device supplied checksum of the _whole_
* packet as seen by netif_rx() and fills out in skb->csum. Meaning, the
* hardware doesn't need to parse L3/L4 headers to implement this.
My understanding from above is that HW, to be fully compatible with Linux, should produce CSUM
starting from first byte after EtherType field:
(6 DST_MAC) (6 SRC_MAC) (2 EtherType) (... ...)
^ ^
| start csum | end csum
and ending at the last byte of Ethernet frame data.
- if packet is VLAN tagged then VLAN TCI and real EtherType included in CSUM,
but first VLAN TPID doesn't
- pad bytes may/may not be included in csum
2) I've found some difference between IPv4 and IPv6 csum processing of fragmented packets
Fragmented IPv4 UDP packet:
- driver fills skb->csum and sets skb->ip_summed = CHECKSUM_COMPLETE for every fragment
- in ip_frag_queue() the ip_hdr is parsed, and polled, and packet queued, but
there is *no* csum correction in this function for polled ip_hdr (no csum_sub() or similar calls)
^^^^
- as result, in inet_frag_reasm_finish() the skb->csum field can be seen unmodified
- if the same packet is sent over VLAN the skb->csum in inet_frag_reasm_finish() will be seen as modified due to
skb_vlan_untag()->skb_pull_rcsum()
Fragmented IPv6 UDP packet:
- driver fills skb->csum and sets skb->ip_summed = CHECKSUM_COMPLETE for every fragment
- in ip6_frag_queue() the ipv6_hdr s parsed, and polled, and packet queued,
*and csum corrected*
ip6_frag_queue()
...
if (skb->ip_summed == CHECKSUM_COMPLETE) {
const unsigned char *nh = skb_network_header(skb);
skb->csum = csum_sub(skb->csum,
csum_partial(nh, (u8 *)(fhdr + 1) - nh,
Are there any reasons for such difference between IPv4 and IPv6?
3) few words about new HW i'm working with.
The HW can parse IP4/IP6 and UDP/TCP headers and generate csum including pseudo header checksum
which is working pretty well for non fragmented packets.
For fragmented packets (UDP for example):
- First fragments have the UDP header (including pseudo header) and data included in the count.
- Middle and last fragments have only data included in the count
As result when SUM_ALL(frag->csum) == 0xFFFF means packet csum is correct.
Above seems will not be working out of the box, at least not without csum manipulations simialar
to what is done in mellanox/mlx4/en_rx.c (check_csum(),get_fixed_ipv4_csum(), get_fixed_ipv6_csum())
Thanks you.
--
Best regards,
grygorii
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CHECKSUM_COMPLETE question
2020-02-12 9:52 CHECKSUM_COMPLETE question Grygorii Strashko
@ 2020-02-19 17:03 ` Grygorii Strashko
2020-02-19 22:59 ` Willem de Bruijn
0 siblings, 1 reply; 5+ messages in thread
From: Grygorii Strashko @ 2020-02-19 17:03 UTC (permalink / raw)
To: netdev, David S . Miller, Alexey Kuznetsov, Eric Dumazet, John Fastabend
Cc: Hideaki YOSHIFUJI
Hi All,
On 12/02/2020 11:52, Grygorii Strashko wrote:
> Hi All,
>
> I'd like to ask expert opinion and clarify few points about HW RX checksum offload.
>
> 1) CHECKSUM_COMPLETE - from description in <linux/skbuff.h>
> * CHECKSUM_COMPLETE:
> *
> * This is the most generic way. The device supplied checksum of the _whole_
> * packet as seen by netif_rx() and fills out in skb->csum. Meaning, the
> * hardware doesn't need to parse L3/L4 headers to implement this.
>
> My understanding from above is that HW, to be fully compatible with Linux, should produce CSUM
> starting from first byte after EtherType field:
> (6 DST_MAC) (6 SRC_MAC) (2 EtherType) (... ...)
> ^ ^
> | start csum | end csum
> and ending at the last byte of Ethernet frame data.
> - if packet is VLAN tagged then VLAN TCI and real EtherType included in CSUM,
> but first VLAN TPID doesn't
> - pad bytes may/may not be included in csum
>
>
> 2) I've found some difference between IPv4 and IPv6 csum processing of fragmented packets
>
> Fragmented IPv4 UDP packet:
> - driver fills skb->csum and sets skb->ip_summed = CHECKSUM_COMPLETE for every fragment
> - in ip_frag_queue() the ip_hdr is parsed, and polled, and packet queued, but
> there is *no* csum correction in this function for polled ip_hdr (no csum_sub() or similar calls)
> ^^^^
> - as result, in inet_frag_reasm_finish() the skb->csum field can be seen unmodified
> - if the same packet is sent over VLAN the skb->csum in inet_frag_reasm_finish() will be seen as modified due to
> skb_vlan_untag()->skb_pull_rcsum()
>
> Fragmented IPv6 UDP packet:
> - driver fills skb->csum and sets skb->ip_summed = CHECKSUM_COMPLETE for every fragment
> - in ip6_frag_queue() the ipv6_hdr s parsed, and polled, and packet queued,
> *and csum corrected*
>
> ip6_frag_queue()
> ...
> if (skb->ip_summed == CHECKSUM_COMPLETE) {
> const unsigned char *nh = skb_network_header(skb);
> skb->csum = csum_sub(skb->csum,
> csum_partial(nh, (u8 *)(fhdr + 1) - nh,
>
> Are there any reasons for such difference between IPv4 and IPv6?
I'm very sorry for disturbing you, but could anybody help with above two questions?
>
> 3) few words about new HW i'm working with.
> The HW can parse IP4/IP6 and UDP/TCP headers and generate csum including pseudo header checksum
> which is working pretty well for non fragmented packets.
>
> For fragmented packets (UDP for example):
> - First fragments have the UDP header (including pseudo header) and data included in the count.
> - Middle and last fragments have only data included in the count
>
> As result when SUM_ALL(frag->csum) == 0xFFFF means packet csum is correct.
>
> Above seems will not be working out of the box, at least not without csum manipulations simialar
> to what is done in mellanox/mlx4/en_rx.c (check_csum(),get_fixed_ipv4_csum(), get_fixed_ipv6_csum())
>
>
> Thanks you.
>
--
Best regards,
grygorii
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CHECKSUM_COMPLETE question
2020-02-19 17:03 ` Grygorii Strashko
@ 2020-02-19 22:59 ` Willem de Bruijn
2020-02-20 0:25 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2020-02-19 22:59 UTC (permalink / raw)
To: Grygorii Strashko
Cc: netdev, David S . Miller, Alexey Kuznetsov, Eric Dumazet,
John Fastabend, Hideaki YOSHIFUJI
On Wed, Feb 19, 2020 at 9:04 AM Grygorii Strashko
<grygorii.strashko@ti.com> wrote:
>
> Hi All,
>
> On 12/02/2020 11:52, Grygorii Strashko wrote:
> > Hi All,
> >
> > I'd like to ask expert opinion and clarify few points about HW RX checksum offload.
> >
> > 1) CHECKSUM_COMPLETE - from description in <linux/skbuff.h>
> > * CHECKSUM_COMPLETE:
> > *
> > * This is the most generic way. The device supplied checksum of the _whole_
> > * packet as seen by netif_rx() and fills out in skb->csum. Meaning, the
> > * hardware doesn't need to parse L3/L4 headers to implement this.
> >
> > My understanding from above is that HW, to be fully compatible with Linux, should produce CSUM
> > starting from first byte after EtherType field:
> > (6 DST_MAC) (6 SRC_MAC) (2 EtherType) (... ...)
> > ^ ^
> > | start csum | end csum
> > and ending at the last byte of Ethernet frame data.
> > - if packet is VLAN tagged then VLAN TCI and real EtherType included in CSUM,
> > but first VLAN TPID doesn't
> > - pad bytes may/may not be included in csum
Based on commit 88078d98d1bb ("net: pskb_trim_rcsum() and
CHECKSUM_COMPLETE are friends") these bytes are expected to be covered
by skb->csum.
Not sure about that ipv4 header pull without csum adjust.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CHECKSUM_COMPLETE question
2020-02-19 22:59 ` Willem de Bruijn
@ 2020-02-20 0:25 ` Jakub Kicinski
2020-02-20 6:41 ` Willem de Bruijn
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2020-02-20 0:25 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Grygorii Strashko, netdev, David S . Miller, Alexey Kuznetsov,
Eric Dumazet, John Fastabend, Hideaki YOSHIFUJI
On Wed, 19 Feb 2020 14:59:16 -0800 Willem de Bruijn wrote:
> On Wed, Feb 19, 2020 at 9:04 AM Grygorii Strashko
> <grygorii.strashko@ti.com> wrote:
> >
> > Hi All,
> >
> > On 12/02/2020 11:52, Grygorii Strashko wrote:
> > > Hi All,
> > >
> > > I'd like to ask expert opinion and clarify few points about HW RX checksum offload.
> > >
> > > 1) CHECKSUM_COMPLETE - from description in <linux/skbuff.h>
> > > * CHECKSUM_COMPLETE:
> > > *
> > > * This is the most generic way. The device supplied checksum of the _whole_
> > > * packet as seen by netif_rx() and fills out in skb->csum. Meaning, the
> > > * hardware doesn't need to parse L3/L4 headers to implement this.
> > >
> > > My understanding from above is that HW, to be fully compatible with Linux, should produce CSUM
> > > starting from first byte after EtherType field:
> > > (6 DST_MAC) (6 SRC_MAC) (2 EtherType) (... ...)
> > > ^ ^
> > > | start csum | end csum
> > > and ending at the last byte of Ethernet frame data.
> > > - if packet is VLAN tagged then VLAN TCI and real EtherType included in CSUM,
> > > but first VLAN TPID doesn't
> > > - pad bytes may/may not be included in csum
>
> Based on commit 88078d98d1bb ("net: pskb_trim_rcsum() and
> CHECKSUM_COMPLETE are friends") these bytes are expected to be covered
> by skb->csum.
>
> Not sure about that ipv4 header pull without csum adjust.
Isn't it just because IPv4 has a header checksum and therefore what's
pulled off adds up to 0 anyway? IPv6 does not have a header csum, hence
the adjustment?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CHECKSUM_COMPLETE question
2020-02-20 0:25 ` Jakub Kicinski
@ 2020-02-20 6:41 ` Willem de Bruijn
0 siblings, 0 replies; 5+ messages in thread
From: Willem de Bruijn @ 2020-02-20 6:41 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Grygorii Strashko, netdev, David S . Miller, Alexey Kuznetsov,
Eric Dumazet, John Fastabend, Hideaki YOSHIFUJI
On Wed, Feb 19, 2020 at 4:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 19 Feb 2020 14:59:16 -0800 Willem de Bruijn wrote:
> > On Wed, Feb 19, 2020 at 9:04 AM Grygorii Strashko
> > <grygorii.strashko@ti.com> wrote:
> > >
> > > Hi All,
> > >
> > > On 12/02/2020 11:52, Grygorii Strashko wrote:
> > > > Hi All,
> > > >
> > > > I'd like to ask expert opinion and clarify few points about HW RX checksum offload.
> > > >
> > > > 1) CHECKSUM_COMPLETE - from description in <linux/skbuff.h>
> > > > * CHECKSUM_COMPLETE:
> > > > *
> > > > * This is the most generic way. The device supplied checksum of the _whole_
> > > > * packet as seen by netif_rx() and fills out in skb->csum. Meaning, the
> > > > * hardware doesn't need to parse L3/L4 headers to implement this.
> > > >
> > > > My understanding from above is that HW, to be fully compatible with Linux, should produce CSUM
> > > > starting from first byte after EtherType field:
> > > > (6 DST_MAC) (6 SRC_MAC) (2 EtherType) (... ...)
> > > > ^ ^
> > > > | start csum | end csum
> > > > and ending at the last byte of Ethernet frame data.
> > > > - if packet is VLAN tagged then VLAN TCI and real EtherType included in CSUM,
> > > > but first VLAN TPID doesn't
> > > > - pad bytes may/may not be included in csum
> >
> > Based on commit 88078d98d1bb ("net: pskb_trim_rcsum() and
> > CHECKSUM_COMPLETE are friends") these bytes are expected to be covered
> > by skb->csum.
> >
> > Not sure about that ipv4 header pull without csum adjust.
>
> Isn't it just because IPv4 has a header checksum and therefore what's
> pulled off adds up to 0 anyway? IPv6 does not have a header csum, hence
> the adjustment?
Ah yes, of course!
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-02-20 6:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 9:52 CHECKSUM_COMPLETE question Grygorii Strashko
2020-02-19 17:03 ` Grygorii Strashko
2020-02-19 22:59 ` Willem de Bruijn
2020-02-20 0:25 ` Jakub Kicinski
2020-02-20 6:41 ` Willem de Bruijn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.