netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Herbert <tom@herbertland.com>
To: Davide Caratti <dcaratti@redhat.com>
Cc: David Laight <David.Laight@aculab.com>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	"linux-sctp@vger.kernel.org" <linux-sctp@vger.kernel.org>,
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Subject: Re: [RFC PATCH net-next 2/5] net: split skb_checksum_help
Date: Thu, 2 Feb 2017 10:08:00 -0800	[thread overview]
Message-ID: <CALx6S34KvvLYo_wrnX-da5ok3Z4pcdyJxiLi63iHzno9j9L4Fg@mail.gmail.com> (raw)
In-Reply-To: <1486048043.2556.4.camel@redhat.com>

On Thu, Feb 2, 2017 at 7:07 AM, Davide Caratti <dcaratti@redhat.com> wrote:
> hello Tom and David,
>
> thank you for the attention.
>
>> From: Tom Herbert
>> >
>> > Sent: 23 January 2017 21:00
>> ..
>> >
>> > skb_checksum_help is specific to the Internet checksum. For instance,
>> > CHECKSUM_COMPLETE can _only_ refer to Internet checksum calculation
>> > nothing else will work. Checksums and CRCs are very different things
>> > with very different processing. They are not interchangeable, have
>> > very different properties, and hence it is a mistake to try to shoe
>> > horn things so that they use a common infrastructure.
>> >
>
> true, we don't need to test CHECKSUM_COMPLETE on skbs carrying SCTP.
> So maybe we can simply replace patches 2/5 and 3/5 with the smaller one at
> the bottom of this message.
>
>> > It might make sense to create some CRC helper functions, but last time
>> > I checked there are so few users of CRC in skbufs I'm not even sure
>> > that would make sense.
>
> This is exactly the cause of issues I see with SCTP. These packets can be
> wrongly checksummed using skb_checksum_help, or simply not checksummed at
> all; and in both cases, the packet goes out from the NIC with wrong L4
> checksum.
>
Okay, makes sense. Please consider doing the following:

- Add a bit to skbuf called something like "csum_not_inet". When
ip_summed == CHECKSUM_PARTIAL and this bit is set that means we are
dealing with something other than an Internet checksum.
- At the top of skb_checksum_help (or maybe before the point where the
inet specific checksum start begins do something like:

   if (unlikely(skb->csum_not_inet))
       return skb_checksum_help_not_inet(...);

   The rest of skb_checksum_help should remained unchanged.

- Add a description of the new bit and how skb_checksum_help can work
to the comments for CHECKSUM_PARTIAL in skbuff.h
- Add FCOE to the list of protocol that can set CHECKSUM_UNNECESSARY
for a CRC/csum
- Add a note to CHECKSUM_COMPLETE section that it can only refer to an
Internet checksum

Thanks,
Tom

> For example: there are scenarios, even the trivial one below, where skb
> carrying SCTP packets are wrongly checksummed, because the originating
> socket read NETIF_F_SCTP_CRC bit in the underlying device features.
> Then, after the kernel forwards the skb, the final transmission
> happens on another device where CRC offload is not available: this
> typically leads to bad checksums on transmitted SCTP packets.
>
>
>
> namespace 1 |                   namespace 2
>             |
>             |                      br0
>             |         +------- Linux bridge -------+
>             |         |                            |
>             |         V                            V
> vethA <-----------> vethB                        eth0
>             |
>             |
>
> when a socket bound to vethA in namespace 1 generates an INIT packet,
> it's not checksummed since veth devices have NETIF_F_SCTP_CRC set [1].
> Then, after vethB receives the packet in namespace 2, linux bridge
> forwards it to eth0, and (depending on eth0 driver code), it will be
> transmitted with wrong CRC32c or simply dropped.
>
> On Tue, 2017-01-24 at 16:35 +0000, David Laight wrote:
>>
>> I can imagine horrid things happening if someone tries to encapsulate
>> SCTP/IP in UDP (or worse UDP/IP in SCTP).
>>
>> For UDP in UDP I suspect that CHECKSUM_COMPLETE on an inner UDP packet
>> allows the outer checksum be calculated by ignoring the inner packet
>> (since it sums to zero).
>> This just isn't true if SCTP is involved.
>> There are tricks to generate a crc of a longer packet, but they'd only
>> work for SCTP in SCTP.
>>
>> For non-encapsulated packets it is a different matter.
>
> If we limit the scope to skbs having ip_summed equal to CHECKSUM_PARTIAL,
> like it's done in patch 4, we only need checksumming the packet starting
> from csum_start to its end, and copy the computed value to csum_offset.
> The difficult thing is discriminating skbs that need CRC32c, namely SCTP,
> from the rest of the traffic (that will likely be checksummed by
> skb_checksum_help).
>
> Currently, the only way to fix wrong CRCs in the scenario above is to
> configure tc filter with "csum" action on eth0 egress, to compensate the
> missing capability of eth0 driver to deal with SCTP packets having
> ip_summed equal to CHECKSUM_PARTIAL [2].
>
> Patch 4 in the series is an attempt to solve the issue, both for
> encapsulated and non-encapsulated skbs, calling skb_csum_hwoffload_help()
> inside validate_xmit_skb. In order to look for unchecksummed SCTP packets,
> I took inspiration from a Linux-4.4 commit (6ae23ad36253 "net: Add driver
> helper functions ...) to implement skb_csum_hwoffload_help, then I called
> it in validate_xmit_skb() to fix situations that can't be recovered by the
> NIC driver (it's the case where NETIF_F_CSUM_MASK bits are all zero).
>
> Today most NICs can provide at least HW offload for Internet Checksum:
> that's why I'm a bit doubtful if it's ok to spend extra CPU cycles in
> validate_xmit_skb() to ensure correct CRC in some scenarios.
> Since this issue affects some (not all) NICs, maybe it's better to drop
> patch 4, or part of it, and provide a fix for individual drivers that
> don't currently handle non-checksummed SCTP packets. But to do that, we
> need at least patch 1 and the small code below.
>
> ------------------- 8< --------------------------
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -200,7 +200,8 @@
>   *     accordingly. Note the there is no indication in the skbuff that the
>   *     CHECKSUM_PARTIAL refers to an FCOE checksum, a driver that supports
>   *     both IP checksum offload and FCOE CRC offload must verify which offload
> - *     is configured for a packet presumably by inspecting packet headers.
> + *     is configured for a packet presumably by inspecting packet headers; in
> + *     case, skb_sctp_csum_help is provided to compute CRC on SCTP packets.
>   *
>   * E. Checksumming on output with GSO.
>   *
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ad5959e..fa9be6d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2580,6 +2580,42 @@ int skb_checksum_help(struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(skb_checksum_help);
>
> +int skb_sctp_csum_help(struct sk_buff *skb)
> +{
> +       __le32 crc32c_csum;
> +       int ret = 0, offset;
> +
> +       if (skb->ip_summed != CHECKSUM_PARTIAL)
> +               goto out;
> +       if (unlikely(skb_is_gso(skb)))
> +               goto out;
> +       if (skb_has_shared_frag(skb)) {
> +               ret = __skb_linearize(skb);
> +               if (ret)
> +                       goto out;
> +       }
> +
> +       offset = skb_checksum_start_offset(skb);
> +       crc32c_csum = cpu_to_le32(~__skb_checksum(skb, offset,
> +                                 skb->len - offset, ~(__u32)0,
> +                                 sctp_csum_stub));
> +
> +       offset += offsetof(struct sctphdr, checksum);
> +       BUG_ON(offset >= skb_headlen(skb));
> +
> +       if (skb_cloned(skb) &&
> +           !skb_clone_writable(skb, offset + sizeof(__le32))) {
> +               ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
> +               if (ret)
> +                       goto out;
> +       }
> +       *(__le32 *)(skb->data + offset) = crc32c_csum;
> +       skb->ip_summed = CHECKSUM_NONE;
> +out:
> +       return ret;
> +}
> +EXPORT_SYMBOL(skb_sctp_csum_help);
> +
>  __be16 skb_network_protocol(struct sk_buff *skb, int *depth)
>  {
>         __be16 type = skb->protocol;
> --
> 2.7.4
> ------------------- >8 --------------------------
>
> Thank you again for paying attention to this, and I would appreciate if
> you share your opinion.
>
> Notes:
>
> [1] see commit c80fafbbb59e ("veth: sctp: add NETIF_F_SCTP_CRC to device
> features")
> [2] see commit c008b33f3ef ("net/sched: act_csum: compute crc32c on SCTP
> packets").  We could also turn off NETIF_F_SCTP_CRC bit from vethA, but
> this would generate useless crc32c calculations if the SCTP server is not
> outside the physical node (e.g. it is bound to br0), leading to a
> throughput degradation.
>

  parent reply	other threads:[~2017-02-02 18:08 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-23 16:52 [RFC PATCH net-next 0/5] net: improve support for SCTP checksums Davide Caratti
2017-01-23 16:52 ` [RFC PATCH net-next 1/5] skbuff: add stub to help computing crc32c on SCTP packets Davide Caratti
2017-01-23 16:52 ` [RFC PATCH net-next 2/5] net: split skb_checksum_help Davide Caratti
2017-01-23 20:59   ` Tom Herbert
2017-01-24 16:35     ` David Laight
2017-02-02 15:07       ` Davide Caratti
2017-02-02 16:55         ` David Laight
2017-02-02 18:08         ` Tom Herbert [this message]
2017-02-27 13:39           ` Davide Caratti
2017-02-27 15:11             ` Tom Herbert
2017-02-28 10:31               ` Davide Caratti
2017-02-28 10:32             ` [PATCH RFC net-next v2 1/4] skbuff: add stub to help computing crc32c on SCTP packets Davide Caratti
2017-02-28 10:32               ` [PATCH RFC net-next v2 2/4] net: introduce skb_sctp_csum_help Davide Caratti
2017-02-28 10:32               ` [PATCH RFC net-next v2 3/4] net: more accurate checksumming in validate_xmit_skb Davide Caratti
2017-02-28 19:50                 ` Tom Herbert
2017-02-28 10:32               ` [PATCH RFC net-next v2 4/4] Documentation: update notes on checksum offloading Davide Caratti
2017-02-28 22:46               ` [PATCH RFC net-next v2 1/4] skbuff: add stub to help computing crc32c on SCTP packets Alexander Duyck
2017-03-01  3:17                 ` Tom Herbert
2017-03-01 10:53                 ` David Laight
2017-03-06 21:51                 ` Davide Caratti
2017-03-07 18:06                   ` Alexander Duyck
2017-03-18 13:17                     ` Davide Caratti
2017-03-18 22:35                       ` Tom Herbert
2017-04-07 14:16                         ` [PATCH RFC net-next v3 0/7] improve CRC32c in the forwarding path Davide Caratti
2017-04-07 14:16                           ` [PATCH RFC net-next v3 1/7] skbuff: add stub to help computing crc32c on SCTP packets Davide Caratti
2017-04-07 14:16                           ` [PATCH RFC net-next v3 2/7] net: introduce skb_crc32c_csum_help Davide Caratti
2017-04-07 14:16                           ` [PATCH RFC net-next v3 3/7] sk_buff: remove support for csum_bad in sk_buff Davide Caratti
2017-04-07 14:16                           ` [PATCH RFC net-next v3 4/7] net: use skb->csum_algo to identify packets needing crc32c Davide Caratti
2017-04-07 15:43                             ` Tom Herbert
2017-04-07 17:29                               ` Davide Caratti
2017-04-07 18:11                                 ` Tom Herbert
2017-04-13 10:36                                   ` Davide Caratti
2017-04-20 13:38                                   ` [PATCH RFC net-next v4 0/7] net: improve support for SCTP checksums Davide Caratti
2017-04-20 13:38                                     ` [PATCH RFC net-next v4 1/7] skbuff: add stub to help computing crc32c on SCTP packets Davide Caratti
2017-04-20 13:38                                     ` [PATCH RFC net-next v4 2/7] net: introduce skb_crc32c_csum_help Davide Caratti
2017-04-27 12:29                                       ` Marcelo Ricardo Leitner
2017-04-20 13:38                                     ` [PATCH RFC net-next v4 3/7] sk_buff: remove support for csum_bad in sk_buff Davide Caratti
2017-04-27  1:34                                       ` [sk_buff] 95510aef27: BUG:Bad_page_state_in_process kernel test robot
2017-04-29 20:21                                       ` [PATCH RFC net-next v4 3/7] sk_buff: remove support for csum_bad in sk_buff Tom Herbert
2017-04-20 13:38                                     ` [PATCH RFC net-next v4 4/7] net: use skb->csum_not_inet to identify packets needing crc32c Davide Caratti
2017-04-29 20:18                                       ` Tom Herbert
2017-04-20 13:38                                     ` [PATCH RFC net-next v4 5/7] net: more accurate checksumming in validate_xmit_skb() Davide Caratti
2017-04-20 13:38                                     ` [PATCH RFC net-next v4 6/7] openvswitch: more accurate checksumming in queue_userspace_packet() Davide Caratti
2017-04-20 13:38                                     ` [PATCH RFC net-next v4 7/7] sk_buff.h: improve description of CHECKSUM_{COMPLETE,UNNECESSARY} Davide Caratti
2017-04-29 20:20                                       ` Tom Herbert
2017-04-27 12:41                                     ` [PATCH RFC net-next v4 0/7] net: improve support for SCTP checksums Marcelo Ricardo Leitner
2017-04-07 14:16                           ` [PATCH RFC net-next v3 5/7] net: more accurate checksumming in validate_xmit_skb() Davide Caratti
2017-04-07 14:16                           ` [PATCH RFC net-next v3 6/7] openvswitch: more accurate checksumming in queue_userspace_packet() Davide Caratti
2017-04-07 14:16                           ` [PATCH RFC net-next v3 7/7] sk_buff.h: improve description of CHECKSUM_{COMPLETE,UNNECESSARY} Davide Caratti
2017-01-23 16:52 ` [RFC PATCH net-next 3/5] net: introduce skb_sctp_csum_help Davide Caratti
2017-01-23 16:52 ` [RFC PATCH net-next 4/5] net: more accurate checksumming in validate_xmit_skb Davide Caratti
2017-01-23 16:52 ` [RFC PATCH net-next 5/5] Documentation: add description of skb_sctp_csum_help Davide Caratti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALx6S34KvvLYo_wrnX-da5ok3Z4pcdyJxiLi63iHzno9j9L4Fg@mail.gmail.com \
    --to=tom@herbertland.com \
    --cc=David.Laight@aculab.com \
    --cc=davem@davemloft.net \
    --cc=dcaratti@redhat.com \
    --cc=linux-sctp@vger.kernel.org \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).