netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Victor Julien <victor@inliniac.net>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Eric Dumazet <edumazet@google.com>,
	Mao Wenan <maowenan@huawei.com>, Arnd Bergmann <arnd@arndb.de>,
	Neil Horman <nhorman@tuxdriver.com>,
	linux-doc@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Alexander Drozdov <al.drozdov@gmail.com>,
	Tom Herbert <tom@herbertland.com>
Subject: Re: [PATCH net-next v2] af-packet: new flag to indicate all csums are good
Date: Thu, 4 Jun 2020 11:46:51 +0200	[thread overview]
Message-ID: <904a4ad6-650b-8097-deff-989f1936064b@inliniac.net> (raw)
In-Reply-To: <CA+FuTSdczH+i8+FO+eQ+OT4-bsRAKG+jacPiuRu3jMszpV_2XA@mail.gmail.com>

On 02-06-2020 22:18, Willem de Bruijn wrote:
> On Tue, Jun 2, 2020 at 4:05 PM Victor Julien <victor@inliniac.net> wrote:
>>
>> On 02-06-2020 21:38, Willem de Bruijn wrote:
>>> On Tue, Jun 2, 2020 at 3:22 PM Victor Julien <victor@inliniac.net> wrote:
>>>>
>>>> On 02-06-2020 21:03, Willem de Bruijn wrote:
>>>>> On Tue, Jun 2, 2020 at 2:31 PM Victor Julien <victor@inliniac.net> wrote:
>>>>>> On 02-06-2020 19:37, Willem de Bruijn wrote:
>>>>>>> On Tue, Jun 2, 2020 at 1:03 PM Victor Julien <victor@inliniac.net> wrote:
>>>>>>>>
>>>>>>>> On 02-06-2020 16:29, Willem de Bruijn wrote:
>>>>>>>>> On Tue, Jun 2, 2020 at 4:05 AM Victor Julien <victor@inliniac.net> wrote:
>>>>>>>>>>
>>>>>>>>>> Introduce a new flag (TP_STATUS_CSUM_UNNECESSARY) to indicate
>>>>>>>>>> that the driver has completely validated the checksums in the packet.
>>>>>>>>>>
>>>>>>>>>> The TP_STATUS_CSUM_UNNECESSARY flag differs from TP_STATUS_CSUM_VALID
>>>>>>>>>> in that the new flag will only be set if all the layers are valid,
>>>>>>>>>> while TP_STATUS_CSUM_VALID is set as well if only the IP layer is valid.
>>>>>>>>>
>>>>>>>>> transport, not ip checksum.
>>>>>>>>
>>>>>>>> Allow me a n00b question: what does transport refer to here? Things like
>>>>>>>> ethernet? It isn't clear to me from the doc.
>>>>>>>
>>>>>>> The TCP/UDP/.. transport protocol checksum.
>>>>>>
>>>>>> Hmm that is what I thought originally, but then it didn't seem to work.
>>>>>> Hence my patch.
>>>>>>
>>>>>> However I just redid my testing. I took the example tpacketv3 program
>>>>>> and added the status flag checks to the 'display()' func:
>>>>>>
>>>>>>                 if (ppd->tp_status & TP_STATUS_CSUM_VALID) {
>>>>>>                         printf("TP_STATUS_CSUM_VALID, ");
>>>>>>                 }
>>>>>>                 if (ppd->tp_status & (1<<8)) {
>>>>>>                         printf("TP_STATUS_CSUM_UNNECESSARY, ");
>>>>>>
>>>>>>                 }
>>>>>>
>>>>>> Then using scapy sent some packets in 2 variants:
>>>>>> - default (good csums)
>>>>>> - deliberately bad csums
>>>>>> (then also added a few things like ip6 over ip)
>>>>>>
>>>>>>
>>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(),
>>>>>> iface="enp1s0") // good csums
>>>>>>
>>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(chksum=1),
>>>>>> iface="enp1s0") //bad tcp
>>>>>
>>>>> Is this a test between two machines? What is the device driver of the
>>>>> machine receiving and printing the packet? It would be helpful to know
>>>>> whether this uses CHECKSUM_COMPLETE or CHECKSUM_UNNECESSARY.
>>>>
>>>> Yes 2 machines, or actually 2 machines and a VM. The receiving Linux
>>>> sits in a kvm vm with network pass through and uses the virtio driver
>>>> (host uses e1000e). Based on a quick 'git grep CHECKSUM_UNNECESSARY'
>>>> virtio seems to support that.
>>>>
>>>> I've done some more tests. In a pcap replay that I know contains packet
>>>> with bad TCP csums (but good IP csums for those pkts), to a physical
>>>> host running Ubuntu Linux kernel 5.3:
>>>>
>>>> - receiver uses nfp (netronome) driver: TP_STATUS_CSUM_VALID set for
>>>> every packet, including the bad TCP ones
>>>> - receiver uses ixgbe driver: TP_STATUS_CSUM_VALID not set for the bad
>>>> packets.
>>>
>>> Great. Thanks a lot for running all these experiments.
>>>
>>> We might have to drop the TP_STATUS_CSUM_VALID with CHECKSUM_COMPLETE
>>> unless skb->csum_valid.
>>>
>>> For packets with multiple transport layer checksums,
>>> CHECKSUM_UNNECESSARY should mean that all have been verified.
>>>
>>> I believe that in the case of multiple transport headers, csum_valid
>>> similarly ensures all checksums up to csum_start are valid. Will need
>>> to double check.
>>>
>>> If so, there probably is no need for a separate new TP_STATUS.
>>> TP_STATUS_CSUM_VALID is reported only when all checksums are valid.
>>
>> So if I understand you correctly the key may be in the call to
>> `skb_csum_unnecessary`:
>>
>> That reads:
>>
>> static inline int skb_csum_unnecessary(const struct sk_buff *skb)
>> {
>>         return ((skb->ip_summed == CHECKSUM_UNNECESSARY) ||
>>                 skb->csum_valid ||
>>                 (skb->ip_summed == CHECKSUM_PARTIAL &&
>>                  skb_checksum_start_offset(skb) >= 0));
>> }
>>
>> But really only the first 2 conditions are reachable
> 
> .. from this codepath. That function is called in other codepaths as well.
> 
>> , as we already know
>> skb->ip_summed is not CHECKSUM_PARTIAL when we call it.
>>
>> So our unmodified check is:
>>
>>         else if (skb->pkt_type != PACKET_OUTGOING &&
>>                 (skb->ip_summed == CHECKSUM_COMPLETE ||
>>                  skb->ip_summed == CHECKSUM_UNNECESSARY ||
>>                  skb->csum_valid))
>>
>> Should this become something like:
>>
>>         else if (skb->pkt_type != PACKET_OUTGOING &&
>>                 (skb->ip_summed == CHECKSUM_COMPLETE &&
>>                  skb->csum_valid) ||
>>                  skb->ip_summed == CHECKSUM_UNNECESSARY)
>>
>> Is this what you had in mind?
> 
> I don't suggest modifying skb_csum_unnecessary probably. Certainly not
> until I've looked at all other callers of it.
> 
> But in case of packet sockets, yes, adding that csum_valid check is my
> first rough approximation.
> 
> That said, first let's give others more familiar with
> TP_STATUS_CSUM_VALID some time to comment.
> 

I did some more experiments, on real hw this time. I made the following
change to 5.7.0 (wasn't brave enough to remote upgrade a box to netnext):

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 29bd405adbbd..3afb1913837a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2216,8 +2216,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct
net_device *dev,
        if (skb->ip_summed == CHECKSUM_PARTIAL)
                status |= TP_STATUS_CSUMNOTREADY;
        else if (skb->pkt_type != PACKET_OUTGOING &&
-                (skb->ip_summed == CHECKSUM_COMPLETE ||
-                 skb_csum_unnecessary(skb)))
+                ((skb->ip_summed == CHECKSUM_COMPLETE &&
skb->csum_valid) ||
+                  skb->ip_summed == CHECKSUM_UNNECESSARY))
                status |= TP_STATUS_CSUM_VALID;

        if (snaplen > res)

With this change it seems the TP_STATUS_CSUM_VALID flag is *never* set
for the nfp driver.

The capture on the ixgbe driver looks unchanged, but that seems to make
sense as I think it uses CHECKSUM_UNNECESSARY and not CHECKSUM_COMPLETE.

For the nfp driver I have these settings:

root@z820:~# ethtool -k ens3np0|grep rx
rx-checksumming: on
rx-vlan-offload: off [fixed]
rx-vlan-filter: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
rx-gro-list: off

Since the driver suggests 'rx-checksumming' is enabled, I'm wondering
how we can actually get a result from it? Jakub, do you know?

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------


  parent reply	other threads:[~2020-06-04  9:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02  8:05 [PATCH net-next v2] af-packet: new flag to indicate all csums are good Victor Julien
2020-06-02  8:11 ` Victor Julien
2020-06-02 14:29 ` Willem de Bruijn
2020-06-02 17:03   ` Victor Julien
2020-06-02 17:37     ` Willem de Bruijn
2020-06-02 18:31       ` Victor Julien
2020-06-02 19:03         ` Willem de Bruijn
2020-06-02 19:22           ` Victor Julien
2020-06-02 19:29             ` Jakub Kicinski
2020-06-02 19:47               ` Victor Julien
2020-06-02 19:38             ` Willem de Bruijn
2020-06-02 20:05               ` Victor Julien
2020-06-02 20:18                 ` Willem de Bruijn
2020-06-02 20:29                   ` Victor Julien
2020-06-04  9:46                   ` Victor Julien [this message]
2020-06-04 13:48                     ` Willem de Bruijn
2020-06-05 12:38                       ` Victor Julien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=904a4ad6-650b-8097-deff-989f1936064b@inliniac.net \
    --to=victor@inliniac.net \
    --cc=al.drozdov@gmail.com \
    --cc=arnd@arndb.de \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maowenan@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=tom@herbertland.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).