From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE950C4332F for ; Thu, 17 Nov 2022 11:33:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234751AbiKQLde (ORCPT ); Thu, 17 Nov 2022 06:33:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234125AbiKQLdc (ORCPT ); Thu, 17 Nov 2022 06:33:32 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 244F84B9A6 for ; Thu, 17 Nov 2022 03:32:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668684756; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TqGQL7f1VO0oELXAbLB5ugC34/GyGd79wtlJNVZt8i4=; b=AfWJS63HRuee6RQKnnQqTxJD/a1ChM3XLv8GaW0x3XFmcZd5jLj9Seb19UHkLmlzTl5ZUg o8XWbB2hYOqrjmd1y0gvWc+uPQgFD2KXojhUeFMgLLSK1fA5Z44F0zGUFE+RDuhsyX0XE5 vrPrhdd9MU+D4rnlPRMPjsfQ0RrM1cc= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-493-6BGPQFPsOLyH2ELHFqHqwQ-1; Thu, 17 Nov 2022 06:32:35 -0500 X-MC-Unique: 6BGPQFPsOLyH2ELHFqHqwQ-1 Received: by mail-ed1-f70.google.com with SMTP id x18-20020a05640226d200b00461e027f704so1041037edd.7 for ; Thu, 17 Nov 2022 03:32:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TqGQL7f1VO0oELXAbLB5ugC34/GyGd79wtlJNVZt8i4=; b=GZTsXDUKQDYVbfHEbuVQpqVvTyn/FDetIWUB0Qjlp5ojmcMYkVrOkQN5+xLdLSqC8v lQY2NQy3ObcgAOh2DjDxivsXEznQdEZAspkKGxyS1W3d/cpbGiiIs22lW8c+6wlNUxPa SRy0ADg8Gt7ry69vb+6DFXGQ3nGYg4I/acLsBAftITlD0IY0fvUJM18lFlgvFCw4mzGH iVJhS1L6vwT9HiAZ8tle8OoH8Bgnht8dBDuR8YXV7jTBBbj6+YmdVG13YagpSsbwlazb H7vc9QYG0/9bCRY6wkOZA9ocBWalGUFltPEHSCPxDxLH4S8M+JQMnfYVAu6g8+3n0r+k 0Gcg== X-Gm-Message-State: ANoB5pnUp3vOQM9kHPsHa57PEEC7oxuV2bURH+PdaEzn9BRLn0Fmqz/j LxgrrK6n/jUW/4RyZkunMTF8o89dJb8S1nuUuOAXOU1aPrr/VIo57F5JhMcoujTkSYGYXIgpE2q 4GKac7/x0MKxV X-Received: by 2002:a17:907:7611:b0:7ae:bc3b:d9c6 with SMTP id jx17-20020a170907761100b007aebc3bd9c6mr1633132ejc.770.1668684753727; Thu, 17 Nov 2022 03:32:33 -0800 (PST) X-Google-Smtp-Source: AA0mqf4FDPSA2bFot9pEEWUNSs2DyZF1gHv0A8iE/vXrtKslN50mPd5zjItty36Nhjk3PExk7IhqkA== X-Received: by 2002:a17:907:7611:b0:7ae:bc3b:d9c6 with SMTP id jx17-20020a170907761100b007aebc3bd9c6mr1633108ejc.770.1668684753381; Thu, 17 Nov 2022 03:32:33 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id bw10-20020a170906c1ca00b0078dce9984afsm238635ejb.220.2022.11.17.03.32.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Nov 2022 03:32:32 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 638917A6F66; Thu, 17 Nov 2022 12:32:31 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Stanislav Fomichev , Alexei Starovoitov Cc: John Fastabend , Martin KaFai Lau , bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, Network Development Subject: Re: [xdp-hints] Re: [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp In-Reply-To: References: <20221115030210.3159213-1-sdf@google.com> <20221115030210.3159213-6-sdf@google.com> <87h6z0i449.fsf@toke.dk> <8735ajet05.fsf@toke.dk> <6374854883b22_5d64b208e3@john.notmuch> <34f89a95-a79e-751c-fdd2-93889420bf96@linux.dev> <878rkbjjnp.fsf@toke.dk> <6375340a6c284_66f16208aa@john.notmuch> <637576962dada_8cd03208b0@john.notmuch> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 17 Nov 2022 12:32:31 +0100 Message-ID: <87wn7t4y0g.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Stanislav Fomichev writes: >> > Doesn't look like the descriptors are as nice as you're trying to >> > paint them (with clear hash/csum fields) :-) So not sure how much >> > CO-RE would help. >> > At least looking at mlx4 rx_csum, the driver consults three different >> > sets of flags to figure out the hash_type. Or am I just unlucky with >> > mlx4? >> >> Which part are you talking about ? >> hw_checksum = csum_unfold((__force __sum16)cqe->checksum); >> is trivial enough for bpf prog to do if it has access to 'cqe' pointer >> which is what John is proposing (I think). > > I'm talking about mlx4_en_process_rx_cq, the caller of that check_csum. > In particular: if (likely(dev->features & NETIF_F_RXCSUM)) branch > I'm assuming we want to have hash_type available to the progs? I agree we should expose the hash_type, but that doesn't actually look to be that complicated, see below. > But also, check_csum handles other corner cases: > - short_frame: we simply force all those small frames to skip checksum complete > - get_fixed_ipv6_csum: In IPv6 packets, hw_checksum lacks 6 bytes from > IPv6 header > - get_fixed_ipv4_csum: Although the stack expects checksum which > doesn't include the pseudo header, the HW adds it > > So it doesn't look like we can just unconditionally use cqe->checksum? > The driver does a lot of massaging around that field to make it > palatable. Poking around a bit in the other drivers, AFAICT it's only a subset of drivers that support CSUM_COMPLETE at all; for instance, the Intel drivers just set CHECKSUM_UNNECESSARY for TCP/UDP/SCTP. I think the CHECKSUM_UNNECESSARY is actually the most important bit we'd want to propagate? AFAICT, the drivers actually implementing CHECKSUM_COMPLETE need access to other data structures than the rx descriptor to determine the status of the checksum (mlx4 looks at priv->flags, mlx5 checks rq->state), so just exposing the rx descriptor to BPF as John is suggesting does not actually give the XDP program enough information to act on the checksum field on its own. We could still have a separate kfunc to just expose the hw checksum value (see below), but I think it probably needs to be paired with other kfuncs to be useful. Looking at the mlx4 code, I think the following mapping to kfuncs (in pseudo-C) would give the flexibility for XDP to access all the bits it needs, while inlining everything except getting the full checksum for non-TCP/UDP traffic. An (admittedly cursory) glance at some of the other drivers (mlx5, ice, i40e) indicates that this would work for those drivers as well. bpf_xdp_metadata_rx_hash_supported() { return dev->features & NETIF_F_RXHASH; } bpf_xdp_metadata_rx_hash() { return be32_to_cpu(cqe->immed_rss_invalid); } bpf_xdp_metdata_rx_hash_type() { if (likely(dev->features & NETIF_F_RXCSUM) && (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP | MLX4_CQE_STATUS_UDP)) && (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) && cqe->checksum == cpu_to_be16(0xffff)) return PKT_HASH_TYPE_L4; return PKT_HASH_TYPE_L3; } bpf_xdp_metadata_rx_csum_supported() { return dev->features & NETIF_F_RXCSUM; } bpf_xdp_metadata_rx_csum_level() { if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP | MLX4_CQE_STATUS_UDP)) && (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) && cqe->checksum == cpu_to_be16(0xffff)) return CHECKSUM_UNNECESSARY; if (!(priv->flags & MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP && (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IP_ANY))) && !short_frame(len)) return CHECKSUM_COMPLETE; /* we could also omit this case entirely */ return CHECKSUM_NONE; } /* this one could be called by the metadata_to_skb code */ bpf_xdp_metadata_rx_csum_full() { return check_csum() /* BPF_CALL this after refactoring so it is skb-agnostic */ } /* this one would be for people like John who want to re-implement * check_csum() themselves */ bpf_xdp_metdata_rx_csum_raw() { return cqe->checksum; } -Toke