From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64971C4332F for ; Wed, 14 Dec 2022 23:48:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229600AbiLNXsD (ORCPT ); Wed, 14 Dec 2022 18:48:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbiLNXsC (ORCPT ); Wed, 14 Dec 2022 18:48:02 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68A16442DE for ; Wed, 14 Dec 2022 15:47:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671061632; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=78LWSsIPJ1XVGL0zlNrTSoo4bL68pZNIsDEq2wZNj8A=; b=YfAKYp3WYaynurVghzJACncL6SrWCO55WmEj6x0QJ5oubQxTZUUwL096jdM/FKk/+nxqHN EphzGBXKHte8zWRlEMcVckTjSMa5InNqoKx69uElbISK4Gupp+eK4xXSCjbYV4BAGyKDWe vyPbvk0+dN9EqBE388D5P50MIEpr1Hs= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-2-BIEy2zgXNlGrjH3qMiGjZA-1; Wed, 14 Dec 2022 18:46:07 -0500 X-MC-Unique: BIEy2zgXNlGrjH3qMiGjZA-1 Received: by mail-ej1-f72.google.com with SMTP id ne1-20020a1709077b8100b007c198bb8c0eso3452801ejc.8 for ; Wed, 14 Dec 2022 15:46:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=78LWSsIPJ1XVGL0zlNrTSoo4bL68pZNIsDEq2wZNj8A=; b=oairrKC7tZWZFk0k/3VAam7tqM1KE6Oaea9vRmURNzDi7PrSIHEAUiqbN2QKitKE/q aK9rCbENpPwxM0Vz8qDCllZv1QNK2UHyu78A2Od/4jUOZAPO3vFNUBAvqveY8iRuDzGZ tfWMoEStHRdtkCUGXYQHSzjMzMFHGkdy4eypogM14TM1MqnvXj0A4NVuss98Qt38qR1E r0N1QOakQy5COBULlPc5rEVQ8XWE3b0lFekCfBjlZw2ohG6OWoutRzIU4Niq/lzEQuK/ 73fz6xPon0OQoqizHd0prndqjRAVRL+2J14dx7Sai20gU/HeIA8lI6Od/kw0sPXDlm6k BhiQ== X-Gm-Message-State: ANoB5pnKhxCyf1+8pUS+f0TKuRe1OYPilkRpCn72gKxVf7+bhERAjZt2 6c4LZRwO5cLv4BK7XDOi9sSPiGLOzmh3aKRFLyoenZMYYcR8FKIHw4dg9uH22Gaeg8cIyCIJQ8n 5Ud/7+9Wvt3rH X-Received: by 2002:a17:906:2284:b0:7c0:4030:ae20 with SMTP id p4-20020a170906228400b007c04030ae20mr23372480eja.24.1671061565174; Wed, 14 Dec 2022 15:46:05 -0800 (PST) X-Google-Smtp-Source: AA0mqf6Kj2IAiZK6tBE55SMpMMy5/RbSQAJVRnIh9peLku4AtIQs4/zPgn0O75+ZteBEVeLMyv2/KQ== X-Received: by 2002:a17:906:2284:b0:7c0:4030:ae20 with SMTP id p4-20020a170906228400b007c04030ae20mr23372449eja.24.1671061564314; Wed, 14 Dec 2022 15:46:04 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id x22-20020a170906711600b007be3aa82543sm6461002ejj.35.2022.12.14.15.46.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Dec 2022 15:46:03 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 1B59982F66F; Thu, 15 Dec 2022 00:46:02 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Stanislav Fomichev , bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Subject: Re: [xdp-hints] [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata In-Reply-To: <20221213023605.737383-2-sdf@google.com> References: <20221213023605.737383-1-sdf@google.com> <20221213023605.737383-2-sdf@google.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 15 Dec 2022 00:46:02 +0100 Message-ID: <87tu1xeeh1.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Stanislav Fomichev writes: > Document all current use-cases and assumptions. Below is a set of slightly more constructive suggestions for how to edit this so it's not confusing the metadata area description with the kfunc list: > Cc: John Fastabend > Cc: David Ahern > Cc: Martin KaFai Lau > Cc: Jakub Kicinski > Cc: Willem de Bruijn > Cc: Jesper Dangaard Brouer > Cc: Anatoly Burakov > Cc: Alexander Lobakin > Cc: Magnus Karlsson > Cc: Maryam Tahhan > Cc: xdp-hints@xdp-project.net > Cc: netdev@vger.kernel.org > Signed-off-by: Stanislav Fomichev > --- > Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++ > 1 file changed, 90 insertions(+) > create mode 100644 Documentation/bpf/xdp-rx-metadata.rst > > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst > new file mode 100644 > index 000000000000..498eae718275 > --- /dev/null > +++ b/Documentation/bpf/xdp-rx-metadata.rst > @@ -0,0 +1,90 @@ > +=============== > +XDP RX Metadata > +=============== > + > +XDP programs support creating and passing custom metadata via > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following > +entities: > + > +1. ``AF_XDP`` consumer. > +2. Kernel core stack via ``XDP_PASS``. > +3. Another device via ``bpf_redirect_map``. > +4. Other BPF programs via ``bpf_tail_call``. I'd replace the above with a short introduction, like: "This document describes how an XDP program can access hardware metadata related to a packet using a set of helper functions, and how it can pass that metadata on to other consumers." > +General Design > +============== > + > +XDP has access to a set of kfuncs to manipulate the metadata. Every > +device driver implements these kfuncs. The set of kfuncs is > +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``. > + > +Currently, the following kfuncs are supported. In the future, as more > +metadata is supported, this set will grow: > + > +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to > + indicate whether the device supports RX timestamps > +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp > +- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to > + indicate whether the device supports RX hash > +- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash Keep the above (with David's comments), then add a bit of extra text, here: "The XDP program can use these kfuncs to read the metadata into stack variables for its own consumption. Or, to pass the metadata on to other consumers, an XDP program can store it into the metadata area carried ahead of the packet. > +Within the XDP frame, the metadata layout is as follows:: > + > + +----------+-----------------+------+ > + | headroom | custom metadata | data | > + +----------+-----------------+------+ > + ^ ^ > + | | > + xdp_buff->data_meta xdp_buff->data Add: "The XDP program can store individual metadata items into this data_meta area in whichever format it chooses. Later consumers of the metadata will have to agree on the format by some out of band contract (like for the AF_XDP use case, see below)." > +AF_XDP > +====== > + > +``AF_XDP`` use-case implies that there is a contract between the BPF program > +that redirects XDP frames into the ``XSK`` and the final consumer. > +Thus the BPF program manually allocates a fixed number of > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset > +of kfuncs to populate it. User-space ``XSK`` consumer, looks > +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata. > + > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: > + > + +----------+-----------------+------+ > + | headroom | custom metadata | data | > + +----------+-----------------+------+ > + ^ > + | > + rx_desc->address > + > +XDP_PASS > +======== > + > +This is the path where the packets processed by the XDP program are passed > +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents. > +Currently, every driver has a custom kernel code to parse the descriptors and > +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion. Add: ", and the XDP metadata is not used by the kernel when building skbs. However, TC-BPF programs can access the XDP metadata area using the data_meta pointer." > +In the future, we'd like to support a case where XDP program can override > +some of that metadata. s/some of that metadata/some of the metadata used for building skbs/. > +The plan of record is to make this path similar to ``bpf_redirect_map`` > +so the program can control which metadata is passed to the skb layer. I'm not sure we are quite agreed on this part, just drop for now (it's sorta covered by the above)? > +bpf_redirect_map > +================ > + > +``bpf_redirect_map`` can redirect the frame to a different device. > +In this case we don't know ahead of time whether that final consumer > +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``. > +Additionally, the final consumer doesn't have access to the original > +hardware descriptor and can't access any of the original metadata. Replace this paragraph with: "``bpf_redirect_map`` can redirect the frame to a different device. Some devices (like virtual ethernet links) support running a second XDP program after the redirect. However, the final consumer doesn't have access to the original hardware descriptor and can't access any of the original metadata. The same applies to XDP programs installed into devmaps and cpumaps." > +For this use-case, only custom metadata is currently supported. If > +the frame is eventually passed to the kernel, the skb created from such > +a frame won't have any skb metadata. The ``XSK`` consumer will only > +have access to the custom metadata. Reword as: "This means that for redirected packets only custom metadata is currently supported, which has to be prepared by the initial XDP program before redirect. If +the frame is eventually passed to the kernel, the skb created from such a frame won't have any hardware metadata populated in its skb. And if such a packet is later redirected into an ``XSK``, that will also only have access to the custom metadata." > +bpf_tail_call > +============= > + > +No special handling here. Tail-called program operates on the same context > +as the original one. Replace this with a statement that it is in fact *not* supported in tail maps :) -Toke