From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4EC5C433FE for ; Fri, 11 Nov 2022 01:26:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230232AbiKKB0Y (ORCPT ); Thu, 10 Nov 2022 20:26:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbiKKB0X (ORCPT ); Thu, 10 Nov 2022 20:26:23 -0500 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4755F15FE2; Thu, 10 Nov 2022 17:26:20 -0800 (PST) Message-ID: <2e3c1e2d-bc60-b406-31e3-6e922eea3f9f@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1668129978; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IpJQ800sy78+a8kmRcchPk+xzAsgLINxEGbosp0MZ6M=; b=WSGspNZV58xAl4fDEq8lwjV5YbmNymLaeb43fC9tHUvtseT+y+TdlzQ6Brk3zbB8YOCs8y xIcgIpXSoL/y9/2iHAwyh/us0pWvuVoQh7UuitPUl5mgHVzeBefGFwZ+Ye57ObMwJQ6W1l IGmLkWxrgiKXp9pECIFCvcbIfv23MxA= Date: Thu, 10 Nov 2022 17:26:12 -0800 MIME-Version: 1.0 Subject: Re: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Content-Language: en-US To: Stanislav Fomichev Cc: =?UTF-8?Q?Toke_H=c3=b8iland-J=c3=b8rgensen?= , ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org, bpf@vger.kernel.org References: <20221104032532.1615099-1-sdf@google.com> <20221104032532.1615099-7-sdf@google.com> <187e89c3-d7de-7bec-c72e-d9d6eb5bcca0@linux.dev> <9a8fefe4-2fcb-95b7-cda0-06509feee78e@linux.dev> <6f57370f-7ec3-07dd-54df-04423cab6d1f@linux.dev> <87leokz8lq.fsf@toke.dk> <5a23b856-88a3-a57a-2191-b673f4160796@linux.dev> <32f81955-8296-6b9a-834a-5184c69d3aac@linux.dev> <87y1siyjf6.fsf@toke.dk> <87o7texv08.fsf@toke.dk> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On 11/10/22 4:57 PM, Stanislav Fomichev wrote: > On Thu, Nov 10, 2022 at 4:33 PM Martin KaFai Lau wrote: >> >> On 11/10/22 3:52 PM, Stanislav Fomichev wrote: >>> On Thu, Nov 10, 2022 at 3:14 PM Toke Høiland-Jørgensen wrote: >>>> >>>> Skipping to the last bit: >>>> >>>>>>>>> } else { >>>>>>>>> use kfuncs >>>>>>>>> } >>>>>>>>> >>>>>>>>> 5. Support the case where we keep program's metadata and kernel's >>>>>>>>> xdp_to_skb_metadata >>>>>>>>> - skb_metadata_import_from_xdp() will "consume" it by mem-moving the >>>>>>>>> rest of the metadata over it and adjusting the headroom >>>>>>>> >>>>>>>> I was thinking the kernel's xdp_to_skb_metadata is always before the program's >>>>>>>> metadata. xdp prog should usually work in this order also: read/write headers, >>>>>>>> write its own metadata, call bpf_xdp_metadata_export_to_skb(), and return >>>>>>>> XDP_PASS/XDP_REDIRECT. When it is XDP_PASS, the kernel just needs to pop the >>>>>>>> xdp_to_skb_metadata and pass the remaining program's metadata to the bpf-tc. >>>>>>>> >>>>>>>> For the kernel and xdp prog, I don't think it matters where the >>>>>>>> xdp_to_skb_metadata is. However, the xdp->data_meta (program's metadata) has to >>>>>>>> be before xdp->data because of the current data_meta and data comparison usage >>>>>>>> in the xdp prog. >>>>>>>> >>>>>>>> The order of the kernel's xdp_to_skb_metadata and the program's metadata >>>>>>>> probably only matters to the userspace AF_XDP. However, I don't see how AF_XDP >>>>>>>> supports the program's metadata now. afaict, it can only work now if there is >>>>>>>> some sort of contract between them or the AF_XDP currently does not use the >>>>>>>> program's metadata. Either way, we can do the mem-moving only for AF_XDP and it >>>>>>>> should be a no op if there is no program's metadata? This behavior could also >>>>>>>> be configurable through setsockopt? >>>>>>> >>>>>>> Agreed on all of the above. For now it seems like the safest thing to >>>>>>> do is to put xdp_to_skb_metadata last to allow af_xdp to properly >>>>>>> locate btf_id. >>>>>>> Let's see if Toke disagrees :-) >>>>>> >>>>>> As I replied to Martin, I'm not sure it's worth the complexity to >>>>>> logically split the SKB metadata from the program's own metadata (as >>>>>> opposed to just reusing the existing data_meta pointer)? >>>>> >>>>> I'd gladly keep my current requirement where it's either or, but not both :-) >>>>> We can relax it later if required? >>>> >>>> So the way I've been thinking about it is simply that the skb_metadata >>>> would live in the same place at the data_meta pointer (including >>>> adjusting that pointer to accommodate it), and just overriding the >>>> existing program metadata, if any exists. But looking at it now, I guess >>>> having the split makes it easier for a program to write its own custom >>>> metadata and still use the skb metadata. See below about the ordering. >>>> >>>>>> However, if we do, the layout that makes most sense to me is putting the >>>>>> skb metadata before the program metadata, like: >>>>>> >>>>>> -------------- >>>>>> | skb_metadata >>>>>> -------------- >>>>>> | data_meta >>>>>> -------------- >>>>>> | data >>>>>> -------------- >>>>>> >> >> Yeah, for the kernel and xdp prog (ie not AF_XDP), I meant this: >> >> | skb_metadata | custom metadata | data | >> >>>>>> Not sure if that's what you meant? :) >>>>> >>>>> I was suggesting the other way around: |custom meta|skb_metadata|data| >>>>> (but, as Martin points out, consuming skb_metadata in the kernel >>>>> becomes messier) >>>>> >>>>> af_xdp can check whether skb_metdata is present by looking at data - >>>>> offsetof(struct skb_metadata, btf_id). >>>>> progs that know how to handle custom metadata, will look at data - >>>>> sizeof(skb_metadata) >>>>> >>>>> Otherwise, if it's the other way around, how do we find skb_metadata >>>>> in a redirected frame? >>>>> Let's say we have |skb_metadata|custom meta|data|, how does the final >>>>> program find skb_metadata? >>>>> All the progs have to agree on the sizeof(tc/custom meta), right? >>>> >>>> Erm, maybe I'm missing something here, but skb_metadata is fixed size, >>>> right? So if the "skb_metadata is present" flag is set, we know that the >>>> sizeof(skb_metadata) bytes before the data_meta pointer contains the >>>> metadata, and if the flag is not set, we know those bytes are not valid >>>> metadata. >> >> right, so to get to the skb_metadata, it will be >> data_meta -= sizeof(skb_metadata); /* probably need alignment */ >> >>>> >>>> For AF_XDP, we'd need to transfer the flag as well, and it could apply >>>> the same logic (getting the size from the vmlinux BTF). >>>> >>>> By this logic, the BTF_ID should be the *first* entry of struct >>>> skb_metadata, since that will be the field AF_XDP programs can find >>>> right off the bat, no? > >>> The problem with AF_XDP is that, IIUC, it doesn't have a data_meta >>> pointer in the userspace. >> >> Yep. It is my understanding also. Missing data_meta pointer in the AF_XDP >> rx_desc is a potential problem. Having BTF_ID or not won't help. >> >>> >>> You get an rx descriptor where the address points to the 'data': >>> | 256 bytes headroom where metadata can go | data | >>> >>> So you have (at most) 256 bytes of headroom, some of that might be the >>> metadata, but you really don't know where it starts. But you know it >>> definitely ends where the data begins. >>> >>> So if we have the following, we can locate skb_metadata: >>> | 256-sizeof(skb_metadata) headroom | custom metadata | skb_metadata | data | >>> data - sizeof(skb_metadata) will get you there >>> >>> But if it's the other way around, the program has to know >>> sizeof(custom metadata) to locate skb_metadata: >>> | 256-sizeof(skb_metadata) headroom | skb_metadata | custom metadata | data | >> >> Right, this won't work if the AF_XDP user does not know how big the custom >> metadata is. The kernel then needs to swap the "skb_metadata" and "custom >> metadata" + setting a flag in the AF_XDP rx_desc->options to make it looks like >> this: >> | custom metadata | skb_metadata | data | >> >> However, since data_meta is missing from the rx_desc, may be we can safely >> assume the AF_XDP user always knows the size of the custom metadata or there is >> usually no "custom metadata" and no swap is needed? > > If we can assume they can share that info, can they also share more > info on what kind of metadata they would prefer to get? > If they can agree on the size, maybe they also can agree on the flows > that need skb_metdata vs the flows that need a custom one? > > Seems like we can start with supporting either one, but not both and > extend in the future once we have more understanding on whether it's > actually needed or not? > > bpf_xdp_metadata_export_to_skb: adjust data meta, add uses-skb-metadata flag > bpf_xdp_adjust_meta: unconditionally reset uses-skb-metadata flag hmm... I am thinking: bpf_xdp_adjust_meta: move the existing (if any) skb_metadata and adjust xdp->data_meta. bpf_xdp_metadata_export_to_skb: If skb_metadata exists, overwrites the existing one. If not exists, gets headroom before xdp->data_meta and writes hints.