From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E2BDC432C3 for ; Thu, 14 Nov 2019 15:41:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0A90B206D5 for ; Thu, 14 Nov 2019 15:41:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S0/6f0Y1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726505AbfKNPlI (ORCPT ); Thu, 14 Nov 2019 10:41:08 -0500 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:29773 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726473AbfKNPlI (ORCPT ); Thu, 14 Nov 2019 10:41:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573746066; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TsUJgGUVpXmofLpPKpc4ZZrOh9LKGy0txkuAyysjhio=; b=S0/6f0Y1yZYRSU6BqHmXzYY9uTLU1YWNGJpnMHLWZvEjIaYbcNlkeEBuQp95AkyznStD5K sLINAiK4h+NY8diCxeEjse5YN12mZDRwaEW4TTxraFozUqIAn0HW0GVbu/zAbYbpHAtUxP B9Ut9FvBDWK+vcpA/5FUf2BDh1Kpxzk= Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-218-4TN_GUMtNemYFdwWF92eBA-1; Thu, 14 Nov 2019 10:41:05 -0500 Received: by mail-lf1-f69.google.com with SMTP id t6so2074719lfd.13 for ; Thu, 14 Nov 2019 07:41:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=GLlkNjP2tY8QkKUF/OUq6PPbMjkE8ofUEdh3FKzNTPE=; b=r7SR56RN2r0cRVFZfgZ/ZMIusgKJL3ylhaqU/yMvuPhmcLAHl5SdVtfkpSnxrNh/J+ XjlW+J0V3/HdWpzJnhoRAUxoQMIl39b8ya1nAuD+39BffizNOkz7vtPo0m2o4331raBl FbQCuF/v25qK2A0C/Ljop0FnqJl1cadjISiNOR9TGo/olNPXO9FhjcBOIKLuyptiisVY N4WF24kJRIkhP/7PmdDWT1fVBKbrAbEisBCJXJpTbzxg1Jhmumi4LjSDtesjIv26/bUb G5kU5rnSmKsaxCG7LzjRQOzqbjBB+rQ1Jfz6hu1luFvk+W0T1xvXxKJ2QJB1bx3FT2D+ dd8Q== X-Gm-Message-State: APjAAAXTc6wKagIIP1BNlsVzyGh1sgW30KG0r+q3PjK8VKnzlAuddPY5 8NkRSVFPOxtxHGslHM6KaiPzt9ZgGTRaKsPCYZs3f6TcbOcbdznozjniQPQBdKUwMuKerTXl7HB piIAuQKnkZJU2 X-Received: by 2002:a19:6d12:: with SMTP id i18mr7332762lfc.153.1573746063791; Thu, 14 Nov 2019 07:41:03 -0800 (PST) X-Google-Smtp-Source: APXvYqw7DlklcocOm5Bieb8jiL3ND9ar6KSd0KbOYO2zf8WREILKU6HiWQQTMUYBqv+4ssIvPPyCTQ== X-Received: by 2002:a19:6d12:: with SMTP id i18mr7332736lfc.153.1573746063358; Thu, 14 Nov 2019 07:41:03 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk (borgediget.toke.dk. [85.204.121.218]) by smtp.gmail.com with ESMTPSA id g14sm2885065lfj.17.2019.11.14.07.41.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Nov 2019 07:41:02 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id B4D781803C7; Thu, 14 Nov 2019 16:41:01 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov Cc: Edward Cree , John Fastabend , Daniel Borkmann , Alexei Starovoitov , Martin KaFai Lau , Song Liu , Yonghong Song , Marek Majkowski , Lorenz Bauer , Alan Maguire , Jesper Dangaard Brouer , David Miller , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: Re: static and dynamic linking. Was: [PATCH bpf-next v3 1/5] bpf: Support chain calling multiple BPF In-Reply-To: <20191112195223.cp5kcmkko54dsfbg@ast-mbp.dhcp.thefacebook.com> References: <5da4ab712043c_25f42addb7c085b83b@john-XPS-13-9370.notmuch> <87eezfi2og.fsf@toke.dk> <87r23egdua.fsf@toke.dk> <70142501-e2dd-1aed-992e-55acd5c30cfd@solarflare.com> <874l07fu61.fsf@toke.dk> <87eez4odqp.fsf@toke.dk> <20191112025112.bhzmrrh2pr76ssnh@ast-mbp.dhcp.thefacebook.com> <87h839oymg.fsf@toke.dk> <20191112195223.cp5kcmkko54dsfbg@ast-mbp.dhcp.thefacebook.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 14 Nov 2019 16:41:01 +0100 Message-ID: <87y2wimpo2.fsf@toke.dk> MIME-Version: 1.0 X-MC-Unique: 4TN_GUMtNemYFdwWF92eBA-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Alexei Starovoitov writes: [...] > Back to your question of how fw2 will get loaded.. I'm thinking the follo= wing: > 1. Static linking: > obj =3D bpf_object__open("rootlet.o", "fw1.o", "fw2.o"); > // libbpf adjusts call offsets and links into single loadable bpf_objec= t > bpf_object__load(obj); > bpf_set_link_xdp_fd() > No kernel changes are necessary to support program chaining via static li= nking. > > 2. Dynamic linking: > // assuming libxdp.so manages eth0 > rootlet_fd =3D get_xdp_fd(eth0); > subprog_btf_id =3D libbpf_find_prog_btf_id("name_of_placeholder", roole= t_fd); > // ^ this function is in patch 16/18 of trampoline > attr.attach_prog_fd =3D roolet_fd; > attr.attach_btf_id =3D subprog_btf_id; > // pair (prog_fd, btf_id) needs to be specified at load time > obj =3D bpf_object__open("fw2.o", attr); > bpf_object__load(obj); > prog =3D bpf_object__find_program_by_title(obj); > link =3D bpf_program__replace(prog); // similar to bpf_program__attach_= trace() > // no extra arguments during 'replace'. > // Target (prog_fd, btf_id) already known to the kernel and verified OK, this makes sense. >> So the two component programs would still exist as kernel objects, >> right?=20 > > yes. Both fw1.o and fw2.o will be loaded and running instead of placehold= ers. > >> And the trampolines would keep individual stats for each one (if >> BPF stats are enabled)?=20 > > In case of dynamic linking both fw1.o and fw2.o will be seen as individua= l > programs from 'bpftool p s' point of view. And both will have > individual stats. Right, this is important, and I think it's where my skepticism about static linking comes from. With static linking, each XDP program will be "reduced" to a subprog instead of a full stand-alone program. Which means that its execution will be different depending on whether it is just attached directly to an interface, or if it's been linked with a rootlet before loading. I'll admit I don't know enough about how subprograms actually work to know if it's a *meaningful* difference, so I guess I'll go play around with it. If nothing else, experimenting with static linking can be a way to hash out the semantics until dynamic linking lands. >> Could userspace also extract the prog IDs being >> referenced by the "glue" proglet?=20 > > Not sure I follow. Both fw1.o and fw2.o will have their own prog ids. > fw1_prog->aux->linked_prog =3D=3D rootlet_prog > fw2_prog->aux->linked_prog =3D=3D rootlet_prog > Unloading and detaching fw1.o will make kernel to switch back to placehol= der > subprog in roolet_prog. I believe roolet_prog should not keep a list of p= rogs > that attached to it (or replaced its subprogs) to avoid circular > dependency. Well I did mean the link in the other direction. But thinking about it some more, I don't think it really matters. The important bit is that userspace can answer the question "given that rootlet ID X is currently attached on eth0, which two program IDs Y and Z will actually run on that interface?". And if there's a link in the other direction, it could just iterate over all loaded programs in the kernel to find them, so that is OK; as long as we can also tell in which "slot" in the rootlet a given program is currently attached. > Due to that detaching roolet_prog from netdev will stop the flow of > packets into fw1.o, but refcnt of rootlet_prog will not go to zero, so > it will stay in memory until both fw1.o and fw2.o detach from > rootlet.o. OK, that is probably fine. I think we should teach most utilities to deal with this anyway; in particular, iproute2 should know about multi-progs (i.e., link against libxdp). >> What about attaching a third program? Would that work by recursion (as >> above, but with the old proglet as old_fd), or should the library build >> a whole new sequence from the component programs? > > This choice is up to libxdp.so. It can have a number of placeholders > ready to be replaced by new progs. Or it can re-generate rootlet.o > every time new fwX.o comes along. Short term I would start development > with auto-generated roolet.o and static linking done by libbpf > while the policy and roolet are done by libxdp.so, since this work > doesn't depend on any kernel changes. Long term auto-generation > can stay in libxdp.so if it turns out to be sufficient. Yes, as I said above this sounds like at least it's a start. >> Finally, what happens if someone where to try to attach a retprobe to >> one of the component programs? Could it be possible to do that even >> while program is being run from proglet dispatch? That way we can still >> debug an individual XDP program even though it's run as part of a chain. > > Right. The fentry/fexit tracing is orthogonal to static/dynamic linking. > It will be available for all prog types after trampoline patches land. > See fexit_bpf2bpf.c example in the last 18/18 patch. > We will be able to debug XDP program regardless whether it's a rootlet > or a subprogram. Doesn't matter whether linking was static or dynamic. OK, that's great, and certainly resolved one point of skepticism :) > With fentry/fexit we will be able to do different stats too. > Right now bpf program stats are limited to cycles and I resisted a lot > of pressure to add more hard coded stats. With fentry/fexit we can > collect arbitrary counters per program. Like number of L1-cache misses > or number of TLB misses in a given XDP prog. Yeah, that makes a lot of sense, of course. Great! >> Sounds reasonable. Any reason libxdp.so couldn't be part of libbpf? > > libxdp.so is a policy specifier while libbpf is a tool. It makes more > sense for them to be separate. libbpf has strong api compatibility > guarantees. While I don't think anyone knows at this point how libxdp > api should look and it will take some time for it to mature. Well, we'd want libxdp to have the same strong API guarantees, eventually. Which would be a reason to just include it in libbpf. But sure, I wasn't suggesting to do this from the get-go; we can start out with something separate and decide later when/if it makes sense to integrate. As long as libbpf can do the heavy lifting on the actual linking that is fine with me. -Toke