From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFB70C28CC6 for ; Mon, 3 Jun 2019 22:34:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 85BA224270 for ; Mon, 3 Jun 2019 22:34:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D7qJebud" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726354AbfFCWeU (ORCPT ); Mon, 3 Jun 2019 18:34:20 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:43209 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726025AbfFCWeT (ORCPT ); Mon, 3 Jun 2019 18:34:19 -0400 Received: by mail-qt1-f194.google.com with SMTP id z24so4347744qtj.10; Mon, 03 Jun 2019 15:34:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hUDi3LpJDfbh5JFvsyRAzGmS70LuLctJXOtDXn0LQ2Q=; b=D7qJebudrGZeM1gOmc8jtIR6aIIHgpsJ8sYSHcLSX593cDg/MLokBdWE3Dc/4yxJLW TEAgKcBdl1OU2WlLxZ8ImKNN8dwvRe2vQDf83zOTc59Ydzpqv7nINiUw10YWeIHBjPM9 WUigYO2x0AfogSKwADhhcVGPkBZQlSZhAj9G1cunBakpKr4zn8uTZBvG+HyKAzXIy/tc ci07EH8yK3KtgJT0AQ+D4MIXpU7xPmwXT5F/FUbgZL4qHx5gOA5/DzZeYjJELHdj1LWD VfzNfmMRNVrnpcf7Kb1YLgYdhtBV683/zHPcjCUVHqCZwmzTPWtqn1qeU0Bcw9i6cs+R y0hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hUDi3LpJDfbh5JFvsyRAzGmS70LuLctJXOtDXn0LQ2Q=; b=g+uR6a6NS6U+EApyaRmIeRXLDDoLu1xaigD0UxXmUxnudVvx/9VoU1IO16opVodtnW prx/S/gn5jR7UB3cpLwaO16bR5ntTRzpqp3ld38TG/PIS2S4WH3Gv74eb2QsU8enC5Vc Bz9r1Ga2J8XyiIf4dd7NeDZjYVU0KU7NkZ+voKg1q6kh1iPcKxBjcLLhMg9zRQeuMmip 1RPUCXq8hcbsxSPsbfVYmr/XyM5oekyBQZxUg85Itp7p5e26MAHAPctSE5m7sVo+PDkM PhJF4LHzwdUM+W2e+7iVArDokGU83GAAGUDa+FY0Ht0F6TD/yZjk+kWEmkAbg3tD3pCT 0M2g== X-Gm-Message-State: APjAAAUvbb5RdsbBfhfHc7YsSrKdv0jPLO1kQVR8yY2sgfpdji92p3yh Ab8j2tUFioDP+OcdVroDdYutcYuu5FnjDHIHZDn2rZTxWQE= X-Google-Smtp-Source: APXvYqzRDhbCUq4oamjTp7IFBzoFqATLRDyx7vn9DWCXD+P/G7Mm39rg8ZHzRHYYYBY9yKRu+AczaS5kivZNODkB7ew= X-Received: by 2002:ac8:2a63:: with SMTP id l32mr5360414qtl.117.1559601258098; Mon, 03 Jun 2019 15:34:18 -0700 (PDT) MIME-Version: 1.0 References: <20190531202132.379386-1-andriin@fb.com> <20190531202132.379386-7-andriin@fb.com> In-Reply-To: <20190531202132.379386-7-andriin@fb.com> From: Andrii Nakryiko Date: Mon, 3 Jun 2019 15:34:06 -0700 Message-ID: Subject: Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF To: Andrii Nakryiko Cc: Networking , bpf , Alexei Starovoitov , Daniel Borkmann , Kernel Team Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, May 31, 2019 at 1:21 PM Andrii Nakryiko wrote: > > This patch adds support for a new way to define BPF maps. It relies on > BTF to describe mandatory and optional attributes of a map, as well as > captures type information of key and value naturally. This eliminates > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are > always in sync with the key/value type. > > Relying on BTF, this approach allows for both forward and backward > compatibility w.r.t. extending supported map definition features. Old > libbpf implementation will ignore fields it doesn't recognize, while new > implementations will parse and recognize new optional attributes. > > The outline of the new map definition (short, BTF-defined maps) is as follows: > 1. All the maps should be defined in .maps ELF section. It's possible to > have both "legacy" map definitions in `maps` sections and BTF-defined > maps in .maps sections. Everything will still work transparently. > 2. The map declaration and initialization is done through > a global/static variable of a struct type with few mandatory and > extra optional fields: > - type field is mandatory and specified type of BPF map; > - key/value fields are mandatory and capture key/value type/size information; > - max_entries attribute is optional; if max_entries is not specified or > initialized, it has to be provided in runtime through libbpf API > before loading bpf_object; > - map_flags is optional and if not defined, will be assumed to be 0. > 3. Key/value fields should be **a pointer** to a type describing > key/value. The pointee type is assumed (and will be recorded as such > and used for size determination) to be a type describing key/value of > the map. This is done to save excessive amounts of space allocated in > corresponding ELF sections for key/value of big size. > 4. As some maps disallow having BTF type ID associated with key/value, > it's possible to specify key/value size explicitly without > associating BTF type ID with it. Use key_size and value_size fields > to do that (see example below). > > Here's an example of simple ARRAY map defintion: > > struct my_value { int x, y, z; }; > > struct { > int type; > int max_entries; > int *key; > struct my_value *value; > } btf_map SEC(".maps") = { > .type = BPF_MAP_TYPE_ARRAY, > .max_entries = 16, > }; > > This will define BPF ARRAY map 'btf_map' with 16 elements. The key will > be of type int and thus key size will be 4 bytes. The value is struct > my_value of size 12 bytes. This map can be used from C code exactly the > same as with existing maps defined through struct bpf_map_def. > > Here's an example of STACKMAP definition (which currently disallows BTF type > IDs for key/value): > > struct { > __u32 type; > __u32 max_entries; > __u32 map_flags; > __u32 key_size; > __u32 value_size; > } stackmap SEC(".maps") = { > .type = BPF_MAP_TYPE_STACK_TRACE, > .max_entries = 128, > .map_flags = BPF_F_STACK_BUILD_ID, > .key_size = sizeof(__u32), > .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id), > }; > > This approach is naturally extended to support map-in-map, by making a value > field to be another struct that describes inner map. This feature is not > implemented yet. It's also possible to incrementally add features like pinning > with full backwards and forward compatibility. So I wanted to elaborate a bit more on what I'm planning to add, once we agree on the approach. Those are the features that are currently supported by iproute2 loader and here's how I was thinking to support them with BTF-defined maps. Once all this is implemented, there should be just a mechanical field rename to switch BPF apps relying on iproute2 loader (size_key -> key_size, size_value -> value_size, max_elem -> max_entries) for most maps. For more complicated cases described below, I hope we can agree it's easy to migrate and end result might even look better (because more explicit). 1. Pinning. This one is simple: - add pinning attribute, that will either be "no pinning", "global pinning", "object-scope pinning". - by default pinning root will be "/sys/fs/bpf", but one will be able to override this per-object using extra options (so that "/sys/fs/bpf/tc" can be specified). 2. Map-in-map declaration: As outlined at LSF/MM, we can extend value type to be another map definition, specifying a prototype for inner map: struct { int type; int max_entries; struct outer_key *key; struct { /* this is definition of inner map */ int type; int max_entries; struct inner_key *key; struct inner_value *value; } value; } my_hash_of_arrays BPF_MAP = { .type = BPF_MAP_TYPE_HASH_OF_MAPS, .max_entries = 1024, .value = { .type = BPF_MAP_TYPE_ARRAY, .max_entries = 64, }, }; This would declare a hash_of_maps, where inner maps are arrays of 64 elements each. Notice, that struct defining inner map can be declared outside and shared with other maps: struct inner_map_t { int type; int max_entries; struct inner_key *key; struct inner_value *value; }; struct { int type; int max_entries; struct outer_key *key; struct inner_map_t value; } my_hash_of_arrays BPF_MAP = { .type = BPF_MAP_TYPE_HASH_OF_MAPS, .max_entries = 1024, .value = { .type = BPF_MAP_TYPE_ARRAY, .max_entries = 64, }, }; 3. Initialization of prog array. Iproute2 supports a convention-driven initialization of BPF_MAP_TYPE_PROG_ARRAY using special section names (wrapped into __section_tail(ID, IDX)): struct bpf_elf_map SEC("maps") POLICY_CALL_MAP = { .type = BPF_MAP_TYPE_PROG_ARRAY, .id = MAP_ID, .size_key = sizeof(__u32), .size_value = sizeof(__u32), .max_elem = 16, }; __section_tail(MAP_ID, MAP_IDX) int handle_policy(struct __sk_buff *skb) { ... } For each such program, iproute2 will put its FD (for later tail-calling) into a corresponding MAP with id == MAP_ID at index MAP_IDX. Here's how I see this supported in BTF-defined maps case. typedef int (* skbuff_tailcall_fn)(struct __sk_buff *); struct { int type; int max_entries; int *key; skbuff_tailcall_fb value[]; } POLICY_CALL_MAP SEC(".maps") = { .type = BPF_MAP_TYPE_PROG_ARRAY, .max_entries = 16, .value = { &handle_policy, NULL, &handle_some_other_policy, }, }; libbpf loader will greate BPF_MAP_TYPE_PROG_ARRAY map with 16 elements and will initialize first and third entries with FDs of handle_policy and handle_some_other_policy programs. As an added nice bonus, compiler should also warn on signature mismatch. ;) 4. We can extend this idea into ARRAY_OF_MAPS initialization. This is currently implemented in iproute2 using .id, .inner_id, and .inner_idx fields. struct inner_map_t { int type; int max_entries; struct inner_key *key; struct inner_value *value; }; struct inner_map_t map1 = {...}; struct inner_map_t map2 = {...}; struct { int type; int max_entries; struct outer_key *key; struct inner_map_t value[]; } my_hash_of_arrays BPF_MAP = { .type = BPF_MAP_TYPE_ARRAY_OF_MAPS, .max_entries = 2, .value = { &map1, &map2, }, }; There are a bunch of slight variations we might consider (e.g., value vs values, when there is inline initialization, is it an array of structs or an array of pointers to structs, etc), but the overall idea stays the same. So when all this is implemented and supported, from looking at Cilium, it seems like conversion of iproute2 to libbpf should be rather simple and painless. I'd be curious to hear what Cilium folks are thinking about that. > > Signed-off-by: Andrii Nakryiko > ---