bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
@ 2022-10-25 22:27 Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations Eduard Zingerman
                   ` (14 more replies)
  0 siblings, 15 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

Hi BPF community,

AFAIK there is a long standing feature request to use kernel headers
alongside `vmlinux.h` generated by `bpftool`. For example significant
effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
clang, unfortunately this effort was stuck due to concerns regarding C
language semantics.

After some discussion with Alexei and Yonghong I'd like to request
your comments regarding a somewhat brittle and partial solution to
this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
the generated `vmlinux.h`.

The basic idea
---

The goal of the patch set is to allow usage of header files from
`include/uapi` alongside `vmlinux.h` as follows:

  #include <uapi/linux/tcp.h>
  #include "vmlinux.h"

This goal is achieved by adding `#ifndef ... #endif` guards in
`vmlinux.h` around definitions that originate from the `include/uapi`
headers. The guards emitted match the guards used in the original
headers. E.g. as follows:

include/uapi/linux/tcp.h:

  #ifndef _UAPI_LINUX_TCP_H
  #define _UAPI_LINUX_TCP_H
  ...
  union tcp_word_hdr {
	struct tcphdr hdr;
	__be32        words[5];
  };
  ...
  #endif /* _UAPI_LINUX_TCP_H */

vmlinux.h:

  ...
  #ifndef _UAPI_LINUX_TCP_H

  union tcp_word_hdr {
	struct tcphdr hdr;
	__be32 words[5];
  };

  #endif /* _UAPI_LINUX_TCP_H */
  ...

To get to this state the following steps are necessary:
- "header guard" name should be identified for each header file;
- the correspondence between data type and it's header guard has to be
  encoded in BTF;
- `bpftool` should be adjusted to emit `#ifndef FOO_H ... #endif`
  brackets.

It is not possible to identify header guard names for all uapi headers
basing only on the file name. However a simple script could devised to
identify the guards basing on the file name and it's content. Thus it
is possible to obtain the list of header names with corresponding
header guards.

The correspondence between type and it's declaration file (header) is
available in DWARF as `DW_AT_decl_file` attribute. The
`DW_AT_decl_file` can be matched with the list of header guards
described above to obtain the header guard name for a specific type.

The `pahole` generates BTF using DWARF. It is possible to modify
`pahole` to accept the header guards list as an additional parameter
and to encode the header guard names in BTF.

Implementation details
---

Present patch-set implements these ideas as follows:
- A parameter `--header_guards_db` is added to `pahole`. If present it
  points to a file with a list of `<header> <guard>` records.
- `pahole` uses DWARF `DW_AT_decl_file` value to lookup the header
  guard for each type emitted to BTF. If header guard is present it is
  encoded alongside the type.
- Header guards are encoded in BTF as `BTF_DECL_TAG` records with a
  special prefix. The prefix "header_guard:" is added to a value of
  such tags. (Here `BTF_DECL_TAG` is used to avoid BTF binary format
  changes).
- A special script `infer_header_guards.pl` is added as a part of
  kbuild, it can infer header guard names for each UAPI header basing
  on the header content.
- This script is invoked from `link-vmlinux.sh` prior to BTF
  generation during kernel build. The output of the script is saved to
  a file, the file is passed to `pahole` as `--header_guards_db`
  parameter.
- `libbpf` is modified to aggregate `BTF_DECL_TAG` records for each
  type and to emit `#ifndef FOO_H ... #endif` brackets when
  "header_guard:" tag is present for a type.

Details for each patch in a set:
- libbpf: Deduplicate unambigous standalone forward declarations
- selftests/bpf: Tests for standalone forward BTF declarations deduplication

  There is a small number (63 for defconfig) of forward declarations
  that are not de-duplicated with the main type declaration under
  certain conditions. This hinders the header guard brackets
  generation. This patch addresses this de-duplication issue.

- libbpf: Support for BTF_DECL_TAG dump in C format
- selftests/bpf: Tests for BTF_DECL_TAG dump in C format

  Currently libbpf does not process BTF_DECL_TAG when btf is dumped in
  C format. This patch adds a hash table matching btf type ids with a
  list of decl tags to the struct btf_dump.
  The `btf_dump_emit_decl_tags` is not necessary for the overall
  patch-set to function but simplifies testing a bit.

- libbpf: Header guards for selected data structures in vmlinux.h
- selftests/bpf: Tests for header guards printing in BTF dump

  Adds option `emit_header_guards` to `struct btf_dump_opts`.
  When enabled the `btf_dump__dump_type` prints `#ifndef ... #endif`
  brackets around types for which header guard information is present
  in BTF.

- bpftool: Enable header guards generation

  Unconditionally enables `emit_header_guards` for BTF dump in C format.

- kbuild: Script to infer header guard values for uapi headers
- kbuild: Header guards for types from include/uapi/*.h in kernel BTF

  Adds `scripts/infer_header_guards.pl` and integrates it with
  `link-vmlinux.sh`.

- selftests/bpf: Script to verify uapi headers usage with vmlinux.h

  Adds a script `test_uapi_headers.py` that tests header guards with
  vmlinux.h by compiling a simple C snippet. The snippet looks as
  follows:
  
    #include <some_uapi_header.h>
    #include "vmlinux.h"
  
    __attribute__((section("tc"), used))
    int syncookie_tc(struct __sk_buff *skb) { return 0; }
  
  The list of headers to test comes from
  `tools/testing/selftests/bpf/good_uapi_headers.txt`.

- selftests/bpf: Known good uapi headers for test_uapi_headers.py

  The list of uapi headers that could be included alongside vmlinux.h.
  The headers are peeked from the following locations:
  - <headers-export-dir>/linux/*.h
  - <headers-export-dir>/linux/**/*.h
  This choice of locations is somewhat arbitrary.

- selftests/bpf: script for infer_header_guards.pl testing

  The test case for `scripts/infer_header_guards.pl`, verifies that
  header guards can be inferred for all uapi headers.

- There is also a patch for dwarves that adds `--header_guards_db`
  option (see [2]).

The `test_uapi_headers.py` is important as it demonstrates the
the necessary compiler flags:

clang ...                                  \
      -D__x86_64__                         \
      -Xclang -fwchar-type=short           \
      -Xclang -fno-signed-wchar            \
      -I{exported_kernel_headers}/include/ \
      ...

- `-fwchar-type=short` and `-fno-signed-wchar` had to be added because
  BPF target uses `int` for `wchar_t` by default and this differs from
  `vmlinux.h` definition of the type (at least for x86_64).
- `__x86_64__` had to be added for uapi headers that include
  `stddef.h` (the one that is supplied my CLANG itself), in order to
  define correct sizes for `size_t` and `ptrdiff_t`.
- The `{exported_kernel_headers}` stands for exported kernel headers
  directory (the headers obtained by `make headers_install` or via
  distribution package).

When it works
---

The mechanics described above works for a significant number of UAPI
headers. For example, for the test case above I chose the headers from
the following locations:
- linux/*.h
- linux/**/*.h
There are 759 such headers and for 677 of them the test described
above passes.

I excluded the headers from the following sub-directories as
potentially not interesting:

  asm          rdma   video xen
  asm-generic  misc   scsi
  drm          mtd    sound

Thus saving some time for both discussion and CI but the choice is
somewhat arbitrary. If I run `test_uapi_headers.py --test '*'` (all
headers) test passes for 834 out of 972 headers.

When it breaks
---

There several scenarios when this mechanics breaks.
Specifically I found the following cases:
- When uapi header includes some system header that conflicts with
  vmlinux.h.
- When uapi header itself conflicts with vmlinux.h.

Below are examples for both cases.

Conflict with system headers
----

The following uapi headers:
- linux/atmbr2684.h
- linux/bpfilter.h
- linux/gsmmux.h
- linux/icmp.h
- linux/if.h
- linux/if_arp.h
- linux/if_bonding.h
- linux/if_pppox.h
- linux/if_tunnel.h
- linux/ip6_tunnel.h
- linux/llc.h
- linux/mctp.h
- linux/mptcp.h
- linux/netdevice.h
- linux/netfilter/xt_RATEEST.h
- linux/netfilter/xt_hashlimit.h
- linux/netfilter/xt_physdev.h
- linux/netfilter/xt_rateest.h
- linux/netfilter_arp/arp_tables.h
- linux/netfilter_arp/arpt_mangle.h
- linux/netfilter_bridge.h
- linux/netfilter_bridge/ebtables.h
- linux/netfilter_ipv4/ip_tables.h
- linux/netfilter_ipv6/ip6_tables.h
- linux/route.h
- linux/wireless.h

Include the following system header:
- /usr/include/sys/socket.h (all via linux/if.h)

The sys/socket.h conflicts with vmlinux.h in:
- types: struct iovec, struct sockaddr, struct msghdr, ...
- constants: SOCK_STREAM, SOCK_DGRAM, ...

However, only two types are actually used:
- struct sockaddr
- struct sockaddr_storage (used only in linux/mptcp.h)

In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
(non UAPI header), thus does not have a header guard.

The only workaround that I see is to:
- define a stub sys/socket.h as follows:

    #ifndef __BPF_SOCKADDR__
    #define __BPF_SOCKADDR__
    
    /* For __kernel_sa_family_t */
    #include <linux/socket.h>
    
    struct sockaddr {
        __kernel_sa_family_t sa_family;
        char sa_data[14];
    };
    
    #endif

- hardcode generation of __BPF_SOCKADDR__ bracket for
  'struct sockaddr' in vmlinux.h.

Another possibility is to move the definition of 'struct sockaddr'
from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
but I expect that this won't fly with the mainline as it might break
the programs that include both 'linux/socket.h' and 'sys/socket.h'.

Conflict with vmlinux.h
----

Uapi header:
- linux/signal.h

Conflict with vmlinux.h in definition of 'struct sigaction'.
Defined in:
- vmlinux.h: kernel/include/linux/signal_types.h
- uapi:      kernel/arch/x86/include/asm/signal.h

Uapi headers:
- linux/tipc_sockets_diag.h
- linux/sock_diag.h

Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
Defined in:
- vmlinux.h: kernel/include/net/sock.h
- uapi:      kernel/include/uapi/linux/sock_diag.h
Constants seem to be unrelated.

And so on... I have details for many other headers but omit those for
brevity.

In conclusion
---

Except from the general feasibility I have a few questions:
- What UAPI headers are the candidates for such use? If there are some
  interesting headers currently not working with this patch-set some
  hacks have to be added (e.g. like with `linux/if.h`).
- Is it ok to encode header guards as special `BTF_DECL_TAG` or should
  I change the BTF format a bit to save some bytes.

Thanks,
Eduard


[1] https://reviews.llvm.org/D111307
    [clang] __attribute__ bpf_dominating_decl
[2] https://lore.kernel.org/dwarves/20221025220729.2293891-1-eddyz87@gmail.com/T/
    [RFC dwarves] pahole: Save header guard names when
                  --header_guards_db is passed

Eduard Zingerman (12):
  libbpf: Deduplicate unambigous standalone forward declarations
  selftests/bpf: Tests for standalone forward BTF declarations
    deduplication
  libbpf: Support for BTF_DECL_TAG dump in C format
  selftests/bpf: Tests for BTF_DECL_TAG dump in C format
  libbpf: Header guards for selected data structures in vmlinux.h
  selftests/bpf: Tests for header guards printing in BTF dump
  bpftool: Enable header guards generation
  kbuild: Script to infer header guard values for uapi headers
  kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  selftests/bpf: Script to verify uapi headers usage with vmlinux.h
  selftests/bpf: Known good uapi headers for test_uapi_headers.py
  selftests/bpf: script for infer_header_guards.pl testing

 scripts/infer_header_guards.pl                | 191 +++++
 scripts/link-vmlinux.sh                       |  13 +-
 tools/bpf/bpftool/btf.c                       |   4 +-
 tools/lib/bpf/btf.c                           | 178 ++++-
 tools/lib/bpf/btf.h                           |   7 +-
 tools/lib/bpf/btf_dump.c                      | 232 +++++-
 .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
 .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
 .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
 .../progs/btf_dump_test_case_header_guards.c  |  94 +++
 .../bpf/test_uapi_header_guards_infer.sh      |  33 +
 .../selftests/bpf/test_uapi_headers.py        | 197 +++++
 13 files changed, 1816 insertions(+), 12 deletions(-)
 create mode 100755 scripts/infer_header_guards.pl
 create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
 create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
 create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
 create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py

-- 
2.34.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-27 22:07   ` Andrii Nakryiko
  2022-10-25 22:27 ` [RFC bpf-next 02/12] selftests/bpf: Tests for standalone forward BTF declarations deduplication Eduard Zingerman
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

Deduplicate forward declarations that don't take part in type graphs
comparisons if declaration name is unambiguous. Example:

CU #1:

struct foo;              // standalone forward declaration
struct foo *some_global;

CU #2:

struct foo { int x; };
struct foo *another_global;

The `struct foo` from CU #1 is not a part of any definition that is
compared against another definition while `btf_dedup_struct_types`
processes structural types. The the BTF after `btf_dedup_struct_types`
the BTF looks as follows:

[1] STRUCT 'foo' size=4 vlen=1 ...
[2] INT 'int' size=4 ...
[3] PTR '(anon)' type_id=1
[4] FWD 'foo' fwd_kind=struct
[5] PTR '(anon)' type_id=4

This commit adds a new pass `btf_dedup_standalone_fwds`, that maps
such forward declarations to structs or unions with identical name in
case if the name is not ambiguous.

The pass is positioned before `btf_dedup_ref_types` so that types
[3] and [5] could be merged as a same type after [1] and [4] are merged.
The final result for the example above looks as follows:

[1] STRUCT 'foo' size=4 vlen=1
	'x' type_id=2 bits_offset=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] PTR '(anon)' type_id=1

For defconfig kernel with BTF enabled this removes 63 forward
declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
The running time of `btf__dedup` function is increased by about 3%.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 174 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index d88647da2c7f..c34c68d8e8a0 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d);
 static int btf_dedup_prim_types(struct btf_dedup *d);
 static int btf_dedup_struct_types(struct btf_dedup *d);
 static int btf_dedup_ref_types(struct btf_dedup *d);
+static int btf_dedup_standalone_fwds(struct btf_dedup *d);
 static int btf_dedup_compact_types(struct btf_dedup *d);
 static int btf_dedup_remap_types(struct btf_dedup *d);
 
@@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d);
  * Algorithm summary
  * =================
  *
- * Algorithm completes its work in 6 separate passes:
+ * Algorithm completes its work in 7 separate passes:
  *
  * 1. Strings deduplication.
  * 2. Primitive types deduplication (int, enum, fwd).
  * 3. Struct/union types deduplication.
- * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
+ * 4. Standalone fwd declarations deduplication.
+ * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
  *    protos, and const/volatile/restrict modifiers).
- * 5. Types compaction.
- * 6. Types remapping.
+ * 6. Types compaction.
+ * 7. Types remapping.
  *
  * Algorithm determines canonical type descriptor, which is a single
  * representative type for each truly unique type. This canonical type is the
@@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
 		pr_debug("btf_dedup_struct_types failed:%d\n", err);
 		goto done;
 	}
+	err = btf_dedup_standalone_fwds(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_standalone_fwd failed:%d\n", err);
+		goto done;
+	}
 	err = btf_dedup_ref_types(d);
 	if (err < 0) {
 		pr_debug("btf_dedup_ref_types failed:%d\n", err);
@@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
 	return 0;
 }
 
+/*
+ * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32).
+ *
+ * The __u32 key/value representations are cast to `void *` before passing
+ * to `hashmap__*` functions. These pseudo-pointers are never dereferenced.
+ *
+ */
+static struct hashmap *name_off_map__new(void)
+{
+	return hashmap__new(btf_dedup_identity_hash_fn,
+			    btf_dedup_equal_fn,
+			    NULL);
+}
+
+static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id)
+{
+	/* This has to be sizeof(void *) in order to be passed to hashmap__find */
+	void *tmp;
+	int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp);
+	/*
+	 * __u64 cast is necessary to avoid pointer to integer conversion size warning.
+	 * It is fine to get rid of this warning as `void *` is used as an integer value.
+	 */
+	if (found)
+		*type_id = (__u64)tmp;
+	return found;
+}
+
+static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id)
+{
+	return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id,
+			    NULL, NULL);
+}
+
+/*
+ * Collect a `name_off_map` that maps type names to type ids for all
+ * canonical structs and unions. If the same name is shared by several
+ * canonical types use a special value 0 to indicate this fact.
+ */
+static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
+{
+	int i, err = 0;
+	__u32 type_id, collision_id;
+	__u16 kind;
+	struct btf_type *t;
+
+	for (i = 0; i < d->btf->nr_types; i++) {
+		type_id = d->btf->start_id + i;
+		t = btf_type_by_id(d->btf, type_id);
+		kind = btf_kind(t);
+
+		if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
+			continue;
+
+		/* Skip non-canonical types */
+		if (type_id != d->map[type_id])
+			continue;
+
+		err = 0;
+		if (name_off_map__find(names_map, t->name_off, &collision_id)) {
+			/* Mark non-unique names with 0 */
+			if (collision_id != 0 && collision_id != type_id)
+				err = name_off_map__set(names_map, t->name_off, 0);
+		} else {
+			err = name_off_map__set(names_map, t->name_off, type_id);
+		}
+
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
+static int btf_dedup_standalone_fwd(struct btf_dedup *d,
+				    struct hashmap *names_map,
+				    __u32 type_id)
+{
+	struct btf_type *t = btf_type_by_id(d->btf, type_id);
+	__u16 kind = btf_kind(t);
+	enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info);
+
+	struct btf_type *cand_t;
+	__u16 cand_kind;
+	__u32 cand_id = 0;
+
+	if (kind != BTF_KIND_FWD)
+		return 0;
+
+	/* Skip if this FWD already has a mapping */
+	if (type_id != d->map[type_id])
+		return 0;
+
+	name_off_map__find(names_map, t->name_off, &cand_id);
+	if (!cand_id)
+		return 0;
+
+	cand_t = btf_type_by_id(d->btf, cand_id);
+	cand_kind = btf_kind(cand_t);
+	if (!(cand_kind == BTF_KIND_STRUCT && fwd_kind == BTF_FWD_STRUCT) &&
+	    !(cand_kind == BTF_KIND_UNION && fwd_kind == BTF_FWD_UNION))
+		return 0;
+
+	d->map[type_id] = cand_id;
+
+	return 0;
+}
+
+/*
+ * Standalone fwd declarations deduplication.
+ *
+ * The lion's share of all FWD declarations is resolved during
+ * `btf_dedup_struct_types` phase when different type graphs are
+ * compared against each other. However, if in some compilation unit a
+ * FWD declaration is not a part of a type graph compared against
+ * another type graph that declaration's canonical type would not be
+ * changed. Example:
+ *
+ * CU #1:
+ *
+ * struct foo;
+ * struct foo *some_global;
+ *
+ * CU #2:
+ *
+ * struct foo { int u; };
+ * struct foo *another_global;
+ *
+ * After `btf_dedup_struct_types` the BTF looks as follows:
+ *
+ * [1] STRUCT 'foo' size=4 vlen=1 ...
+ * [2] INT 'int' size=4 ...
+ * [3] PTR '(anon)' type_id=1
+ * [4] FWD 'foo' fwd_kind=struct
+ * [5] PTR '(anon)' type_id=4
+ *
+ * This pass assumes that such FWD declarations should be mapped to
+ * structs or unions with identical name in case if the name is not
+ * ambiguous.
+ */
+static int btf_dedup_standalone_fwds(struct btf_dedup *d)
+{
+	int i, err;
+	struct hashmap *names_map = name_off_map__new();
+
+	if (!names_map)
+		return -ENOMEM;
+
+	err = btf_dedup_fill_unique_names_map(d, names_map);
+	if (err < 0)
+		goto exit;
+
+	for (i = 0; i < d->btf->nr_types; i++) {
+		err = btf_dedup_standalone_fwd(d, names_map, d->btf->start_id + i);
+		if (err < 0)
+			goto exit;
+	}
+
+exit:
+	hashmap__free(names_map);
+	return err;
+}
+
 /*
  * Compact types.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 02/12] selftests/bpf: Tests for standalone forward BTF declarations deduplication
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 03/12] libbpf: Support for BTF_DECL_TAG dump in C format Eduard Zingerman
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

Tests to verify the following behavior of `btf_dedup_standalone_fwds`:
- remapping for struct forward declarations;
- remapping for union forward declarations;
- no remapping if forward declaration kind does not match similarly
  named struct or union declaration;
- no remapping if forward declaration name is ambiguous.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/btf.c | 152 +++++++++++++++++++
 1 file changed, 152 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
index 127b8caa3dc1..f14020d51ab9 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -7598,6 +7598,158 @@ static struct btf_dedup_test dedup_tests[] = {
 		BTF_STR_SEC("\0e1\0e1_val"),
 	},
 },
+{
+	.descr = "dedup: standalone fwd declaration struct",
+	/*
+	 * // CU 1:
+	 * struct foo { int x; };
+	 * struct foo *a;
+	 *
+	 * // CU 2:
+	 * struct foo;
+	 * struct foo *b;
+	 */
+	.input = {
+		.raw_types = {
+			BTF_STRUCT_ENC(NAME_NTH(1), 1, 4),             /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_FWD_ENC(NAME_TBD, 0),                      /* [4] */
+			BTF_PTR_ENC(4),                                /* [5] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_STRUCT_ENC(NAME_NTH(1), 1, 4),             /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x"),
+	},
+},
+{
+	.descr = "dedup: standalone fwd declaration union",
+	/*
+	 * // CU 1:
+	 * union foo { int x; };
+	 * union foo *another_global;
+	 *
+	 * // CU 2:
+	 * union foo;
+	 * union foo *some_global;
+	 */
+	.input = {
+		.raw_types = {
+			BTF_UNION_ENC(NAME_NTH(1), 1, 4),              /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_FWD_ENC(NAME_TBD, 1),                      /* [4] */
+			BTF_PTR_ENC(4),                                /* [5] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_UNION_ENC(NAME_NTH(1), 1, 4),              /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x"),
+	},
+},
+{
+	.descr = "dedup: standalone fwd declaration wrong kind",
+	/*
+	 * // CU 1:
+	 * struct foo { int x; };
+	 * struct foo *b;
+	 *
+	 * // CU 2:
+	 * union foo;
+	 * union foo *a;
+	 */
+	.input = {
+		.raw_types = {
+			BTF_STRUCT_ENC(NAME_NTH(1), 1, 4),             /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_FWD_ENC(NAME_TBD, 1),                      /* [4] */
+			BTF_PTR_ENC(4),                                /* [5] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_STRUCT_ENC(NAME_NTH(1), 1, 4),             /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_FWD_ENC(NAME_TBD, 1),                      /* [4] */
+			BTF_PTR_ENC(4),                                /* [5] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x"),
+	},
+},
+{
+	.descr = "dedup: standalone fwd declaration name conflict",
+	/*
+	 * // CU 1:
+	 * struct foo { int x; };
+	 * struct foo *a;
+	 *
+	 * // CU 2:
+	 * struct foo;
+	 * struct foo *b;
+	 *
+	 * // CU 3:
+	 * struct foo { int x; int y; };
+	 * struct foo *c;
+	 */
+	.input = {
+		.raw_types = {
+			BTF_STRUCT_ENC(NAME_NTH(1), 1, 4),             /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_FWD_ENC(NAME_TBD, 0),                      /* [4] */
+			BTF_PTR_ENC(4),                                /* [5] */
+			BTF_STRUCT_ENC(NAME_NTH(1), 2, 8),             /* [6] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_MEMBER_ENC(NAME_NTH(3), 2, 0),
+			BTF_PTR_ENC(6),                                /* [7] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x\0y"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_STRUCT_ENC(NAME_NTH(1), 1, 4),             /* [1] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [2] */
+			BTF_PTR_ENC(1),                                /* [3] */
+			BTF_FWD_ENC(NAME_TBD, 0),                      /* [4] */
+			BTF_PTR_ENC(4),                                /* [5] */
+			BTF_STRUCT_ENC(NAME_NTH(1), 2, 8),             /* [6] */
+			BTF_MEMBER_ENC(NAME_NTH(2), 2, 0),
+			BTF_MEMBER_ENC(NAME_NTH(3), 2, 0),
+			BTF_PTR_ENC(6),                                /* [7] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0foo\0x\0y"),
+	},
+},
 
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 03/12] libbpf: Support for BTF_DECL_TAG dump in C format
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 02/12] selftests/bpf: Tests for standalone forward BTF declarations deduplication Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-27 22:36   ` Andrii Nakryiko
  2022-10-25 22:27 ` [RFC bpf-next 04/12] selftests/bpf: Tests " Eduard Zingerman
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

At C level BTF_DECL_TAGs are represented as __attribute__
declarations, e.g.:

struct foo {
	...;
} __attribute__((btf_decl_tag("bar")));

This commit covers only decl tags attached to structs and unions.

BTF doc says that BTF_DECL_TAGs should follow a target type but this
is not enforced and tests don't honor this restriction.
This commit uses hash table to map types to the list of decl tags.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/lib/bpf/btf_dump.c | 143 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 142 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index bf0cc0e986dd..9bfe2a4ae277 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -75,6 +75,15 @@ struct btf_dump_data {
 	bool is_array_char;
 };
 
+/*
+ * An array of ids of BTF_DECL_TAG objects associated with a specific type.
+ */
+struct decl_tag_array {
+	__u16 cnt;
+	__u16 cap;
+	__u32 tag_ids[0];
+};
+
 struct btf_dump {
 	const struct btf *btf;
 	btf_dump_printf_fn_t printf_fn;
@@ -111,6 +120,11 @@ struct btf_dump {
 	 * name occurrences
 	 */
 	struct hashmap *ident_names;
+	/*
+	 * maps type id to decl_tag_array, assume that relatively small
+	 * fraction of types has btf_decl_tag's attached
+	 */
+	struct hashmap *decl_tags;
 	/*
 	 * data for typed display; allocated if needed.
 	 */
@@ -127,6 +141,26 @@ static bool str_equal_fn(const void *a, const void *b, void *ctx)
 	return strcmp(a, b) == 0;
 }
 
+static size_t int_hash_fn(const void *key, void *ctx)
+{
+	int i;
+	size_t h = 0;
+	char *bytes = (char *)key;
+
+	for (i = 0; i < 4; ++i)
+		h = h * 31 + bytes[i];
+
+	return h;
+}
+
+static bool int_equal_fn(const void *a, const void *b, void *ctx)
+{
+	int *ia = (int *)a;
+	int *ib = (int *)b;
+
+	return *ia == *ib;
+}
+
 static const char *btf_name_of(const struct btf_dump *d, __u32 name_off)
 {
 	return btf__name_by_offset(d->btf, name_off);
@@ -143,6 +177,7 @@ static void btf_dump_printf(const struct btf_dump *d, const char *fmt, ...)
 
 static int btf_dump_mark_referenced(struct btf_dump *d);
 static int btf_dump_resize(struct btf_dump *d);
+static int btf_dump_assign_decl_tags(struct btf_dump *d);
 
 struct btf_dump *btf_dump__new(const struct btf *btf,
 			       btf_dump_printf_fn_t printf_fn,
@@ -179,11 +214,24 @@ struct btf_dump *btf_dump__new(const struct btf *btf,
 		d->ident_names = NULL;
 		goto err;
 	}
+	d->decl_tags = hashmap__new(int_hash_fn, int_equal_fn, NULL);
+	if (IS_ERR(d->decl_tags)) {
+		err = PTR_ERR(d->decl_tags);
+		d->decl_tags = NULL;
+		goto err;
+	}
 
 	err = btf_dump_resize(d);
 	if (err)
 		goto err;
 
+	err = btf_dump_assign_decl_tags(d);
+	if (err)
+		goto err;
+
+	if (err)
+		goto err;
+
 	return d;
 err:
 	btf_dump__free(d);
@@ -232,7 +280,8 @@ static void btf_dump_free_names(struct hashmap *map)
 
 void btf_dump__free(struct btf_dump *d)
 {
-	int i;
+	int i, bkt;
+	struct hashmap_entry *cur;
 
 	if (IS_ERR_OR_NULL(d))
 		return;
@@ -250,6 +299,9 @@ void btf_dump__free(struct btf_dump *d)
 	free(d->decl_stack);
 	btf_dump_free_names(d->type_names);
 	btf_dump_free_names(d->ident_names);
+	hashmap__for_each_entry(d->decl_tags, cur, bkt)
+		free(cur->value);
+	hashmap__free(d->decl_tags);
 
 	free(d);
 }
@@ -373,6 +425,77 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
 	return 0;
 }
 
+static struct decl_tag_array *btf_dump_find_decl_tags(struct btf_dump *d, __u32 id)
+{
+	struct decl_tag_array *decl_tags = NULL;
+
+	hashmap__find(d->decl_tags, &id, (void **)&decl_tags);
+
+	return decl_tags;
+}
+
+static struct decl_tag_array *realloc_decl_tags(struct decl_tag_array *tags, __u16 new_cap)
+{
+	size_t new_size = sizeof(struct decl_tag_array) + new_cap * sizeof(__u32);
+	struct decl_tag_array *new_tags = (tags
+					   ? realloc(tags, new_size)
+					   : calloc(1, new_size));
+
+	if (!new_tags)
+		return NULL;
+
+	new_tags->cap = new_cap;
+
+	return new_tags;
+}
+
+/*
+ * Scans all BTF objects looking for BTF_KIND_DECL_TAG entries.
+ * The id's of the entries are stored in the `btf_dump.decl_tags` table,
+ * grouped by a target type.
+ */
+static int btf_dump_assign_decl_tags(struct btf_dump *d)
+{
+	int err;
+	__u32 id;
+	__u32 n = btf__type_cnt(d->btf);
+	__u32 new_capacity;
+	const struct btf_type *t;
+	struct decl_tag_array *decl_tags;
+
+	for (id = 0; id < n; id++) {
+		t = btf__type_by_id(d->btf, id);
+
+		if (btf_kind(t) != BTF_KIND_DECL_TAG)
+			continue;
+
+		decl_tags = btf_dump_find_decl_tags(d, t->type);
+		if (!decl_tags) {
+			decl_tags = realloc_decl_tags(NULL, 1);
+			if (!decl_tags)
+				return -ENOMEM;
+			err = hashmap__insert(d->decl_tags, &t->type, decl_tags,
+					      HASHMAP_SET, NULL, NULL);
+			if (err)
+				return err;
+		} else if (decl_tags->cnt == decl_tags->cap) {
+			new_capacity = decl_tags->cap * 2;
+			if (new_capacity > 0xffff)
+				return -ERANGE;
+			decl_tags = realloc_decl_tags(decl_tags, new_capacity);
+			if (!decl_tags)
+				return -ENOMEM;
+			decl_tags->cap = new_capacity;
+			err = hashmap__update(d->decl_tags, &t->type, decl_tags, NULL, NULL);
+			if (err)
+				return err;
+		}
+		decl_tags->tag_ids[decl_tags->cnt++] = id;
+	}
+
+	return 0;
+}
+
 static int btf_dump_add_emit_queue_id(struct btf_dump *d, __u32 id)
 {
 	__u32 *new_queue;
@@ -899,6 +1022,23 @@ static void btf_dump_emit_bit_padding(const struct btf_dump *d,
 	}
 }
 
+static void btf_dump_emit_decl_tags(struct btf_dump *d, __u32 id)
+{
+	struct decl_tag_array *decl_tags = btf_dump_find_decl_tags(d, id);
+	struct btf_type *decl_tag_t;
+	const char *decl_tag_text;
+	__u32 i;
+
+	if (!decl_tags)
+		return;
+
+	for (i = 0; i < decl_tags->cnt; ++i) {
+		decl_tag_t = btf_type_by_id(d->btf, decl_tags->tag_ids[i]);
+		decl_tag_text = btf__name_by_offset(d->btf, decl_tag_t->name_off);
+		btf_dump_printf(d, " __attribute__((btf_decl_tag(\"%s\")))", decl_tag_text);
+	}
+}
+
 static void btf_dump_emit_struct_fwd(struct btf_dump *d, __u32 id,
 				     const struct btf_type *t)
 {
@@ -964,6 +1104,7 @@ static void btf_dump_emit_struct_def(struct btf_dump *d,
 	btf_dump_printf(d, "%s}", pfx(lvl));
 	if (packed)
 		btf_dump_printf(d, " __attribute__((packed))");
+	btf_dump_emit_decl_tags(d, id);
 }
 
 static const char *missing_base_types[][2] = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 04/12] selftests/bpf: Tests for BTF_DECL_TAG dump in C format
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (2 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 03/12] libbpf: Support for BTF_DECL_TAG dump in C format Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 05/12] libbpf: Header guards for selected data structures in vmlinux.h Eduard Zingerman
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

Covers the following cases:
- `__atribute__((btf_decl_tag("...")))` could be applied to structs
  and unions;
- decl tag applied to an empty struct is printed on a single line;
- decl tags with the same name could be applied to several structs;
- several decl tags could be applied to the same struct;
- attribute `packed` works fine with decl tags (it is a separate
  branch in `tools/lib/bpf/btf_dump.c:btf_dump_emit_attributes`.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../selftests/bpf/prog_tests/btf_dump.c       |  1 +
 .../bpf/progs/btf_dump_test_case_decl_tag.c   | 39 +++++++++++++++++++
 2 files changed, 40 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
index 24da335482d4..5f6ce7f1a801 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
@@ -21,6 +21,7 @@ static struct btf_dump_test_case {
 	{"btf_dump: bitfields", "btf_dump_test_case_bitfields", true},
 	{"btf_dump: multidim", "btf_dump_test_case_multidim", false},
 	{"btf_dump: namespacing", "btf_dump_test_case_namespacing", false},
+	{"btf_dump: decl_tag", "btf_dump_test_case_decl_tag", true},
 };
 
 static int btf_dump_all_types(const struct btf *btf, void *ctx)
diff --git a/tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c b/tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
new file mode 100644
index 000000000000..470bbbb530dc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+/*
+ * BTF-to-C dumper test for __atribute__((btf_decl_tag("..."))).
+ */
+/* ----- START-EXPECTED-OUTPUT ----- */
+struct empty_with_tag {} __attribute__((btf_decl_tag("a")));
+
+struct one_tag {
+	int x;
+} __attribute__((btf_decl_tag("b")));
+
+struct same_tag {
+	int x;
+} __attribute__((btf_decl_tag("b")));
+
+struct two_tags {
+	int x;
+} __attribute__((btf_decl_tag("a"))) __attribute__((btf_decl_tag("b")));
+
+struct packed {
+	int x;
+	short y;
+} __attribute__((packed)) __attribute__((btf_decl_tag("another_name")));
+
+struct root_struct {
+	struct empty_with_tag a;
+	struct one_tag b;
+	struct same_tag c;
+	struct two_tags d;
+	struct packed e;
+};
+
+/* ------ END-EXPECTED-OUTPUT ------ */
+
+int f(struct root_struct *s)
+{
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 05/12] libbpf: Header guards for selected data structures in vmlinux.h
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (3 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 04/12] selftests/bpf: Tests " Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-27 22:44   ` Andrii Nakryiko
  2022-10-25 22:27 ` [RFC bpf-next 06/12] selftests/bpf: Tests for header guards printing in BTF dump Eduard Zingerman
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

The goal of the patch is to allow usage of header files from
`include/uapi` alongside with `vmlinux.h`. E.g. as follows:

  #include <uapi/linux/tcp.h>
  #include "vmlinux.h"

This goal is achieved by adding #ifndef / #endif guards in vmlinux.h
around definitions that originate from the `include/uapi` headers. The
guards emitted match the guards used in the original headers.
E.g. as follows:

include/uapi/linux/tcp.h:

  #ifndef _UAPI_LINUX_TCP_H
  #define _UAPI_LINUX_TCP_H
  ...
  union tcp_word_hdr {
	struct tcphdr hdr;
	__be32        words[5];
  };
  ...
  #endif /* _UAPI_LINUX_TCP_H */

vmlinux.h:

  ...
  #ifndef _UAPI_LINUX_TCP_H

  union tcp_word_hdr {
	struct tcphdr hdr;
	__be32 words[5];
  };

  #endif /* _UAPI_LINUX_TCP_H */
  ...

The problem of identifying data structures from uapi and selecting
proper guard names is delegated to pahole. When configured pahole
generates fake `BTF_DECL_TAG` records with header guards information.
The fake tag is distinguished from a real tag by a prefix
"header_guard:" in its value. These tags could be present for unions,
structures, enums and typedefs, e.g.:

[24139] STRUCT 'tcphdr' size=20 vlen=17
  ...
[24296] DECL_TAG 'header_guard:_UAPI_LINUX_TCP_H' type_id=24139 ...

This patch adds An option `emit_header_guards` to `struct btf_dump_opts`.
When this option is present the function `btf_dump__dump_type` emits
header guards for top-level declarations. The header guards are
identified by inspecting fake `BTF_DECL_TAG` records described above.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/lib/bpf/btf.h      |  7 +++-
 tools/lib/bpf/btf_dump.c | 89 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 8e6880d91c84..dcfb3f168750 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -235,8 +235,13 @@ struct btf_dump;
 
 struct btf_dump_opts {
 	size_t sz;
+	/*
+	 * if set to true #ifndef X ... #endif brackets would be printed
+	 * for types marked by BTF_DECL_TAG with "header_guard:" prefix
+	 */
+	bool emit_header_guards;
 };
-#define btf_dump_opts__last_field sz
+#define btf_dump_opts__last_field emit_header_guards
 
 typedef void (*btf_dump_printf_fn_t)(void *ctx, const char *fmt, va_list args);
 
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 9bfe2a4ae277..30b995d209c0 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -113,6 +113,8 @@ struct btf_dump {
 	int decl_stack_cap;
 	int decl_stack_cnt;
 
+	bool emit_header_guards;
+
 	/* maps struct/union/enum name to a number of name occurrences */
 	struct hashmap *type_names;
 	/*
@@ -202,6 +204,8 @@ struct btf_dump *btf_dump__new(const struct btf *btf,
 	d->cb_ctx = ctx;
 	d->ptr_sz = btf__pointer_size(btf) ? : sizeof(void *);
 
+	d->emit_header_guards = OPTS_GET(opts, emit_header_guards, false);
+
 	d->type_names = hashmap__new(str_hash_fn, str_equal_fn, NULL);
 	if (IS_ERR(d->type_names)) {
 		err = PTR_ERR(d->type_names);
@@ -347,6 +351,8 @@ int btf_dump__dump_type(struct btf_dump *d, __u32 id)
 	return 0;
 }
 
+static const char *btf_dump_is_header_guard_tag(struct btf_dump *d, const struct btf_type *t);
+
 /*
  * Mark all types that are referenced from any other type. This is used to
  * determine top-level anonymous enums that need to be emitted as an
@@ -384,11 +390,15 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
 		case BTF_KIND_TYPEDEF:
 		case BTF_KIND_FUNC:
 		case BTF_KIND_VAR:
-		case BTF_KIND_DECL_TAG:
 		case BTF_KIND_TYPE_TAG:
 			d->type_states[t->type].referenced = 1;
 			break;
 
+		case BTF_KIND_DECL_TAG:
+			if (!btf_dump_is_header_guard_tag(d, t))
+				d->type_states[t->type].referenced = 1;
+			break;
+
 		case BTF_KIND_ARRAY: {
 			const struct btf_array *a = btf_array(t);
 
@@ -449,6 +459,40 @@ static struct decl_tag_array *realloc_decl_tags(struct decl_tag_array *tags, __u
 	return new_tags;
 }
 
+#define HEADER_GUARD_TAG "header_guard:"
+
+static const char *btf_dump_is_header_guard_tag(struct btf_dump *d, const struct btf_type *t)
+{
+	const char *tag_value;
+	int tag_len = strlen(HEADER_GUARD_TAG);
+
+	tag_value = btf__str_by_offset(d->btf, t->name_off);
+	if (strncmp(tag_value, HEADER_GUARD_TAG, tag_len))
+		return NULL;
+
+	return &tag_value[tag_len];
+}
+
+static const char *btf_dump_find_header_guard(struct btf_dump *d, __u32 id)
+{
+	struct decl_tag_array *decl_tags = btf_dump_find_decl_tags(d, id);
+	const struct btf_type *t;
+	const char *guard;
+	int i;
+
+	if (!decl_tags)
+		return NULL;
+
+	for (i = 0; i < decl_tags->cnt; ++i) {
+		t = btf__type_by_id(d->btf, decl_tags->tag_ids[i]);
+		guard = btf_dump_is_header_guard_tag(d, t);
+		if (guard)
+			return guard;
+	}
+
+	return NULL;
+}
+
 /*
  * Scans all BTF objects looking for BTF_KIND_DECL_TAG entries.
  * The id's of the entries are stored in the `btf_dump.decl_tags` table,
@@ -770,6 +814,8 @@ static const char *btf_dump_type_name(struct btf_dump *d, __u32 id);
 static const char *btf_dump_ident_name(struct btf_dump *d, __u32 id);
 static size_t btf_dump_name_dups(struct btf_dump *d, struct hashmap *name_map,
 				 const char *orig_name);
+static void btf_dump_emit_guard_start(struct btf_dump *d, __u32 id);
+static void btf_dump_emit_guard_end(struct btf_dump *d, __u32 id);
 
 static bool btf_dump_is_blacklisted(struct btf_dump *d, __u32 id)
 {
@@ -835,8 +881,10 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
 					id);
 				return;
 			}
+			btf_dump_emit_guard_start(d, id);
 			btf_dump_emit_struct_fwd(d, id, t);
 			btf_dump_printf(d, ";\n\n");
+			btf_dump_emit_guard_end(d, id);
 			tstate->fwd_emitted = 1;
 			break;
 		case BTF_KIND_TYPEDEF:
@@ -846,8 +894,10 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
 			 * references through pointer only, not for embedding
 			 */
 			if (!btf_dump_is_blacklisted(d, id)) {
+				btf_dump_emit_guard_start(d, id);
 				btf_dump_emit_typedef_def(d, id, t, 0);
 				btf_dump_printf(d, ";\n\n");
+				btf_dump_emit_guard_end(d, id);
 			}
 			tstate->fwd_emitted = 1;
 			break;
@@ -868,8 +918,10 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
 	case BTF_KIND_ENUM:
 	case BTF_KIND_ENUM64:
 		if (top_level_def) {
+			btf_dump_emit_guard_start(d, id);
 			btf_dump_emit_enum_def(d, id, t, 0);
 			btf_dump_printf(d, ";\n\n");
+			btf_dump_emit_guard_end(d, id);
 		}
 		tstate->emit_state = EMITTED;
 		break;
@@ -884,8 +936,10 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
 		btf_dump_emit_type(d, btf_array(t)->type, cont_id);
 		break;
 	case BTF_KIND_FWD:
+		btf_dump_emit_guard_start(d, id);
 		btf_dump_emit_fwd_def(d, id, t);
 		btf_dump_printf(d, ";\n\n");
+		btf_dump_emit_guard_end(d, id);
 		tstate->emit_state = EMITTED;
 		break;
 	case BTF_KIND_TYPEDEF:
@@ -899,8 +953,10 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
 		 * emit typedef as a forward declaration
 		 */
 		if (!tstate->fwd_emitted && !btf_dump_is_blacklisted(d, id)) {
+			btf_dump_emit_guard_start(d, id);
 			btf_dump_emit_typedef_def(d, id, t, 0);
 			btf_dump_printf(d, ";\n\n");
+			btf_dump_emit_guard_end(d, id);
 		}
 		tstate->emit_state = EMITTED;
 		break;
@@ -923,14 +979,18 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
 			for (i = 0; i < vlen; i++, m++)
 				btf_dump_emit_type(d, m->type, new_cont_id);
 		} else if (!tstate->fwd_emitted && id != cont_id) {
+			btf_dump_emit_guard_start(d, id);
 			btf_dump_emit_struct_fwd(d, id, t);
 			btf_dump_printf(d, ";\n\n");
+			btf_dump_emit_guard_end(d, id);
 			tstate->fwd_emitted = 1;
 		}
 
 		if (top_level_def) {
+			btf_dump_emit_guard_start(d, id);
 			btf_dump_emit_struct_def(d, id, t, 0);
 			btf_dump_printf(d, ";\n\n");
+			btf_dump_emit_guard_end(d, id);
 			tstate->emit_state = EMITTED;
 		} else {
 			tstate->emit_state = NOT_EMITTED;
@@ -1034,6 +1094,8 @@ static void btf_dump_emit_decl_tags(struct btf_dump *d, __u32 id)
 
 	for (i = 0; i < decl_tags->cnt; ++i) {
 		decl_tag_t = btf_type_by_id(d->btf, decl_tags->tag_ids[i]);
+		if (btf_dump_is_header_guard_tag(d, decl_tag_t))
+			continue;
 		decl_tag_text = btf__name_by_offset(d->btf, decl_tag_t->name_off);
 		btf_dump_printf(d, " __attribute__((btf_decl_tag(\"%s\")))", decl_tag_text);
 	}
@@ -1672,6 +1734,31 @@ static void btf_dump_emit_type_cast(struct btf_dump *d, __u32 id,
 		btf_dump_printf(d, ")");
 }
 
+static void btf_dump_emit_guard_start(struct btf_dump *d, __u32 id)
+{
+	const char *header_guard;
+
+	if (!d->emit_header_guards)
+		return;
+
+	header_guard = btf_dump_find_header_guard(d, id);
+	if (!header_guard)
+		return;
+
+	btf_dump_printf(d, "#ifndef %s\n\n", header_guard);
+}
+
+static void btf_dump_emit_guard_end(struct btf_dump *d, __u32 id)
+{
+	if (!d->emit_header_guards)
+		return;
+
+	if (!btf_dump_find_header_guard(d, id))
+		return;
+
+	btf_dump_printf(d, "#endif\n\n");
+}
+
 /* return number of duplicates (occurrences) of a given name */
 static size_t btf_dump_name_dups(struct btf_dump *d, struct hashmap *name_map,
 				 const char *orig_name)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 06/12] selftests/bpf: Tests for header guards printing in BTF dump
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (4 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 05/12] libbpf: Header guards for selected data structures in vmlinux.h Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 07/12] bpftool: Enable header guards generation Eduard Zingerman
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

Verify that `btf_dump__dump_type` emits header guard brackets for
various types when `btf_dump_opts.emit_header_guards` is set to true.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../selftests/bpf/prog_tests/btf_dump.c       | 10 +-
 .../progs/btf_dump_test_case_header_guards.c  | 94 +++++++++++++++++++
 2 files changed, 101 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
index 5f6ce7f1a801..a3db352e61c7 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
@@ -13,6 +13,7 @@ static struct btf_dump_test_case {
 	const char *name;
 	const char *file;
 	bool known_ptr_sz;
+	bool emit_header_guards;
 } btf_dump_test_cases[] = {
 	{"btf_dump: syntax", "btf_dump_test_case_syntax", true},
 	{"btf_dump: ordering", "btf_dump_test_case_ordering", false},
@@ -22,15 +23,18 @@ static struct btf_dump_test_case {
 	{"btf_dump: multidim", "btf_dump_test_case_multidim", false},
 	{"btf_dump: namespacing", "btf_dump_test_case_namespacing", false},
 	{"btf_dump: decl_tag", "btf_dump_test_case_decl_tag", true},
+	{"btf_dump: header guards", "btf_dump_test_case_header_guards", true, true},
 };
 
-static int btf_dump_all_types(const struct btf *btf, void *ctx)
+static int btf_dump_all_types(const struct btf *btf, void *ctx, struct btf_dump_test_case *t)
 {
 	size_t type_cnt = btf__type_cnt(btf);
+	LIBBPF_OPTS(btf_dump_opts, opts);
 	struct btf_dump *d;
 	int err = 0, id;
 
-	d = btf_dump__new(btf, btf_dump_printf, ctx, NULL);
+	opts.emit_header_guards = t->emit_header_guards;
+	d = btf_dump__new(btf, btf_dump_printf, ctx, &opts);
 	err = libbpf_get_error(d);
 	if (err)
 		return err;
@@ -87,7 +91,7 @@ static int test_btf_dump_case(int n, struct btf_dump_test_case *t)
 		goto done;
 	}
 
-	err = btf_dump_all_types(btf, f);
+	err = btf_dump_all_types(btf, f, t);
 	fclose(f);
 	close(fd);
 	if (CHECK(err, "btf_dump", "failure during C dumping: %d\n", err)) {
diff --git a/tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c b/tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
new file mode 100644
index 000000000000..3ee8aaba9e0a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+/*
+ * BTF-to-C dumper test for header guards.
+ */
+struct hg_struct {
+	int x;
+} __attribute__((btf_decl_tag("header_guard:S")));
+
+union hg_union {
+	int x;
+} __attribute__((btf_decl_tag("header_guard:U")));
+
+typedef int hg_typedef __attribute__((btf_decl_tag("header_guard:T")));
+
+struct hg_fwd_a;
+
+struct hg_fwd_b {
+	struct hg_fwd_a *loop;
+} __attribute__((btf_decl_tag("header_guard:FWD")));
+
+struct hg_fwd_a {
+	struct hg_fwd_b *loop;
+} __attribute__((btf_decl_tag("header_guard:FWD")));
+
+struct root_struct {
+	struct hg_struct a;
+	union hg_union b;
+	hg_typedef c;
+	struct hg_fwd_a d;
+	struct hg_fwd_b e;
+};
+
+/* ----- START-EXPECTED-OUTPUT ----- */
+/*
+ *#ifndef S
+ *
+ *struct hg_struct {
+ *	int x;
+ *};
+ *
+ *#endif
+ *
+ *#ifndef U
+ *
+ *union hg_union {
+ *	int x;
+ *};
+ *
+ *#endif
+ *
+ *#ifndef T
+ *
+ *typedef int hg_typedef;
+ *
+ *#endif
+ *
+ *#ifndef FWD
+ *
+ *struct hg_fwd_b;
+ *
+ *#endif
+ *
+ *#ifndef FWD
+ *
+ *struct hg_fwd_a {
+ *	struct hg_fwd_b *loop;
+ *};
+ *
+ *#endif
+ *
+ *#ifndef FWD
+ *
+ *struct hg_fwd_b {
+ *	struct hg_fwd_a *loop;
+ *};
+ *
+ *#endif
+ *
+ *struct root_struct {
+ *	struct hg_struct a;
+ *	union hg_union b;
+ *	hg_typedef c;
+ *	struct hg_fwd_a d;
+ *	struct hg_fwd_b e;
+ *};
+ *
+ */
+/* ------ END-EXPECTED-OUTPUT ------ */
+
+int f(struct root_struct *s)
+{
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 07/12] bpftool: Enable header guards generation
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (5 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 06/12] selftests/bpf: Tests for header guards printing in BTF dump Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-25 22:27 ` [RFC bpf-next 08/12] kbuild: Script to infer header guard values for uapi headers Eduard Zingerman
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

Enables header guards generation for BTF dumps in C format, e.g.
vmlinux.h would be printed as follows:

  ...
  #ifndef _UAPI_LINUX_TCP_H

  enum {
    TCP_NO_QUEUE = 0,
    TCP_RECV_QUEUE = 1,
    TCP_SEND_QUEUE = 2,
    TCP_QUEUES_NR = 3,
  };

  #endif /* _UAPI_LINUX_TCP_H */
  ...

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/bpf/bpftool/btf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 68a70ac03c80..f8e8946b61a7 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -466,7 +466,9 @@ static int dump_btf_c(const struct btf *btf,
 	struct btf_dump *d;
 	int err = 0, i;
 
-	d = btf_dump__new(btf, btf_dump_printf, NULL, NULL);
+	LIBBPF_OPTS(btf_dump_opts, opts);
+	opts.emit_header_guards = true;
+	d = btf_dump__new(btf, btf_dump_printf, NULL, &opts);
 	err = libbpf_get_error(d);
 	if (err)
 		return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 08/12] kbuild: Script to infer header guard values for uapi headers
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (6 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 07/12] bpftool: Enable header guards generation Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-27 22:51   ` Andrii Nakryiko
  2022-10-25 22:27 ` [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF Eduard Zingerman
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

The script infers header guard defines in headers from
include/uapi/**/*.h . E.g. header guard for the
`include/uapi/linux/tcp.h` is `_UAPI_LINUX_TCP_H`:

    include/uapi/linux/tcp.h:

      #ifndef _UAPI_LINUX_TCP_H
      #define _UAPI_LINUX_TCP_H
      ...
      union tcp_word_hdr {
            struct tcphdr hdr;
            __be32        words[5];
      };
      ...
      #endif /* _UAPI_LINUX_TCP_H */

The output of the script could be used as an input to pahole's
`--header_guards_db` parameter. This information is necessary to
repeat the same header guards in the `vmlinux.h` generated from BTF.

It is not possible to infer the guard names from header file names
alone, the file content has to be analyzed. The following heuristic is
used to infer guard for a specific file:
- All pairs `#ifndef <candidate>` / `#define <candidate>` are collected;
- If a unique candidate matching regex `${headername}.*_H(EADER)?` it
  is selected;
- If a unique candidate matching regex `_H(EADER)?_` it is selected;
- If a unique candidate matching regex `_H(EADER)?$` it is selected;

There is also a small list of headers that can't be caught by the
rules above, 15 in total. These headers and corresponding guard values
are listed in the `%OVERRIDES` hash table.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 scripts/infer_header_guards.pl | 191 +++++++++++++++++++++++++++++++++
 1 file changed, 191 insertions(+)
 create mode 100755 scripts/infer_header_guards.pl

diff --git a/scripts/infer_header_guards.pl b/scripts/infer_header_guards.pl
new file mode 100755
index 000000000000..201008ac83f3
--- /dev/null
+++ b/scripts/infer_header_guards.pl
@@ -0,0 +1,191 @@
+#!/usr/bin/env perl
+# SPDX-License-Identifier: GPL-2.0
+
+# This script scans the passed directory for header files (files ending with ".h").
+# For each header file it tries to infer the name of a C pre-processor
+# variable used as a double include guard (dubbed as "header guard").
+# For example:
+#
+#   #ifdef __MY_HEADER__  // <-- "header guard"
+#     ...
+#   #endif
+#
+# The inferred guards are printed to stdout in the following format:
+#
+#   <header-file> <header-guard>
+#
+# This is an expected format for pahole --header_guards_db parameter.
+# Intended usage is to infer header guards for Linux UAPI headers.
+# The collected information is further used in BTF embedded into kernel.
+#
+# The following inference logic is used for each file:
+# - find all pairs `#ifndef <name> #define <name>`
+# - if there is a unique <name> that matches a pattern - use this
+#   <name> as a header guard, (see subroutine `select_guard` for the
+#   list of the patterns);
+# - files containing only #include directives are safe to ignore.
+#
+# There are a few UAPI header files that don't fit in such logic,
+# header guards for these files are hard-coded in %OVERRIDES hash.
+#
+# The script reports inference only when --report-failures flag is
+# passed. This flag is intended for BPF tests.
+#
+# See subroutine `help` for usage info.
+
+use strict;
+use warnings;
+use File::Basename;
+use File::Find;
+use Getopt::Long;
+
+sub help {
+	my $message = << "EOM";
+Usage:
+  $0 [--report-failures] directory-or-file...
+  $0 --help
+
+For a specific file or for each .h file in a directory infer the name
+of a C pre-processor variable used as a double include guard.
+
+Options:
+  --report-failures   Report inference errors to stderr,
+                      exit with non-zero code if guards were not inferred
+                      for some files.
+  --help              Print this message and exit.
+EOM
+	print $message;
+}
+
+my %OVERRIDES = (
+	# Header guards that don't follow common naming rules
+	"include/uapi/linux/cciss_ioctl.h" => "_UAPICCISS_IOCTLH",
+	"include/uapi/linux/hpet.h" => "_UAPI__HPET__",
+	"include/uapi/linux/if_ppp.h" => "_PPP_IOCTL_H",
+	"include/uapi/linux/netfilter/xt_NFLOG.h" => "_XT_NFLOG_TARGET",
+	"include/uapi/linux/netfilter_ipv6/ip6t_NPT.h" => "__NETFILTER_IP6T_NPT",
+	"include/uapi/linux/quota.h" => "_UAPI_LINUX_QUOTA_",
+	"include/uapi/linux/v4l2-common.h" => "__V4L2_COMMON__",
+	# Headers that should be ignored
+	"arch/x86/include/uapi/asm/hw_breakpoint.h" => undef,
+	"arch/x86/include/uapi/asm/posix_types.h" => undef,
+	"arch/x86/include/uapi/asm/setup.h" => undef,
+	"include/generated/uapi/linux/version.h" => undef,
+	"include/uapi/asm-generic/bitsperlong.h" => undef,
+	"include/uapi/asm-generic/kvm_para.h" => undef,
+	"include/uapi/asm-generic/unistd.h" => undef,
+	"include/uapi/linux/irqnr.h" => undef,
+	"include/uapi/linux/zorro_ids.h" => undef,
+	);
+
+sub get_basename {
+	my ($filename) = @_;
+	my $basename = fileparse($filename, qr/\.[^.]*/);
+	return $basename;
+}
+
+sub find_bracket_candidates {
+	my ($filename) = @_;
+	my @candidates = ();
+	my $guard_candidate = undef;
+	my $safe_to_ignore = 1;
+
+	open my $file, $filename or die "Can't open file $filename: $!";
+	while (my $line = <$file>) {
+		if (not($line =~ "^#include")) {
+			$safe_to_ignore = 0;
+		}
+		if ($line =~ "^#ifndef[ \t]+([a-zA-Z0-9_]+)") {
+			$guard_candidate = $1;
+		} elsif ($guard_candidate && $line =~ "^#define[ \t]+${guard_candidate}") {
+			push(@candidates, $guard_candidate);
+			$guard_candidate = undef;
+		}
+	}
+	close $file;
+
+	return ($safe_to_ignore, @candidates);
+}
+
+sub select_guard {
+	my ($filename, @candidates) = @_;
+	my $basename = get_basename($filename);
+	my @regexes = ("$basename.*_H(EADER)?",
+		       "_H(EADER)?_",
+		       "_H(EADER)?\$");
+	foreach my $re (@regexes) {
+		my @filtered = grep(/$re/i, @candidates);
+		if (scalar(@filtered) == 1) {
+			return $filtered[0];
+		}
+	}
+
+	return undef;
+}
+
+sub collect_headers {
+	my ($dir) = @_;
+	my @headers = ();
+
+	find(sub { /\.h$/ && push(@headers, $File::Find::name); }, $dir);
+
+	return @headers;
+}
+
+my $report_failures = 0;
+my $options_parsed = GetOptions(
+	"report-failures" => \$report_failures,
+	"help" => sub { help(); exit 0; },
+    );
+
+if (!$options_parsed || scalar @ARGV == 0) {
+	help();
+	exit 1;
+}
+
+my @headers;
+
+foreach my $dir_or_file (@ARGV) {
+	if (-f $dir_or_file) {
+		push(@headers, $dir_or_file);
+	} elsif (-d $dir_or_file) {
+		push(@headers, collect_headers($dir_or_file));
+	} else {
+		print("'$dir_or_file' is not a file or directory.\n");
+		help();
+		exit 1;
+	}
+}
+
+my $rc = 0;
+
+foreach my $header (@headers) {
+	my $basename = get_basename($header);
+	my $guard;
+
+	if (exists $OVERRIDES{$header}) {
+		$guard = $OVERRIDES{$basename};
+	} else {
+		my ($safe_to_ignore, @candidates) = find_bracket_candidates($header);
+		$guard = select_guard($header, @candidates);
+		if ((not $guard) && (not $safe_to_ignore) && $report_failures) {
+			print STDERR "Can't select guard for $header, candidates:\n";
+			print STDERR "  ";
+			if (scalar(@candidates)) {
+				print STDERR join(", ", @candidates);
+			} else {
+				print STDERR "<no candidates>"
+			}
+			print STDERR "\n";
+			$rc = 1;
+		}
+	}
+	if ($guard) {
+		# Remove the _UAPI prefix/suffix the same way
+		# scripts/headers_install.sh does it.
+		$guard =~ s/_UAPI//;
+		print("$header $guard\n");
+	}
+}
+
+exit $rc;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (7 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 08/12] kbuild: Script to infer header guard values for uapi headers Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-27 18:43   ` Yonghong Song
  2022-10-25 22:27 ` [RFC bpf-next 10/12] selftests/bpf: Script to verify uapi headers usage with vmlinux.h Eduard Zingerman
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

Use pahole --header_guards_db flag to enable encoding of header guard
information in kernel BTF. The actual correspondence between header
file and guard string is computed by the scripts/infer_header_guards.pl.

The encoded header guard information could be used to restore the
original guards in the vmlinux.h, e.g.:

    include/uapi/linux/tcp.h:

      #ifndef _UAPI_LINUX_TCP_H
      #define _UAPI_LINUX_TCP_H
      ...
      union tcp_word_hdr {
    	struct tcphdr hdr;
    	__be32        words[5];
      };
      ...
      #endif /* _UAPI_LINUX_TCP_H */

    vmlinux.h:

      ...
      #ifndef _UAPI_LINUX_TCP_H

      union tcp_word_hdr {
    	struct tcphdr hdr;
    	__be32 words[5];
      };

      #endif /* _UAPI_LINUX_TCP_H */
      ...

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 scripts/link-vmlinux.sh | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 918470d768e9..f57f621eda1f 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -110,6 +110,7 @@ vmlinux_link()
 gen_btf()
 {
 	local pahole_ver
+	local extra_flags
 
 	if ! [ -x "$(command -v ${PAHOLE})" ]; then
 		echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
@@ -122,10 +123,20 @@ gen_btf()
 		return 1
 	fi
 
+	if [ "${pahole_ver}" -ge "124" ]; then
+		scripts/infer_header_guards.pl \
+			include/uapi \
+			include/generated/uapi \
+			arch/${SRCARCH}/include/uapi \
+			arch/${SRCARCH}/include/generated/uapi \
+			> .btf.uapi_header_guards || return 1;
+		extra_flags="--header_guards_db .btf.uapi_header_guards"
+	fi
+
 	vmlinux_link ${1}
 
 	info "BTF" ${2}
-	LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
+	LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${extra_flags} ${1}
 
 	# Create ${2} which contains just .BTF section but no symbols. Add
 	# SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 10/12] selftests/bpf: Script to verify uapi headers usage with vmlinux.h
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (8 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF Eduard Zingerman
@ 2022-10-25 22:27 ` Eduard Zingerman
  2022-10-25 22:28 ` [RFC bpf-next 11/12] selftests/bpf: Known good uapi headers for test_uapi_headers.py Eduard Zingerman
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:27 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

A script to test header guards in vmlinux.h by compiling a simple C
snippet for a set of selected UAPI headers. The snippet being
compiled looks as follows:

  #include <some_uapi_header.h>
  #include "vmlinux.h"

  __attribute__((section("tc"), used))
  int syncookie_tc(struct __sk_buff *skb) { return 0; }

If header guards are placed correctly in vmlinux.h the snippet
should compile w/o errors.

The list of known good headers is supposed to be located in
`tools/testing/selftests/bpf/good_uapi_headers.txt` added as a
separate commit.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../selftests/bpf/test_uapi_headers.py        | 197 ++++++++++++++++++
 1 file changed, 197 insertions(+)
 create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py

diff --git a/tools/testing/selftests/bpf/test_uapi_headers.py b/tools/testing/selftests/bpf/test_uapi_headers.py
new file mode 100755
index 000000000000..1740c4fe0625
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_uapi_headers.py
@@ -0,0 +1,197 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+# A script to test header guards in vmlinux.h by compiling a simple C
+# snippet for a set of selected UAPI headers. The snippet being
+# compiled looks as follows:
+#
+#   #include <some_uapi_header.h>
+#   #include "vmlinux.h"
+#
+#   __attribute__((section("tc"), used))
+#   int syncookie_tc(struct __sk_buff *skb) { return 0; }
+#
+# If header guards are placed correctly in vmlinux.h the snippet
+# should compile w/o errors.
+#
+# The script could be used in two modes:
+# - interactive BPF testing and CI;
+# - debug mode.
+#
+# * Interactive BPF testing and CI
+#
+# Run script as follows:
+#
+#   ./test_uapi_headers.py
+#
+# In this mode the following actions are performed:
+# - kernel headers are installed to a temporary directory;
+# - a list of known good uapi headers is read from ./good_uapi_headers.txt;
+# - the snippet above is compiled by clang using BPF target for each header;
+# - if shell is interactive the progress / ETA are reported during execution;
+# - pass / fail statistics is reported in the end;
+# - headers temporary directory is deleted;
+# - script exit code is 0 if snippet could be compiled for all headers.
+#
+# The vmlinux.h processing time is significant (~700ms using Intel i7-4710HQ),
+# thus the headers are processed in parallel.
+#
+# * Debug mode
+#
+# The following parameters are available for debugging:
+#
+#   test_uapi_headers.py \
+#            [-h] [--kheaders KHEADERS] [--vmlinuxh VMLINUXH] [--test TEST]
+#
+#   options:
+#     -h, --help           show this help message and exit
+#     --kheaders KHEADERS  path to exported kernel headers
+#     --vmlinuxh VMLINUXH  path to vmlinux.h
+#     --test TEST          name of the header -or-
+#                          file with header names -or-
+#                          special value '*'
+#
+# When --kheaders is specified the temporary directory is not created
+# and KHEADERS is used instead. It is assumed that headers are already
+# installed to KHEADERS.
+#
+# When TEST names a header (e.g. 'linux/tcp.h') it is the to test.
+# When TEST names a file this file should contain a list of
+# headers to test one per line.
+# When TEST is '*' all exported headers are tested.
+#
+# The simplest way to debug an issue with a single header is:
+#
+#   ./test_uapi_headers.py --test linux/tcp.h
+
+import subprocess
+import concurrent.futures
+import pathlib
+import time
+import os
+import sys
+import argparse
+import tempfile
+import shutil
+import atexit
+from dataclasses import dataclass
+
+@dataclass
+class Result:
+    header: pathlib.Path
+    returncode: int
+    stderr: str
+
+def run_one(header, kheaders, vmlinuxh):
+    code=f'''
+#include <{header}>
+#include "{vmlinuxh}"
+
+__attribute__((section("tc"), used))
+int syncookie_tc(struct __sk_buff *skb)
+{{
+    return 0;
+}}
+    '''
+    command = f'''
+{os.getenv('CLANG', 'clang')} \
+    -g -Werror -mlittle-endian \
+    -D__x86_64__ \
+    -Xclang -fwchar-type=short \
+    -Xclang -fno-signed-wchar \
+    -I{kheaders}/include/ \
+    -Wno-compare-distinct-pointer-types \
+    -mcpu=v3 \
+    -O2 \
+    -target bpf \
+    -x c \
+    -o /dev/null \
+    -fsyntax-only \
+    -
+'''
+    proc = subprocess.run(command, input=code, capture_output=True,
+                          shell=True, encoding='utf8')
+    return Result(header=header,
+                  returncode=proc.returncode,
+                  stderr=proc.stderr)
+
+def run_all(headers, kheaders, vmlinuxh):
+    start_time = time.time()
+    ok = 0
+    fail = 0
+    failures = []
+    remain = len(headers)
+    print_progress = sys.stdout.isatty()
+    print(f'Processing {remain} headers.')
+    with concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
+        for result in executor.map(lambda header: run_one(header, kheaders, vmlinuxh),
+                                   headers):
+            if result.returncode == 0:
+                print(f"{result.header:<60}   ok")
+                ok += 1
+            else:
+                print(f"{result.header:<60} fail")
+                fail += 1
+                failures.append(result)
+            remain -= 1
+            if print_progress:
+                elapsed = time.time() - start_time
+                processed = ok + fail
+                time_per_header = elapsed / processed
+                eta = int(remain * time_per_header)
+                # keep this shorter than header ok/fail line
+                line = f"Ok {ok: >4} Fail {fail: >4} Remain {remain: >4} ETA {eta: >4}s"
+                print(line, end="\r")
+    if print_progress:
+        print('')
+    elapsed = int(time.time() - start_time)
+    if fail == 0:
+        print(f"Done in {elapsed}s, all {len(headers)} ok.")
+    else:
+        print('----- Failure details -----')
+        for result in failures:
+            print(f'{result.header}: rc = {result.returncode}')
+            for line in result.stderr.split('\n'):
+                print(f"{result.header}: {line}")
+        print(f"Done in {elapsed}s, {fail} out of {len(headers)} failed.")
+    return fail == 0
+
+def main(argv):
+    bpf_test_dir = pathlib.Path(__file__).resolve().parent
+    default_vmlinuxh = bpf_test_dir / './tools/include/vmlinux.h'
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--kheaders", type=str, help='path to exported kernel headers')
+    parser.add_argument("--vmlinuxh", type=str, default=default_vmlinuxh,
+                        help='path to vmlinux.h')
+    parser.add_argument("--test", type=str,
+                        default='./good_uapi_headers.txt',
+                        help="name of the header | file with header names | special value '*'")
+    args = parser.parse_args(argv)
+
+    if args.kheaders is None:
+        kheaders = tempfile.mkdtemp(prefix='kheaders')
+        atexit.register(lambda: shutil.rmtree(kheaders))
+        kernel_dir = bpf_test_dir / '../../../../'
+        # Capture both stdout and stderr as stdout to simplify CI logging
+        subprocess.run(f'make -C {kernel_dir} INSTALL_HDR_PATH={kheaders} headers_install',
+                       stdout=sys.stdout, stderr=sys.stdout,
+                       check=True, shell=True)
+    else:
+        kheaders = args.kheaders
+
+    if os.path.exists(args.test):
+        with open(args.test, 'r') as list_file:
+            headers = [line.strip() for line in list_file]
+    elif args.test == '*':
+        headers = [p.relative_to(f'{kheaders}/include').as_posix()
+                   for p in pathlib.Path(kheaders).rglob("*.h")]
+    else:
+        headers = [args.test]
+
+    if run_all(headers, kheaders, args.vmlinuxh):
+        sys.exit(0)
+    else:
+        sys.exit(1)
+
+if __name__ == '__main__':
+    main(sys.argv[1:])
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 11/12] selftests/bpf: Known good uapi headers for test_uapi_headers.py
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (9 preceding siblings ...)
  2022-10-25 22:27 ` [RFC bpf-next 10/12] selftests/bpf: Script to verify uapi headers usage with vmlinux.h Eduard Zingerman
@ 2022-10-25 22:28 ` Eduard Zingerman
  2022-10-25 22:28 ` [RFC bpf-next 12/12] selftests/bpf: script for infer_header_guards.pl testing Eduard Zingerman
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:28 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

The list of uapi headers that could be included alongside vmlinux.h.
The headers are peeked from the following locations:
- <headers-export-dir>/linux/*.h
- <headers-export-dir>/linux/**/*.h

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
 1 file changed, 677 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt

diff --git a/tools/testing/selftests/bpf/good_uapi_headers.txt b/tools/testing/selftests/bpf/good_uapi_headers.txt
new file mode 100644
index 000000000000..d2a7a6e3de74
--- /dev/null
+++ b/tools/testing/selftests/bpf/good_uapi_headers.txt
@@ -0,0 +1,677 @@
+linux/a.out.h
+linux/acct.h
+linux/acrn.h
+linux/adb.h
+linux/adfs_fs.h
+linux/affs_hardblocks.h
+linux/aio_abi.h
+linux/amt.h
+linux/android/binder.h
+linux/android/binderfs.h
+linux/apm_bios.h
+linux/arcfb.h
+linux/arm_sdei.h
+linux/aspeed-lpc-ctrl.h
+linux/aspeed-p2a-ctrl.h
+linux/atalk.h
+linux/atm.h
+linux/atm_eni.h
+linux/atm_he.h
+linux/atm_idt77105.h
+linux/atm_nicstar.h
+linux/atm_tcp.h
+linux/atm_zatm.h
+linux/atmapi.h
+linux/atmarp.h
+linux/atmclip.h
+linux/atmdev.h
+linux/atmioc.h
+linux/atmlec.h
+linux/atmmpc.h
+linux/atmppp.h
+linux/atmsap.h
+linux/atmsvc.h
+linux/audit.h
+linux/auxvec.h
+linux/ax25.h
+linux/batadv_packet.h
+linux/batman_adv.h
+linux/baycom.h
+linux/bcm933xx_hcs.h
+linux/bfs_fs.h
+linux/binfmts.h
+linux/blkpg.h
+linux/blktrace_api.h
+linux/blkzoned.h
+linux/bpf.h
+linux/bpf_common.h
+linux/bpqether.h
+linux/bsg.h
+linux/bt-bmc.h
+linux/btf.h
+linux/btrfs.h
+linux/btrfs_tree.h
+linux/byteorder/big_endian.h
+linux/byteorder/little_endian.h
+linux/cachefiles.h
+linux/caif/caif_socket.h
+linux/caif/if_caif.h
+linux/can.h
+linux/can/bcm.h
+linux/can/error.h
+linux/can/gw.h
+linux/can/isotp.h
+linux/can/j1939.h
+linux/can/netlink.h
+linux/can/raw.h
+linux/can/vxcan.h
+linux/capability.h
+linux/capi.h
+linux/cciss_defs.h
+linux/cciss_ioctl.h
+linux/ccs.h
+linux/cdrom.h
+linux/cfm_bridge.h
+linux/cgroupstats.h
+linux/chio.h
+linux/cifs/cifs_mount.h
+linux/cifs/cifs_netlink.h
+linux/close_range.h
+linux/cm4000_cs.h
+linux/cn_proc.h
+linux/coff.h
+linux/comedi.h
+linux/connector.h
+linux/const.h
+linux/coresight-stm.h
+linux/counter.h
+linux/cramfs_fs.h
+linux/cryptouser.h
+linux/cuda.h
+linux/cxl_mem.h
+linux/cycx_cfm.h
+linux/dcbnl.h
+linux/dccp.h
+linux/devlink.h
+linux/dlm.h
+linux/dlm_device.h
+linux/dlm_netlink.h
+linux/dlm_plock.h
+linux/dlmconstants.h
+linux/dm-ioctl.h
+linux/dm-log-userspace.h
+linux/dma-heap.h
+linux/dns_resolver.h
+linux/dqblk_xfs.h
+linux/dvb/audio.h
+linux/dvb/ca.h
+linux/dvb/frontend.h
+linux/dvb/net.h
+linux/dvb/osd.h
+linux/dvb/version.h
+linux/dw100.h
+linux/edd.h
+linux/efs_fs_sb.h
+linux/elf-em.h
+linux/elf-fdpic.h
+linux/elf.h
+linux/errno.h
+linux/erspan.h
+linux/eventpoll.h
+linux/f2fs.h
+linux/fadvise.h
+linux/falloc.h
+linux/fanotify.h
+linux/fb.h
+linux/fcntl.h
+linux/fd.h
+linux/fdreg.h
+linux/fib_rules.h
+linux/fiemap.h
+linux/filter.h
+linux/firewire-cdev.h
+linux/firewire-constants.h
+linux/fou.h
+linux/fpga-dfl.h
+linux/fs.h
+linux/fscrypt.h
+linux/fsi.h
+linux/fsl_hypervisor.h
+linux/fsl_mc.h
+linux/fsmap.h
+linux/fsverity.h
+linux/futex.h
+linux/gameport.h
+linux/gen_stats.h
+linux/genetlink.h
+linux/genwqe/genwqe_card.h
+linux/gfs2_ondisk.h
+linux/gpio.h
+linux/gtp.h
+linux/hash_info.h
+linux/hdlc.h
+linux/hdlcdrv.h
+linux/hdreg.h
+linux/hid.h
+linux/hiddev.h
+linux/hidraw.h
+linux/hsi/cs-protocol.h
+linux/hsi/hsi_char.h
+linux/hsr_netlink.h
+linux/hw_breakpoint.h
+linux/hyperv.h
+linux/i2c-dev.h
+linux/i2c.h
+linux/i2o-dev.h
+linux/i8k.h
+linux/icmpv6.h
+linux/if_addr.h
+linux/if_addrlabel.h
+linux/if_alg.h
+linux/if_arcnet.h
+linux/if_bridge.h
+linux/if_cablemodem.h
+linux/if_eql.h
+linux/if_ether.h
+linux/if_fc.h
+linux/if_fddi.h
+linux/if_hippi.h
+linux/if_infiniband.h
+linux/if_link.h
+linux/if_ltalk.h
+linux/if_macsec.h
+linux/if_packet.h
+linux/if_phonet.h
+linux/if_plip.h
+linux/if_ppp.h
+linux/if_pppol2tp.h
+linux/if_slip.h
+linux/if_team.h
+linux/if_tun.h
+linux/if_vlan.h
+linux/if_x25.h
+linux/if_xdp.h
+linux/ife.h
+linux/igmp.h
+linux/iio/buffer.h
+linux/iio/events.h
+linux/iio/types.h
+linux/ila.h
+linux/in.h
+linux/in6.h
+linux/in_route.h
+linux/inet_diag.h
+linux/inotify.h
+linux/input-event-codes.h
+linux/io_uring.h
+linux/ioam6.h
+linux/ioam6_genl.h
+linux/ioam6_iptunnel.h
+linux/ioctl.h
+linux/iommu.h
+linux/ioprio.h
+linux/ip.h
+linux/ip_vs.h
+linux/ipc.h
+linux/ipmi.h
+linux/ipmi_bmc.h
+linux/ipmi_msgdefs.h
+linux/ipsec.h
+linux/ipv6.h
+linux/ipv6_route.h
+linux/irqnr.h
+linux/isdn/capicmd.h
+linux/iso_fs.h
+linux/isst_if.h
+linux/ivtvfb.h
+linux/jffs2.h
+linux/kcm.h
+linux/kcmp.h
+linux/kcov.h
+linux/kd.h
+linux/kdev_t.h
+linux/kernel-page-flags.h
+linux/kernel.h
+linux/kernelcapi.h
+linux/keyboard.h
+linux/keyctl.h
+linux/kfd_sysfs.h
+linux/kvm_para.h
+linux/l2tp.h
+linux/landlock.h
+linux/libc-compat.h
+linux/limits.h
+linux/lirc.h
+linux/loadpin.h
+linux/loop.h
+linux/lp.h
+linux/lwtunnel.h
+linux/magic.h
+linux/major.h
+linux/map_to_14segment.h
+linux/map_to_7segment.h
+linux/max2175.h
+linux/media-bus-format.h
+linux/media.h
+linux/mei.h
+linux/membarrier.h
+linux/memfd.h
+linux/mempolicy.h
+linux/meye.h
+linux/minix_fs.h
+linux/misc/bcm_vk.h
+linux/mman.h
+linux/mmc/ioctl.h
+linux/mmtimer.h
+linux/module.h
+linux/mount.h
+linux/mpls.h
+linux/mpls_iptunnel.h
+linux/mqueue.h
+linux/mroute.h
+linux/mroute6.h
+linux/mrp_bridge.h
+linux/msdos_fs.h
+linux/msg.h
+linux/mtio.h
+linux/nbd-netlink.h
+linux/nbd.h
+linux/ncsi.h
+linux/ndctl.h
+linux/neighbour.h
+linux/net.h
+linux/net_dropmon.h
+linux/net_namespace.h
+linux/net_tstamp.h
+linux/netconf.h
+linux/netfilter.h
+linux/netfilter/ipset/ip_set.h
+linux/netfilter/ipset/ip_set_bitmap.h
+linux/netfilter/ipset/ip_set_hash.h
+linux/netfilter/ipset/ip_set_list.h
+linux/netfilter/nf_conntrack_common.h
+linux/netfilter/nf_conntrack_ftp.h
+linux/netfilter/nf_conntrack_sctp.h
+linux/netfilter/nf_conntrack_tcp.h
+linux/netfilter/nf_conntrack_tuple_common.h
+linux/netfilter/nf_log.h
+linux/netfilter/nf_nat.h
+linux/netfilter/nf_synproxy.h
+linux/netfilter/nf_tables.h
+linux/netfilter/nf_tables_compat.h
+linux/netfilter/nfnetlink.h
+linux/netfilter/nfnetlink_acct.h
+linux/netfilter/nfnetlink_compat.h
+linux/netfilter/nfnetlink_conntrack.h
+linux/netfilter/nfnetlink_cthelper.h
+linux/netfilter/nfnetlink_cttimeout.h
+linux/netfilter/nfnetlink_hook.h
+linux/netfilter/nfnetlink_log.h
+linux/netfilter/nfnetlink_osf.h
+linux/netfilter/nfnetlink_queue.h
+linux/netfilter/x_tables.h
+linux/netfilter/xt_AUDIT.h
+linux/netfilter/xt_CHECKSUM.h
+linux/netfilter/xt_CLASSIFY.h
+linux/netfilter/xt_CONNMARK.h
+linux/netfilter/xt_CONNSECMARK.h
+linux/netfilter/xt_CT.h
+linux/netfilter/xt_DSCP.h
+linux/netfilter/xt_HMARK.h
+linux/netfilter/xt_IDLETIMER.h
+linux/netfilter/xt_LED.h
+linux/netfilter/xt_LOG.h
+linux/netfilter/xt_MARK.h
+linux/netfilter/xt_NFQUEUE.h
+linux/netfilter/xt_SECMARK.h
+linux/netfilter/xt_SYNPROXY.h
+linux/netfilter/xt_TCPMSS.h
+linux/netfilter/xt_TCPOPTSTRIP.h
+linux/netfilter/xt_TEE.h
+linux/netfilter/xt_TPROXY.h
+linux/netfilter/xt_addrtype.h
+linux/netfilter/xt_bpf.h
+linux/netfilter/xt_cgroup.h
+linux/netfilter/xt_cluster.h
+linux/netfilter/xt_comment.h
+linux/netfilter/xt_connbytes.h
+linux/netfilter/xt_connlabel.h
+linux/netfilter/xt_connlimit.h
+linux/netfilter/xt_connmark.h
+linux/netfilter/xt_conntrack.h
+linux/netfilter/xt_cpu.h
+linux/netfilter/xt_dccp.h
+linux/netfilter/xt_devgroup.h
+linux/netfilter/xt_dscp.h
+linux/netfilter/xt_ecn.h
+linux/netfilter/xt_esp.h
+linux/netfilter/xt_helper.h
+linux/netfilter/xt_ipcomp.h
+linux/netfilter/xt_iprange.h
+linux/netfilter/xt_ipvs.h
+linux/netfilter/xt_l2tp.h
+linux/netfilter/xt_length.h
+linux/netfilter/xt_limit.h
+linux/netfilter/xt_mac.h
+linux/netfilter/xt_mark.h
+linux/netfilter/xt_multiport.h
+linux/netfilter/xt_nfacct.h
+linux/netfilter/xt_osf.h
+linux/netfilter/xt_owner.h
+linux/netfilter/xt_pkttype.h
+linux/netfilter/xt_policy.h
+linux/netfilter/xt_quota.h
+linux/netfilter/xt_realm.h
+linux/netfilter/xt_recent.h
+linux/netfilter/xt_rpfilter.h
+linux/netfilter/xt_sctp.h
+linux/netfilter/xt_set.h
+linux/netfilter/xt_socket.h
+linux/netfilter/xt_state.h
+linux/netfilter/xt_statistic.h
+linux/netfilter/xt_string.h
+linux/netfilter/xt_tcpmss.h
+linux/netfilter/xt_tcpudp.h
+linux/netfilter/xt_time.h
+linux/netfilter/xt_u32.h
+linux/netfilter_arp.h
+linux/netfilter_bridge/ebt_among.h
+linux/netfilter_bridge/ebt_arp.h
+linux/netfilter_bridge/ebt_arpreply.h
+linux/netfilter_bridge/ebt_ip.h
+linux/netfilter_bridge/ebt_ip6.h
+linux/netfilter_bridge/ebt_limit.h
+linux/netfilter_bridge/ebt_log.h
+linux/netfilter_bridge/ebt_mark_m.h
+linux/netfilter_bridge/ebt_mark_t.h
+linux/netfilter_bridge/ebt_nat.h
+linux/netfilter_bridge/ebt_nflog.h
+linux/netfilter_bridge/ebt_pkttype.h
+linux/netfilter_bridge/ebt_redirect.h
+linux/netfilter_bridge/ebt_stp.h
+linux/netfilter_bridge/ebt_vlan.h
+linux/netfilter_ipv4.h
+linux/netfilter_ipv4/ipt_CLUSTERIP.h
+linux/netfilter_ipv4/ipt_ECN.h
+linux/netfilter_ipv4/ipt_LOG.h
+linux/netfilter_ipv4/ipt_REJECT.h
+linux/netfilter_ipv4/ipt_TTL.h
+linux/netfilter_ipv4/ipt_ah.h
+linux/netfilter_ipv4/ipt_ecn.h
+linux/netfilter_ipv4/ipt_ttl.h
+linux/netfilter_ipv6.h
+linux/netfilter_ipv6/ip6t_HL.h
+linux/netfilter_ipv6/ip6t_LOG.h
+linux/netfilter_ipv6/ip6t_NPT.h
+linux/netfilter_ipv6/ip6t_REJECT.h
+linux/netfilter_ipv6/ip6t_ah.h
+linux/netfilter_ipv6/ip6t_frag.h
+linux/netfilter_ipv6/ip6t_hl.h
+linux/netfilter_ipv6/ip6t_ipv6header.h
+linux/netfilter_ipv6/ip6t_mh.h
+linux/netfilter_ipv6/ip6t_opts.h
+linux/netfilter_ipv6/ip6t_rt.h
+linux/netfilter_ipv6/ip6t_srh.h
+linux/netlink.h
+linux/netlink_diag.h
+linux/netrom.h
+linux/nexthop.h
+linux/nfc.h
+linux/nfs.h
+linux/nfs2.h
+linux/nfs3.h
+linux/nfs4.h
+linux/nfs4_mount.h
+linux/nfs_fs.h
+linux/nfs_idmap.h
+linux/nfs_mount.h
+linux/nfsacl.h
+linux/nfsd/cld.h
+linux/nfsd/debug.h
+linux/nfsd/export.h
+linux/nfsd/stats.h
+linux/nilfs2_api.h
+linux/nilfs2_ondisk.h
+linux/nitro_enclaves.h
+linux/nl80211-vnd-intel.h
+linux/nl80211.h
+linux/nsfs.h
+linux/nubus.h
+linux/nvme_ioctl.h
+linux/nvram.h
+linux/oom.h
+linux/openat2.h
+linux/openvswitch.h
+linux/packet_diag.h
+linux/param.h
+linux/parport.h
+linux/pci.h
+linux/pci_regs.h
+linux/pcitest.h
+linux/perf_event.h
+linux/personality.h
+linux/pfkeyv2.h
+linux/pfrut.h
+linux/pg.h
+linux/phantom.h
+linux/pidfd.h
+linux/pkt_cls.h
+linux/pkt_sched.h
+linux/pktcdvd.h
+linux/pmu.h
+linux/poll.h
+linux/posix_acl.h
+linux/posix_types.h
+linux/ppdev.h
+linux/ppp-comp.h
+linux/ppp-ioctl.h
+linux/ppp_defs.h
+linux/pps.h
+linux/pr.h
+linux/prctl.h
+linux/psample.h
+linux/psci.h
+linux/psp-sev.h
+linux/ptp_clock.h
+linux/qemu_fw_cfg.h
+linux/qnx4_fs.h
+linux/qnxtypes.h
+linux/qrtr.h
+linux/radeonfb.h
+linux/raid/md_p.h
+linux/raid/md_u.h
+linux/random.h
+linux/rds.h
+linux/reboot.h
+linux/reiserfs_fs.h
+linux/reiserfs_xattr.h
+linux/remoteproc_cdev.h
+linux/resource.h
+linux/rfkill.h
+linux/rio_cm_cdev.h
+linux/rio_mport_cdev.h
+linux/rkisp1-config.h
+linux/romfs_fs.h
+linux/rose.h
+linux/rpl.h
+linux/rpl_iptunnel.h
+linux/rpmsg.h
+linux/rpmsg_types.h
+linux/rseq.h
+linux/rtc.h
+linux/rtnetlink.h
+linux/rxrpc.h
+linux/scc.h
+linux/sched.h
+linux/sched/types.h
+linux/scif_ioctl.h
+linux/screen_info.h
+linux/seccomp.h
+linux/securebits.h
+linux/sed-opal.h
+linux/seg6.h
+linux/seg6_genl.h
+linux/seg6_hmac.h
+linux/seg6_iptunnel.h
+linux/seg6_local.h
+linux/selinux_netlink.h
+linux/sem.h
+linux/serial_reg.h
+linux/serio.h
+linux/sev-guest.h
+linux/shm.h
+linux/signalfd.h
+linux/smc.h
+linux/smc_diag.h
+linux/smiapp.h
+linux/snmp.h
+linux/socket.h
+linux/sockios.h
+linux/sonet.h
+linux/sonypi.h
+linux/sound.h
+linux/soundcard.h
+linux/spi/spi.h
+linux/spi/spidev.h
+linux/stat.h
+linux/stddef.h
+linux/stm.h
+linux/string.h
+linux/sunrpc/debug.h
+linux/surface_aggregator/cdev.h
+linux/surface_aggregator/dtx.h
+linux/suspend_ioctls.h
+linux/swab.h
+linux/switchtec_ioctl.h
+linux/sync_file.h
+linux/sysinfo.h
+linux/target_core_user.h
+linux/taskstats.h
+linux/tc_act/tc_bpf.h
+linux/tc_act/tc_csum.h
+linux/tc_act/tc_ct.h
+linux/tc_act/tc_defact.h
+linux/tc_act/tc_gact.h
+linux/tc_act/tc_gate.h
+linux/tc_act/tc_ife.h
+linux/tc_act/tc_ipt.h
+linux/tc_act/tc_mirred.h
+linux/tc_act/tc_mpls.h
+linux/tc_act/tc_nat.h
+linux/tc_act/tc_pedit.h
+linux/tc_act/tc_sample.h
+linux/tc_act/tc_skbedit.h
+linux/tc_act/tc_skbmod.h
+linux/tc_act/tc_tunnel_key.h
+linux/tc_act/tc_vlan.h
+linux/tc_ematch/tc_em_cmp.h
+linux/tc_ematch/tc_em_ipt.h
+linux/tc_ematch/tc_em_meta.h
+linux/tc_ematch/tc_em_nbyte.h
+linux/tc_ematch/tc_em_text.h
+linux/tcp.h
+linux/tcp_metrics.h
+linux/tee.h
+linux/termios.h
+linux/thermal.h
+linux/time.h
+linux/time_types.h
+linux/timerfd.h
+linux/times.h
+linux/timex.h
+linux/tiocl.h
+linux/tipc.h
+linux/tipc_config.h
+linux/tipc_netlink.h
+linux/tls.h
+linux/toshiba.h
+linux/tty.h
+linux/tty_flags.h
+linux/types.h
+linux/ublk_cmd.h
+linux/udf_fs_i.h
+linux/udmabuf.h
+linux/udp.h
+linux/uio.h
+linux/uleds.h
+linux/ultrasound.h
+linux/um_timetravel.h
+linux/un.h
+linux/unistd.h
+linux/unix_diag.h
+linux/usb/cdc-wdm.h
+linux/usb/ch11.h
+linux/usb/ch9.h
+linux/usb/charger.h
+linux/usb/functionfs.h
+linux/usb/g_printer.h
+linux/usb/g_uvc.h
+linux/usb/gadgetfs.h
+linux/usb/midi.h
+linux/usb/raw_gadget.h
+linux/usb/tmc.h
+linux/usb/video.h
+linux/usbdevice_fs.h
+linux/usbip.h
+linux/userfaultfd.h
+linux/userio.h
+linux/utime.h
+linux/utsname.h
+linux/uuid.h
+linux/uvcvideo.h
+linux/v4l2-common.h
+linux/v4l2-controls.h
+linux/v4l2-dv-timings.h
+linux/vbox_err.h
+linux/vbox_vmmdev_types.h
+linux/vboxguest.h
+linux/vdpa.h
+linux/vduse.h
+linux/veth.h
+linux/vfio.h
+linux/vfio_ccw.h
+linux/vfio_zdev.h
+linux/virtio_9p.h
+linux/virtio_balloon.h
+linux/virtio_blk.h
+linux/virtio_bt.h
+linux/virtio_config.h
+linux/virtio_console.h
+linux/virtio_crypto.h
+linux/virtio_fs.h
+linux/virtio_gpio.h
+linux/virtio_gpu.h
+linux/virtio_i2c.h
+linux/virtio_ids.h
+linux/virtio_input.h
+linux/virtio_iommu.h
+linux/virtio_mem.h
+linux/virtio_mmio.h
+linux/virtio_net.h
+linux/virtio_pci.h
+linux/virtio_pcidev.h
+linux/virtio_pmem.h
+linux/virtio_rng.h
+linux/virtio_scmi.h
+linux/virtio_scsi.h
+linux/virtio_snd.h
+linux/virtio_types.h
+linux/virtio_vsock.h
+linux/vm_sockets_diag.h
+linux/vmcore.h
+linux/vsockmon.h
+linux/vt.h
+linux/vtpm_proxy.h
+linux/wait.h
+linux/watch_queue.h
+linux/watchdog.h
+linux/wireguard.h
+linux/wmi.h
+linux/wwan.h
+linux/x25.h
+linux/xattr.h
+linux/xdp_diag.h
+linux/xfrm.h
+linux/xilinx-v4l2-controls.h
+linux/zorro.h
+linux/zorro_ids.h
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 12/12] selftests/bpf: script for infer_header_guards.pl testing
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (10 preceding siblings ...)
  2022-10-25 22:28 ` [RFC bpf-next 11/12] selftests/bpf: Known good uapi headers for test_uapi_headers.py Eduard Zingerman
@ 2022-10-25 22:28 ` Eduard Zingerman
  2022-10-25 23:46 ` [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Alexei Starovoitov
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-25 22:28 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo, Eduard Zingerman

This script verifies that patterns for header guard inference
specified in scripts/infer_header_guards.pl cover all uapi headers.
To achieve this the infer_header_guards.pl is invoked the same way it
is invoked from link-vmlinux.sh but with --report-failures flag.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../bpf/test_uapi_header_guards_infer.sh      | 33 +++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh

diff --git a/tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh b/tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
new file mode 100755
index 000000000000..bd332db100f3
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+# This script verifies that patterns for header guard inference
+# specified in scripts/infer_header_guards.pl cover all uapi headers.
+# To achieve this the infer_header_guards.pl is invoked the same way
+# it is invoked from link-vmlinux.sh but with --report-failures flag.
+
+kernel_dir=$(dirname $0)/../../../../
+
+# The SRCARCH is defined in tools/scripts/Makefile.arch, thus use a
+# temporary makefile to get access to this variable.
+fake_makefile=$(cat <<EOF
+include tools/scripts/Makefile.arch
+default:
+	scripts/infer_header_guards.pl --report-failures \
+		include/uapi \
+		include/generated/uapi \
+		arch/\$(SRCARCH)/include/uapi \
+		arch/\$(SRCARCH)/include/generated/uapi 1>/dev/null
+EOF
+)
+
+# The infer_header_guards.pl script prints inferred guards to stdout,
+# redirecting stdout to /dev/null to see only error messages.
+echo "$fake_makefile" | make -C $kernel_dir -f - 1>/dev/null
+if [ "$?" == "0" ]; then
+	echo "all good"
+	exit 0
+fi
+
+# Failures are already reported by infer_header_guards.pl
+exit 1
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (11 preceding siblings ...)
  2022-10-25 22:28 ` [RFC bpf-next 12/12] selftests/bpf: script for infer_header_guards.pl testing Eduard Zingerman
@ 2022-10-25 23:46 ` Alexei Starovoitov
  2022-10-26 22:46   ` Eduard Zingerman
  2022-10-26 11:10 ` Alan Maguire
  2022-10-27 23:14 ` Andrii Nakryiko
  14 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2022-10-25 23:46 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Wed, Oct 26, 2022 at 01:27:49AM +0300, Eduard Zingerman wrote:
> 
> Include the following system header:
> - /usr/include/sys/socket.h (all via linux/if.h)
> 
> The sys/socket.h conflicts with vmlinux.h in:
> - types: struct iovec, struct sockaddr, struct msghdr, ...
> - constants: SOCK_STREAM, SOCK_DGRAM, ...
> 
> However, only two types are actually used:
> - struct sockaddr
> - struct sockaddr_storage (used only in linux/mptcp.h)
> 
> In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> (non UAPI header), thus does not have a header guard.
> 
> The only workaround that I see is to:
> - define a stub sys/socket.h as follows:
> 
>     #ifndef __BPF_SOCKADDR__
>     #define __BPF_SOCKADDR__
>     
>     /* For __kernel_sa_family_t */
>     #include <linux/socket.h>
>     
>     struct sockaddr {
>         __kernel_sa_family_t sa_family;
>         char sa_data[14];
>     };
>     
>     #endif
> 
> - hardcode generation of __BPF_SOCKADDR__ bracket for
>   'struct sockaddr' in vmlinux.h.

we don't need to hack sys/socket.h and can hardcode
#ifdef _SYS_SOCKET_H as header guard for sockaddr instead, right?
bits/socket.h has this:
#ifndef _SYS_SOCKET_H
# error "Never include <bits/socket.h> directly; use <sys/socket.h> instead."

So that ifdef is kinda stable.

> Another possibility is to move the definition of 'struct sockaddr'
> from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> but I expect that this won't fly with the mainline as it might break
> the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> 
> Conflict with vmlinux.h
> ----
> 
> Uapi header:
> - linux/signal.h
> 
> Conflict with vmlinux.h in definition of 'struct sigaction'.
> Defined in:
> - vmlinux.h: kernel/include/linux/signal_types.h
> - uapi:      kernel/arch/x86/include/asm/signal.h
> 
> Uapi headers:
> - linux/tipc_sockets_diag.h
> - linux/sock_diag.h
> 
> Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.

Interesting one!
I think we can hard code '#undef SOCK_DESTROY' in vmlinux.h

The goal is not to be able to mix arbitrary uapi header with
vmlinux.h, but only those that could be useful out of bpf progs.
Currently it's tcp.h and few other network related headers
because they have #define-s in them that are useful inside bpf progs.
As long as the solution covers this small subset we're fine.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (12 preceding siblings ...)
  2022-10-25 23:46 ` [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Alexei Starovoitov
@ 2022-10-26 11:10 ` Alan Maguire
  2022-10-26 23:54   ` Eduard Zingerman
  2022-10-27 23:14 ` Andrii Nakryiko
  14 siblings, 1 reply; 46+ messages in thread
From: Alan Maguire @ 2022-10-26 11:10 UTC (permalink / raw)
  To: Eduard Zingerman, bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo

On 25/10/2022 23:27, Eduard Zingerman wrote:
> Hi BPF community,
> 
> AFAIK there is a long standing feature request to use kernel headers
> alongside `vmlinux.h` generated by `bpftool`. For example significant
> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> clang, unfortunately this effort was stuck due to concerns regarding C
> language semantics.
> 

Thanks for the details! It's a tricky problem to solve.

Before diving into this, can I ask if there's another way round this;
is there no way we could teach vmlinux.h generation which types to
skip via some kind of bootstrapping process?

For a bpf object foo.bpf.c that wants to include linux/tcp.h and 
vmlinux.h, something like this seems possible;

1. run the preprocessor (gcc -E) over the BPF program to generate 
a bootstrap header foo.bpf.exclude_types.h consisting of all the types mentioned
in it and associated includes, but not in vmlinux.h  It would need to -D__VMLINUX_H__ 
to avoid including vmlinux.h definitions and -D__BPF_EXCLUDE_TYPE_BOOTSTRAP__ to 
skip the programs themselves, which would need a guard around them I think:

#include <stddef.h>
#include <stdbool.h>
#include <vmlinux.h>
#include <linux/tcp.h>

#ifndef __BPF_EXCLUDE_TYPE_BOOTSTRAP__
//rest of program here
#endif

So as a result of this, we now have a single header file that contains all the types
that non-vmlinux.h include files define.

2. now to generate vmlinux.h, pass foo.bpf.exclude_types.h into "bpftool btf dump" as an
exclude header:

bpftool btf dump --exclude /tmp/foo.bpf.types.h file /sys/kernel/btf/vmlinux format c > vmlinux.h

bpftool would have to parse the exclude header for actual type definitions, spotting struct,
enum and typedef definitions. This is likely made easier by running the preprocessor
at least since formatting is probably quite uniform. vmlinux.h could simply emit forward declarations
for types described both in vmlinux BTF and in the exclude file.

So the build process would be

- start with empty vmlinux.h
- bootstrap a header consisting of the set of types to exclude via c preprocessor
- generate new vmlinux.h based on above
- build bpf program

Build processes for BPF objects already has bootstrapping elements like
this for generating vmlinux.h and skeletons, so while it's not perfect it might
be a simpler approach. There may be problems with this I'm not seeing though?

Thanks!

Alan

> After some discussion with Alexei and Yonghong I'd like to request
> your comments regarding a somewhat brittle and partial solution to
> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> the generated `vmlinux.h`.
> 
> The basic idea
> ---
> 
> The goal of the patch set is to allow usage of header files from
> `include/uapi` alongside `vmlinux.h` as follows:
> 
>   #include <uapi/linux/tcp.h>
>   #include "vmlinux.h"
> 
> This goal is achieved by adding `#ifndef ... #endif` guards in
> `vmlinux.h` around definitions that originate from the `include/uapi`
> headers. The guards emitted match the guards used in the original
> headers. E.g. as follows:
> 
> include/uapi/linux/tcp.h:
> 
>   #ifndef _UAPI_LINUX_TCP_H
>   #define _UAPI_LINUX_TCP_H
>   ...
>   union tcp_word_hdr {
> 	struct tcphdr hdr;
> 	__be32        words[5];
>   };
>   ...
>   #endif /* _UAPI_LINUX_TCP_H */
> 
> vmlinux.h:
> 
>   ...
>   #ifndef _UAPI_LINUX_TCP_H
> 
>   union tcp_word_hdr {
> 	struct tcphdr hdr;
> 	__be32 words[5];
>   };
> 
>   #endif /* _UAPI_LINUX_TCP_H */
>   ...
> 
> To get to this state the following steps are necessary:
> - "header guard" name should be identified for each header file;
> - the correspondence between data type and it's header guard has to be
>   encoded in BTF;
> - `bpftool` should be adjusted to emit `#ifndef FOO_H ... #endif`
>   brackets.
> 
> It is not possible to identify header guard names for all uapi headers
> basing only on the file name. However a simple script could devised to
> identify the guards basing on the file name and it's content. Thus it
> is possible to obtain the list of header names with corresponding
> header guards.
> 
> The correspondence between type and it's declaration file (header) is
> available in DWARF as `DW_AT_decl_file` attribute. The
> `DW_AT_decl_file` can be matched with the list of header guards
> described above to obtain the header guard name for a specific type.
> 
> The `pahole` generates BTF using DWARF. It is possible to modify
> `pahole` to accept the header guards list as an additional parameter
> and to encode the header guard names in BTF.
> 
> Implementation details
> ---
> 
> Present patch-set implements these ideas as follows:
> - A parameter `--header_guards_db` is added to `pahole`. If present it
>   points to a file with a list of `<header> <guard>` records.
> - `pahole` uses DWARF `DW_AT_decl_file` value to lookup the header
>   guard for each type emitted to BTF. If header guard is present it is
>   encoded alongside the type.
> - Header guards are encoded in BTF as `BTF_DECL_TAG` records with a
>   special prefix. The prefix "header_guard:" is added to a value of
>   such tags. (Here `BTF_DECL_TAG` is used to avoid BTF binary format
>   changes).
> - A special script `infer_header_guards.pl` is added as a part of
>   kbuild, it can infer header guard names for each UAPI header basing
>   on the header content.
> - This script is invoked from `link-vmlinux.sh` prior to BTF
>   generation during kernel build. The output of the script is saved to
>   a file, the file is passed to `pahole` as `--header_guards_db`
>   parameter.
> - `libbpf` is modified to aggregate `BTF_DECL_TAG` records for each
>   type and to emit `#ifndef FOO_H ... #endif` brackets when
>   "header_guard:" tag is present for a type.
> 
> Details for each patch in a set:
> - libbpf: Deduplicate unambigous standalone forward declarations
> - selftests/bpf: Tests for standalone forward BTF declarations deduplication
> 
>   There is a small number (63 for defconfig) of forward declarations
>   that are not de-duplicated with the main type declaration under
>   certain conditions. This hinders the header guard brackets
>   generation. This patch addresses this de-duplication issue.
> 
> - libbpf: Support for BTF_DECL_TAG dump in C format
> - selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> 
>   Currently libbpf does not process BTF_DECL_TAG when btf is dumped in
>   C format. This patch adds a hash table matching btf type ids with a
>   list of decl tags to the struct btf_dump.
>   The `btf_dump_emit_decl_tags` is not necessary for the overall
>   patch-set to function but simplifies testing a bit.
> 
> - libbpf: Header guards for selected data structures in vmlinux.h
> - selftests/bpf: Tests for header guards printing in BTF dump
> 
>   Adds option `emit_header_guards` to `struct btf_dump_opts`.
>   When enabled the `btf_dump__dump_type` prints `#ifndef ... #endif`
>   brackets around types for which header guard information is present
>   in BTF.
> 
> - bpftool: Enable header guards generation
> 
>   Unconditionally enables `emit_header_guards` for BTF dump in C format.
> 
> - kbuild: Script to infer header guard values for uapi headers
> - kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> 
>   Adds `scripts/infer_header_guards.pl` and integrates it with
>   `link-vmlinux.sh`.
> 
> - selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> 
>   Adds a script `test_uapi_headers.py` that tests header guards with
>   vmlinux.h by compiling a simple C snippet. The snippet looks as
>   follows:
>   
>     #include <some_uapi_header.h>
>     #include "vmlinux.h"
>   
>     __attribute__((section("tc"), used))
>     int syncookie_tc(struct __sk_buff *skb) { return 0; }
>   
>   The list of headers to test comes from
>   `tools/testing/selftests/bpf/good_uapi_headers.txt`.
> 
> - selftests/bpf: Known good uapi headers for test_uapi_headers.py
> 
>   The list of uapi headers that could be included alongside vmlinux.h.
>   The headers are peeked from the following locations:
>   - <headers-export-dir>/linux/*.h
>   - <headers-export-dir>/linux/**/*.h
>   This choice of locations is somewhat arbitrary.
> 
> - selftests/bpf: script for infer_header_guards.pl testing
> 
>   The test case for `scripts/infer_header_guards.pl`, verifies that
>   header guards can be inferred for all uapi headers.
> 
> - There is also a patch for dwarves that adds `--header_guards_db`
>   option (see [2]).
> 
> The `test_uapi_headers.py` is important as it demonstrates the
> the necessary compiler flags:
> 
> clang ...                                  \
>       -D__x86_64__                         \
>       -Xclang -fwchar-type=short           \
>       -Xclang -fno-signed-wchar            \
>       -I{exported_kernel_headers}/include/ \
>       ...
> 
> - `-fwchar-type=short` and `-fno-signed-wchar` had to be added because
>   BPF target uses `int` for `wchar_t` by default and this differs from
>   `vmlinux.h` definition of the type (at least for x86_64).
> - `__x86_64__` had to be added for uapi headers that include
>   `stddef.h` (the one that is supplied my CLANG itself), in order to
>   define correct sizes for `size_t` and `ptrdiff_t`.
> - The `{exported_kernel_headers}` stands for exported kernel headers
>   directory (the headers obtained by `make headers_install` or via
>   distribution package).
> 
> When it works
> ---
> 
> The mechanics described above works for a significant number of UAPI
> headers. For example, for the test case above I chose the headers from
> the following locations:
> - linux/*.h
> - linux/**/*.h
> There are 759 such headers and for 677 of them the test described
> above passes.
> 
> I excluded the headers from the following sub-directories as
> potentially not interesting:
> 
>   asm          rdma   video xen
>   asm-generic  misc   scsi
>   drm          mtd    sound
> 
> Thus saving some time for both discussion and CI but the choice is
> somewhat arbitrary. If I run `test_uapi_headers.py --test '*'` (all
> headers) test passes for 834 out of 972 headers.
> 
> When it breaks
> ---
> 
> There several scenarios when this mechanics breaks.
> Specifically I found the following cases:
> - When uapi header includes some system header that conflicts with
>   vmlinux.h.
> - When uapi header itself conflicts with vmlinux.h.
> 
> Below are examples for both cases.
> 
> Conflict with system headers
> ----
> 
> The following uapi headers:
> - linux/atmbr2684.h
> - linux/bpfilter.h
> - linux/gsmmux.h
> - linux/icmp.h
> - linux/if.h
> - linux/if_arp.h
> - linux/if_bonding.h
> - linux/if_pppox.h
> - linux/if_tunnel.h
> - linux/ip6_tunnel.h
> - linux/llc.h
> - linux/mctp.h
> - linux/mptcp.h
> - linux/netdevice.h
> - linux/netfilter/xt_RATEEST.h
> - linux/netfilter/xt_hashlimit.h
> - linux/netfilter/xt_physdev.h
> - linux/netfilter/xt_rateest.h
> - linux/netfilter_arp/arp_tables.h
> - linux/netfilter_arp/arpt_mangle.h
> - linux/netfilter_bridge.h
> - linux/netfilter_bridge/ebtables.h
> - linux/netfilter_ipv4/ip_tables.h
> - linux/netfilter_ipv6/ip6_tables.h
> - linux/route.h
> - linux/wireless.h
> 
> Include the following system header:
> - /usr/include/sys/socket.h (all via linux/if.h)
> 
> The sys/socket.h conflicts with vmlinux.h in:
> - types: struct iovec, struct sockaddr, struct msghdr, ...
> - constants: SOCK_STREAM, SOCK_DGRAM, ...
> 
> However, only two types are actually used:
> - struct sockaddr
> - struct sockaddr_storage (used only in linux/mptcp.h)
> 
> In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> (non UAPI header), thus does not have a header guard.
> 
> The only workaround that I see is to:
> - define a stub sys/socket.h as follows:
> 
>     #ifndef __BPF_SOCKADDR__
>     #define __BPF_SOCKADDR__
>     
>     /* For __kernel_sa_family_t */
>     #include <linux/socket.h>
>     
>     struct sockaddr {
>         __kernel_sa_family_t sa_family;
>         char sa_data[14];
>     };
>     
>     #endif
> 
> - hardcode generation of __BPF_SOCKADDR__ bracket for
>   'struct sockaddr' in vmlinux.h.
> 
> Another possibility is to move the definition of 'struct sockaddr'
> from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> but I expect that this won't fly with the mainline as it might break
> the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> 
> Conflict with vmlinux.h
> ----
> 
> Uapi header:
> - linux/signal.h
> 
> Conflict with vmlinux.h in definition of 'struct sigaction'.
> Defined in:
> - vmlinux.h: kernel/include/linux/signal_types.h
> - uapi:      kernel/arch/x86/include/asm/signal.h
> 
> Uapi headers:
> - linux/tipc_sockets_diag.h
> - linux/sock_diag.h
> 
> Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
> Defined in:
> - vmlinux.h: kernel/include/net/sock.h
> - uapi:      kernel/include/uapi/linux/sock_diag.h
> Constants seem to be unrelated.
> 
> And so on... I have details for many other headers but omit those for
> brevity.
> 
> In conclusion
> ---
> 
> Except from the general feasibility I have a few questions:
> - What UAPI headers are the candidates for such use? If there are some
>   interesting headers currently not working with this patch-set some
>   hacks have to be added (e.g. like with `linux/if.h`).
> - Is it ok to encode header guards as special `BTF_DECL_TAG` or should
>   I change the BTF format a bit to save some bytes.
> 
> Thanks,
> Eduard
> 
> 
> [1] https://reviews.llvm.org/D111307
>     [clang] __attribute__ bpf_dominating_decl
> [2] https://lore.kernel.org/dwarves/20221025220729.2293891-1-eddyz87@gmail.com/T/
>     [RFC dwarves] pahole: Save header guard names when
>                   --header_guards_db is passed
> 
> Eduard Zingerman (12):
>   libbpf: Deduplicate unambigous standalone forward declarations
>   selftests/bpf: Tests for standalone forward BTF declarations
>     deduplication
>   libbpf: Support for BTF_DECL_TAG dump in C format
>   selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>   libbpf: Header guards for selected data structures in vmlinux.h
>   selftests/bpf: Tests for header guards printing in BTF dump
>   bpftool: Enable header guards generation
>   kbuild: Script to infer header guard values for uapi headers
>   kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>   selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>   selftests/bpf: Known good uapi headers for test_uapi_headers.py
>   selftests/bpf: script for infer_header_guards.pl testing
> 
>  scripts/infer_header_guards.pl                | 191 +++++
>  scripts/link-vmlinux.sh                       |  13 +-
>  tools/bpf/bpftool/btf.c                       |   4 +-
>  tools/lib/bpf/btf.c                           | 178 ++++-
>  tools/lib/bpf/btf.h                           |   7 +-
>  tools/lib/bpf/btf_dump.c                      | 232 +++++-
>  .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>  tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>  .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>  .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>  .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>  .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>  .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>  13 files changed, 1816 insertions(+), 12 deletions(-)
>  create mode 100755 scripts/infer_header_guards.pl
>  create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-25 23:46 ` [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Alexei Starovoitov
@ 2022-10-26 22:46   ` Eduard Zingerman
  0 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-26 22:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Tue, 2022-10-25 at 16:46 -0700, Alexei Starovoitov wrote:
> On Wed, Oct 26, 2022 at 01:27:49AM +0300, Eduard Zingerman wrote:
> > 
> > Include the following system header:
> > - /usr/include/sys/socket.h (all via linux/if.h)
> > 
> > The sys/socket.h conflicts with vmlinux.h in:
> > - types: struct iovec, struct sockaddr, struct msghdr, ...
> > - constants: SOCK_STREAM, SOCK_DGRAM, ...
> > 
> > However, only two types are actually used:
> > - struct sockaddr
> > - struct sockaddr_storage (used only in linux/mptcp.h)
> > 
> > In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> > (non UAPI header), thus does not have a header guard.
> > 
> > The only workaround that I see is to:
> > - define a stub sys/socket.h as follows:
> > 
> >     #ifndef __BPF_SOCKADDR__
> >     #define __BPF_SOCKADDR__
> >     
> >     /* For __kernel_sa_family_t */
> >     #include <linux/socket.h>
> >     
> >     struct sockaddr {
> >         __kernel_sa_family_t sa_family;
> >         char sa_data[14];
> >     };
> >     
> >     #endif
> > 
> > - hardcode generation of __BPF_SOCKADDR__ bracket for
> >   'struct sockaddr' in vmlinux.h.
> 
> we don't need to hack sys/socket.h and can hardcode
> #ifdef _SYS_SOCKET_H as header guard for sockaddr instead, right?
> bits/socket.h has this:
> #ifndef _SYS_SOCKET_H
> # error "Never include <bits/socket.h> directly; use <sys/socket.h> instead."
> 
> So that ifdef is kinda stable.

The `if.h` only uses two types from `sys/socket.h`, namely:
- `struct sockaddr`
- `struct sockaddr_storage`

However `sys/socket.h` itself defines more types, here is a complete
list of types from `sys/socket.h` that conflict with `vmlinux.h`
(generated for my x86_64 laptop):

Type name       System header
iovec           /usr/include/bits/types/struct_iovec.h
loff_t          /usr/include/sys/types.h
dev_t           /usr/include/sys/types.h
nlink_t         /usr/include/sys/types.h
timer_t         /usr/include/bits/types/timer_t.h
int16_t         /usr/include/bits/stdint-intn.h
int32_t         /usr/include/bits/stdint-intn.h
int64_t         /usr/include/bits/stdint-intn.h
u_int64_t       /usr/include/sys/types.h
sigset_t        /usr/include/bits/types/sigset_t.h
fd_set          /usr/include/sys/select.h
blkcnt_t        /usr/include/sys/types.h
SOCK_STREAM     /usr/include/bits/socket_type.h
SOCK_DGRAM      /usr/include/bits/socket_type.h
SOCK_RAW        /usr/include/bits/socket_type.h
SOCK_RDM        /usr/include/bits/socket_type.h
SOCK_SEQPACKET  /usr/include/bits/socket_type.h
SOCK_DCCP       /usr/include/bits/socket_type.h
SOCK_PACKET     /usr/include/bits/socket_type.h
sockaddr        /usr/include/bits/socket.h
msghdr          /usr/include/bits/socket.h
cmsghdr         /usr/include/bits/socket.h
linger          /usr/include/bits/socket.h
SHUT_RD         /usr/include/sys/socket.h
SHUT_WR         /usr/include/sys/socket.h
SHUT_RDWR       /usr/include/sys/socket.h

It would be safe to wrap the corresponding types in the vmlinux.h with
_SYS_SOCKET_H / _SYS_TYPES guards if the definitions above match
between libc and kernel. To my surprise not all of them match. Here is
the list of genuine conflicts (for typedefs I skip the intermediate
definitions and print the last typedef in the chain):

Type: dev_t
typedef unsigned int __u32                                vmlinux.h
typedef unsigned long int __dev_t         /usr/include/bits/types.h

Type: nlink_t
typedef unsigned int __u32                                vmlinux.h
typedef unsigned long int __nlink_t       /usr/include/bits/types.h

Type: timer_t
typedef int __kernel_timer _t                             vmlinux.h
typedef void *__timer_t                   /usr/include/bits/types.h

Type: sigset_t
typedef struct                                            vmlinux.h
{
  long unsigned int sig[1];
} sigset_t

typedef struct                 /usr/include/bits/types/__sigset_t.h
{
  unsigned long int __val[1024 / (8 * (sizeof(unsigned long int)))];
} __sigset_t

Type: fd_set
typedef struct                                            vmlinux.h
{
  long unsigned int fds_bits[16];
} __kernel_fd_set

typedef struct                            /usr/include/sys/select.h
{
  __fd_mask __fds_bits[1024 / (8 * ((int) (sizeof(__fd_mask))))];
} fd_set

Type: sigset_t
typedef struct                                            vmlinux.h 
{
  long unsigned int sig[1];
} sigset_t

typedef struct                 /usr/include/bits/types/__sigset_t.h
{
  unsigned long int __val[1024 / (8 * (sizeof(unsigned long int)))];
} __sigset_t

Type: msghdr
struct msghdr                                             vmlinux.h
{
  void *msg_name;
  int msg_namelen;
  int msg_inq;
  struct iov_iter msg_iter;
  union 
  {
    void *msg_control;
    void *msg_control_user;
  };
  bool msg_control_is_user : 1;
  bool msg_get_inq : 1;
  unsigned int msg_flags;
  __kernel_size_t msg_controllen;
  struct kiocb *msg_iocb;
  struct ubuf_info *msg_ubuf;
  struct sock *, struct sk_buff *, struct iov_iter *, size_tint;
}

struct msghdr                            /usr/include/bits/socket.h
{
  void *msg_name;
  socklen_t msg_namelen;
  struct iovec *msg_iov;
  size_t msg_iovlen;
  void *msg_control;
  size_t msg_controllen;
  int msg_flags;
}

> 
> > Another possibility is to move the definition of 'struct sockaddr'
> > from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> > but I expect that this won't fly with the mainline as it might break
> > the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> > 
> > Conflict with vmlinux.h
> > ----
> > 
> > Uapi header:
> > - linux/signal.h
> > 
> > Conflict with vmlinux.h in definition of 'struct sigaction'.
> > Defined in:
> > - vmlinux.h: kernel/include/linux/signal_types.h
> > - uapi:      kernel/arch/x86/include/asm/signal.h
> > 
> > Uapi headers:
> > - linux/tipc_sockets_diag.h
> > - linux/sock_diag.h
> > 
> > Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
> 
> Interesting one!
> I think we can hard code '#undef SOCK_DESTROY' in vmlinux.h
> 
> The goal is not to be able to mix arbitrary uapi header with
> vmlinux.h, but only those that could be useful out of bpf progs.
> Currently it's tcp.h and few other network related headers
> because they have #define-s in them that are useful inside bpf progs.
> As long as the solution covers this small subset we're fine.

Well, tcp.h works :) It would be great if someone could list the
interesting headers.

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-26 11:10 ` Alan Maguire
@ 2022-10-26 23:54   ` Eduard Zingerman
  0 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-26 23:54 UTC (permalink / raw)
  To: Alan Maguire, bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo

On Wed, 2022-10-26 at 12:10 +0100, Alan Maguire wrote:
> On 25/10/2022 23:27, Eduard Zingerman wrote:
> > Hi BPF community,
> > 
> > AFAIK there is a long standing feature request to use kernel headers
> > alongside `vmlinux.h` generated by `bpftool`. For example significant
> > effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> > clang, unfortunately this effort was stuck due to concerns regarding C
> > language semantics.
> > 
> 
> Thanks for the details! It's a tricky problem to solve.
> 
> Before diving into this, can I ask if there's another way round this;
> is there no way we could teach vmlinux.h generation which types to
> skip via some kind of bootstrapping process?
> 
> For a bpf object foo.bpf.c that wants to include linux/tcp.h and 
> vmlinux.h, something like this seems possible;
> 
> 1. run the preprocessor (gcc -E) over the BPF program to generate 
> a bootstrap header foo.bpf.exclude_types.h consisting of all the types mentioned
> in it and associated includes, but not in vmlinux.h  It would need to -D__VMLINUX_H__ 
> to avoid including vmlinux.h definitions and -D__BPF_EXCLUDE_TYPE_BOOTSTRAP__ to 
> skip the programs themselves, which would need a guard around them I think:
> 
> #include <stddef.h>
> #include <stdbool.h>
> #include <vmlinux.h>
> #include <linux/tcp.h>
> 
> #ifndef __BPF_EXCLUDE_TYPE_BOOTSTRAP__
> //rest of program here
> #endif
> 
> So as a result of this, we now have a single header file that contains all the types
> that non-vmlinux.h include files define.
> 
> 2. now to generate vmlinux.h, pass foo.bpf.exclude_types.h into "bpftool btf dump" as an
> exclude header:
> 
> bpftool btf dump --exclude /tmp/foo.bpf.types.h file /sys/kernel/btf/vmlinux format c > vmlinux.h
> 
> bpftool would have to parse the exclude header for actual type definitions, spotting struct,
> enum and typedef definitions. This is likely made easier by running the preprocessor
> at least since formatting is probably quite uniform. vmlinux.h could simply emit forward declarations
> for types described both in vmlinux BTF and in the exclude file.
> 
> So the build process would be
> 
> - start with empty vmlinux.h
> - bootstrap a header consisting of the set of types to exclude via c preprocessor
> - generate new vmlinux.h based on above
> - build bpf program
> 
> Build processes for BPF objects already has bootstrapping elements like
> this for generating vmlinux.h and skeletons, so while it's not perfect it might
> be a simpler approach. There may be problems with this I'm not seeing though?

I like the tool based approach more but I heard there were some
reservations about separate tool complicating the build process.
- the tool does not require changes to the kbuild;
- the tool is not limited to uapi headers;
- the tool could imitate the dominating declarations attribute and
  address the issue with definitions miss-match between vmlinux.h and
  system headers (see my reply to Alexei in the parallel sub-thread).
  To support this point it would have to work in a somewhat different
  order:
  - read/pre-process the input file;
  - remove from it the definitions that are also present in kernel BTF;
  - prepend vmlinux.h to the beginning of the file.

On the other hand I think that implementation would need a C parser /
type analysis step. Thus it would be harder to implement it as a part
of bpftool. But the prototype using something like [1] should be simple.

Thanks,
Eduard


[1] https://github.com/inducer/pycparserext

> 
> Thanks!
> 
> Alan
> 
> > After some discussion with Alexei and Yonghong I'd like to request
> > your comments regarding a somewhat brittle and partial solution to
> > this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> > the generated `vmlinux.h`.
> > 
> > The basic idea
> > ---
> > 
> > The goal of the patch set is to allow usage of header files from
> > `include/uapi` alongside `vmlinux.h` as follows:
> > 
> >   #include <uapi/linux/tcp.h>
> >   #include "vmlinux.h"
> > 
> > This goal is achieved by adding `#ifndef ... #endif` guards in
> > `vmlinux.h` around definitions that originate from the `include/uapi`
> > headers. The guards emitted match the guards used in the original
> > headers. E.g. as follows:
> > 
> > include/uapi/linux/tcp.h:
> > 
> >   #ifndef _UAPI_LINUX_TCP_H
> >   #define _UAPI_LINUX_TCP_H
> >   ...
> >   union tcp_word_hdr {
> > 	struct tcphdr hdr;
> > 	__be32        words[5];
> >   };
> >   ...
> >   #endif /* _UAPI_LINUX_TCP_H */
> > 
> > vmlinux.h:
> > 
> >   ...
> >   #ifndef _UAPI_LINUX_TCP_H
> > 
> >   union tcp_word_hdr {
> > 	struct tcphdr hdr;
> > 	__be32 words[5];
> >   };
> > 
> >   #endif /* _UAPI_LINUX_TCP_H */
> >   ...
> > 
> > To get to this state the following steps are necessary:
> > - "header guard" name should be identified for each header file;
> > - the correspondence between data type and it's header guard has to be
> >   encoded in BTF;
> > - `bpftool` should be adjusted to emit `#ifndef FOO_H ... #endif`
> >   brackets.
> > 
> > It is not possible to identify header guard names for all uapi headers
> > basing only on the file name. However a simple script could devised to
> > identify the guards basing on the file name and it's content. Thus it
> > is possible to obtain the list of header names with corresponding
> > header guards.
> > 
> > The correspondence between type and it's declaration file (header) is
> > available in DWARF as `DW_AT_decl_file` attribute. The
> > `DW_AT_decl_file` can be matched with the list of header guards
> > described above to obtain the header guard name for a specific type.
> > 
> > The `pahole` generates BTF using DWARF. It is possible to modify
> > `pahole` to accept the header guards list as an additional parameter
> > and to encode the header guard names in BTF.
> > 
> > Implementation details
> > ---
> > 
> > Present patch-set implements these ideas as follows:
> > - A parameter `--header_guards_db` is added to `pahole`. If present it
> >   points to a file with a list of `<header> <guard>` records.
> > - `pahole` uses DWARF `DW_AT_decl_file` value to lookup the header
> >   guard for each type emitted to BTF. If header guard is present it is
> >   encoded alongside the type.
> > - Header guards are encoded in BTF as `BTF_DECL_TAG` records with a
> >   special prefix. The prefix "header_guard:" is added to a value of
> >   such tags. (Here `BTF_DECL_TAG` is used to avoid BTF binary format
> >   changes).
> > - A special script `infer_header_guards.pl` is added as a part of
> >   kbuild, it can infer header guard names for each UAPI header basing
> >   on the header content.
> > - This script is invoked from `link-vmlinux.sh` prior to BTF
> >   generation during kernel build. The output of the script is saved to
> >   a file, the file is passed to `pahole` as `--header_guards_db`
> >   parameter.
> > - `libbpf` is modified to aggregate `BTF_DECL_TAG` records for each
> >   type and to emit `#ifndef FOO_H ... #endif` brackets when
> >   "header_guard:" tag is present for a type.
> > 
> > Details for each patch in a set:
> > - libbpf: Deduplicate unambigous standalone forward declarations
> > - selftests/bpf: Tests for standalone forward BTF declarations deduplication
> > 
> >   There is a small number (63 for defconfig) of forward declarations
> >   that are not de-duplicated with the main type declaration under
> >   certain conditions. This hinders the header guard brackets
> >   generation. This patch addresses this de-duplication issue.
> > 
> > - libbpf: Support for BTF_DECL_TAG dump in C format
> > - selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> > 
> >   Currently libbpf does not process BTF_DECL_TAG when btf is dumped in
> >   C format. This patch adds a hash table matching btf type ids with a
> >   list of decl tags to the struct btf_dump.
> >   The `btf_dump_emit_decl_tags` is not necessary for the overall
> >   patch-set to function but simplifies testing a bit.
> > 
> > - libbpf: Header guards for selected data structures in vmlinux.h
> > - selftests/bpf: Tests for header guards printing in BTF dump
> > 
> >   Adds option `emit_header_guards` to `struct btf_dump_opts`.
> >   When enabled the `btf_dump__dump_type` prints `#ifndef ... #endif`
> >   brackets around types for which header guard information is present
> >   in BTF.
> > 
> > - bpftool: Enable header guards generation
> > 
> >   Unconditionally enables `emit_header_guards` for BTF dump in C format.
> > 
> > - kbuild: Script to infer header guard values for uapi headers
> > - kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> > 
> >   Adds `scripts/infer_header_guards.pl` and integrates it with
> >   `link-vmlinux.sh`.
> > 
> > - selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> > 
> >   Adds a script `test_uapi_headers.py` that tests header guards with
> >   vmlinux.h by compiling a simple C snippet. The snippet looks as
> >   follows:
> >   
> >     #include <some_uapi_header.h>
> >     #include "vmlinux.h"
> >   
> >     __attribute__((section("tc"), used))
> >     int syncookie_tc(struct __sk_buff *skb) { return 0; }
> >   
> >   The list of headers to test comes from
> >   `tools/testing/selftests/bpf/good_uapi_headers.txt`.
> > 
> > - selftests/bpf: Known good uapi headers for test_uapi_headers.py
> > 
> >   The list of uapi headers that could be included alongside vmlinux.h.
> >   The headers are peeked from the following locations:
> >   - <headers-export-dir>/linux/*.h
> >   - <headers-export-dir>/linux/**/*.h
> >   This choice of locations is somewhat arbitrary.
> > 
> > - selftests/bpf: script for infer_header_guards.pl testing
> > 
> >   The test case for `scripts/infer_header_guards.pl`, verifies that
> >   header guards can be inferred for all uapi headers.
> > 
> > - There is also a patch for dwarves that adds `--header_guards_db`
> >   option (see [2]).
> > 
> > The `test_uapi_headers.py` is important as it demonstrates the
> > the necessary compiler flags:
> > 
> > clang ...                                  \
> >       -D__x86_64__                         \
> >       -Xclang -fwchar-type=short           \
> >       -Xclang -fno-signed-wchar            \
> >       -I{exported_kernel_headers}/include/ \
> >       ...
> > 
> > - `-fwchar-type=short` and `-fno-signed-wchar` had to be added because
> >   BPF target uses `int` for `wchar_t` by default and this differs from
> >   `vmlinux.h` definition of the type (at least for x86_64).
> > - `__x86_64__` had to be added for uapi headers that include
> >   `stddef.h` (the one that is supplied my CLANG itself), in order to
> >   define correct sizes for `size_t` and `ptrdiff_t`.
> > - The `{exported_kernel_headers}` stands for exported kernel headers
> >   directory (the headers obtained by `make headers_install` or via
> >   distribution package).
> > 
> > When it works
> > ---
> > 
> > The mechanics described above works for a significant number of UAPI
> > headers. For example, for the test case above I chose the headers from
> > the following locations:
> > - linux/*.h
> > - linux/**/*.h
> > There are 759 such headers and for 677 of them the test described
> > above passes.
> > 
> > I excluded the headers from the following sub-directories as
> > potentially not interesting:
> > 
> >   asm          rdma   video xen
> >   asm-generic  misc   scsi
> >   drm          mtd    sound
> > 
> > Thus saving some time for both discussion and CI but the choice is
> > somewhat arbitrary. If I run `test_uapi_headers.py --test '*'` (all
> > headers) test passes for 834 out of 972 headers.
> > 
> > When it breaks
> > ---
> > 
> > There several scenarios when this mechanics breaks.
> > Specifically I found the following cases:
> > - When uapi header includes some system header that conflicts with
> >   vmlinux.h.
> > - When uapi header itself conflicts with vmlinux.h.
> > 
> > Below are examples for both cases.
> > 
> > Conflict with system headers
> > ----
> > 
> > The following uapi headers:
> > - linux/atmbr2684.h
> > - linux/bpfilter.h
> > - linux/gsmmux.h
> > - linux/icmp.h
> > - linux/if.h
> > - linux/if_arp.h
> > - linux/if_bonding.h
> > - linux/if_pppox.h
> > - linux/if_tunnel.h
> > - linux/ip6_tunnel.h
> > - linux/llc.h
> > - linux/mctp.h
> > - linux/mptcp.h
> > - linux/netdevice.h
> > - linux/netfilter/xt_RATEEST.h
> > - linux/netfilter/xt_hashlimit.h
> > - linux/netfilter/xt_physdev.h
> > - linux/netfilter/xt_rateest.h
> > - linux/netfilter_arp/arp_tables.h
> > - linux/netfilter_arp/arpt_mangle.h
> > - linux/netfilter_bridge.h
> > - linux/netfilter_bridge/ebtables.h
> > - linux/netfilter_ipv4/ip_tables.h
> > - linux/netfilter_ipv6/ip6_tables.h
> > - linux/route.h
> > - linux/wireless.h
> > 
> > Include the following system header:
> > - /usr/include/sys/socket.h (all via linux/if.h)
> > 
> > The sys/socket.h conflicts with vmlinux.h in:
> > - types: struct iovec, struct sockaddr, struct msghdr, ...
> > - constants: SOCK_STREAM, SOCK_DGRAM, ...
> > 
> > However, only two types are actually used:
> > - struct sockaddr
> > - struct sockaddr_storage (used only in linux/mptcp.h)
> > 
> > In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> > (non UAPI header), thus does not have a header guard.
> > 
> > The only workaround that I see is to:
> > - define a stub sys/socket.h as follows:
> > 
> >     #ifndef __BPF_SOCKADDR__
> >     #define __BPF_SOCKADDR__
> >     
> >     /* For __kernel_sa_family_t */
> >     #include <linux/socket.h>
> >     
> >     struct sockaddr {
> >         __kernel_sa_family_t sa_family;
> >         char sa_data[14];
> >     };
> >     
> >     #endif
> > 
> > - hardcode generation of __BPF_SOCKADDR__ bracket for
> >   'struct sockaddr' in vmlinux.h.
> > 
> > Another possibility is to move the definition of 'struct sockaddr'
> > from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> > but I expect that this won't fly with the mainline as it might break
> > the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> > 
> > Conflict with vmlinux.h
> > ----
> > 
> > Uapi header:
> > - linux/signal.h
> > 
> > Conflict with vmlinux.h in definition of 'struct sigaction'.
> > Defined in:
> > - vmlinux.h: kernel/include/linux/signal_types.h
> > - uapi:      kernel/arch/x86/include/asm/signal.h
> > 
> > Uapi headers:
> > - linux/tipc_sockets_diag.h
> > - linux/sock_diag.h
> > 
> > Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
> > Defined in:
> > - vmlinux.h: kernel/include/net/sock.h
> > - uapi:      kernel/include/uapi/linux/sock_diag.h
> > Constants seem to be unrelated.
> > 
> > And so on... I have details for many other headers but omit those for
> > brevity.
> > 
> > In conclusion
> > ---
> > 
> > Except from the general feasibility I have a few questions:
> > - What UAPI headers are the candidates for such use? If there are some
> >   interesting headers currently not working with this patch-set some
> >   hacks have to be added (e.g. like with `linux/if.h`).
> > - Is it ok to encode header guards as special `BTF_DECL_TAG` or should
> >   I change the BTF format a bit to save some bytes.
> > 
> > Thanks,
> > Eduard
> > 
> > 
> > [1] https://reviews.llvm.org/D111307
> >     [clang] __attribute__ bpf_dominating_decl
> > [2] https://lore.kernel.org/dwarves/20221025220729.2293891-1-eddyz87@gmail.com/T/
> >     [RFC dwarves] pahole: Save header guard names when
> >                   --header_guards_db is passed
> > 
> > Eduard Zingerman (12):
> >   libbpf: Deduplicate unambigous standalone forward declarations
> >   selftests/bpf: Tests for standalone forward BTF declarations
> >     deduplication
> >   libbpf: Support for BTF_DECL_TAG dump in C format
> >   selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> >   libbpf: Header guards for selected data structures in vmlinux.h
> >   selftests/bpf: Tests for header guards printing in BTF dump
> >   bpftool: Enable header guards generation
> >   kbuild: Script to infer header guard values for uapi headers
> >   kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> >   selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> >   selftests/bpf: Known good uapi headers for test_uapi_headers.py
> >   selftests/bpf: script for infer_header_guards.pl testing
> > 
> >  scripts/infer_header_guards.pl                | 191 +++++
> >  scripts/link-vmlinux.sh                       |  13 +-
> >  tools/bpf/bpftool/btf.c                       |   4 +-
> >  tools/lib/bpf/btf.c                           | 178 ++++-
> >  tools/lib/bpf/btf.h                           |   7 +-
> >  tools/lib/bpf/btf_dump.c                      | 232 +++++-
> >  .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> >  tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> >  .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> >  .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> >  .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> >  .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> >  .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> >  13 files changed, 1816 insertions(+), 12 deletions(-)
> >  create mode 100755 scripts/infer_header_guards.pl
> >  create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> >  create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> >  create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> > 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-25 22:27 ` [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF Eduard Zingerman
@ 2022-10-27 18:43   ` Yonghong Song
  2022-10-27 18:55     ` Yonghong Song
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2022-10-27 18:43 UTC (permalink / raw)
  To: Eduard Zingerman, bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo



On 10/25/22 3:27 PM, Eduard Zingerman wrote:
> Use pahole --header_guards_db flag to enable encoding of header guard
> information in kernel BTF. The actual correspondence between header
> file and guard string is computed by the scripts/infer_header_guards.pl.
> 
> The encoded header guard information could be used to restore the
> original guards in the vmlinux.h, e.g.:
> 
>      include/uapi/linux/tcp.h:
> 
>        #ifndef _UAPI_LINUX_TCP_H
>        #define _UAPI_LINUX_TCP_H
>        ...
>        union tcp_word_hdr {
>      	struct tcphdr hdr;
>      	__be32        words[5];
>        };
>        ...
>        #endif /* _UAPI_LINUX_TCP_H */
> 
>      vmlinux.h:
> 
>        ...
>        #ifndef _UAPI_LINUX_TCP_H
> 
>        union tcp_word_hdr {
>      	struct tcphdr hdr;
>      	__be32 words[5];
>        };
> 
>        #endif /* _UAPI_LINUX_TCP_H */
>        ...
> 
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
>   scripts/link-vmlinux.sh | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 918470d768e9..f57f621eda1f 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -110,6 +110,7 @@ vmlinux_link()
>   gen_btf()
>   {
>   	local pahole_ver
> +	local extra_flags
>   
>   	if ! [ -x "$(command -v ${PAHOLE})" ]; then
>   		echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
> @@ -122,10 +123,20 @@ gen_btf()
>   		return 1
>   	fi
>   
> +	if [ "${pahole_ver}" -ge "124" ]; then
> +		scripts/infer_header_guards.pl \

We should have full path like
	${srctree}/scripts/infer_header_guards.pl
so it can work if build directory is different from source directory.

> +			include/uapi \
> +			include/generated/uapi \
> +			arch/${SRCARCH}/include/uapi \
> +			arch/${SRCARCH}/include/generated/uapi \
> +			> .btf.uapi_header_guards || return 1;
> +		extra_flags="--header_guards_db .btf.uapi_header_guards"
> +	fi
> +
>   	vmlinux_link ${1}
>   
>   	info "BTF" ${2}
> -	LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
> +	LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${extra_flags} ${1}
>   
>   	# Create ${2} which contains just .BTF section but no symbols. Add
>   	# SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-27 18:43   ` Yonghong Song
@ 2022-10-27 18:55     ` Yonghong Song
  2022-10-27 22:44       ` Yonghong Song
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2022-10-27 18:55 UTC (permalink / raw)
  To: Eduard Zingerman, bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo



On 10/27/22 11:43 AM, Yonghong Song wrote:
> 
> 
> On 10/25/22 3:27 PM, Eduard Zingerman wrote:
>> Use pahole --header_guards_db flag to enable encoding of header guard
>> information in kernel BTF. The actual correspondence between header
>> file and guard string is computed by the scripts/infer_header_guards.pl.
>>
>> The encoded header guard information could be used to restore the
>> original guards in the vmlinux.h, e.g.:
>>
>>      include/uapi/linux/tcp.h:
>>
>>        #ifndef _UAPI_LINUX_TCP_H
>>        #define _UAPI_LINUX_TCP_H
>>        ...
>>        union tcp_word_hdr {
>>          struct tcphdr hdr;
>>          __be32        words[5];
>>        };
>>        ...
>>        #endif /* _UAPI_LINUX_TCP_H */
>>
>>      vmlinux.h:
>>
>>        ...
>>        #ifndef _UAPI_LINUX_TCP_H
>>
>>        union tcp_word_hdr {
>>          struct tcphdr hdr;
>>          __be32 words[5];
>>        };
>>
>>        #endif /* _UAPI_LINUX_TCP_H */
>>        ...
>>
>> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
>> ---
>>   scripts/link-vmlinux.sh | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
>> index 918470d768e9..f57f621eda1f 100755
>> --- a/scripts/link-vmlinux.sh
>> +++ b/scripts/link-vmlinux.sh
>> @@ -110,6 +110,7 @@ vmlinux_link()
>>   gen_btf()
>>   {
>>       local pahole_ver
>> +    local extra_flags
>>       if ! [ -x "$(command -v ${PAHOLE})" ]; then
>>           echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
>> @@ -122,10 +123,20 @@ gen_btf()
>>           return 1
>>       fi
>> +    if [ "${pahole_ver}" -ge "124" ]; then
>> +        scripts/infer_header_guards.pl \
> 
> We should have full path like
>      ${srctree}/scripts/infer_header_guards.pl
> so it can work if build directory is different from source directory.

handling arguments for infer_header_guards.pl should also take
care of full file path.

+ /home/yhs/work/bpf-next/scripts/infer_header_guards.pl include/uapi 
include/generated/uapi arch/x86/include/uapi arch/x86/include/generated/uapi
+ return 1

> 
>> +            include/uapi \
>> +            include/generated/uapi \
>> +            arch/${SRCARCH}/include/uapi \
>> +            arch/${SRCARCH}/include/generated/uapi \
>> +            > .btf.uapi_header_guards || return 1;
>> +        extra_flags="--header_guards_db .btf.uapi_header_guards"
>> +    fi
>> +
>>       vmlinux_link ${1}
>>       info "BTF" ${2}
>> -    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
>> +    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} 
>> ${extra_flags} ${1}
>>       # Create ${2} which contains just .BTF section but no symbols. Add
>>       # SHF_ALLOC because .BTF will be part of the vmlinux image. 
>> --strip-all

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations
  2022-10-25 22:27 ` [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations Eduard Zingerman
@ 2022-10-27 22:07   ` Andrii Nakryiko
  2022-10-31  1:00     ` Eduard Zingerman
  2022-10-31 15:49     ` Eduard Zingerman
  0 siblings, 2 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2022-10-27 22:07 UTC (permalink / raw)
  To: Eduard Zingerman, Alan Maguire
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Deduplicate forward declarations that don't take part in type graphs
> comparisons if declaration name is unambiguous. Example:
>
> CU #1:
>
> struct foo;              // standalone forward declaration
> struct foo *some_global;
>
> CU #2:
>
> struct foo { int x; };
> struct foo *another_global;
>
> The `struct foo` from CU #1 is not a part of any definition that is
> compared against another definition while `btf_dedup_struct_types`
> processes structural types. The the BTF after `btf_dedup_struct_types`
> the BTF looks as follows:
>
> [1] STRUCT 'foo' size=4 vlen=1 ...
> [2] INT 'int' size=4 ...
> [3] PTR '(anon)' type_id=1
> [4] FWD 'foo' fwd_kind=struct
> [5] PTR '(anon)' type_id=4
>
> This commit adds a new pass `btf_dedup_standalone_fwds`, that maps
> such forward declarations to structs or unions with identical name in
> case if the name is not ambiguous.
>
> The pass is positioned before `btf_dedup_ref_types` so that types
> [3] and [5] could be merged as a same type after [1] and [4] are merged.
> The final result for the example above looks as follows:
>
> [1] STRUCT 'foo' size=4 vlen=1
>         'x' type_id=2 bits_offset=0
> [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> [3] PTR '(anon)' type_id=1
>
> For defconfig kernel with BTF enabled this removes 63 forward
> declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
> The running time of `btf__dedup` function is increased by about 3%.
>

What about modules, can you share stats for module BTFs?

Also cc Alan as he was looking at BTF dedup improvements for kernel
module BTF dedup.

> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
>  tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 174 insertions(+), 4 deletions(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index d88647da2c7f..c34c68d8e8a0 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d);
>  static int btf_dedup_prim_types(struct btf_dedup *d);
>  static int btf_dedup_struct_types(struct btf_dedup *d);
>  static int btf_dedup_ref_types(struct btf_dedup *d);
> +static int btf_dedup_standalone_fwds(struct btf_dedup *d);
>  static int btf_dedup_compact_types(struct btf_dedup *d);
>  static int btf_dedup_remap_types(struct btf_dedup *d);
>
> @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d);
>   * Algorithm summary
>   * =================
>   *
> - * Algorithm completes its work in 6 separate passes:
> + * Algorithm completes its work in 7 separate passes:
>   *
>   * 1. Strings deduplication.
>   * 2. Primitive types deduplication (int, enum, fwd).
>   * 3. Struct/union types deduplication.
> - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> + * 4. Standalone fwd declarations deduplication.

Let's call this "Resolve unambiguous forward declarations", we don't
really deduplicate anything. And call the function
btf_dedup_resolve_fwds()?

> + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
>   *    protos, and const/volatile/restrict modifiers).
> - * 5. Types compaction.
> - * 6. Types remapping.
> + * 6. Types compaction.
> + * 7. Types remapping.
>   *
>   * Algorithm determines canonical type descriptor, which is a single
>   * representative type for each truly unique type. This canonical type is the
> @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
>                 pr_debug("btf_dedup_struct_types failed:%d\n", err);
>                 goto done;
>         }
> +       err = btf_dedup_standalone_fwds(d);
> +       if (err < 0) {
> +               pr_debug("btf_dedup_standalone_fwd failed:%d\n", err);
> +               goto done;
> +       }
>         err = btf_dedup_ref_types(d);
>         if (err < 0) {
>                 pr_debug("btf_dedup_ref_types failed:%d\n", err);
> @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
>         return 0;
>  }
>
> +/*
> + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32).
> + *
> + * The __u32 key/value representations are cast to `void *` before passing
> + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced.
> + *
> + */
> +static struct hashmap *name_off_map__new(void)
> +{
> +       return hashmap__new(btf_dedup_identity_hash_fn,
> +                           btf_dedup_equal_fn,
> +                           NULL);
> +}

is there a point in name_off_map__new and name_off_map__find wrappers
except to add one extra function to jump through when reading the
code? If you look at other uses of hashmaps in this file, we use the
directly. Let's drop those.

> +
> +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id)
> +{
> +       /* This has to be sizeof(void *) in order to be passed to hashmap__find */
> +       void *tmp;
> +       int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp);

but this (void *) casting everything was an error in API design, mea
culpa. I've been wanting to switch hashmap to use long as key/value
type for a long while, maybe let's do it now, as we are adding even
more code that looks weird? It seems like accepting long will make
hashmap API usage cleaner in most cases. There are not a lot of places
where we use hashmap APIs in libbpf, but we'll also need to fix up
bpftool usage, and I believe perf copy/pasted hashmap.h (cc Arnaldo),
so we'd need to make sure to not break all that. But good thing it's
all in the same repo and we can convert them at the same time with no
breakage.

WDYT?

> +       /*
> +        * __u64 cast is necessary to avoid pointer to integer conversion size warning.
> +        * It is fine to get rid of this warning as `void *` is used as an integer value.
> +        */
> +       if (found)
> +               *type_id = (__u64)tmp;
> +       return found;
> +}
> +
> +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id)
> +{
> +       return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id,
> +                           NULL, NULL);
> +}

this function will also be completely unnecessary with longs

> +
> +/*
> + * Collect a `name_off_map` that maps type names to type ids for all
> + * canonical structs and unions. If the same name is shared by several
> + * canonical types use a special value 0 to indicate this fact.
> + */
> +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> +{
> +       int i, err = 0;
> +       __u32 type_id, collision_id;
> +       __u16 kind;
> +       struct btf_type *t;
> +
> +       for (i = 0; i < d->btf->nr_types; i++) {
> +               type_id = d->btf->start_id + i;
> +               t = btf_type_by_id(d->btf, type_id);
> +               kind = btf_kind(t);
> +
> +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> +                       continue;

let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0

> +
> +               /* Skip non-canonical types */
> +               if (type_id != d->map[type_id])
> +                       continue;
> +
> +               err = 0;
> +               if (name_off_map__find(names_map, t->name_off, &collision_id)) {
> +                       /* Mark non-unique names with 0 */
> +                       if (collision_id != 0 && collision_id != type_id)
> +                               err = name_off_map__set(names_map, t->name_off, 0);
> +               } else {
> +                       err = name_off_map__set(names_map, t->name_off, type_id);
> +               }

err = hashmap__add(..., t->name_off, type_id);
if (err == -EEXISTS) {
    hashmap__set(..., 0);
    return 0;
}

see comment for hashmap_insert_strategy in hashmap.h

> +
> +               if (err < 0)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static int btf_dedup_standalone_fwd(struct btf_dedup *d,
> +                                   struct hashmap *names_map,
> +                                   __u32 type_id)
> +{
> +       struct btf_type *t = btf_type_by_id(d->btf, type_id);
> +       __u16 kind = btf_kind(t);
> +       enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info);
> +

nit: don't break variables block in two parts, there shouldn't be empty lines

also please use btf_kflag(t)


> +       struct btf_type *cand_t;
> +       __u16 cand_kind;
> +       __u32 cand_id = 0;
> +
> +       if (kind != BTF_KIND_FWD)
> +               return 0;
> +
> +       /* Skip if this FWD already has a mapping */
> +       if (type_id != d->map[type_id])
> +               return 0;
> +

[...]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 03/12] libbpf: Support for BTF_DECL_TAG dump in C format
  2022-10-25 22:27 ` [RFC bpf-next 03/12] libbpf: Support for BTF_DECL_TAG dump in C format Eduard Zingerman
@ 2022-10-27 22:36   ` Andrii Nakryiko
  0 siblings, 0 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2022-10-27 22:36 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> At C level BTF_DECL_TAGs are represented as __attribute__
> declarations, e.g.:
>
> struct foo {
>         ...;
> } __attribute__((btf_decl_tag("bar")));
>
> This commit covers only decl tags attached to structs and unions.
>
> BTF doc says that BTF_DECL_TAGs should follow a target type but this
> is not enforced and tests don't honor this restriction.
> This commit uses hash table to map types to the list of decl tags.
>
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
>  tools/lib/bpf/btf_dump.c | 143 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 142 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
> index bf0cc0e986dd..9bfe2a4ae277 100644
> --- a/tools/lib/bpf/btf_dump.c
> +++ b/tools/lib/bpf/btf_dump.c
> @@ -75,6 +75,15 @@ struct btf_dump_data {
>         bool is_array_char;
>  };
>
> +/*
> + * An array of ids of BTF_DECL_TAG objects associated with a specific type.
> + */
> +struct decl_tag_array {
> +       __u16 cnt;
> +       __u16 cap;
> +       __u32 tag_ids[0];
> +};
> +
>  struct btf_dump {
>         const struct btf *btf;
>         btf_dump_printf_fn_t printf_fn;
> @@ -111,6 +120,11 @@ struct btf_dump {
>          * name occurrences
>          */
>         struct hashmap *ident_names;
> +       /*
> +        * maps type id to decl_tag_array, assume that relatively small
> +        * fraction of types has btf_decl_tag's attached
> +        */
> +       struct hashmap *decl_tags;
>         /*
>          * data for typed display; allocated if needed.
>          */
> @@ -127,6 +141,26 @@ static bool str_equal_fn(const void *a, const void *b, void *ctx)
>         return strcmp(a, b) == 0;
>  }
>
> +static size_t int_hash_fn(const void *key, void *ctx)
> +{
> +       int i;
> +       size_t h = 0;
> +       char *bytes = (char *)key;
> +
> +       for (i = 0; i < 4; ++i)
> +               h = h * 31 + bytes[i];
> +
> +       return h;
> +}

no need, you can just do what btf_dedup_identity_hash_fn() is doing
and pass int/long/size_t as is, hashmap implementation does additional
multiplicative hashing on top to find a bucket

> +
> +static bool int_equal_fn(const void *a, const void *b, void *ctx)
> +{
> +       int *ia = (int *)a;
> +       int *ib = (int *)b;
> +
> +       return *ia == *ib;
> +}

see btf_dedup_equal_fn(), no need for casting, just return a == b;

> +
>  static const char *btf_name_of(const struct btf_dump *d, __u32 name_off)
>  {
>         return btf__name_by_offset(d->btf, name_off);
> @@ -143,6 +177,7 @@ static void btf_dump_printf(const struct btf_dump *d, const char *fmt, ...)
>
>  static int btf_dump_mark_referenced(struct btf_dump *d);
>  static int btf_dump_resize(struct btf_dump *d);
> +static int btf_dump_assign_decl_tags(struct btf_dump *d);
>
>  struct btf_dump *btf_dump__new(const struct btf *btf,
>                                btf_dump_printf_fn_t printf_fn,
> @@ -179,11 +214,24 @@ struct btf_dump *btf_dump__new(const struct btf *btf,
>                 d->ident_names = NULL;
>                 goto err;
>         }
> +       d->decl_tags = hashmap__new(int_hash_fn, int_equal_fn, NULL);
> +       if (IS_ERR(d->decl_tags)) {
> +               err = PTR_ERR(d->decl_tags);
> +               d->decl_tags = NULL;
> +               goto err;
> +       }
>
>         err = btf_dump_resize(d);
>         if (err)
>                 goto err;
>
> +       err = btf_dump_assign_decl_tags(d);
> +       if (err)
> +               goto err;
> +
> +       if (err)
> +               goto err;
> +

I like the bullet-proof error checking, but checking just once should
be enough ;)

>         return d;
>  err:
>         btf_dump__free(d);
> @@ -232,7 +280,8 @@ static void btf_dump_free_names(struct hashmap *map)
>
>  void btf_dump__free(struct btf_dump *d)
>  {
> -       int i;
> +       int i, bkt;
> +       struct hashmap_entry *cur;
>
>         if (IS_ERR_OR_NULL(d))
>                 return;
> @@ -250,6 +299,9 @@ void btf_dump__free(struct btf_dump *d)
>         free(d->decl_stack);
>         btf_dump_free_names(d->type_names);
>         btf_dump_free_names(d->ident_names);
> +       hashmap__for_each_entry(d->decl_tags, cur, bkt)
> +               free(cur->value);
> +       hashmap__free(d->decl_tags);
>
>         free(d);
>  }
> @@ -373,6 +425,77 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
>         return 0;
>  }
>
> +static struct decl_tag_array *btf_dump_find_decl_tags(struct btf_dump *d, __u32 id)

do we really need this wrapper?


> +{
> +       struct decl_tag_array *decl_tags = NULL;
> +
> +       hashmap__find(d->decl_tags, &id, (void **)&decl_tags);

this &id also made me realize that this is all broken, you are
remembering random pointers in hashmap (they point onto stack, which
gets reused once this function returns; but hashmap remember it, so on
next lookup or update we are going to be reading random values in
int_equal_fn?)

you should be passing (void *)(long)id instead (and better yet let's
refactor hashmap API as I suggested in previous patch)

either I'm missing something, or this works by accident, which
suggests that tests could be improved maybe?..

> +
> +       return decl_tags;
> +}
> +
> +static struct decl_tag_array *realloc_decl_tags(struct decl_tag_array *tags, __u16 new_cap)
> +{
> +       size_t new_size = sizeof(struct decl_tag_array) + new_cap * sizeof(__u32);
> +       struct decl_tag_array *new_tags = (tags
> +                                          ? realloc(tags, new_size)
> +                                          : calloc(1, new_size));

realloc allocates if passed NULL, so no need for calloc, assuming
proper initialization

but let's use libbpf_reallocarray(), we'll waste few bytes on size_t,
but given we expect few tags, it's not a big deal

> +
> +       if (!new_tags)
> +               return NULL;
> +
> +       new_tags->cap = new_cap;
> +
> +       return new_tags;
> +}
> +
> +/*
> + * Scans all BTF objects looking for BTF_KIND_DECL_TAG entries.
> + * The id's of the entries are stored in the `btf_dump.decl_tags` table,
> + * grouped by a target type.
> + */
> +static int btf_dump_assign_decl_tags(struct btf_dump *d)
> +{
> +       int err;
> +       __u32 id;
> +       __u32 n = btf__type_cnt(d->btf);
> +       __u32 new_capacity;
> +       const struct btf_type *t;
> +       struct decl_tag_array *decl_tags;

few nits: generally, for new code try to do reverse Christmas try
style, where widest line is at the top, shortest at the bottom

but also here you can have id, new_capacity, and n on same line

and s/new_capacity/new_cap/

> +
> +       for (id = 0; id < n; id++) {

0 is VOID, we never really need to process it, just start with id = 1

> +               t = btf__type_by_id(d->btf, id);
> +
> +               if (btf_kind(t) != BTF_KIND_DECL_TAG)
> +                       continue;

if (!btf_is_decl_tag(t))
    continue;

> +
> +               decl_tags = btf_dump_find_decl_tags(d, t->type);
> +               if (!decl_tags) {
> +                       decl_tags = realloc_decl_tags(NULL, 1);
> +                       if (!decl_tags)
> +                               return -ENOMEM;
> +                       err = hashmap__insert(d->decl_tags, &t->type, decl_tags,
> +                                             HASHMAP_SET, NULL, NULL);
> +                       if (err)
> +                               return err;
> +               } else if (decl_tags->cnt == decl_tags->cap) {
> +                       new_capacity = decl_tags->cap * 2;
> +                       if (new_capacity > 0xffff)
> +                               return -ERANGE;
> +                       decl_tags = realloc_decl_tags(decl_tags, new_capacity);
> +                       if (!decl_tags)
> +                               return -ENOMEM;
> +                       decl_tags->cap = new_capacity;
> +                       err = hashmap__update(d->decl_tags, &t->type, decl_tags, NULL, NULL);
> +                       if (err)
> +                               return err;
> +               }

really, let's just use libbpf_reallocarray? I was going to suggest
libbpf_ensure_mem, but it allocates at least 16 elements, which seems
like an overkill. But also given we don't expect a lot of tags per
type, realloc()'ing with + 1 (no * 2 strategy) seems reasonable.
Modern allocators either way use differently sized buckets, so when
realloc size increment is small, allocator basically will do nothing.

> +               decl_tags->tag_ids[decl_tags->cnt++] = id;
> +       }
> +
> +       return 0;
> +}
> +
>  static int btf_dump_add_emit_queue_id(struct btf_dump *d, __u32 id)
>  {
>         __u32 *new_queue;
> @@ -899,6 +1022,23 @@ static void btf_dump_emit_bit_padding(const struct btf_dump *d,
>         }
>  }
>
> +static void btf_dump_emit_decl_tags(struct btf_dump *d, __u32 id)
> +{
> +       struct decl_tag_array *decl_tags = btf_dump_find_decl_tags(d, id);
> +       struct btf_type *decl_tag_t;
> +       const char *decl_tag_text;
> +       __u32 i;
> +
> +       if (!decl_tags)
> +               return;
> +
> +       for (i = 0; i < decl_tags->cnt; ++i) {
> +               decl_tag_t = btf_type_by_id(d->btf, decl_tags->tag_ids[i]);
> +               decl_tag_text = btf__name_by_offset(d->btf, decl_tag_t->name_off);
> +               btf_dump_printf(d, " __attribute__((btf_decl_tag(\"%s\")))", decl_tag_text);
> +       }
> +}

I'm wondering if we should anticipate that some compilers won't know
about btf_decl_tag attribute? It seems a bit off for btf_dump to worry
about this, but if we don't do something like:

#if __has_attribute(btf_decl_tag)
#define __btf_decl_tag(x) __attribute__((btf_decl_tag(#x)))
#else
#define __btf_decl_tag(x)
#endif

.
.
.

struct my_struct {
     ...
} __btf_decl_tag(awesomeness);


it will be hard for users to use resulting vmlinux.h with slightly older Clang?

> +
>  static void btf_dump_emit_struct_fwd(struct btf_dump *d, __u32 id,
>                                      const struct btf_type *t)
>  {
> @@ -964,6 +1104,7 @@ static void btf_dump_emit_struct_def(struct btf_dump *d,
>         btf_dump_printf(d, "%s}", pfx(lvl));
>         if (packed)
>                 btf_dump_printf(d, " __attribute__((packed))");
> +       btf_dump_emit_decl_tags(d, id);
>  }
>
>  static const char *missing_base_types[][2] = {
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-27 18:55     ` Yonghong Song
@ 2022-10-27 22:44       ` Yonghong Song
  2022-10-28  0:00         ` Eduard Zingerman
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2022-10-27 22:44 UTC (permalink / raw)
  To: Eduard Zingerman, bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo



On 10/27/22 11:55 AM, Yonghong Song wrote:
> 
> 
> On 10/27/22 11:43 AM, Yonghong Song wrote:
>>
>>
>> On 10/25/22 3:27 PM, Eduard Zingerman wrote:
>>> Use pahole --header_guards_db flag to enable encoding of header guard
>>> information in kernel BTF. The actual correspondence between header
>>> file and guard string is computed by the scripts/infer_header_guards.pl.
>>>
>>> The encoded header guard information could be used to restore the
>>> original guards in the vmlinux.h, e.g.:
>>>
>>>      include/uapi/linux/tcp.h:
>>>
>>>        #ifndef _UAPI_LINUX_TCP_H
>>>        #define _UAPI_LINUX_TCP_H
>>>        ...
>>>        union tcp_word_hdr {
>>>          struct tcphdr hdr;
>>>          __be32        words[5];
>>>        };
>>>        ...
>>>        #endif /* _UAPI_LINUX_TCP_H */
>>>
>>>      vmlinux.h:
>>>
>>>        ...
>>>        #ifndef _UAPI_LINUX_TCP_H
>>>
>>>        union tcp_word_hdr {
>>>          struct tcphdr hdr;
>>>          __be32 words[5];
>>>        };
>>>
>>>        #endif /* _UAPI_LINUX_TCP_H */
>>>        ...
>>>
>>> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
>>> ---
>>>   scripts/link-vmlinux.sh | 13 ++++++++++++-
>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
>>> index 918470d768e9..f57f621eda1f 100755
>>> --- a/scripts/link-vmlinux.sh
>>> +++ b/scripts/link-vmlinux.sh
>>> @@ -110,6 +110,7 @@ vmlinux_link()
>>>   gen_btf()
>>>   {
>>>       local pahole_ver
>>> +    local extra_flags
>>>       if ! [ -x "$(command -v ${PAHOLE})" ]; then
>>>           echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
>>> @@ -122,10 +123,20 @@ gen_btf()
>>>           return 1
>>>       fi
>>> +    if [ "${pahole_ver}" -ge "124" ]; then
>>> +        scripts/infer_header_guards.pl \
>>
>> We should have full path like
>>      ${srctree}/scripts/infer_header_guards.pl
>> so it can work if build directory is different from source directory.
> 
> handling arguments for infer_header_guards.pl should also take
> care of full file path.
> 
> + /home/yhs/work/bpf-next/scripts/infer_header_guards.pl include/uapi 
> include/generated/uapi arch/x86/include/uapi 
> arch/x86/include/generated/uapi
> + return 1

Also, please pay attention to bpf selftest result. I see quite a
few selftest failures with this patch set.

>>
>>> +            include/uapi \
>>> +            include/generated/uapi \
>>> +            arch/${SRCARCH}/include/uapi \
>>> +            arch/${SRCARCH}/include/generated/uapi \
>>> +            > .btf.uapi_header_guards || return 1;
>>> +        extra_flags="--header_guards_db .btf.uapi_header_guards"
>>> +    fi
>>> +
>>>       vmlinux_link ${1}
>>>       info "BTF" ${2}
>>> -    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
>>> +    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} 
>>> ${extra_flags} ${1}
>>>       # Create ${2} which contains just .BTF section but no symbols. Add
>>>       # SHF_ALLOC because .BTF will be part of the vmlinux image. 
>>> --strip-all

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 05/12] libbpf: Header guards for selected data structures in vmlinux.h
  2022-10-25 22:27 ` [RFC bpf-next 05/12] libbpf: Header guards for selected data structures in vmlinux.h Eduard Zingerman
@ 2022-10-27 22:44   ` Andrii Nakryiko
  0 siblings, 0 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2022-10-27 22:44 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> The goal of the patch is to allow usage of header files from
> `include/uapi` alongside with `vmlinux.h`. E.g. as follows:
>
>   #include <uapi/linux/tcp.h>
>   #include "vmlinux.h"
>
> This goal is achieved by adding #ifndef / #endif guards in vmlinux.h
> around definitions that originate from the `include/uapi` headers. The
> guards emitted match the guards used in the original headers.
> E.g. as follows:
>
> include/uapi/linux/tcp.h:
>
>   #ifndef _UAPI_LINUX_TCP_H
>   #define _UAPI_LINUX_TCP_H
>   ...
>   union tcp_word_hdr {
>         struct tcphdr hdr;
>         __be32        words[5];
>   };
>   ...
>   #endif /* _UAPI_LINUX_TCP_H */
>
> vmlinux.h:
>
>   ...
>   #ifndef _UAPI_LINUX_TCP_H
>
>   union tcp_word_hdr {
>         struct tcphdr hdr;
>         __be32 words[5];
>   };
>
>   #endif /* _UAPI_LINUX_TCP_H */
>   ...
>
> The problem of identifying data structures from uapi and selecting
> proper guard names is delegated to pahole. When configured pahole
> generates fake `BTF_DECL_TAG` records with header guards information.
> The fake tag is distinguished from a real tag by a prefix
> "header_guard:" in its value. These tags could be present for unions,
> structures, enums and typedefs, e.g.:
>
> [24139] STRUCT 'tcphdr' size=20 vlen=17
>   ...
> [24296] DECL_TAG 'header_guard:_UAPI_LINUX_TCP_H' type_id=24139 ...
>
> This patch adds An option `emit_header_guards` to `struct btf_dump_opts`.
> When this option is present the function `btf_dump__dump_type` emits
> header guards for top-level declarations. The header guards are
> identified by inspecting fake `BTF_DECL_TAG` records described above.

This looks like a completely arbitrary convention that libbpf has no
business knowing or caring about. I think bpftool should be emitting
these guards when generating vmlinux.h. Let's solve this somehow
differently.

>
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
>  tools/lib/bpf/btf.h      |  7 +++-
>  tools/lib/bpf/btf_dump.c | 89 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 94 insertions(+), 2 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 08/12] kbuild: Script to infer header guard values for uapi headers
  2022-10-25 22:27 ` [RFC bpf-next 08/12] kbuild: Script to infer header guard values for uapi headers Eduard Zingerman
@ 2022-10-27 22:51   ` Andrii Nakryiko
  0 siblings, 0 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2022-10-27 22:51 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> The script infers header guard defines in headers from
> include/uapi/**/*.h . E.g. header guard for the
> `include/uapi/linux/tcp.h` is `_UAPI_LINUX_TCP_H`:
>
>     include/uapi/linux/tcp.h:
>
>       #ifndef _UAPI_LINUX_TCP_H
>       #define _UAPI_LINUX_TCP_H
>       ...
>       union tcp_word_hdr {
>             struct tcphdr hdr;
>             __be32        words[5];
>       };
>       ...
>       #endif /* _UAPI_LINUX_TCP_H */
>
> The output of the script could be used as an input to pahole's
> `--header_guards_db` parameter. This information is necessary to
> repeat the same header guards in the `vmlinux.h` generated from BTF.
>
> It is not possible to infer the guard names from header file names
> alone, the file content has to be analyzed. The following heuristic is
> used to infer guard for a specific file:
> - All pairs `#ifndef <candidate>` / `#define <candidate>` are collected;
> - If a unique candidate matching regex `${headername}.*_H(EADER)?` it
>   is selected;
> - If a unique candidate matching regex `_H(EADER)?_` it is selected;
> - If a unique candidate matching regex `_H(EADER)?$` it is selected;
>
> There is also a small list of headers that can't be caught by the
> rules above, 15 in total. These headers and corresponding guard values
> are listed in the `%OVERRIDES` hash table.
>

Instead of expecting naming pattern, why can't we just expect

/* some comments here */

#ifndef XXX
#define XXX
....
#endif

and extract XXX from such a pattern?

The harder part is skipping comments (but awk might help do this
easier), or we can just ignore all the lines before the first #ifndef.

WDYT?

> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
>  scripts/infer_header_guards.pl | 191 +++++++++++++++++++++++++++++++++
>  1 file changed, 191 insertions(+)
>  create mode 100755 scripts/infer_header_guards.pl
>

[...]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
                   ` (13 preceding siblings ...)
  2022-10-26 11:10 ` Alan Maguire
@ 2022-10-27 23:14 ` Andrii Nakryiko
  2022-10-28  1:33   ` Yonghong Song
  14 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2022-10-27 23:14 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Hi BPF community,
>
> AFAIK there is a long standing feature request to use kernel headers
> alongside `vmlinux.h` generated by `bpftool`. For example significant
> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> clang, unfortunately this effort was stuck due to concerns regarding C
> language semantics.
>

Maybe we should make another attempt to implement bpf_dominating_decl?
That seems like a more elegant solution than any other implemented or
discussed alternative. Yonghong, WDYT?

BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
separate series and sending them as non-RFC sooner. Those improvements
are independent of all the header guards stuff, let's get them landed
sooner.

> After some discussion with Alexei and Yonghong I'd like to request
> your comments regarding a somewhat brittle and partial solution to
> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> the generated `vmlinux.h`.
>

[...]

> Eduard Zingerman (12):
>   libbpf: Deduplicate unambigous standalone forward declarations
>   selftests/bpf: Tests for standalone forward BTF declarations
>     deduplication
>   libbpf: Support for BTF_DECL_TAG dump in C format
>   selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>   libbpf: Header guards for selected data structures in vmlinux.h
>   selftests/bpf: Tests for header guards printing in BTF dump
>   bpftool: Enable header guards generation
>   kbuild: Script to infer header guard values for uapi headers
>   kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>   selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>   selftests/bpf: Known good uapi headers for test_uapi_headers.py
>   selftests/bpf: script for infer_header_guards.pl testing
>
>  scripts/infer_header_guards.pl                | 191 +++++
>  scripts/link-vmlinux.sh                       |  13 +-
>  tools/bpf/bpftool/btf.c                       |   4 +-
>  tools/lib/bpf/btf.c                           | 178 ++++-
>  tools/lib/bpf/btf.h                           |   7 +-
>  tools/lib/bpf/btf_dump.c                      | 232 +++++-
>  .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>  tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>  .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>  .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>  .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>  .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>  .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>  13 files changed, 1816 insertions(+), 12 deletions(-)
>  create mode 100755 scripts/infer_header_guards.pl
>  create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-27 22:44       ` Yonghong Song
@ 2022-10-28  0:00         ` Eduard Zingerman
  2022-10-28  0:14           ` Mykola Lysenko
  2022-10-28  1:21           ` Yonghong Song
  0 siblings, 2 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-28  0:00 UTC (permalink / raw)
  To: Yonghong Song, bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo

On Thu, 2022-10-27 at 15:44 -0700, Yonghong Song wrote:
> 
> On 10/27/22 11:55 AM, Yonghong Song wrote:
> > 
> > 
> > On 10/27/22 11:43 AM, Yonghong Song wrote:
> > > 
> > > 
> > > On 10/25/22 3:27 PM, Eduard Zingerman wrote:
> > > > Use pahole --header_guards_db flag to enable encoding of header guard
> > > > information in kernel BTF. The actual correspondence between header
> > > > file and guard string is computed by the scripts/infer_header_guards.pl.
> > > > 
> > > > The encoded header guard information could be used to restore the
> > > > original guards in the vmlinux.h, e.g.:
> > > > 
> > > >      include/uapi/linux/tcp.h:
> > > > 
> > > >        #ifndef _UAPI_LINUX_TCP_H
> > > >        #define _UAPI_LINUX_TCP_H
> > > >        ...
> > > >        union tcp_word_hdr {
> > > >          struct tcphdr hdr;
> > > >          __be32        words[5];
> > > >        };
> > > >        ...
> > > >        #endif /* _UAPI_LINUX_TCP_H */
> > > > 
> > > >      vmlinux.h:
> > > > 
> > > >        ...
> > > >        #ifndef _UAPI_LINUX_TCP_H
> > > > 
> > > >        union tcp_word_hdr {
> > > >          struct tcphdr hdr;
> > > >          __be32 words[5];
> > > >        };
> > > > 
> > > >        #endif /* _UAPI_LINUX_TCP_H */
> > > >        ...
> > > > 
> > > > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> > > > ---
> > > >   scripts/link-vmlinux.sh | 13 ++++++++++++-
> > > >   1 file changed, 12 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > > > index 918470d768e9..f57f621eda1f 100755
> > > > --- a/scripts/link-vmlinux.sh
> > > > +++ b/scripts/link-vmlinux.sh
> > > > @@ -110,6 +110,7 @@ vmlinux_link()
> > > >   gen_btf()
> > > >   {
> > > >       local pahole_ver
> > > > +    local extra_flags
> > > >       if ! [ -x "$(command -v ${PAHOLE})" ]; then
> > > >           echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
> > > > @@ -122,10 +123,20 @@ gen_btf()
> > > >           return 1
> > > >       fi
> > > > +    if [ "${pahole_ver}" -ge "124" ]; then
> > > > +        scripts/infer_header_guards.pl \
> > > 
> > > We should have full path like
> > >      ${srctree}/scripts/infer_header_guards.pl
> > > so it can work if build directory is different from source directory.
> > 
> > handling arguments for infer_header_guards.pl should also take
> > care of full file path.
> > 
> > + /home/yhs/work/bpf-next/scripts/infer_header_guards.pl include/uapi 
> > include/generated/uapi arch/x86/include/uapi 
> > arch/x86/include/generated/uapi
> > + return 1
> 
> Also, please pay attention to bpf selftest result. I see quite a
> few selftest failures with this patch set.

Hi Yonghong,

Could you please copy-paste some of the error reports? I just re-run
the selftests locally and have test_maps, test_verifier, test_progs
and test_progs-no_alu32 passing.

Thanks,
Eduard

> 
> > > 
> > > > +            include/uapi \
> > > > +            include/generated/uapi \
> > > > +            arch/${SRCARCH}/include/uapi \
> > > > +            arch/${SRCARCH}/include/generated/uapi \
> > > > +            > .btf.uapi_header_guards || return 1;
> > > > +        extra_flags="--header_guards_db .btf.uapi_header_guards"
> > > > +    fi
> > > > +
> > > >       vmlinux_link ${1}
> > > >       info "BTF" ${2}
> > > > -    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
> > > > +    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} 
> > > > ${extra_flags} ${1}
> > > >       # Create ${2} which contains just .BTF section but no symbols. Add
> > > >       # SHF_ALLOC because .BTF will be part of the vmlinux image. 
> > > > --strip-all


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-28  0:00         ` Eduard Zingerman
@ 2022-10-28  0:14           ` Mykola Lysenko
  2022-10-28  1:23             ` Yonghong Song
  2022-10-28  1:21           ` Yonghong Song
  1 sibling, 1 reply; 46+ messages in thread
From: Mykola Lysenko @ 2022-10-28  0:14 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Mykola Lysenko, bpf, Eduard Zingerman, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Kernel Team, Yonghong Song,
	arnaldo.melo

Yonghong,

build will be failing without merged pahole changes.

> On Oct 27, 2022, at 5:00 PM, Eduard Zingerman <eddyz87@gmail.com> wrote:
> 
> On Thu, 2022-10-27 at 15:44 -0700, Yonghong Song wrote:
>> 
>> On 10/27/22 11:55 AM, Yonghong Song wrote:
>>> 
>>> 
>>> On 10/27/22 11:43 AM, Yonghong Song wrote:
>>>> 
>>>> 
>>>> On 10/25/22 3:27 PM, Eduard Zingerman wrote:
>>>>> Use pahole --header_guards_db flag to enable encoding of header guard
>>>>> information in kernel BTF. The actual correspondence between header
>>>>> file and guard string is computed by the scripts/infer_header_guards.pl.
>>>>> 
>>>>> The encoded header guard information could be used to restore the
>>>>> original guards in the vmlinux.h, e.g.:
>>>>> 
>>>>>      include/uapi/linux/tcp.h:
>>>>> 
>>>>>        #ifndef _UAPI_LINUX_TCP_H
>>>>>        #define _UAPI_LINUX_TCP_H
>>>>>        ...
>>>>>        union tcp_word_hdr {
>>>>>          struct tcphdr hdr;
>>>>>          __be32        words[5];
>>>>>        };
>>>>>        ...
>>>>>        #endif /* _UAPI_LINUX_TCP_H */
>>>>> 
>>>>>      vmlinux.h:
>>>>> 
>>>>>        ...
>>>>>        #ifndef _UAPI_LINUX_TCP_H
>>>>> 
>>>>>        union tcp_word_hdr {
>>>>>          struct tcphdr hdr;
>>>>>          __be32 words[5];
>>>>>        };
>>>>> 
>>>>>        #endif /* _UAPI_LINUX_TCP_H */
>>>>>        ...
>>>>> 
>>>>> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
>>>>> ---
>>>>>   scripts/link-vmlinux.sh | 13 ++++++++++++-
>>>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
>>>>> index 918470d768e9..f57f621eda1f 100755
>>>>> --- a/scripts/link-vmlinux.sh
>>>>> +++ b/scripts/link-vmlinux.sh
>>>>> @@ -110,6 +110,7 @@ vmlinux_link()
>>>>>   gen_btf()
>>>>>   {
>>>>>       local pahole_ver
>>>>> +    local extra_flags
>>>>>       if ! [ -x "$(command -v ${PAHOLE})" ]; then
>>>>>           echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
>>>>> @@ -122,10 +123,20 @@ gen_btf()
>>>>>           return 1
>>>>>       fi
>>>>> +    if [ "${pahole_ver}" -ge "124" ]; then
>>>>> +        scripts/infer_header_guards.pl \
>>>> 
>>>> We should have full path like
>>>>      ${srctree}/scripts/infer_header_guards.pl
>>>> so it can work if build directory is different from source directory.
>>> 
>>> handling arguments for infer_header_guards.pl should also take
>>> care of full file path.
>>> 
>>> + /home/yhs/work/bpf-next/scripts/infer_header_guards.pl include/uapi 
>>> include/generated/uapi arch/x86/include/uapi 
>>> arch/x86/include/generated/uapi
>>> + return 1
>> 
>> Also, please pay attention to bpf selftest result. I see quite a
>> few selftest failures with this patch set.
> 
> Hi Yonghong,
> 
> Could you please copy-paste some of the error reports? I just re-run
> the selftests locally and have test_maps, test_verifier, test_progs
> and test_progs-no_alu32 passing.
> 
> Thanks,
> Eduard
> 
>> 
>>>> 
>>>>> +            include/uapi \
>>>>> +            include/generated/uapi \
>>>>> +            arch/${SRCARCH}/include/uapi \
>>>>> +            arch/${SRCARCH}/include/generated/uapi \
>>>>> +            > .btf.uapi_header_guards || return 1;
>>>>> +        extra_flags="--header_guards_db .btf.uapi_header_guards"
>>>>> +    fi
>>>>> +
>>>>>       vmlinux_link ${1}
>>>>>       info "BTF" ${2}
>>>>> -    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
>>>>> +    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} 
>>>>> ${extra_flags} ${1}
>>>>>       # Create ${2} which contains just .BTF section but no symbols. Add
>>>>>       # SHF_ALLOC because .BTF will be part of the vmlinux image. 
>>>>> --strip-all


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-28  0:00         ` Eduard Zingerman
  2022-10-28  0:14           ` Mykola Lysenko
@ 2022-10-28  1:21           ` Yonghong Song
  1 sibling, 0 replies; 46+ messages in thread
From: Yonghong Song @ 2022-10-28  1:21 UTC (permalink / raw)
  To: Eduard Zingerman, bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, arnaldo.melo



On 10/27/22 5:00 PM, Eduard Zingerman wrote:
> On Thu, 2022-10-27 at 15:44 -0700, Yonghong Song wrote:
>>
>> On 10/27/22 11:55 AM, Yonghong Song wrote:
>>>
>>>
>>> On 10/27/22 11:43 AM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 10/25/22 3:27 PM, Eduard Zingerman wrote:
>>>>> Use pahole --header_guards_db flag to enable encoding of header guard
>>>>> information in kernel BTF. The actual correspondence between header
>>>>> file and guard string is computed by the scripts/infer_header_guards.pl.
>>>>>
>>>>> The encoded header guard information could be used to restore the
>>>>> original guards in the vmlinux.h, e.g.:
>>>>>
>>>>>       include/uapi/linux/tcp.h:
>>>>>
>>>>>         #ifndef _UAPI_LINUX_TCP_H
>>>>>         #define _UAPI_LINUX_TCP_H
>>>>>         ...
>>>>>         union tcp_word_hdr {
>>>>>           struct tcphdr hdr;
>>>>>           __be32        words[5];
>>>>>         };
>>>>>         ...
>>>>>         #endif /* _UAPI_LINUX_TCP_H */
>>>>>
>>>>>       vmlinux.h:
>>>>>
>>>>>         ...
>>>>>         #ifndef _UAPI_LINUX_TCP_H
>>>>>
>>>>>         union tcp_word_hdr {
>>>>>           struct tcphdr hdr;
>>>>>           __be32 words[5];
>>>>>         };
>>>>>
>>>>>         #endif /* _UAPI_LINUX_TCP_H */
>>>>>         ...
>>>>>
>>>>> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
>>>>> ---
>>>>>    scripts/link-vmlinux.sh | 13 ++++++++++++-
>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
>>>>> index 918470d768e9..f57f621eda1f 100755
>>>>> --- a/scripts/link-vmlinux.sh
>>>>> +++ b/scripts/link-vmlinux.sh
>>>>> @@ -110,6 +110,7 @@ vmlinux_link()
>>>>>    gen_btf()
>>>>>    {
>>>>>        local pahole_ver
>>>>> +    local extra_flags
>>>>>        if ! [ -x "$(command -v ${PAHOLE})" ]; then
>>>>>            echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
>>>>> @@ -122,10 +123,20 @@ gen_btf()
>>>>>            return 1
>>>>>        fi
>>>>> +    if [ "${pahole_ver}" -ge "124" ]; then
>>>>> +        scripts/infer_header_guards.pl \
>>>>
>>>> We should have full path like
>>>>       ${srctree}/scripts/infer_header_guards.pl
>>>> so it can work if build directory is different from source directory.
>>>
>>> handling arguments for infer_header_guards.pl should also take
>>> care of full file path.
>>>
>>> + /home/yhs/work/bpf-next/scripts/infer_header_guards.pl include/uapi
>>> include/generated/uapi arch/x86/include/uapi
>>> arch/x86/include/generated/uapi
>>> + return 1
>>
>> Also, please pay attention to bpf selftest result. I see quite a
>> few selftest failures with this patch set.
> 
> Hi Yonghong,
> 
> Could you please copy-paste some of the error reports? I just re-run
> the selftests locally and have test_maps, test_verifier, test_progs
> and test_progs-no_alu32 passing.

Sorry about the noise. It is my fault. My default build is out of
source tree with KBUILD_OUTPUT=<path>. Since the current patch set
won't work with it, so I build a in-tree one for vmlinux but forgot
to adjust selftest build which still has KBUILD_OUTPUT=<path> and
it caused some selftest failures. Consistently doing in-tree build
for vmlinux and selftest results in the same Success/Failure rate
with and without this patch set.

> 
> Thanks,
> Eduard
> 
>>
>>>>
>>>>> +            include/uapi \
>>>>> +            include/generated/uapi \
>>>>> +            arch/${SRCARCH}/include/uapi \
>>>>> +            arch/${SRCARCH}/include/generated/uapi \
>>>>> +            > .btf.uapi_header_guards || return 1;
>>>>> +        extra_flags="--header_guards_db .btf.uapi_header_guards"
>>>>> +    fi
>>>>> +
>>>>>        vmlinux_link ${1}
>>>>>        info "BTF" ${2}
>>>>> -    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
>>>>> +    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS}
>>>>> ${extra_flags} ${1}
>>>>>        # Create ${2} which contains just .BTF section but no symbols. Add
>>>>>        # SHF_ALLOC because .BTF will be part of the vmlinux image.
>>>>> --strip-all
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  2022-10-28  0:14           ` Mykola Lysenko
@ 2022-10-28  1:23             ` Yonghong Song
  0 siblings, 0 replies; 46+ messages in thread
From: Yonghong Song @ 2022-10-28  1:23 UTC (permalink / raw)
  To: Mykola Lysenko
  Cc: bpf, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Kernel Team, arnaldo.melo



On 10/27/22 5:14 PM, Mykola Lysenko wrote:
> Yonghong,
> 
> build will be failing without merged pahole changes.

Yes, I do have pahole changes. My failure is due to
building in-tree vmlinux (KBUILD_OUTPUT= ) since
current patch set doesn't support out-of-tree build,
and out-of-tree selftest (KBUILD_OUTPUT=<path>).
Using the same in-tree build fixed the problem.

> 
>> On Oct 27, 2022, at 5:00 PM, Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> On Thu, 2022-10-27 at 15:44 -0700, Yonghong Song wrote:
>>>
>>> On 10/27/22 11:55 AM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 10/27/22 11:43 AM, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 10/25/22 3:27 PM, Eduard Zingerman wrote:
>>>>>> Use pahole --header_guards_db flag to enable encoding of header guard
>>>>>> information in kernel BTF. The actual correspondence between header
>>>>>> file and guard string is computed by the scripts/infer_header_guards.pl.
>>>>>>
>>>>>> The encoded header guard information could be used to restore the
>>>>>> original guards in the vmlinux.h, e.g.:
>>>>>>
>>>>>>       include/uapi/linux/tcp.h:
>>>>>>
>>>>>>         #ifndef _UAPI_LINUX_TCP_H
>>>>>>         #define _UAPI_LINUX_TCP_H
>>>>>>         ...
>>>>>>         union tcp_word_hdr {
>>>>>>           struct tcphdr hdr;
>>>>>>           __be32        words[5];
>>>>>>         };
>>>>>>         ...
>>>>>>         #endif /* _UAPI_LINUX_TCP_H */
>>>>>>
>>>>>>       vmlinux.h:
>>>>>>
>>>>>>         ...
>>>>>>         #ifndef _UAPI_LINUX_TCP_H
>>>>>>
>>>>>>         union tcp_word_hdr {
>>>>>>           struct tcphdr hdr;
>>>>>>           __be32 words[5];
>>>>>>         };
>>>>>>
>>>>>>         #endif /* _UAPI_LINUX_TCP_H */
>>>>>>         ...
>>>>>>
>>>>>> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
>>>>>> ---
>>>>>>    scripts/link-vmlinux.sh | 13 ++++++++++++-
>>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
>>>>>> index 918470d768e9..f57f621eda1f 100755
>>>>>> --- a/scripts/link-vmlinux.sh
>>>>>> +++ b/scripts/link-vmlinux.sh
>>>>>> @@ -110,6 +110,7 @@ vmlinux_link()
>>>>>>    gen_btf()
>>>>>>    {
>>>>>>        local pahole_ver
>>>>>> +    local extra_flags
>>>>>>        if ! [ -x "$(command -v ${PAHOLE})" ]; then
>>>>>>            echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available"
>>>>>> @@ -122,10 +123,20 @@ gen_btf()
>>>>>>            return 1
>>>>>>        fi
>>>>>> +    if [ "${pahole_ver}" -ge "124" ]; then
>>>>>> +        scripts/infer_header_guards.pl \
>>>>>
>>>>> We should have full path like
>>>>>       ${srctree}/scripts/infer_header_guards.pl
>>>>> so it can work if build directory is different from source directory.
>>>>
>>>> handling arguments for infer_header_guards.pl should also take
>>>> care of full file path.
>>>>
>>>> + /home/yhs/work/bpf-next/scripts/infer_header_guards.pl include/uapi
>>>> include/generated/uapi arch/x86/include/uapi
>>>> arch/x86/include/generated/uapi
>>>> + return 1
>>>
>>> Also, please pay attention to bpf selftest result. I see quite a
>>> few selftest failures with this patch set.
>>
>> Hi Yonghong,
>>
>> Could you please copy-paste some of the error reports? I just re-run
>> the selftests locally and have test_maps, test_verifier, test_progs
>> and test_progs-no_alu32 passing.
>>
>> Thanks,
>> Eduard
>>
>>>
>>>>>
>>>>>> +            include/uapi \
>>>>>> +            include/generated/uapi \
>>>>>> +            arch/${SRCARCH}/include/uapi \
>>>>>> +            arch/${SRCARCH}/include/generated/uapi \
>>>>>> +            > .btf.uapi_header_guards || return 1;
>>>>>> +        extra_flags="--header_guards_db .btf.uapi_header_guards"
>>>>>> +    fi
>>>>>> +
>>>>>>        vmlinux_link ${1}
>>>>>>        info "BTF" ${2}
>>>>>> -    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
>>>>>> +    LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS}
>>>>>> ${extra_flags} ${1}
>>>>>>        # Create ${2} which contains just .BTF section but no symbols. Add
>>>>>>        # SHF_ALLOC because .BTF will be part of the vmlinux image.
>>>>>> --strip-all
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-27 23:14 ` Andrii Nakryiko
@ 2022-10-28  1:33   ` Yonghong Song
  2022-10-28 17:13     ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2022-10-28  1:33 UTC (permalink / raw)
  To: Andrii Nakryiko, Eduard Zingerman
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo



On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> Hi BPF community,
>>
>> AFAIK there is a long standing feature request to use kernel headers
>> alongside `vmlinux.h` generated by `bpftool`. For example significant
>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
>> clang, unfortunately this effort was stuck due to concerns regarding C
>> language semantics.
>>
> 
> Maybe we should make another attempt to implement bpf_dominating_decl?
> That seems like a more elegant solution than any other implemented or
> discussed alternative. Yonghong, WDYT?

I would say it would be very difficult for upstream to agree with
bpf_dominating_decl. We already have lots of discussions and we
likely won't be able to satisfy Aaron who wants us to emit
adequate diagnostics which will involve lots of other work
and he also thinks this is too far away from C standard and he
wants us to implement in a llvm/clang tool which is not what
we want.

> 
> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> separate series and sending them as non-RFC sooner. Those improvements
> are independent of all the header guards stuff, let's get them landed
> sooner.
> 
>> After some discussion with Alexei and Yonghong I'd like to request
>> your comments regarding a somewhat brittle and partial solution to
>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>> the generated `vmlinux.h`.
>>
> 
> [...]
> 
>> Eduard Zingerman (12):
>>    libbpf: Deduplicate unambigous standalone forward declarations
>>    selftests/bpf: Tests for standalone forward BTF declarations
>>      deduplication
>>    libbpf: Support for BTF_DECL_TAG dump in C format
>>    selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>    libbpf: Header guards for selected data structures in vmlinux.h
>>    selftests/bpf: Tests for header guards printing in BTF dump
>>    bpftool: Enable header guards generation
>>    kbuild: Script to infer header guard values for uapi headers
>>    kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>    selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>    selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>    selftests/bpf: script for infer_header_guards.pl testing
>>
>>   scripts/infer_header_guards.pl                | 191 +++++
>>   scripts/link-vmlinux.sh                       |  13 +-
>>   tools/bpf/bpftool/btf.c                       |   4 +-
>>   tools/lib/bpf/btf.c                           | 178 ++++-
>>   tools/lib/bpf/btf.h                           |   7 +-
>>   tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>   .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>   tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>   .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>   .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>   .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>   .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>   .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>   13 files changed, 1816 insertions(+), 12 deletions(-)
>>   create mode 100755 scripts/infer_header_guards.pl
>>   create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>   create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>   create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-28  1:33   ` Yonghong Song
@ 2022-10-28 17:13     ` Andrii Nakryiko
  2022-10-28 18:56       ` Yonghong Song
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2022-10-28 17:13 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Eduard Zingerman, bpf, ast, andrii, daniel, kernel-team, yhs,
	arnaldo.melo

On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
> > On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >>
> >> Hi BPF community,
> >>
> >> AFAIK there is a long standing feature request to use kernel headers
> >> alongside `vmlinux.h` generated by `bpftool`. For example significant
> >> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> >> clang, unfortunately this effort was stuck due to concerns regarding C
> >> language semantics.
> >>
> >
> > Maybe we should make another attempt to implement bpf_dominating_decl?
> > That seems like a more elegant solution than any other implemented or
> > discussed alternative. Yonghong, WDYT?
>
> I would say it would be very difficult for upstream to agree with
> bpf_dominating_decl. We already have lots of discussions and we
> likely won't be able to satisfy Aaron who wants us to emit
> adequate diagnostics which will involve lots of other work
> and he also thinks this is too far away from C standard and he
> wants us to implement in a llvm/clang tool which is not what
> we want.

Ok, could we change the problem to detecting if some type is defined.
Would it be possible to have something like

#if !__is_type_defined(struct abc)
struct abc {
};
#endif

I think we talked about this and there were problems with this
approach, but I don't remember details and how insurmountable the
problem is. Having a way to check whether some type is defined would
be very useful even outside of -target bpf parlance, though, so maybe
it's the problem worth attacking?

>
> >
> > BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> > separate series and sending them as non-RFC sooner. Those improvements
> > are independent of all the header guards stuff, let's get them landed
> > sooner.
> >
> >> After some discussion with Alexei and Yonghong I'd like to request
> >> your comments regarding a somewhat brittle and partial solution to
> >> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> >> the generated `vmlinux.h`.
> >>
> >
> > [...]
> >
> >> Eduard Zingerman (12):
> >>    libbpf: Deduplicate unambigous standalone forward declarations
> >>    selftests/bpf: Tests for standalone forward BTF declarations
> >>      deduplication
> >>    libbpf: Support for BTF_DECL_TAG dump in C format
> >>    selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> >>    libbpf: Header guards for selected data structures in vmlinux.h
> >>    selftests/bpf: Tests for header guards printing in BTF dump
> >>    bpftool: Enable header guards generation
> >>    kbuild: Script to infer header guard values for uapi headers
> >>    kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> >>    selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> >>    selftests/bpf: Known good uapi headers for test_uapi_headers.py
> >>    selftests/bpf: script for infer_header_guards.pl testing
> >>
> >>   scripts/infer_header_guards.pl                | 191 +++++
> >>   scripts/link-vmlinux.sh                       |  13 +-
> >>   tools/bpf/bpftool/btf.c                       |   4 +-
> >>   tools/lib/bpf/btf.c                           | 178 ++++-
> >>   tools/lib/bpf/btf.h                           |   7 +-
> >>   tools/lib/bpf/btf_dump.c                      | 232 +++++-
> >>   .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> >>   tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> >>   .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> >>   .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> >>   .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> >>   .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> >>   .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> >>   13 files changed, 1816 insertions(+), 12 deletions(-)
> >>   create mode 100755 scripts/infer_header_guards.pl
> >>   create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> >>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> >>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> >>   create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> >>   create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> >>
> >> --
> >> 2.34.1
> >>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-28 17:13     ` Andrii Nakryiko
@ 2022-10-28 18:56       ` Yonghong Song
  2022-10-28 21:35         ` Andrii Nakryiko
  2022-11-11 21:55         ` Eduard Zingerman
  0 siblings, 2 replies; 46+ messages in thread
From: Yonghong Song @ 2022-10-28 18:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Eduard Zingerman, bpf, ast, andrii, daniel, kernel-team, yhs,
	arnaldo.melo



On 10/28/22 10:13 AM, Andrii Nakryiko wrote:
> On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
>>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>
>>>> Hi BPF community,
>>>>
>>>> AFAIK there is a long standing feature request to use kernel headers
>>>> alongside `vmlinux.h` generated by `bpftool`. For example significant
>>>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
>>>> clang, unfortunately this effort was stuck due to concerns regarding C
>>>> language semantics.
>>>>
>>>
>>> Maybe we should make another attempt to implement bpf_dominating_decl?
>>> That seems like a more elegant solution than any other implemented or
>>> discussed alternative. Yonghong, WDYT?
>>
>> I would say it would be very difficult for upstream to agree with
>> bpf_dominating_decl. We already have lots of discussions and we
>> likely won't be able to satisfy Aaron who wants us to emit
>> adequate diagnostics which will involve lots of other work
>> and he also thinks this is too far away from C standard and he
>> wants us to implement in a llvm/clang tool which is not what
>> we want.
> 
> Ok, could we change the problem to detecting if some type is defined.
> Would it be possible to have something like
> 
> #if !__is_type_defined(struct abc)
> struct abc {
> };
> #endif
> 
> I think we talked about this and there were problems with this
> approach, but I don't remember details and how insurmountable the
> problem is. Having a way to check whether some type is defined would
> be very useful even outside of -target bpf parlance, though, so maybe
> it's the problem worth attacking?

Yes, we discussed this before. This will need to add additional work
in preprocessor. I just made a discussion topic in llvm discourse

https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268

Let us see whether we can get some upstream agreement or not.

> 
>>
>>>
>>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
>>> separate series and sending them as non-RFC sooner. Those improvements
>>> are independent of all the header guards stuff, let's get them landed
>>> sooner.
>>>
>>>> After some discussion with Alexei and Yonghong I'd like to request
>>>> your comments regarding a somewhat brittle and partial solution to
>>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>>>> the generated `vmlinux.h`.
>>>>
>>>
>>> [...]
>>>
>>>> Eduard Zingerman (12):
>>>>     libbpf: Deduplicate unambigous standalone forward declarations
>>>>     selftests/bpf: Tests for standalone forward BTF declarations
>>>>       deduplication
>>>>     libbpf: Support for BTF_DECL_TAG dump in C format
>>>>     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>>>     libbpf: Header guards for selected data structures in vmlinux.h
>>>>     selftests/bpf: Tests for header guards printing in BTF dump
>>>>     bpftool: Enable header guards generation
>>>>     kbuild: Script to infer header guard values for uapi headers
>>>>     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>>>     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>>>     selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>>>     selftests/bpf: script for infer_header_guards.pl testing
>>>>
>>>>    scripts/infer_header_guards.pl                | 191 +++++
>>>>    scripts/link-vmlinux.sh                       |  13 +-
>>>>    tools/bpf/bpftool/btf.c                       |   4 +-
>>>>    tools/lib/bpf/btf.c                           | 178 ++++-
>>>>    tools/lib/bpf/btf.h                           |   7 +-
>>>>    tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>>>    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>>>    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>>>    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>>>    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>>>    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>>>    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>>>    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>>>    13 files changed, 1816 insertions(+), 12 deletions(-)
>>>>    create mode 100755 scripts/infer_header_guards.pl
>>>>    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>>>
>>>> --
>>>> 2.34.1
>>>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-28 18:56       ` Yonghong Song
@ 2022-10-28 21:35         ` Andrii Nakryiko
  2022-11-01 16:01           ` Alan Maguire
  2022-11-11 21:55         ` Eduard Zingerman
  1 sibling, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2022-10-28 21:35 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Eduard Zingerman, bpf, ast, andrii, daniel, kernel-team, yhs,
	arnaldo.melo

On Fri, Oct 28, 2022 at 11:57 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 10/28/22 10:13 AM, Andrii Nakryiko wrote:
> > On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
> >>
> >>
> >>
> >> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
> >>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >>>>
> >>>> Hi BPF community,
> >>>>
> >>>> AFAIK there is a long standing feature request to use kernel headers
> >>>> alongside `vmlinux.h` generated by `bpftool`. For example significant
> >>>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> >>>> clang, unfortunately this effort was stuck due to concerns regarding C
> >>>> language semantics.
> >>>>
> >>>
> >>> Maybe we should make another attempt to implement bpf_dominating_decl?
> >>> That seems like a more elegant solution than any other implemented or
> >>> discussed alternative. Yonghong, WDYT?
> >>
> >> I would say it would be very difficult for upstream to agree with
> >> bpf_dominating_decl. We already have lots of discussions and we
> >> likely won't be able to satisfy Aaron who wants us to emit
> >> adequate diagnostics which will involve lots of other work
> >> and he also thinks this is too far away from C standard and he
> >> wants us to implement in a llvm/clang tool which is not what
> >> we want.
> >
> > Ok, could we change the problem to detecting if some type is defined.
> > Would it be possible to have something like
> >
> > #if !__is_type_defined(struct abc)
> > struct abc {
> > };
> > #endif
> >
> > I think we talked about this and there were problems with this
> > approach, but I don't remember details and how insurmountable the
> > problem is. Having a way to check whether some type is defined would
> > be very useful even outside of -target bpf parlance, though, so maybe
> > it's the problem worth attacking?
>
> Yes, we discussed this before. This will need to add additional work
> in preprocessor. I just made a discussion topic in llvm discourse
>
> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
>
> Let us see whether we can get some upstream agreement or not.
>

Thanks for starting the conversation! I'll be following along.

> >
> >>
> >>>
> >>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> >>> separate series and sending them as non-RFC sooner. Those improvements
> >>> are independent of all the header guards stuff, let's get them landed
> >>> sooner.
> >>>
> >>>> After some discussion with Alexei and Yonghong I'd like to request
> >>>> your comments regarding a somewhat brittle and partial solution to
> >>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> >>>> the generated `vmlinux.h`.
> >>>>
> >>>
> >>> [...]
> >>>
> >>>> Eduard Zingerman (12):
> >>>>     libbpf: Deduplicate unambigous standalone forward declarations
> >>>>     selftests/bpf: Tests for standalone forward BTF declarations
> >>>>       deduplication
> >>>>     libbpf: Support for BTF_DECL_TAG dump in C format
> >>>>     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> >>>>     libbpf: Header guards for selected data structures in vmlinux.h
> >>>>     selftests/bpf: Tests for header guards printing in BTF dump
> >>>>     bpftool: Enable header guards generation
> >>>>     kbuild: Script to infer header guard values for uapi headers
> >>>>     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> >>>>     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> >>>>     selftests/bpf: Known good uapi headers for test_uapi_headers.py
> >>>>     selftests/bpf: script for infer_header_guards.pl testing
> >>>>
> >>>>    scripts/infer_header_guards.pl                | 191 +++++
> >>>>    scripts/link-vmlinux.sh                       |  13 +-
> >>>>    tools/bpf/bpftool/btf.c                       |   4 +-
> >>>>    tools/lib/bpf/btf.c                           | 178 ++++-
> >>>>    tools/lib/bpf/btf.h                           |   7 +-
> >>>>    tools/lib/bpf/btf_dump.c                      | 232 +++++-
> >>>>    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> >>>>    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> >>>>    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> >>>>    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> >>>>    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> >>>>    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> >>>>    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> >>>>    13 files changed, 1816 insertions(+), 12 deletions(-)
> >>>>    create mode 100755 scripts/infer_header_guards.pl
> >>>>    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> >>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> >>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> >>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> >>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> >>>>
> >>>> --
> >>>> 2.34.1
> >>>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations
  2022-10-27 22:07   ` Andrii Nakryiko
@ 2022-10-31  1:00     ` Eduard Zingerman
  2022-10-31 15:49     ` Eduard Zingerman
  1 sibling, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-31  1:00 UTC (permalink / raw)
  To: Andrii Nakryiko, Alan Maguire
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > Deduplicate forward declarations that don't take part in type graphs
> > comparisons if declaration name is unambiguous. Example:
> > 
> > CU #1:
> > 
> > struct foo;              // standalone forward declaration
> > struct foo *some_global;
> > 
> > CU #2:
> > 
> > struct foo { int x; };
> > struct foo *another_global;
> > 
> > The `struct foo` from CU #1 is not a part of any definition that is
> > compared against another definition while `btf_dedup_struct_types`
> > processes structural types. The the BTF after `btf_dedup_struct_types`
> > the BTF looks as follows:
> > 
> > [1] STRUCT 'foo' size=4 vlen=1 ...
> > [2] INT 'int' size=4 ...
> > [3] PTR '(anon)' type_id=1
> > [4] FWD 'foo' fwd_kind=struct
> > [5] PTR '(anon)' type_id=4
> > 
> > This commit adds a new pass `btf_dedup_standalone_fwds`, that maps
> > such forward declarations to structs or unions with identical name in
> > case if the name is not ambiguous.
> > 
> > The pass is positioned before `btf_dedup_ref_types` so that types
> > [3] and [5] could be merged as a same type after [1] and [4] are merged.
> > The final result for the example above looks as follows:
> > 
> > [1] STRUCT 'foo' size=4 vlen=1
> >         'x' type_id=2 bits_offset=0
> > [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> > [3] PTR '(anon)' type_id=1
> > 
> > For defconfig kernel with BTF enabled this removes 63 forward
> > declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
> > The running time of `btf__dedup` function is increased by about 3%.
> > 
> 
> What about modules, can you share stats for module BTFs?
> 
> Also cc Alan as he was looking at BTF dedup improvements for kernel
> module BTF dedup.
> 
> > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> > ---
> >  tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 174 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index d88647da2c7f..c34c68d8e8a0 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d);
> >  static int btf_dedup_prim_types(struct btf_dedup *d);
> >  static int btf_dedup_struct_types(struct btf_dedup *d);
> >  static int btf_dedup_ref_types(struct btf_dedup *d);
> > +static int btf_dedup_standalone_fwds(struct btf_dedup *d);
> >  static int btf_dedup_compact_types(struct btf_dedup *d);
> >  static int btf_dedup_remap_types(struct btf_dedup *d);
> > 
> > @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d);
> >   * Algorithm summary
> >   * =================
> >   *
> > - * Algorithm completes its work in 6 separate passes:
> > + * Algorithm completes its work in 7 separate passes:
> >   *
> >   * 1. Strings deduplication.
> >   * 2. Primitive types deduplication (int, enum, fwd).
> >   * 3. Struct/union types deduplication.
> > - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> > + * 4. Standalone fwd declarations deduplication.
> 
> Let's call this "Resolve unambiguous forward declarations", we don't
> really deduplicate anything. And call the function
> btf_dedup_resolve_fwds()?
> 
> > + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> >   *    protos, and const/volatile/restrict modifiers).
> > - * 5. Types compaction.
> > - * 6. Types remapping.
> > + * 6. Types compaction.
> > + * 7. Types remapping.
> >   *
> >   * Algorithm determines canonical type descriptor, which is a single
> >   * representative type for each truly unique type. This canonical type is the
> > @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
> >                 pr_debug("btf_dedup_struct_types failed:%d\n", err);
> >                 goto done;
> >         }
> > +       err = btf_dedup_standalone_fwds(d);
> > +       if (err < 0) {
> > +               pr_debug("btf_dedup_standalone_fwd failed:%d\n", err);
> > +               goto done;
> > +       }
> >         err = btf_dedup_ref_types(d);
> >         if (err < 0) {
> >                 pr_debug("btf_dedup_ref_types failed:%d\n", err);
> > @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
> >         return 0;
> >  }
> > 
> > +/*
> > + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32).
> > + *
> > + * The __u32 key/value representations are cast to `void *` before passing
> > + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced.
> > + *
> > + */
> > +static struct hashmap *name_off_map__new(void)
> > +{
> > +       return hashmap__new(btf_dedup_identity_hash_fn,
> > +                           btf_dedup_equal_fn,
> > +                           NULL);
> > +}
> 
> is there a point in name_off_map__new and name_off_map__find wrappers
> except to add one extra function to jump through when reading the
> code? If you look at other uses of hashmaps in this file, we use the
> directly. Let's drop those.
> 
> > +
> > +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id)
> > +{
> > +       /* This has to be sizeof(void *) in order to be passed to hashmap__find */
> > +       void *tmp;
> > +       int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp);
> 
> but this (void *) casting everything was an error in API design, mea
> culpa. I've been wanting to switch hashmap to use long as key/value
> type for a long while, maybe let's do it now, as we are adding even
> more code that looks weird? It seems like accepting long will make
> hashmap API usage cleaner in most cases. There are not a lot of places
> where we use hashmap APIs in libbpf, but we'll also need to fix up
> bpftool usage, and I believe perf copy/pasted hashmap.h (cc Arnaldo),
> so we'd need to make sure to not break all that. But good thing it's
> all in the same repo and we can convert them at the same time with no
> breakage.
> 
> WDYT?

Well, I did the change, excluding tests it amounts to:
- 15 files changed, 114 insertions(+), 137 deletions(-);
- 45 casts removed;
- 30 casts added.

TBH, it seems like I should just use "u32_as_hash_field" and be done
with it. In any case I'll post this as a part of v1 series for
"libbpf: Resolve unambigous forward declarations".

To account for a case when map has to store pointers and pointers are
32 bit I chose to update the map interface to be "uintptr_t -> uintptr_t".
Had it been "u64 -> u64" the additional temporary variable would be
necessary for "old" values, e.g. while working with hashmap__insert.
(Contrary to what we discussed on Friday).

> 
> > +       /*
> > +        * __u64 cast is necessary to avoid pointer to integer conversion size warning.
> > +        * It is fine to get rid of this warning as `void *` is used as an integer value.
> > +        */
> > +       if (found)
> > +               *type_id = (__u64)tmp;
> > +       return found;
> > +}
> > +
> > +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id)
> > +{
> > +       return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id,
> > +                           NULL, NULL);
> > +}
> 
> this function will also be completely unnecessary with longs
> 
> > +
> > +/*
> > + * Collect a `name_off_map` that maps type names to type ids for all
> > + * canonical structs and unions. If the same name is shared by several
> > + * canonical types use a special value 0 to indicate this fact.
> > + */
> > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> > +{
> > +       int i, err = 0;
> > +       __u32 type_id, collision_id;
> > +       __u16 kind;
> > +       struct btf_type *t;
> > +
> > +       for (i = 0; i < d->btf->nr_types; i++) {
> > +               type_id = d->btf->start_id + i;
> > +               t = btf_type_by_id(d->btf, type_id);
> > +               kind = btf_kind(t);
> > +
> > +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> > +                       continue;
> 
> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0
> 
> > +
> > +               /* Skip non-canonical types */
> > +               if (type_id != d->map[type_id])
> > +                       continue;
> > +
> > +               err = 0;
> > +               if (name_off_map__find(names_map, t->name_off, &collision_id)) {
> > +                       /* Mark non-unique names with 0 */
> > +                       if (collision_id != 0 && collision_id != type_id)
> > +                               err = name_off_map__set(names_map, t->name_off, 0);
> > +               } else {
> > +                       err = name_off_map__set(names_map, t->name_off, type_id);
> > +               }
> 
> err = hashmap__add(..., t->name_off, type_id);
> if (err == -EEXISTS) {
>     hashmap__set(..., 0);
>     return 0;
> }
> 
> see comment for hashmap_insert_strategy in hashmap.h
> 
> > +
> > +               if (err < 0)
> > +                       return err;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static int btf_dedup_standalone_fwd(struct btf_dedup *d,
> > +                                   struct hashmap *names_map,
> > +                                   __u32 type_id)
> > +{
> > +       struct btf_type *t = btf_type_by_id(d->btf, type_id);
> > +       __u16 kind = btf_kind(t);
> > +       enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info);
> > +
> 
> nit: don't break variables block in two parts, there shouldn't be empty lines
> 
> also please use btf_kflag(t)
> 
> 
> > +       struct btf_type *cand_t;
> > +       __u16 cand_kind;
> > +       __u32 cand_id = 0;
> > +
> > +       if (kind != BTF_KIND_FWD)
> > +               return 0;
> > +
> > +       /* Skip if this FWD already has a mapping */
> > +       if (type_id != d->map[type_id])
> > +               return 0;
> > +
> 
> [...]



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations
  2022-10-27 22:07   ` Andrii Nakryiko
  2022-10-31  1:00     ` Eduard Zingerman
@ 2022-10-31 15:49     ` Eduard Zingerman
  2022-11-01 17:08       ` Alan Maguire
  1 sibling, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-10-31 15:49 UTC (permalink / raw)
  To: Andrii Nakryiko, Alan Maguire
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
[...] 
> > +
> > +/*
> > + * Collect a `name_off_map` that maps type names to type ids for all
> > + * canonical structs and unions. If the same name is shared by several
> > + * canonical types use a special value 0 to indicate this fact.
> > + */
> > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> > +{
> > +       int i, err = 0;
> > +       __u32 type_id, collision_id;
> > +       __u16 kind;
> > +       struct btf_type *t;
> > +
> > +       for (i = 0; i < d->btf->nr_types; i++) {
> > +               type_id = d->btf->start_id + i;
> > +               t = btf_type_by_id(d->btf, type_id);
> > +               kind = btf_kind(t);
> > +
> > +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> > +                       continue;
> 
> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0

Interestingly this is necessary only for mixed enum / enum64 case.
Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type:

	case BTF_KIND_ENUM:
		h = btf_hash_enum(t);
		for_each_dedup_cand(d, hash_entry, h) {
			cand_id = (__u32)(long)hash_entry->value;
			cand = btf_type_by_id(d->btf, cand_id);
			if (btf_equal_enum(t, cand)) {
				new_id = cand_id;
				break;
			}
			if (btf_compat_enum(t, cand)) {
				if (btf_is_enum_fwd(t)) {
					/* resolve fwd to full enum */
					new_id = cand_id;
					break;
				}
				/* resolve canonical enum fwd to full enum */
				d->map[cand_id] = type_id;
			}
		}
		break;
    // ... similar logic for ENUM64 ...

- btf_hash_enum ignores vlen when hashing;
- btf_compat_enum compares only names and sizes.

So, if forward and main declaration kinds match (either BTF_KIND_ENUM
or BTF_KIND_ENUM64) the forward declaration would be removed. But if
the kinds are different the forward declaration would remain. E.g.:

CU #1:
enum foo;
enum foo *a;

CU #2:
enum foo { x = 0xfffffffff };
enum foo *b;

BTF:
[1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
	'x' val=68719476735ULL
[2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[3] PTR '(anon)' type_id=1
[4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
[5] PTR '(anon)' type_id=4

BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum
forward declarations are not. So it would be incorrect to add enum
forward declaration unification logic to btf_dedup_resolve_fwds,
because the following case would not be covered:

CU #1:
enum foo;
struct s { enum foo *a; } *a;

CU #2:
enum foo { x = 0xfffffffff };
struct s { enum foo *a; } *b;

Currently STRUCTs 's' are not de-duplicated.

I think that btf_dedup_prim_type should be adjusted to handle this case.

[...] 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-28 21:35         ` Andrii Nakryiko
@ 2022-11-01 16:01           ` Alan Maguire
  2022-11-01 18:35             ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Alan Maguire @ 2022-11-01 16:01 UTC (permalink / raw)
  To: Andrii Nakryiko, Yonghong Song
  Cc: Eduard Zingerman, bpf, ast, andrii, daniel, kernel-team, yhs,
	arnaldo.melo

On 28/10/2022 22:35, Andrii Nakryiko wrote:
> On Fri, Oct 28, 2022 at 11:57 AM Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 10/28/22 10:13 AM, Andrii Nakryiko wrote:
>>> On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
>>>>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>>>
>>>>>> Hi BPF community,
>>>>>>
>>>>>> AFAIK there is a long standing feature request to use kernel headers
>>>>>> alongside `vmlinux.h` generated by `bpftool`. For example significant
>>>>>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
>>>>>> clang, unfortunately this effort was stuck due to concerns regarding C
>>>>>> language semantics.
>>>>>>
>>>>>
>>>>> Maybe we should make another attempt to implement bpf_dominating_decl?
>>>>> That seems like a more elegant solution than any other implemented or
>>>>> discussed alternative. Yonghong, WDYT?
>>>>
>>>> I would say it would be very difficult for upstream to agree with
>>>> bpf_dominating_decl. We already have lots of discussions and we
>>>> likely won't be able to satisfy Aaron who wants us to emit
>>>> adequate diagnostics which will involve lots of other work
>>>> and he also thinks this is too far away from C standard and he
>>>> wants us to implement in a llvm/clang tool which is not what
>>>> we want.
>>>
>>> Ok, could we change the problem to detecting if some type is defined.
>>> Would it be possible to have something like
>>>
>>> #if !__is_type_defined(struct abc)
>>> struct abc {
>>> };
>>> #endif
>>>
>>> I think we talked about this and there were problems with this
>>> approach, but I don't remember details and how insurmountable the
>>> problem is. Having a way to check whether some type is defined would
>>> be very useful even outside of -target bpf parlance, though, so maybe
>>> it's the problem worth attacking?
>>
>> Yes, we discussed this before. This will need to add additional work
>> in preprocessor. I just made a discussion topic in llvm discourse
>>
>> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
>>
>> Let us see whether we can get some upstream agreement or not.
>>
> 
> Thanks for starting the conversation! I'll be following along.
>


I think this sort of approach assumes that vmlinux.h is included after
any uapi headers, and would guard type definitions with 

#if type_is_defined(foo)
struct foo {

};
#endif

...is that right? My concern is that the vmlinux.h definitions have
the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
definitions to dominate over UAPI definitions do you think, or does it
matter?

I was wondering if there might be yet another way to crack this;
if we did want the vmlinux.h type definitions to be authoritative
because they have the preserve access index attribute, and because
bpftool knows all vmlinux types, it could use that info to selectively
redefine those type names such that we avoid name clashes when later
including UAPI headers. Something like

#ifdef __VMLINUX_H__
//usual vmlinux.h type definitions
#endif /* __VMLINUX_H__ */

#ifdef __VMLINUX_ALIAS__
if !defined(timespec64)
#define timespec64 __VMLINUX_ALIAS__timespec64
#endif
// rest of the types define aliases here
#undef __VMLINUX_ALIAS__
#else /* unalias */
#if defined(timespec64)
#undef timespec64
#endif
// rest of types undef aliases here
#endif /* __VMLINUX_ALIAS__ */


Then the consumer does this:

#define __VMLINUX_ALIAS__
#include "vmlinux.h"
// include uapi headers
#include "vmlinux.h"

(the latter include of vmlinux.h is needed to undef all the type aliases)

I tried hacking up bpftool to support this aliasing scheme and while 
it is kind of hacky it does seem to work, aside from some issues with 
IPPROTO_* definitions - for the enumerated IPPROTO_ values linux/in.h does
this:

enum {
  IPPROTO_IP = 0,               /* Dummy protocol for TCP               */
#define IPPROTO_IP              IPPROTO_IP
  IPPROTO_ICMP = 1,             /* Internet Control Message Protocol    */
#define IPPROTO_ICMP            IPPROTO_ICMP


...so our enum value definitions for IPPROTO_ values clash with the above 
definitions. These could be individually ifdef-guarded if needed though I think.

I can send the proof-of-concept patch if it would help, I just wanted to 
check in case that might be a workable path too, since it just requires 
changes to bpftool (and changes to in.h).

Thanks!

Alan

 
>>>
>>>>
>>>>>
>>>>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
>>>>> separate series and sending them as non-RFC sooner. Those improvements
>>>>> are independent of all the header guards stuff, let's get them landed
>>>>> sooner.
>>>>>
>>>>>> After some discussion with Alexei and Yonghong I'd like to request
>>>>>> your comments regarding a somewhat brittle and partial solution to
>>>>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>>>>>> the generated `vmlinux.h`.
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>> Eduard Zingerman (12):
>>>>>>     libbpf: Deduplicate unambigous standalone forward declarations
>>>>>>     selftests/bpf: Tests for standalone forward BTF declarations
>>>>>>       deduplication
>>>>>>     libbpf: Support for BTF_DECL_TAG dump in C format
>>>>>>     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>>>>>     libbpf: Header guards for selected data structures in vmlinux.h
>>>>>>     selftests/bpf: Tests for header guards printing in BTF dump
>>>>>>     bpftool: Enable header guards generation
>>>>>>     kbuild: Script to infer header guard values for uapi headers
>>>>>>     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>>>>>     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>>>>>     selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>>>>>     selftests/bpf: script for infer_header_guards.pl testing
>>>>>>
>>>>>>    scripts/infer_header_guards.pl                | 191 +++++
>>>>>>    scripts/link-vmlinux.sh                       |  13 +-
>>>>>>    tools/bpf/bpftool/btf.c                       |   4 +-
>>>>>>    tools/lib/bpf/btf.c                           | 178 ++++-
>>>>>>    tools/lib/bpf/btf.h                           |   7 +-
>>>>>>    tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>>>>>    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>>>>>    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>>>>>    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>>>>>    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>>>>>    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>>>>>    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>>>>>    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>>>>>    13 files changed, 1816 insertions(+), 12 deletions(-)
>>>>>>    create mode 100755 scripts/infer_header_guards.pl
>>>>>>    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>>>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations
  2022-10-31 15:49     ` Eduard Zingerman
@ 2022-11-01 17:08       ` Alan Maguire
  2022-11-01 17:37         ` Eduard Zingerman
  0 siblings, 1 reply; 46+ messages in thread
From: Alan Maguire @ 2022-11-01 17:08 UTC (permalink / raw)
  To: Eduard Zingerman, Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On 31/10/2022 15:49, Eduard Zingerman wrote:
> On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> [...] 
>>> +
>>> +/*
>>> + * Collect a `name_off_map` that maps type names to type ids for all
>>> + * canonical structs and unions. If the same name is shared by several
>>> + * canonical types use a special value 0 to indicate this fact.
>>> + */
>>> +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
>>> +{
>>> +       int i, err = 0;
>>> +       __u32 type_id, collision_id;
>>> +       __u16 kind;
>>> +       struct btf_type *t;
>>> +
>>> +       for (i = 0; i < d->btf->nr_types; i++) {
>>> +               type_id = d->btf->start_id + i;
>>> +               t = btf_type_by_id(d->btf, type_id);
>>> +               kind = btf_kind(t);
>>> +
>>> +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
>>> +                       continue;
>>
>> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0
> 
> Interestingly this is necessary only for mixed enum / enum64 case.
> Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type:
>

Ah, great catch! A forward can look like an enum to one CU but another CU can
specify values that make it an enum64.

> 	case BTF_KIND_ENUM:
> 		h = btf_hash_enum(t);
> 		for_each_dedup_cand(d, hash_entry, h) {
> 			cand_id = (__u32)(long)hash_entry->value;
> 			cand = btf_type_by_id(d->btf, cand_id);
> 			if (btf_equal_enum(t, cand)) {
> 				new_id = cand_id;
> 				break;
> 			}
> 			if (btf_compat_enum(t, cand)) {
> 				if (btf_is_enum_fwd(t)) {
> 					/* resolve fwd to full enum */
> 					new_id = cand_id;
> 					break;
> 				}
> 				/* resolve canonical enum fwd to full enum */
> 				d->map[cand_id] = type_id;
> 			}
> 		}
> 		break;
>     // ... similar logic for ENUM64 ...
> 
> - btf_hash_enum ignores vlen when hashing;
> - btf_compat_enum compares only names and sizes.
> 
> So, if forward and main declaration kinds match (either BTF_KIND_ENUM
> or BTF_KIND_ENUM64) the forward declaration would be removed. But if
> the kinds are different the forward declaration would remain. E.g.:
> 
> CU #1:
> enum foo;
> enum foo *a;
> 
> CU #2:
> enum foo { x = 0xfffffffff };
> enum foo *b;
> 
> BTF:
> [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
> 	'x' val=68719476735ULL
> [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
> [3] PTR '(anon)' type_id=1
> [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
> [5] PTR '(anon)' type_id=4
> 
> BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum
> forward declarations are not. So it would be incorrect to add enum
> forward declaration unification logic to btf_dedup_resolve_fwds,
> because the following case would not be covered:
> 
> CU #1:
> enum foo;
> struct s { enum foo *a; } *a;
> 
> CU #2:
> enum foo { x = 0xfffffffff };
> struct s { enum foo *a; } *b;
> 
> Currently STRUCTs 's' are not de-duplicated.
> 

What if CU#1 is in base BTF and CU#2 in split module BTF? I think we'd explicitly
want to avoid deduping "struct s" then since we can't be sure that it is the
same enum they are pointing at.  That's the logic we employ for structs at 
least, based upon the rationale that we can't feed back knowledge of types
from module to kernel BTF since the latter is now fixed (Andrii, do correct me
if I have this wrong). In such a case the enum is no longer standalone; it
serves the purpose of allowing us to define a pointer to a module-specific
type. We recently found some examples of this sort of thing with structs,
where the struct was defined in module BTF, making dedup fail for some core
kernel data types, but the problem was restricted to modules which _did_
define the type so wasn't a major driver of dedup failures. Not sure if
there's many (any?) enum cases of this in practice.

I suppose if we could guarantee the dedup happened within the same object
(kernel or module) we could relax this constraint though?

Alan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations
  2022-11-01 17:08       ` Alan Maguire
@ 2022-11-01 17:37         ` Eduard Zingerman
  0 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-11-01 17:37 UTC (permalink / raw)
  To: Alan Maguire, Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Tue, 2022-11-01 at 17:08 +0000, Alan Maguire wrote:
> On 31/10/2022 15:49, Eduard Zingerman wrote:
> > On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
> > > On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > [...] 
> > > > +
> > > > +/*
> > > > + * Collect a `name_off_map` that maps type names to type ids for all
> > > > + * canonical structs and unions. If the same name is shared by several
> > > > + * canonical types use a special value 0 to indicate this fact.
> > > > + */
> > > > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> > > > +{
> > > > +       int i, err = 0;
> > > > +       __u32 type_id, collision_id;
> > > > +       __u16 kind;
> > > > +       struct btf_type *t;
> > > > +
> > > > +       for (i = 0; i < d->btf->nr_types; i++) {
> > > > +               type_id = d->btf->start_id + i;
> > > > +               t = btf_type_by_id(d->btf, type_id);
> > > > +               kind = btf_kind(t);
> > > > +
> > > > +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> > > > +                       continue;
> > > 
> > > let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0
> > 
> > Interestingly this is necessary only for mixed enum / enum64 case.
> > Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type:
> > 
> 
> Ah, great catch! A forward can look like an enum to one CU but another CU can
> specify values that make it an enum64.
> 
> > 	case BTF_KIND_ENUM:
> > 		h = btf_hash_enum(t);
> > 		for_each_dedup_cand(d, hash_entry, h) {
> > 			cand_id = (__u32)(long)hash_entry->value;
> > 			cand = btf_type_by_id(d->btf, cand_id);
> > 			if (btf_equal_enum(t, cand)) {
> > 				new_id = cand_id;
> > 				break;
> > 			}
> > 			if (btf_compat_enum(t, cand)) {
> > 				if (btf_is_enum_fwd(t)) {
> > 					/* resolve fwd to full enum */
> > 					new_id = cand_id;
> > 					break;
> > 				}
> > 				/* resolve canonical enum fwd to full enum */
> > 				d->map[cand_id] = type_id;
> > 			}
> > 		}
> > 		break;
> >     // ... similar logic for ENUM64 ...
> > 
> > - btf_hash_enum ignores vlen when hashing;
> > - btf_compat_enum compares only names and sizes.
> > 
> > So, if forward and main declaration kinds match (either BTF_KIND_ENUM
> > or BTF_KIND_ENUM64) the forward declaration would be removed. But if
> > the kinds are different the forward declaration would remain. E.g.:
> > 
> > CU #1:
> > enum foo;
> > enum foo *a;
> > 
> > CU #2:
> > enum foo { x = 0xfffffffff };
> > enum foo *b;
> > 
> > BTF:
> > [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
> > 	'x' val=68719476735ULL
> > [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
> > [3] PTR '(anon)' type_id=1
> > [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
> > [5] PTR '(anon)' type_id=4
> > 
> > BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum
> > forward declarations are not. So it would be incorrect to add enum
> > forward declaration unification logic to btf_dedup_resolve_fwds,
> > because the following case would not be covered:
> > 
> > CU #1:
> > enum foo;
> > struct s { enum foo *a; } *a;
> > 
> > CU #2:
> > enum foo { x = 0xfffffffff };
> > struct s { enum foo *a; } *b;
> > 
> > Currently STRUCTs 's' are not de-duplicated.
> > 
> 
> What if CU#1 is in base BTF and CU#2 in split module BTF? I think we'd explicitly
> want to avoid deduping "struct s" then since we can't be sure that it is the
> same enum they are pointing at.  That's the logic we employ for structs at 
> least, based upon the rationale that we can't feed back knowledge of types
> from module to kernel BTF since the latter is now fixed (Andrii, do correct me
> if I have this wrong). In such a case the enum is no longer standalone; it
> serves the purpose of allowing us to define a pointer to a module-specific
> type. We recently found some examples of this sort of thing with structs,
> where the struct was defined in module BTF, making dedup fail for some core
> kernel data types, but the problem was restricted to modules which _did_
> define the type so wasn't a major driver of dedup failures. Not sure if
> there's many (any?) enum cases of this in practice.

Hi Alan,

As far as I understand the loop in `btf_dedup_prim_types` guarantees
that only ids from the split module would be remapped:

	struct btf {
    	...
		/* BTF type ID of the first type in this BTF instance:
		 *   - for base BTF it's equal to 1;
		 *   - for split BTF it's equal to biggest type ID of base BTF plus 1.
		 */
		int start_id;
    	...
	}

    ...
	for (i = 0; i < d->btf->nr_types; i++) {
		err = btf_dedup_prim_type(d, d->btf->start_id + i);
		if (err)
			return err;
	}

Thus CU1:foo won't be updated to be CU2:foo and CU1:s will not be the
same as CU2:s. Is that right or am I confused?

Thanks,
Eduard

> 
> I suppose if we could guarantee the dedup happened within the same object
> (kernel or module) we could relax this constraint though?
> 
> Alan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-11-01 16:01           ` Alan Maguire
@ 2022-11-01 18:35             ` Alexei Starovoitov
  2022-11-01 19:21               ` Eduard Zingerman
  0 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2022-11-01 18:35 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, Yonghong Song, Eduard Zingerman, bpf,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Kernel Team, Yonghong Song, Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 9:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> >> Yes, we discussed this before. This will need to add additional work
> >> in preprocessor. I just made a discussion topic in llvm discourse
> >>
> >> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268

That would be a great clang feature.
Thanks Yonghong!

> >>
> >> Let us see whether we can get some upstream agreement or not.
> >>
> >
> > Thanks for starting the conversation! I'll be following along.
> >
>
>
> I think this sort of approach assumes that vmlinux.h is included after
> any uapi headers, and would guard type definitions with
>
> #if type_is_defined(foo)
> struct foo {
>
> };
> #endif
>
> ...is that right? My concern is that the vmlinux.h definitions have
> the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
> definitions to dominate over UAPI definitions do you think, or does it
> matter?

I think it's totally fine to require #include "vmlinux.h" to be last.
The attr(preserve_access_index) is only useful for kernel internal
structs. uapi structs don't need it.

>
> I was wondering if there might be yet another way to crack this;
> if we did want the vmlinux.h type definitions to be authoritative
> because they have the preserve access index attribute, and because
> bpftool knows all vmlinux types, it could use that info to selectively
> redefine those type names such that we avoid name clashes when later
> including UAPI headers. Something like
>
> #ifdef __VMLINUX_H__
> //usual vmlinux.h type definitions
> #endif /* __VMLINUX_H__ */
>
> #ifdef __VMLINUX_ALIAS__
> if !defined(timespec64)
> #define timespec64 __VMLINUX_ALIAS__timespec64
> #endif
> // rest of the types define aliases here
> #undef __VMLINUX_ALIAS__
> #else /* unalias */
> #if defined(timespec64)
> #undef timespec64
> #endif
> // rest of types undef aliases here
> #endif /* __VMLINUX_ALIAS__ */
>
>
> Then the consumer does this:
>
> #define __VMLINUX_ALIAS__
> #include "vmlinux.h"
> // include uapi headers
> #include "vmlinux.h"
>
> (the latter include of vmlinux.h is needed to undef all the type aliases)

Sounds like a bunch of complexity for the use case that is not
clear to me.

>
> I tried hacking up bpftool to support this aliasing scheme and while
> it is kind of hacky it does seem to work, aside from some issues with
> IPPROTO_* definitions - for the enumerated IPPROTO_ values linux/in.h does
> this:
>
> enum {
>   IPPROTO_IP = 0,               /* Dummy protocol for TCP               */
> #define IPPROTO_IP              IPPROTO_IP
>   IPPROTO_ICMP = 1,             /* Internet Control Message Protocol    */
> #define IPPROTO_ICMP            IPPROTO_ICMP
>
>
> ...so our enum value definitions for IPPROTO_ values clash with the above
> definitions. These could be individually ifdef-guarded if needed though I think.

Including vmlinux.h last won't have this enum conflicts, right?

> I can send the proof-of-concept patch if it would help, I just wanted to
> check in case that might be a workable path too, since it just requires
> changes to bpftool (and changes to in.h).

I think changing the uapi header like in.h is no-go.
Touching anything in uapi is too much of a risk.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-11-01 18:35             ` Alexei Starovoitov
@ 2022-11-01 19:21               ` Eduard Zingerman
  2022-11-01 19:44                 ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-11-01 19:21 UTC (permalink / raw)
  To: Alexei Starovoitov, Alan Maguire
  Cc: Andrii Nakryiko, Yonghong Song, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Kernel Team, Yonghong Song,
	Arnaldo Carvalho de Melo

On Tue, 2022-11-01 at 11:35 -0700, Alexei Starovoitov wrote:
> On Tue, Nov 1, 2022 at 9:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > 
> > > > Yes, we discussed this before. This will need to add additional work
> > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > > 
> > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> 
> That would be a great clang feature.
> Thanks Yonghong!
> 
> > > > 
> > > > Let us see whether we can get some upstream agreement or not.
> > > > 
> > > 
> > > Thanks for starting the conversation! I'll be following along.
> > > 
> > 
> > 
> > I think this sort of approach assumes that vmlinux.h is included after
> > any uapi headers, and would guard type definitions with
> > 
> > #if type_is_defined(foo)
> > struct foo {
> > 
> > };
> > #endif
> > 
> > ...is that right? My concern is that the vmlinux.h definitions have
> > the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
> > definitions to dominate over UAPI definitions do you think, or does it
> > matter?
> 
> I think it's totally fine to require #include "vmlinux.h" to be last.
> The attr(preserve_access_index) is only useful for kernel internal
> structs. uapi structs don't need it.
> 
> > 
> > I was wondering if there might be yet another way to crack this;
> > if we did want the vmlinux.h type definitions to be authoritative
> > because they have the preserve access index attribute, and because
> > bpftool knows all vmlinux types, it could use that info to selectively
> > redefine those type names such that we avoid name clashes when later
> > including UAPI headers. Something like
> > 
> > #ifdef __VMLINUX_H__
> > //usual vmlinux.h type definitions
> > #endif /* __VMLINUX_H__ */
> > 
> > #ifdef __VMLINUX_ALIAS__
> > if !defined(timespec64)
> > #define timespec64 __VMLINUX_ALIAS__timespec64
> > #endif
> > // rest of the types define aliases here
> > #undef __VMLINUX_ALIAS__
> > #else /* unalias */
> > #if defined(timespec64)
> > #undef timespec64
> > #endif
> > // rest of types undef aliases here
> > #endif /* __VMLINUX_ALIAS__ */
> > 
> > 
> > Then the consumer does this:
> > 
> > #define __VMLINUX_ALIAS__
> > #include "vmlinux.h"
> > // include uapi headers
> > #include "vmlinux.h"
> > 
> > (the latter include of vmlinux.h is needed to undef all the type aliases)
> 
> Sounds like a bunch of complexity for the use case that is not
> clear to me.

Well, my RFC is not shy of complexity :)
What Alan suggests should solve the confilicts described in [1] or any
other confilicts of such kind.

[1] https://lore.kernel.org/bpf/999da51bdf050f155ba299500061b3eb6e0dcd0d.camel@gmail.com/


> > 
> > I tried hacking up bpftool to support this aliasing scheme and while
> > it is kind of hacky it does seem to work, aside from some issues with
> > IPPROTO_* definitions - for the enumerated IPPROTO_ values linux/in.h does
> > this:
> > 
> > enum {
> >   IPPROTO_IP = 0,               /* Dummy protocol for TCP               */
> > #define IPPROTO_IP              IPPROTO_IP
> >   IPPROTO_ICMP = 1,             /* Internet Control Message Protocol    */
> > #define IPPROTO_ICMP            IPPROTO_ICMP
> > 
> > 
> > ...so our enum value definitions for IPPROTO_ values clash with the above
> > definitions. These could be individually ifdef-guarded if needed though I think.
> 
> Including vmlinux.h last won't have this enum conflicts, right?
> 
> > I can send the proof-of-concept patch if it would help, I just wanted to
> > check in case that might be a workable path too, since it just requires
> > changes to bpftool (and changes to in.h).
> 
> I think changing the uapi header like in.h is no-go.
> Touching anything in uapi is too much of a risk.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-11-01 19:21               ` Eduard Zingerman
@ 2022-11-01 19:44                 ` Alexei Starovoitov
  0 siblings, 0 replies; 46+ messages in thread
From: Alexei Starovoitov @ 2022-11-01 19:44 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alan Maguire, Andrii Nakryiko, Yonghong Song, bpf,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Kernel Team, Yonghong Song, Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 12:21 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2022-11-01 at 11:35 -0700, Alexei Starovoitov wrote:
> > On Tue, Nov 1, 2022 at 9:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > > > Yes, we discussed this before. This will need to add additional work
> > > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > > >
> > > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> >
> > That would be a great clang feature.
> > Thanks Yonghong!
> >
> > > > >
> > > > > Let us see whether we can get some upstream agreement or not.
> > > > >
> > > >
> > > > Thanks for starting the conversation! I'll be following along.
> > > >
> > >
> > >
> > > I think this sort of approach assumes that vmlinux.h is included after
> > > any uapi headers, and would guard type definitions with
> > >
> > > #if type_is_defined(foo)
> > > struct foo {
> > >
> > > };
> > > #endif
> > >
> > > ...is that right? My concern is that the vmlinux.h definitions have
> > > the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
> > > definitions to dominate over UAPI definitions do you think, or does it
> > > matter?
> >
> > I think it's totally fine to require #include "vmlinux.h" to be last.
> > The attr(preserve_access_index) is only useful for kernel internal
> > structs. uapi structs don't need it.
> >
> > >
> > > I was wondering if there might be yet another way to crack this;
> > > if we did want the vmlinux.h type definitions to be authoritative
> > > because they have the preserve access index attribute, and because
> > > bpftool knows all vmlinux types, it could use that info to selectively
> > > redefine those type names such that we avoid name clashes when later
> > > including UAPI headers. Something like
> > >
> > > #ifdef __VMLINUX_H__
> > > //usual vmlinux.h type definitions
> > > #endif /* __VMLINUX_H__ */
> > >
> > > #ifdef __VMLINUX_ALIAS__
> > > if !defined(timespec64)
> > > #define timespec64 __VMLINUX_ALIAS__timespec64
> > > #endif
> > > // rest of the types define aliases here
> > > #undef __VMLINUX_ALIAS__
> > > #else /* unalias */
> > > #if defined(timespec64)
> > > #undef timespec64
> > > #endif
> > > // rest of types undef aliases here
> > > #endif /* __VMLINUX_ALIAS__ */
> > >
> > >
> > > Then the consumer does this:
> > >
> > > #define __VMLINUX_ALIAS__
> > > #include "vmlinux.h"
> > > // include uapi headers
> > > #include "vmlinux.h"
> > >
> > > (the latter include of vmlinux.h is needed to undef all the type aliases)
> >
> > Sounds like a bunch of complexity for the use case that is not
> > clear to me.
>
> Well, my RFC is not shy of complexity :)
> What Alan suggests should solve the confilicts described in [1] or any
> other confilicts of such kind.
>
> [1] https://lore.kernel.org/bpf/999da51bdf050f155ba299500061b3eb6e0dcd0d.camel@gmail.com/

I don't quite see how renaming all types in the 1st vmlinux.h
will help with name collisions inside enum {}.
The enums will conflict with 2nd vmlinux.h too.
Unless the proposal is to rename them as well,
but then what's the point?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-10-28 18:56       ` Yonghong Song
  2022-10-28 21:35         ` Andrii Nakryiko
@ 2022-11-11 21:55         ` Eduard Zingerman
  2022-11-14  7:52           ` Yonghong Song
  1 sibling, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-11-11 21:55 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > [...]
> > 
> > Ok, could we change the problem to detecting if some type is defined.
> > Would it be possible to have something like
> > 
> > #if !__is_type_defined(struct abc)
> > struct abc {
> > };
> > #endif
> > 
> > I think we talked about this and there were problems with this
> > approach, but I don't remember details and how insurmountable the
> > problem is. Having a way to check whether some type is defined would
> > be very useful even outside of -target bpf parlance, though, so maybe
> > it's the problem worth attacking?
> 
> Yes, we discussed this before. This will need to add additional work
> in preprocessor. I just made a discussion topic in llvm discourse
> 
> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> 
> Let us see whether we can get some upstream agreement or not.

I did a small investigation of this feature.

The main pre-requirement is construction of the symbol table during
source code pre-processing, which implies necessity to parse the
source code at the same time. It is technically possible in clang, as
lexing, pre-processing and AST construction happens at the same time
when in compilation mode.

The prototype is available here [1], it includes:
- Change in the pre-processor that adds an optional callback
  "IsTypeDefinedFn" & necessary parsing of __is_type_defined
  construct.
- Change in Sema module (responsible for parsing/AST & symbol table)
  that installs the appropriate "IsTypeDefinedFn" in the pre-processor
  instance.

However, this prototype builds a backward dependency between
pre-processor and semantic analysis. There are currently no such
dependencies in the clang code base.

This makes it impossible to do pre-processing and compilation
separately, e.g. consider the following example:

$ cat test.c

  struct foo { int x; };
  
  #if __is_type_defined(foo)
    const int x = 1;
  #else
    const int x = 2;
  #endif
  
$ clang -cc1 -ast-print test.c -o -

  struct foo {
      int x;
  };
  const int x = 1;

$ clang -E test.c -o -

  # ... some line directives ...
  struct foo { int x; };
  const int x = 2;

Note that __is_type_defined is computed to different value in the
first and second calls. This is so because semantic analysis (AST,
symbol table) is not done for -E.

It also breaks that C11 standard which clearly separates
pre-processing and semantic analysis phases, see [2] 5.1.1.2.

So, my conclusion is as follows: this is technically possible in clang
but has no chance to reach llvm upstream.

Thanks,
Eduard

[1] https://github.com/llvm/llvm-project/compare/main...eddyz87:llvm-project:is-type-defined-experiment
[2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf


> 
> > 
> > > 
> > > > 
> > > > BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> > > > separate series and sending them as non-RFC sooner. Those improvements
> > > > are independent of all the header guards stuff, let's get them landed
> > > > sooner.
> > > > 
> > > > > After some discussion with Alexei and Yonghong I'd like to request
> > > > > your comments regarding a somewhat brittle and partial solution to
> > > > > this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> > > > > the generated `vmlinux.h`.
> > > > > 
> > > > 
> > > > [...]
> > > > 
> > > > > Eduard Zingerman (12):
> > > > >     libbpf: Deduplicate unambigous standalone forward declarations
> > > > >     selftests/bpf: Tests for standalone forward BTF declarations
> > > > >       deduplication
> > > > >     libbpf: Support for BTF_DECL_TAG dump in C format
> > > > >     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> > > > >     libbpf: Header guards for selected data structures in vmlinux.h
> > > > >     selftests/bpf: Tests for header guards printing in BTF dump
> > > > >     bpftool: Enable header guards generation
> > > > >     kbuild: Script to infer header guard values for uapi headers
> > > > >     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> > > > >     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> > > > >     selftests/bpf: Known good uapi headers for test_uapi_headers.py
> > > > >     selftests/bpf: script for infer_header_guards.pl testing
> > > > > 
> > > > >    scripts/infer_header_guards.pl                | 191 +++++
> > > > >    scripts/link-vmlinux.sh                       |  13 +-
> > > > >    tools/bpf/bpftool/btf.c                       |   4 +-
> > > > >    tools/lib/bpf/btf.c                           | 178 ++++-
> > > > >    tools/lib/bpf/btf.h                           |   7 +-
> > > > >    tools/lib/bpf/btf_dump.c                      | 232 +++++-
> > > > >    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> > > > >    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> > > > >    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> > > > >    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> > > > >    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> > > > >    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> > > > >    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> > > > >    13 files changed, 1816 insertions(+), 12 deletions(-)
> > > > >    create mode 100755 scripts/infer_header_guards.pl
> > > > >    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> > > > >    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> > > > >    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> > > > >    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> > > > >    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> > > > > 
> > > > > --
> > > > > 2.34.1
> > > > > 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-11-11 21:55         ` Eduard Zingerman
@ 2022-11-14  7:52           ` Yonghong Song
  2022-11-14 21:13             ` Eduard Zingerman
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2022-11-14  7:52 UTC (permalink / raw)
  To: Eduard Zingerman, Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo



On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
>>>> [...]
>>>
>>> Ok, could we change the problem to detecting if some type is defined.
>>> Would it be possible to have something like
>>>
>>> #if !__is_type_defined(struct abc)
>>> struct abc {
>>> };
>>> #endif
>>>
>>> I think we talked about this and there were problems with this
>>> approach, but I don't remember details and how insurmountable the
>>> problem is. Having a way to check whether some type is defined would
>>> be very useful even outside of -target bpf parlance, though, so maybe
>>> it's the problem worth attacking?
>>
>> Yes, we discussed this before. This will need to add additional work
>> in preprocessor. I just made a discussion topic in llvm discourse
>>
>> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
>>
>> Let us see whether we can get some upstream agreement or not.
> 
> I did a small investigation of this feature.
> 
> The main pre-requirement is construction of the symbol table during
> source code pre-processing, which implies necessity to parse the
> source code at the same time. It is technically possible in clang, as
> lexing, pre-processing and AST construction happens at the same time
> when in compilation mode.
> 
> The prototype is available here [1], it includes:
> - Change in the pre-processor that adds an optional callback
>    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
>    construct.
> - Change in Sema module (responsible for parsing/AST & symbol table)
>    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
>    instance.
> 
> However, this prototype builds a backward dependency between
> pre-processor and semantic analysis. There are currently no such
> dependencies in the clang code base.
> 
> This makes it impossible to do pre-processing and compilation
> separately, e.g. consider the following example:
> 
> $ cat test.c
> 
>    struct foo { int x; };
>    
>    #if __is_type_defined(foo)
>      const int x = 1;
>    #else
>      const int x = 2;
>    #endif
>    
> $ clang -cc1 -ast-print test.c -o -
> 
>    struct foo {
>        int x;
>    };
>    const int x = 1;
> 
> $ clang -E test.c -o -
> 
>    # ... some line directives ...
>    struct foo { int x; };
>    const int x = 2;

Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
That is, even with -E we could do some semantics analysis
as well, using either current clang semantics analysis or creating
an minimal version of sema analysis in preprocessor itself?

> 
> Note that __is_type_defined is computed to different value in the
> first and second calls. This is so because semantic analysis (AST,
> symbol table) is not done for -E.
> 
> It also breaks that C11 standard which clearly separates
> pre-processing and semantic analysis phases, see [2] 5.1.1.2.
> 
> So, my conclusion is as follows: this is technically possible in clang
> but has no chance to reach llvm upstream.
> 
> Thanks,
> Eduard
> 
> [1] https://github.com/llvm/llvm-project/compare/main...eddyz87:llvm-project:is-type-defined-experiment
> [2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
> 
> 
>>
>>>
>>>>
>>>>>
>>>>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
>>>>> separate series and sending them as non-RFC sooner. Those improvements
>>>>> are independent of all the header guards stuff, let's get them landed
>>>>> sooner.
>>>>>
>>>>>> After some discussion with Alexei and Yonghong I'd like to request
>>>>>> your comments regarding a somewhat brittle and partial solution to
>>>>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>>>>>> the generated `vmlinux.h`.
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>> Eduard Zingerman (12):
>>>>>>      libbpf: Deduplicate unambigous standalone forward declarations
>>>>>>      selftests/bpf: Tests for standalone forward BTF declarations
>>>>>>        deduplication
>>>>>>      libbpf: Support for BTF_DECL_TAG dump in C format
>>>>>>      selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>>>>>      libbpf: Header guards for selected data structures in vmlinux.h
>>>>>>      selftests/bpf: Tests for header guards printing in BTF dump
>>>>>>      bpftool: Enable header guards generation
>>>>>>      kbuild: Script to infer header guard values for uapi headers
>>>>>>      kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>>>>>      selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>>>>>      selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>>>>>      selftests/bpf: script for infer_header_guards.pl testing
>>>>>>
>>>>>>     scripts/infer_header_guards.pl                | 191 +++++
>>>>>>     scripts/link-vmlinux.sh                       |  13 +-
>>>>>>     tools/bpf/bpftool/btf.c                       |   4 +-
>>>>>>     tools/lib/bpf/btf.c                           | 178 ++++-
>>>>>>     tools/lib/bpf/btf.h                           |   7 +-
>>>>>>     tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>>>>>     .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>>>>>     tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>>>>>     .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>>>>>     .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>>>>>     .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>>>>>     .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>>>>>     .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>>>>>     13 files changed, 1816 insertions(+), 12 deletions(-)
>>>>>>     create mode 100755 scripts/infer_header_guards.pl
>>>>>>     create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>>>>>     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>>>>>     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>>>>>     create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>>>>>     create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>>>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>>
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-11-14  7:52           ` Yonghong Song
@ 2022-11-14 21:13             ` Eduard Zingerman
  2022-11-14 21:50               ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Eduard Zingerman @ 2022-11-14 21:13 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, arnaldo.melo

On Sun, 2022-11-13 at 23:52 -0800, Yonghong Song wrote:
> 
> On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> > On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > > > [...]
> > > > 
> > > > Ok, could we change the problem to detecting if some type is defined.
> > > > Would it be possible to have something like
> > > > 
> > > > #if !__is_type_defined(struct abc)
> > > > struct abc {
> > > > };
> > > > #endif
> > > > 
> > > > I think we talked about this and there were problems with this
> > > > approach, but I don't remember details and how insurmountable the
> > > > problem is. Having a way to check whether some type is defined would
> > > > be very useful even outside of -target bpf parlance, though, so maybe
> > > > it's the problem worth attacking?
> > > 
> > > Yes, we discussed this before. This will need to add additional work
> > > in preprocessor. I just made a discussion topic in llvm discourse
> > > 
> > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> > > 
> > > Let us see whether we can get some upstream agreement or not.
> > 
> > I did a small investigation of this feature.
> > 
> > The main pre-requirement is construction of the symbol table during
> > source code pre-processing, which implies necessity to parse the
> > source code at the same time. It is technically possible in clang, as
> > lexing, pre-processing and AST construction happens at the same time
> > when in compilation mode.
> > 
> > The prototype is available here [1], it includes:
> > - Change in the pre-processor that adds an optional callback
> >    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
> >    construct.
> > - Change in Sema module (responsible for parsing/AST & symbol table)
> >    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
> >    instance.
> > 
> > However, this prototype builds a backward dependency between
> > pre-processor and semantic analysis. There are currently no such
> > dependencies in the clang code base.
> > 
> > This makes it impossible to do pre-processing and compilation
> > separately, e.g. consider the following example:
> > 
> > $ cat test.c
> > 
> >    struct foo { int x; };
> >    
> >    #if __is_type_defined(foo)
> >      const int x = 1;
> >    #else
> >      const int x = 2;
> >    #endif
> >    
> > $ clang -cc1 -ast-print test.c -o -
> > 
> >    struct foo {
> >        int x;
> >    };
> >    const int x = 1;
> > 
> > $ clang -E test.c -o -
> > 
> >    # ... some line directives ...
> >    struct foo { int x; };
> >    const int x = 2;
> 
> Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
> That is, even with -E we could do some semantics analysis
> as well, using either current clang semantics analysis or creating
> an minimal version of sema analysis in preprocessor itself?

Sema drives consumption of tokens from Preprocessor. Calls to
Preprocessor are done on a parsing recursive descent. Extracting a
stream of tokens would require an incremental parser instead.

A minimal version of such parser is possible to implement for C.
It might be the case that matching open / closing braces and
identifiers following 'struct' / 'union' / 'enum' keywords might be
almost sufficient but I need to try to be sure (e.g. it is more
complex for 'typedef').

I can work on it but I don't think there is a chance to upstream this work.

Thanks,
Eduard

> 
> > 
> > Note that __is_type_defined is computed to different value in the
> > first and second calls. This is so because semantic analysis (AST,
> > symbol table) is not done for -E.
> > 
> > It also breaks that C11 standard which clearly separates
> > pre-processing and semantic analysis phases, see [2] 5.1.1.2.
> > 
> > So, my conclusion is as follows: this is technically possible in clang
> > but has no chance to reach llvm upstream.
> > 
> > Thanks,
> > Eduard
> > 
> > [1] https://github.com/llvm/llvm-project/compare/main...eddyz87:llvm-project:is-type-defined-experiment
> > [2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
> > 
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> > > > > > separate series and sending them as non-RFC sooner. Those improvements
> > > > > > are independent of all the header guards stuff, let's get them landed
> > > > > > sooner.
> > > > > > 
> > > > > > > After some discussion with Alexei and Yonghong I'd like to request
> > > > > > > your comments regarding a somewhat brittle and partial solution to
> > > > > > > this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> > > > > > > the generated `vmlinux.h`.
> > > > > > > 
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > Eduard Zingerman (12):
> > > > > > >      libbpf: Deduplicate unambigous standalone forward declarations
> > > > > > >      selftests/bpf: Tests for standalone forward BTF declarations
> > > > > > >        deduplication
> > > > > > >      libbpf: Support for BTF_DECL_TAG dump in C format
> > > > > > >      selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> > > > > > >      libbpf: Header guards for selected data structures in vmlinux.h
> > > > > > >      selftests/bpf: Tests for header guards printing in BTF dump
> > > > > > >      bpftool: Enable header guards generation
> > > > > > >      kbuild: Script to infer header guard values for uapi headers
> > > > > > >      kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> > > > > > >      selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> > > > > > >      selftests/bpf: Known good uapi headers for test_uapi_headers.py
> > > > > > >      selftests/bpf: script for infer_header_guards.pl testing
> > > > > > > 
> > > > > > >     scripts/infer_header_guards.pl                | 191 +++++
> > > > > > >     scripts/link-vmlinux.sh                       |  13 +-
> > > > > > >     tools/bpf/bpftool/btf.c                       |   4 +-
> > > > > > >     tools/lib/bpf/btf.c                           | 178 ++++-
> > > > > > >     tools/lib/bpf/btf.h                           |   7 +-
> > > > > > >     tools/lib/bpf/btf_dump.c                      | 232 +++++-
> > > > > > >     .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> > > > > > >     tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> > > > > > >     .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> > > > > > >     .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> > > > > > >     .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> > > > > > >     .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> > > > > > >     .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> > > > > > >     13 files changed, 1816 insertions(+), 12 deletions(-)
> > > > > > >     create mode 100755 scripts/infer_header_guards.pl
> > > > > > >     create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> > > > > > >     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> > > > > > >     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> > > > > > >     create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> > > > > > >     create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> > > > > > > 
> > > > > > > --
> > > > > > > 2.34.1
> > > > > > > 
> > 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-11-14 21:13             ` Eduard Zingerman
@ 2022-11-14 21:50               ` Alexei Starovoitov
  2022-11-16  2:01                 ` Eduard Zingerman
  0 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2022-11-14 21:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Kernel Team, Yonghong Song,
	Arnaldo Carvalho de Melo

On Mon, Nov 14, 2022 at 1:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Sun, 2022-11-13 at 23:52 -0800, Yonghong Song wrote:
> >
> > On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> > > On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > > > > [...]
> > > > >
> > > > > Ok, could we change the problem to detecting if some type is defined.
> > > > > Would it be possible to have something like
> > > > >
> > > > > #if !__is_type_defined(struct abc)
> > > > > struct abc {
> > > > > };
> > > > > #endif
> > > > >
> > > > > I think we talked about this and there were problems with this
> > > > > approach, but I don't remember details and how insurmountable the
> > > > > problem is. Having a way to check whether some type is defined would
> > > > > be very useful even outside of -target bpf parlance, though, so maybe
> > > > > it's the problem worth attacking?
> > > >
> > > > Yes, we discussed this before. This will need to add additional work
> > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > >
> > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> > > >
> > > > Let us see whether we can get some upstream agreement or not.
> > >
> > > I did a small investigation of this feature.
> > >
> > > The main pre-requirement is construction of the symbol table during
> > > source code pre-processing, which implies necessity to parse the
> > > source code at the same time. It is technically possible in clang, as
> > > lexing, pre-processing and AST construction happens at the same time
> > > when in compilation mode.
> > >
> > > The prototype is available here [1], it includes:
> > > - Change in the pre-processor that adds an optional callback
> > >    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
> > >    construct.
> > > - Change in Sema module (responsible for parsing/AST & symbol table)
> > >    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
> > >    instance.
> > >
> > > However, this prototype builds a backward dependency between
> > > pre-processor and semantic analysis. There are currently no such
> > > dependencies in the clang code base.
> > >
> > > This makes it impossible to do pre-processing and compilation
> > > separately, e.g. consider the following example:
> > >
> > > $ cat test.c
> > >
> > >    struct foo { int x; };
> > >
> > >    #if __is_type_defined(foo)
> > >      const int x = 1;
> > >    #else
> > >      const int x = 2;
> > >    #endif
> > >
> > > $ clang -cc1 -ast-print test.c -o -
> > >
> > >    struct foo {
> > >        int x;
> > >    };
> > >    const int x = 1;
> > >
> > > $ clang -E test.c -o -
> > >
> > >    # ... some line directives ...
> > >    struct foo { int x; };
> > >    const int x = 2;
> >
> > Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
> > That is, even with -E we could do some semantics analysis
> > as well, using either current clang semantics analysis or creating
> > an minimal version of sema analysis in preprocessor itself?
>
> Sema drives consumption of tokens from Preprocessor. Calls to
> Preprocessor are done on a parsing recursive descent. Extracting a
> stream of tokens would require an incremental parser instead.
>
> A minimal version of such parser is possible to implement for C.
> It might be the case that matching open / closing braces and
> identifiers following 'struct' / 'union' / 'enum' keywords might be
> almost sufficient but I need to try to be sure (e.g. it is more
> complex for 'typedef').
>
> I can work on it but I don't think there is a chance to upstream this work.

Right. It's going to be C only.
C++ with namespaces and nested class decls won't work with simple
type parser.

On the other side if we're asking preprocessor to look for
'struct foo' and remember that 'foo' is a type
maybe we can add a regex-search instead?
It would be a bit more generic and will work for basic
union/struct foo definition?
Something like instead of:
#if __is_type_defined(foo)
use:
#if regex(struct[\t]+foo)

enums are harder in this approach, but higher chance to land?

regex() would mean "search for this pattern in the file until this line.

Or some other preprocessor "language" tricks?

For example:
The preprocessor would grep for 'struct *' in a single line
while processing a file and emit #define __secret_prefix_##$1
where $1 would be a capture from "single line regex".
Then later in the same file instead of:
#if __is_type_defined(foo)
use:
#ifdef __secret_prefix_foo

This "single line regex" may look like:
#if regex_in_any_later_line(struct[\t]+[a-zA-Z_]+) define __secret_prefix_$2

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h
  2022-11-14 21:50               ` Alexei Starovoitov
@ 2022-11-16  2:01                 ` Eduard Zingerman
  0 siblings, 0 replies; 46+ messages in thread
From: Eduard Zingerman @ 2022-11-16  2:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Kernel Team, Yonghong Song,
	Arnaldo Carvalho de Melo

On Mon, 2022-11-14 at 13:50 -0800, Alexei Starovoitov wrote:
> On Mon, Nov 14, 2022 at 1:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Sun, 2022-11-13 at 23:52 -0800, Yonghong Song wrote:
> > > 
> > > On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> > > > On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > > > > > [...]
> > > > > > 
> > > > > > Ok, could we change the problem to detecting if some type is defined.
> > > > > > Would it be possible to have something like
> > > > > > 
> > > > > > #if !__is_type_defined(struct abc)
> > > > > > struct abc {
> > > > > > };
> > > > > > #endif
> > > > > > 
> > > > > > I think we talked about this and there were problems with this
> > > > > > approach, but I don't remember details and how insurmountable the
> > > > > > problem is. Having a way to check whether some type is defined would
> > > > > > be very useful even outside of -target bpf parlance, though, so maybe
> > > > > > it's the problem worth attacking?
> > > > > 
> > > > > Yes, we discussed this before. This will need to add additional work
> > > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > > > 
> > > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> > > > > 
> > > > > Let us see whether we can get some upstream agreement or not.
> > > > 
> > > > I did a small investigation of this feature.
> > > > 
> > > > The main pre-requirement is construction of the symbol table during
> > > > source code pre-processing, which implies necessity to parse the
> > > > source code at the same time. It is technically possible in clang, as
> > > > lexing, pre-processing and AST construction happens at the same time
> > > > when in compilation mode.
> > > > 
> > > > The prototype is available here [1], it includes:
> > > > - Change in the pre-processor that adds an optional callback
> > > >    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
> > > >    construct.
> > > > - Change in Sema module (responsible for parsing/AST & symbol table)
> > > >    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
> > > >    instance.
> > > > 
> > > > However, this prototype builds a backward dependency between
> > > > pre-processor and semantic analysis. There are currently no such
> > > > dependencies in the clang code base.
> > > > 
> > > > This makes it impossible to do pre-processing and compilation
> > > > separately, e.g. consider the following example:
> > > > 
> > > > $ cat test.c
> > > > 
> > > >    struct foo { int x; };
> > > > 
> > > >    #if __is_type_defined(foo)
> > > >      const int x = 1;
> > > >    #else
> > > >      const int x = 2;
> > > >    #endif
> > > > 
> > > > $ clang -cc1 -ast-print test.c -o -
> > > > 
> > > >    struct foo {
> > > >        int x;
> > > >    };
> > > >    const int x = 1;
> > > > 
> > > > $ clang -E test.c -o -
> > > > 
> > > >    # ... some line directives ...
> > > >    struct foo { int x; };
> > > >    const int x = 2;
> > > 
> > > Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
> > > That is, even with -E we could do some semantics analysis
> > > as well, using either current clang semantics analysis or creating
> > > an minimal version of sema analysis in preprocessor itself?
> > 
> > Sema drives consumption of tokens from Preprocessor. Calls to
> > Preprocessor are done on a parsing recursive descent. Extracting a
> > stream of tokens would require an incremental parser instead.
> > 
> > A minimal version of such parser is possible to implement for C.
> > It might be the case that matching open / closing braces and
> > identifiers following 'struct' / 'union' / 'enum' keywords might be
> > almost sufficient but I need to try to be sure (e.g. it is more
> > complex for 'typedef').
> > 
> > I can work on it but I don't think there is a chance to upstream this work.
> 
> Right. It's going to be C only.
> C++ with namespaces and nested class decls won't work with simple
> type parser.
> 
> On the other side if we're asking preprocessor to look for
> 'struct foo' and remember that 'foo' is a type
> maybe we can add a regex-search instead?
> It would be a bit more generic and will work for basic
> union/struct foo definition?
> Something like instead of:
> #if __is_type_defined(foo)
> use:
> #if regex(struct[\t]+foo)
> 
> enums are harder in this approach, but higher chance to land?
> 
> regex() would mean "search for this pattern in the file until this line.
> 
> Or some other preprocessor "language" tricks?
> 

I talked to Yonhong today and he suggests to investigate whether pre-processor
changes could be made BPF target specific. E.g. there are extension points
in the clang pre-processor right now but those for tooling. There might be
a way to extend this mechanism to allow target specific pre-processor behavior.
I'll take a look and write another email here.

> For example:
> The preprocessor would grep for 'struct *' in a single line
> while processing a file and emit #define __secret_prefix_##$1
> where $1 would be a capture from "single line regex".
> Then later in the same file instead of:
> #if __is_type_defined(foo)
> use:
> #ifdef __secret_prefix_foo
> 
> This "single line regex" may look like:
> #if regex_in_any_later_line(struct[\t]+[a-zA-Z_]+) define __secret_prefix_$2


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-11-16  2:01 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations Eduard Zingerman
2022-10-27 22:07   ` Andrii Nakryiko
2022-10-31  1:00     ` Eduard Zingerman
2022-10-31 15:49     ` Eduard Zingerman
2022-11-01 17:08       ` Alan Maguire
2022-11-01 17:37         ` Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 02/12] selftests/bpf: Tests for standalone forward BTF declarations deduplication Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 03/12] libbpf: Support for BTF_DECL_TAG dump in C format Eduard Zingerman
2022-10-27 22:36   ` Andrii Nakryiko
2022-10-25 22:27 ` [RFC bpf-next 04/12] selftests/bpf: Tests " Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 05/12] libbpf: Header guards for selected data structures in vmlinux.h Eduard Zingerman
2022-10-27 22:44   ` Andrii Nakryiko
2022-10-25 22:27 ` [RFC bpf-next 06/12] selftests/bpf: Tests for header guards printing in BTF dump Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 07/12] bpftool: Enable header guards generation Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 08/12] kbuild: Script to infer header guard values for uapi headers Eduard Zingerman
2022-10-27 22:51   ` Andrii Nakryiko
2022-10-25 22:27 ` [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF Eduard Zingerman
2022-10-27 18:43   ` Yonghong Song
2022-10-27 18:55     ` Yonghong Song
2022-10-27 22:44       ` Yonghong Song
2022-10-28  0:00         ` Eduard Zingerman
2022-10-28  0:14           ` Mykola Lysenko
2022-10-28  1:23             ` Yonghong Song
2022-10-28  1:21           ` Yonghong Song
2022-10-25 22:27 ` [RFC bpf-next 10/12] selftests/bpf: Script to verify uapi headers usage with vmlinux.h Eduard Zingerman
2022-10-25 22:28 ` [RFC bpf-next 11/12] selftests/bpf: Known good uapi headers for test_uapi_headers.py Eduard Zingerman
2022-10-25 22:28 ` [RFC bpf-next 12/12] selftests/bpf: script for infer_header_guards.pl testing Eduard Zingerman
2022-10-25 23:46 ` [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Alexei Starovoitov
2022-10-26 22:46   ` Eduard Zingerman
2022-10-26 11:10 ` Alan Maguire
2022-10-26 23:54   ` Eduard Zingerman
2022-10-27 23:14 ` Andrii Nakryiko
2022-10-28  1:33   ` Yonghong Song
2022-10-28 17:13     ` Andrii Nakryiko
2022-10-28 18:56       ` Yonghong Song
2022-10-28 21:35         ` Andrii Nakryiko
2022-11-01 16:01           ` Alan Maguire
2022-11-01 18:35             ` Alexei Starovoitov
2022-11-01 19:21               ` Eduard Zingerman
2022-11-01 19:44                 ` Alexei Starovoitov
2022-11-11 21:55         ` Eduard Zingerman
2022-11-14  7:52           ` Yonghong Song
2022-11-14 21:13             ` Eduard Zingerman
2022-11-14 21:50               ` Alexei Starovoitov
2022-11-16  2:01                 ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).