bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/3] libbpf: Support symbol versioning for uprobe
@ 2023-09-11  1:50 Hengqi Chen
  2023-09-11  1:50 ` [PATCH bpf-next v3 1/3] libbpf: Resolve symbol conflicts at the same offset " Hengqi Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hengqi Chen @ 2023-09-11  1:50 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, alan.maguire, olsajiri, hengqi.chen

Dynamic symbols in shared library may have the same name, for example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
      706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
      2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
      2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

There are two pthread_rwlock_wrlock symbols in libc.so .dynsym section.
The one with @@ is the default version, the other is hidden.
Note that the version info is stored in .gnu.version and .gnu.version_d
sections of libc and the two symbols are at the _same_ offset.

Currently, specify `pthread_rwlock_wrlock`, `pthread_rwlock_wrlock@@GLIBC_2.34`
or `pthread_rwlock_wrlock@GLIBC_2.2.5` in bpf_uprobe_opts::func_name won't work.
Because there are two `pthread_rwlock_wrlock` in .dynsym sections without the
version suffix and both are global bind.

We could solve this by introducing symbol versioning ([0]). So that users can
specify func, func@LIB_VERSION or func@@LIB_VERSION to attach a uprobe.

This patchset resolves symbol conflicts and add symbol versioning for uprobe.
  - Patch 1 resolves symbol conflicts at the same offset
  - Patch 2 adds symbol versioning for dynsym
  - Patch 3 adds selftests for the above changes

Changes from v2:
  - Add uretprobe selfttest (Alan)
  - Check symbol exact match (Alan)
  - Fix typo (Jiri)

Changes from v1:
  - Address comments from Alan and Jiri
  - Add selftests (Someone reminds me that there is an attempt at [1]
    and part of the selftest code from Andrii is taken from there)

  [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
  [1]: https://lore.kernel.org/lkml/CAEf4BzZTrjjyyOm3ak9JsssPSh6T_ZmGd677a2rt5e5rBLUrpQ@mail.gmail.com/

Hengqi Chen (3):
  libbpf: Resolve symbol conflicts at the same offset for uprobe
  libbpf: Support symbol versioning for uprobe
  selftests/bpf: Add tests for symbol versioning for uprobe

 tools/lib/bpf/elf.c                           | 150 ++++++++++++++++--
 tools/lib/bpf/libbpf.c                        |   2 +-
 tools/testing/selftests/bpf/Makefile          |   5 +-
 .../testing/selftests/bpf/liburandom_read.map |  15 ++
 .../testing/selftests/bpf/prog_tests/uprobe.c |  95 +++++++++++
 .../testing/selftests/bpf/progs/test_uprobe.c |  61 +++++++
 tools/testing/selftests/bpf/urandom_read.c    |   9 ++
 .../testing/selftests/bpf/urandom_read_lib1.c |  41 +++++
 8 files changed, 361 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/liburandom_read.map
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_uprobe.c

--
2.34.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 1/3] libbpf: Resolve symbol conflicts at the same offset for uprobe
  2023-09-11  1:50 [PATCH bpf-next v3 0/3] libbpf: Support symbol versioning for uprobe Hengqi Chen
@ 2023-09-11  1:50 ` Hengqi Chen
  2023-09-11  1:50 ` [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning " Hengqi Chen
  2023-09-11  1:50 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add tests for " Hengqi Chen
  2 siblings, 0 replies; 10+ messages in thread
From: Hengqi Chen @ 2023-09-11  1:50 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, alan.maguire, olsajiri, hengqi.chen, Jiri Olsa

Dynamic symbols in shared library may have the same name, for example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
     706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
    2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
    2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

Currently, users can't attach a uprobe to pthread_rwlock_wrlock because
there are two symbols named pthread_rwlock_wrlock and both are global
bind. And libbpf considers it as a conflict.

Since both of them are at the same offset we could accept one of them
harmlessly. Note that we already does this in elf_resolve_syms_offsets.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
---
 tools/lib/bpf/elf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
index 9d0296c1726a..5c9e588b17da 100644
--- a/tools/lib/bpf/elf.c
+++ b/tools/lib/bpf/elf.c
@@ -214,7 +214,10 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
 
 			if (ret > 0) {
 				/* handle multiple matches */
-				if (last_bind != STB_WEAK && cur_bind != STB_WEAK) {
+				if (elf_sym_offset(sym) == ret) {
+					/* same offset, no problem */
+					continue;
+				} else if (last_bind != STB_WEAK && cur_bind != STB_WEAK) {
 					/* Only accept one non-weak bind. */
 					pr_warn("elf: ambiguous match for '%s', '%s' in '%s'\n",
 						sym->name, name, binary_path);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning for uprobe
  2023-09-11  1:50 [PATCH bpf-next v3 0/3] libbpf: Support symbol versioning for uprobe Hengqi Chen
  2023-09-11  1:50 ` [PATCH bpf-next v3 1/3] libbpf: Resolve symbol conflicts at the same offset " Hengqi Chen
@ 2023-09-11  1:50 ` Hengqi Chen
  2023-09-12 23:14   ` Andrii Nakryiko
  2023-09-11  1:50 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add tests for " Hengqi Chen
  2 siblings, 1 reply; 10+ messages in thread
From: Hengqi Chen @ 2023-09-11  1:50 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, alan.maguire, olsajiri, hengqi.chen, Jiri Olsa

In current implementation, we assume that symbol found in .dynsym section
would have a version suffix and use it to compare with symbol user supplied.
According to the spec ([0]), this assumption is incorrect, the version info
of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
of ELF objects. For example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
      706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
      2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
      2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
version suffix) in .dynsym sections.

This commit implements the symbol versioning for dynsym and allows user to
specify symbol in the following forms:
  - func
  - func@LIB_VERSION
  - func@@LIB_VERSION

In case of symbol conflicts, error out and users should resolve it by
specifying a qualified name.

  [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
---
 tools/lib/bpf/elf.c    | 146 +++++++++++++++++++++++++++++++++++++----
 tools/lib/bpf/libbpf.c |   2 +-
 2 files changed, 134 insertions(+), 14 deletions(-)

diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
index 5c9e588b17da..825da903a34c 100644
--- a/tools/lib/bpf/elf.c
+++ b/tools/lib/bpf/elf.c
@@ -1,5 +1,8 @@
 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
 #include <libelf.h>
 #include <gelf.h>
 #include <fcntl.h>
@@ -10,6 +13,17 @@
 
 #define STRERR_BUFSIZE  128
 
+/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
+ * the symbol is hidden and can only be seen when referenced using an
+ * explicit version number. This is a GNU extension.
+ */
+#define VERSYM_HIDDEN	0x8000
+
+/* This is the mask for the rest of the data in a word read from a
+ * SHT_GNU_versym section.
+ */
+#define VERSYM_VERSION	0x7fff
+
 int elf_open(const char *binary_path, struct elf_fd *elf_fd)
 {
 	char errmsg[STRERR_BUFSIZE];
@@ -64,11 +78,14 @@ struct elf_sym {
 	const char *name;
 	GElf_Sym sym;
 	GElf_Shdr sh;
+	int ver;
+	bool hidden;
 };
 
 struct elf_sym_iter {
 	Elf *elf;
 	Elf_Data *syms;
+	Elf_Data *versyms;
 	size_t nr_syms;
 	size_t strtabidx;
 	size_t next_sym_idx;
@@ -111,6 +128,18 @@ static int elf_sym_iter_new(struct elf_sym_iter *iter,
 	iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
 	iter->elf = elf;
 	iter->st_type = st_type;
+
+	/* Version symbol table is meaningful to dynsym only */
+	if (sh_type != SHT_DYNSYM)
+		return 0;
+
+	scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
+	if (!scn)
+		return 0;
+	if (!gelf_getshdr(scn, &sh))
+		return -EINVAL;
+	iter->versyms = elf_getdata(scn, 0);
+
 	return 0;
 }
 
@@ -119,6 +148,7 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
 	struct elf_sym *ret = &iter->sym;
 	GElf_Sym *sym = &ret->sym;
 	const char *name = NULL;
+	GElf_Versym versym;
 	Elf_Scn *sym_scn;
 	size_t idx;
 
@@ -138,12 +168,113 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
 
 		iter->next_sym_idx = idx + 1;
 		ret->name = name;
+		ret->ver = 0;
+		ret->hidden = false;
+
+		if (iter->versyms) {
+			if (!gelf_getversym(iter->versyms, idx, &versym))
+				continue;
+			ret->ver = versym & VERSYM_VERSION;
+			ret->hidden = versym & VERSYM_HIDDEN;
+		}
 		return ret;
 	}
 
 	return NULL;
 }
 
+static const char *elf_get_vername(Elf *elf, int ver)
+{
+	GElf_Verdaux verdaux;
+	GElf_Verdef verdef;
+	Elf_Data *verdefs;
+	size_t strtabidx;
+	GElf_Shdr sh;
+	Elf_Scn *scn;
+	int offset;
+
+	scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);
+	if (!scn)
+		return NULL;
+	if (!gelf_getshdr(scn, &sh))
+		return NULL;
+	strtabidx = sh.sh_link;
+	verdefs =  elf_getdata(scn, 0);
+
+	offset = 0;
+	while (gelf_getverdef(verdefs, offset, &verdef)) {
+		if (verdef.vd_ndx != ver) {
+			if (!verdef.vd_next)
+				break;
+
+			offset += verdef.vd_next;
+			continue;
+		}
+
+		if (!gelf_getverdaux(verdefs, offset + verdef.vd_aux, &verdaux))
+			break;
+
+		return elf_strptr(elf, strtabidx, verdaux.vda_name);
+
+	}
+	return NULL;
+}
+
+static bool symbol_match(Elf *elf, int sh_type, struct elf_sym *sym, const char *name)
+{
+	size_t name_len, sname_len;
+	bool is_name_qualified;
+	const char *ver;
+	char *sname;
+	int ret;
+
+	name_len = strlen(name);
+	/* Does name specify "@LIB" or "@@LIB" ? */
+	is_name_qualified = strstr(name, "@") != NULL;
+
+	/* If user specify a qualified name, for dynamic symbol,
+	 * it is in form of func, NOT func@LIB_VER or func@@LIB_VER.
+	 * So construct a full quailified symbol name using versym info
+	 * for comparison.
+	 */
+	if (is_name_qualified && sh_type == SHT_DYNSYM) {
+		/* Make sure func match func@LIB_VER */
+		sname_len = strlen(sym->name);
+		if (strncmp(sym->name, name, sname_len) != 0)
+			return false;
+
+		/* But not func2@LIB_VER */
+		if (name[sname_len] != '@')
+			return false;
+
+		ver = elf_get_vername(elf, sym->ver);
+		if (!ver)
+			return false;
+
+		ret = asprintf(&sname, "%s%s%s", sym->name,
+			       sym->hidden ? "@" : "@@", ver);
+		if (ret == -1)
+			return false;
+
+		sname_len = ret;
+		ret = strncmp(sname, name, sname_len);
+		free(sname);
+		return ret == 0;
+	}
+
+	/* Otherwise, for normal symbols or non-qualified names
+	 * User can specify func, func@@LIB or func@@LIB_VERSION.
+	 */
+	if (strncmp(sym->name, name, name_len) != 0)
+		return false;
+	/* ...but we don't want a search for "foo" to match 'foo2" also, so any
+	 * additional characters in sname should be of the form "@LIB" or "@@LIB".
+	 */
+	if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
+		return false;
+
+	return true;
+}
 
 /* Transform symbol's virtual address (absolute for binaries and relative
  * for shared libs) into file offset, which is what kernel is expecting
@@ -166,9 +297,8 @@ static unsigned long elf_sym_offset(struct elf_sym *sym)
 long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
 {
 	int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
-	bool is_shared_lib, is_name_qualified;
+	bool is_shared_lib;
 	long ret = -ENOENT;
-	size_t name_len;
 	GElf_Ehdr ehdr;
 
 	if (!gelf_getehdr(elf, &ehdr)) {
@@ -179,10 +309,6 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
 	/* for shared lib case, we do not need to calculate relative offset */
 	is_shared_lib = ehdr.e_type == ET_DYN;
 
-	name_len = strlen(name);
-	/* Does name specify "@@LIB"? */
-	is_name_qualified = strstr(name, "@@") != NULL;
-
 	/* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if
 	 * a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically
 	 * linked binary may not have SHT_DYMSYM, so absence of a section should not be
@@ -201,13 +327,7 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
 			goto out;
 
 		while ((sym = elf_sym_iter_next(&iter))) {
-			/* User can specify func, func@@LIB or func@@LIB_VERSION. */
-			if (strncmp(sym->name, name, name_len) != 0)
-				continue;
-			/* ...but we don't want a search for "foo" to match 'foo2" also, so any
-			 * additional characters in sname should be of the form "@@LIB".
-			 */
-			if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
+			if (!symbol_match(elf, sh_types[i], sym, name))
 				continue;
 
 			cur_bind = GELF_ST_BIND(sym->sym.st_info);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 96ff1aa4bf6a..30b8f96820a7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -11512,7 +11512,7 @@ static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf
 
 	*link = NULL;
 
-	n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.]+%li",
+	n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.@]+%li",
 		   &probe_type, &binary_path, &func_name, &offset);
 	switch (n) {
 	case 1:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 3/3] selftests/bpf: Add tests for symbol versioning for uprobe
  2023-09-11  1:50 [PATCH bpf-next v3 0/3] libbpf: Support symbol versioning for uprobe Hengqi Chen
  2023-09-11  1:50 ` [PATCH bpf-next v3 1/3] libbpf: Resolve symbol conflicts at the same offset " Hengqi Chen
  2023-09-11  1:50 ` [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning " Hengqi Chen
@ 2023-09-11  1:50 ` Hengqi Chen
  2023-09-12 23:19   ` Andrii Nakryiko
  2 siblings, 1 reply; 10+ messages in thread
From: Hengqi Chen @ 2023-09-11  1:50 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, alan.maguire, olsajiri, hengqi.chen, Jiri Olsa

This exercises the newly added dynsym symbol versioning logics.
Now we accept symbols in form of func, func@LIB_VERSION or
func@@LIB_VERSION.

The test rely on liburandom_read.so. For liburandom_read.so, we have:

    $ nm -D liburandom_read.so
                     w __cxa_finalize@GLIBC_2.17
                     w __gmon_start__
                     w _ITM_deregisterTMCloneTable
                     w _ITM_registerTMCloneTable
    0000000000000000 A LIBURANDOM_READ_1.0.0
    0000000000000000 A LIBURANDOM_READ_2.0.0
    000000000000081c T urandlib_api@@LIBURANDOM_READ_2.0.0
    0000000000000814 T urandlib_api@LIBURANDOM_READ_1.0.0
    0000000000000824 T urandlib_api_sameoffset@LIBURANDOM_READ_1.0.0
    0000000000000824 T urandlib_api_sameoffset@@LIBURANDOM_READ_2.0.0
    000000000000082c T urandlib_read_without_sema@@LIBURANDOM_READ_1.0.0
    00000000000007c4 T urandlib_read_with_sema@@LIBURANDOM_READ_1.0.0
    0000000000011018 D urandlib_read_with_sema_semaphore@@LIBURANDOM_READ_1.0.0

For `urandlib_api`, specifying `urandlib_api` will cause a conflict because
there are two symbols named urandlib_api and both are global bind.
For `urandlib_api_sameoffset`, there are also two symbols in the .so, but
both are at the same offset and essentially they refer to the same function
so no conflict.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
---
 tools/testing/selftests/bpf/Makefile          |  5 +-
 .../testing/selftests/bpf/liburandom_read.map | 15 +++
 .../testing/selftests/bpf/prog_tests/uprobe.c | 95 +++++++++++++++++++
 .../testing/selftests/bpf/progs/test_uprobe.c | 61 ++++++++++++
 tools/testing/selftests/bpf/urandom_read.c    |  9 ++
 .../testing/selftests/bpf/urandom_read_lib1.c | 41 ++++++++
 6 files changed, 224 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/liburandom_read.map
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_uprobe.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index caede9b574cb..47365161b6fc 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -196,11 +196,12 @@ endif
 
 # Filter out -static for liburandom_read.so and its dependent targets so that static builds
 # do not fail. Static builds leave urandom_read relying on system-wide shared libraries.
-$(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
+$(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c liburandom_read.map
 	$(call msg,LIB,,$@)
 	$(Q)$(CLANG) $(filter-out -static,$(CFLAGS) $(LDFLAGS))   \
-		     $^ $(filter-out -static,$(LDLIBS))	     \
+		     $(filter %.c,$^) $(filter-out -static,$(LDLIBS)) \
 		     -fuse-ld=$(LLD) -Wl,-znoseparate-code -Wl,--build-id=sha1 \
+		     -Wl,--version-script=liburandom_read.map \
 		     -fPIC -shared -o $@
 
 $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
diff --git a/tools/testing/selftests/bpf/liburandom_read.map b/tools/testing/selftests/bpf/liburandom_read.map
new file mode 100644
index 000000000000..38a97a419a04
--- /dev/null
+++ b/tools/testing/selftests/bpf/liburandom_read.map
@@ -0,0 +1,15 @@
+LIBURANDOM_READ_1.0.0 {
+	global:
+		urandlib_api;
+		urandlib_api_sameoffset;
+		urandlib_read_without_sema;
+		urandlib_read_with_sema;
+		urandlib_read_with_sema_semaphore;
+	local:
+		*;
+};
+
+LIBURANDOM_READ_2.0.0 {
+	global:
+		urandlib_api;
+} LIBURANDOM_READ_1.0.0;
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe.c b/tools/testing/selftests/bpf/prog_tests/uprobe.c
new file mode 100644
index 000000000000..cf3e0e7a64fa
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Hengqi Chen */
+
+#include <test_progs.h>
+#include "test_uprobe.skel.h"
+
+static FILE *urand_spawn(int *pid)
+{
+	FILE *f;
+
+	/* urandom_read's stdout is wired into f */
+	f = popen("./urandom_read 1 report-pid", "r");
+	if (!f)
+		return NULL;
+
+	if (fscanf(f, "%d", pid) != 1) {
+		pclose(f);
+		errno = EINVAL;
+		return NULL;
+	}
+
+	return f;
+}
+
+static int urand_trigger(FILE **urand_pipe)
+{
+	int exit_code;
+
+	/* pclose() waits for child process to exit and returns their exit code */
+	exit_code = pclose(*urand_pipe);
+	*urand_pipe = NULL;
+
+	return exit_code;
+}
+
+void test_uprobe(void)
+{
+	LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts);
+	struct test_uprobe *skel;
+	FILE *urand_pipe = NULL;
+	int urand_pid = 0, err;
+
+	skel = test_uprobe__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	urand_pipe = urand_spawn(&urand_pid);
+	if (!ASSERT_OK_PTR(urand_pipe, "urand_spawn"))
+		goto cleanup;
+
+	skel->bss->my_pid = urand_pid;
+
+	/* Manual attach uprobe to urandlib_api
+	 * There are two `urandlib_api` symbols in .dynsym section:
+	 *   - urandlib_api@LIBURANDOM_READ_1.0.0
+	 *   - urandlib_api@@LIBURANDOM_READ_2.0.0
+	 * Both are global bind and would cause a conflict if user
+	 * specify the symbol name without a version suffix
+	 */
+	uprobe_opts.func_name = "urandlib_api";
+	skel->links.test4 = bpf_program__attach_uprobe_opts(skel->progs.test4,
+							    urand_pid,
+							    "./liburandom_read.so",
+							    0 /* offset */,
+							    &uprobe_opts);
+	if (!ASSERT_ERR_PTR(skel->links.test4, "urandlib_api_attach_conflict"))
+		goto cleanup;
+
+	uprobe_opts.func_name = "urandlib_api@LIBURANDOM_READ_1.0.0";
+	skel->links.test4 = bpf_program__attach_uprobe_opts(skel->progs.test4,
+							    urand_pid,
+							    "./liburandom_read.so",
+							    0 /* offset */,
+							    &uprobe_opts);
+	if (!ASSERT_OK_PTR(skel->links.test4, "urandlib_api_attach_ok"))
+		goto cleanup;
+
+	/* Auto attach 3 u[ret]probes to urandlib_api_sameoffset */
+	err = test_uprobe__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	/* trigger urandom_read */
+	ASSERT_OK(urand_trigger(&urand_pipe), "urand_exit_code");
+
+	ASSERT_EQ(skel->bss->test1_result, 1, "urandlib_api_sameoffset");
+	ASSERT_EQ(skel->bss->test2_result, 1, "urandlib_api_sameoffset@v1");
+	ASSERT_EQ(skel->bss->test3_result, 3, "urandlib_api_sameoffset@@v2");
+	ASSERT_EQ(skel->bss->test4_result, 1, "urandlib_api");
+
+cleanup:
+	if (urand_pipe)
+		pclose(urand_pipe);
+	test_uprobe__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_uprobe.c b/tools/testing/selftests/bpf/progs/test_uprobe.c
new file mode 100644
index 000000000000..896c88a4960d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_uprobe.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Hengqi Chen */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+pid_t my_pid = 0;
+
+int test1_result = 0;
+int test2_result = 0;
+int test3_result = 0;
+int test4_result = 0;
+
+SEC("uprobe/./liburandom_read.so:urandlib_api_sameoffset")
+int BPF_UPROBE(test1)
+{
+	pid_t pid = bpf_get_current_pid_tgid() >> 32;
+
+	if (pid != my_pid)
+		return 0;
+
+	test1_result = 1;
+	return 0;
+}
+
+SEC("uprobe/./liburandom_read.so:urandlib_api_sameoffset@LIBURANDOM_READ_1.0.0")
+int BPF_UPROBE(test2)
+{
+	pid_t pid = bpf_get_current_pid_tgid() >> 32;
+
+	if (pid != my_pid)
+		return 0;
+
+	test2_result = 1;
+	return 0;
+}
+
+SEC("uretprobe/./liburandom_read.so:urandlib_api_sameoffset@@LIBURANDOM_READ_2.0.0")
+int BPF_URETPROBE(test3, int ret)
+{
+	pid_t pid = bpf_get_current_pid_tgid() >> 32;
+
+	if (pid != my_pid)
+		return 0;
+
+	test3_result = ret;
+	return 0;
+}
+
+SEC("uprobe")
+int BPF_UPROBE(test4)
+{
+	pid_t pid = bpf_get_current_pid_tgid() >> 32;
+
+	if (pid != my_pid)
+		return 0;
+
+	test4_result = 1;
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
index e92644d0fa75..b28e910a8fbb 100644
--- a/tools/testing/selftests/bpf/urandom_read.c
+++ b/tools/testing/selftests/bpf/urandom_read.c
@@ -21,6 +21,11 @@ void urand_read_without_sema(int iter_num, int iter_cnt, int read_sz);
 void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz);
 void urandlib_read_without_sema(int iter_num, int iter_cnt, int read_sz);
 
+int urandlib_api(void);
+__asm__(".symver urandlib_api_old,urandlib_api@LIBURANDOM_READ_1.0.0");
+int urandlib_api_old(void);
+int urandlib_api_sameoffset(void);
+
 unsigned short urand_read_with_sema_semaphore SEC(".probes");
 
 static __attribute__((noinline))
@@ -83,6 +88,10 @@ int main(int argc, char *argv[])
 
 	urandom_read(fd, count);
 
+	urandlib_api();
+	urandlib_api_old();
+	urandlib_api_sameoffset();
+
 	close(fd);
 	return 0;
 }
diff --git a/tools/testing/selftests/bpf/urandom_read_lib1.c b/tools/testing/selftests/bpf/urandom_read_lib1.c
index 86186e24b740..403b0735e223 100644
--- a/tools/testing/selftests/bpf/urandom_read_lib1.c
+++ b/tools/testing/selftests/bpf/urandom_read_lib1.c
@@ -11,3 +11,44 @@ void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz)
 {
 	STAP_PROBE3(urandlib, read_with_sema, iter_num, iter_cnt, read_sz);
 }
+
+/* Symbol versioning is different between static and shared library.
+ * Properly versioned symbols are needed for shared library, but
+ * only the symbol of the new version is needed for static library.
+ * Starting with GNU C 10, use symver attribute instead of .symver assembler
+ * directive, which works better with GCC LTO builds.
+ */
+#if defined(__GNUC__) && __GNUC__ >= 10
+
+#define DEFAULT_VERSION(internal_name, api_name, version) \
+	__attribute__((symver(#api_name "@@" #version)))
+#define COMPAT_VERSION(internal_name, api_name, version) \
+	__attribute__((symver(#api_name "@" #version)))
+
+#else
+
+#define COMPAT_VERSION(internal_name, api_name, version) \
+	asm(".symver " #internal_name "," #api_name "@" #version);
+#define DEFAULT_VERSION(internal_name, api_name, version) \
+	asm(".symver " #internal_name "," #api_name "@@" #version);
+
+#endif
+
+COMPAT_VERSION(urandlib_api_v1, urandlib_api, LIBURANDOM_READ_1.0.0)
+int urandlib_api_v1(void)
+{
+	return 1;
+}
+
+DEFAULT_VERSION(urandlib_api_v2, urandlib_api, LIBURANDOM_READ_2.0.0)
+int urandlib_api_v2(void)
+{
+	return 2;
+}
+
+COMPAT_VERSION(urandlib_api_sameoffset, urandlib_api_sameoffset, LIBURANDOM_READ_1.0.0)
+DEFAULT_VERSION(urandlib_api_sameoffset, urandlib_api_sameoffset, LIBURANDOM_READ_2.0.0)
+int urandlib_api_sameoffset(void)
+{
+	return 3;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning for uprobe
  2023-09-11  1:50 ` [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning " Hengqi Chen
@ 2023-09-12 23:14   ` Andrii Nakryiko
  2023-09-14 12:36     ` Hengqi Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 23:14 UTC (permalink / raw)
  To: Hengqi Chen; +Cc: bpf, ast, daniel, andrii, alan.maguire, olsajiri, Jiri Olsa

On Sun, Sep 10, 2023 at 6:51 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
> In current implementation, we assume that symbol found in .dynsym section
> would have a version suffix and use it to compare with symbol user supplied.
> According to the spec ([0]), this assumption is incorrect, the version info
> of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
> of ELF objects. For example:
>
>     $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
>     000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
>     000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
>     000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5
>
>     $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
>       706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
>       2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
>       2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5
>
> In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
> pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
> Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
> version suffix) in .dynsym sections.
>
> This commit implements the symbol versioning for dynsym and allows user to
> specify symbol in the following forms:
>   - func
>   - func@LIB_VERSION
>   - func@@LIB_VERSION
>
> In case of symbol conflicts, error out and users should resolve it by
> specifying a qualified name.
>
>   [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
> ---
>  tools/lib/bpf/elf.c    | 146 +++++++++++++++++++++++++++++++++++++----
>  tools/lib/bpf/libbpf.c |   2 +-
>  2 files changed, 134 insertions(+), 14 deletions(-)
>
> diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
> index 5c9e588b17da..825da903a34c 100644
> --- a/tools/lib/bpf/elf.c
> +++ b/tools/lib/bpf/elf.c
> @@ -1,5 +1,8 @@
>  // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
>  #include <libelf.h>
>  #include <gelf.h>
>  #include <fcntl.h>
> @@ -10,6 +13,17 @@
>
>  #define STRERR_BUFSIZE  128
>
> +/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
> + * the symbol is hidden and can only be seen when referenced using an
> + * explicit version number. This is a GNU extension.
> + */
> +#define VERSYM_HIDDEN  0x8000
> +
> +/* This is the mask for the rest of the data in a word read from a
> + * SHT_GNU_versym section.
> + */
> +#define VERSYM_VERSION 0x7fff
> +
>  int elf_open(const char *binary_path, struct elf_fd *elf_fd)
>  {
>         char errmsg[STRERR_BUFSIZE];
> @@ -64,11 +78,14 @@ struct elf_sym {
>         const char *name;
>         GElf_Sym sym;
>         GElf_Shdr sh;
> +       int ver;
> +       bool hidden;
>  };
>
>  struct elf_sym_iter {
>         Elf *elf;
>         Elf_Data *syms;
> +       Elf_Data *versyms;
>         size_t nr_syms;
>         size_t strtabidx;
>         size_t next_sym_idx;
> @@ -111,6 +128,18 @@ static int elf_sym_iter_new(struct elf_sym_iter *iter,
>         iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
>         iter->elf = elf;
>         iter->st_type = st_type;
> +
> +       /* Version symbol table is meaningful to dynsym only */
> +       if (sh_type != SHT_DYNSYM)
> +               return 0;
> +
> +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
> +       if (!scn)
> +               return 0;
> +       if (!gelf_getshdr(scn, &sh))
> +               return -EINVAL;
> +       iter->versyms = elf_getdata(scn, 0);
> +
>         return 0;
>  }
>
> @@ -119,6 +148,7 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
>         struct elf_sym *ret = &iter->sym;
>         GElf_Sym *sym = &ret->sym;
>         const char *name = NULL;
> +       GElf_Versym versym;
>         Elf_Scn *sym_scn;
>         size_t idx;
>
> @@ -138,12 +168,113 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
>
>                 iter->next_sym_idx = idx + 1;
>                 ret->name = name;
> +               ret->ver = 0;
> +               ret->hidden = false;
> +
> +               if (iter->versyms) {
> +                       if (!gelf_getversym(iter->versyms, idx, &versym))
> +                               continue;
> +                       ret->ver = versym & VERSYM_VERSION;
> +                       ret->hidden = versym & VERSYM_HIDDEN;
> +               }
>                 return ret;
>         }
>
>         return NULL;
>  }
>
> +static const char *elf_get_vername(Elf *elf, int ver)
> +{
> +       GElf_Verdaux verdaux;
> +       GElf_Verdef verdef;
> +       Elf_Data *verdefs;
> +       size_t strtabidx;
> +       GElf_Shdr sh;
> +       Elf_Scn *scn;
> +       int offset;
> +
> +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);

so this is a linear search, right? And we'll be doing it for every
.dynsym symbol. Let's do this once at the creation time and remember a
pointer inside struct Elf?

> +       if (!scn)
> +               return NULL;
> +       if (!gelf_getshdr(scn, &sh))
> +               return NULL;
> +       strtabidx = sh.sh_link;
> +       verdefs =  elf_getdata(scn, 0);
> +
> +       offset = 0;
> +       while (gelf_getverdef(verdefs, offset, &verdef)) {
> +               if (verdef.vd_ndx != ver) {
> +                       if (!verdef.vd_next)
> +                               break;
> +
> +                       offset += verdef.vd_next;
> +                       continue;
> +               }
> +
> +               if (!gelf_getverdaux(verdefs, offset + verdef.vd_aux, &verdaux))
> +                       break;
> +
> +               return elf_strptr(elf, strtabidx, verdaux.vda_name);
> +
> +       }
> +       return NULL;
> +}
> +
> +static bool symbol_match(Elf *elf, int sh_type, struct elf_sym *sym, const char *name)
> +{
> +       size_t name_len, sname_len;
> +       bool is_name_qualified;
> +       const char *ver;
> +       char *sname;
> +       int ret;
> +
> +       name_len = strlen(name);
> +       /* Does name specify "@LIB" or "@@LIB" ? */
> +       is_name_qualified = strstr(name, "@") != NULL;
> +
> +       /* If user specify a qualified name, for dynamic symbol,
> +        * it is in form of func, NOT func@LIB_VER or func@@LIB_VER.
> +        * So construct a full quailified symbol name using versym info

gmail points out typo: qualified

> +        * for comparison.
> +        */
> +       if (is_name_qualified && sh_type == SHT_DYNSYM) {
> +               /* Make sure func match func@LIB_VER */
> +               sname_len = strlen(sym->name);
> +               if (strncmp(sym->name, name, sname_len) != 0)
> +                       return false;
> +
> +               /* But not func2@LIB_VER */
> +               if (name[sname_len] != '@')
> +                       return false;
> +
> +               ver = elf_get_vername(elf, sym->ver);
> +               if (!ver)
> +                       return false;
> +
> +               ret = asprintf(&sname, "%s%s%s", sym->name,
> +                              sym->hidden ? "@" : "@@", ver);
> +               if (ret == -1)

nit: ret < 0, I've spent enough time switching all users of libbpf to
not rely on exact -1 return, let's not show a bad example ;)

> +                       return false;
> +
> +               sname_len = ret;
> +               ret = strncmp(sname, name, sname_len);

why is this strncmp? shouldn't the match be exact? both name is
version-qualified, and current ELF symbol is version-qualified. They
have to exactly match, no?

> +               free(sname);
> +               return ret == 0;
> +       }
> +
> +       /* Otherwise, for normal symbols or non-qualified names
> +        * User can specify func, func@@LIB or func@@LIB_VERSION.
> +        */
> +       if (strncmp(sym->name, name, name_len) != 0)
> +               return false;
> +       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> +        * additional characters in sname should be of the form "@LIB" or "@@LIB".
> +        */
> +       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> +               return false;
> +
> +       return true;
> +}
>
>  /* Transform symbol's virtual address (absolute for binaries and relative
>   * for shared libs) into file offset, which is what kernel is expecting
> @@ -166,9 +297,8 @@ static unsigned long elf_sym_offset(struct elf_sym *sym)
>  long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
>  {
>         int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
> -       bool is_shared_lib, is_name_qualified;
> +       bool is_shared_lib;
>         long ret = -ENOENT;
> -       size_t name_len;
>         GElf_Ehdr ehdr;
>
>         if (!gelf_getehdr(elf, &ehdr)) {
> @@ -179,10 +309,6 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
>         /* for shared lib case, we do not need to calculate relative offset */
>         is_shared_lib = ehdr.e_type == ET_DYN;
>
> -       name_len = strlen(name);
> -       /* Does name specify "@@LIB"? */
> -       is_name_qualified = strstr(name, "@@") != NULL;
> -
>         /* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if
>          * a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically
>          * linked binary may not have SHT_DYMSYM, so absence of a section should not be
> @@ -201,13 +327,7 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
>                         goto out;
>
>                 while ((sym = elf_sym_iter_next(&iter))) {
> -                       /* User can specify func, func@@LIB or func@@LIB_VERSION. */
> -                       if (strncmp(sym->name, name, name_len) != 0)
> -                               continue;
> -                       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> -                        * additional characters in sname should be of the form "@@LIB".
> -                        */
> -                       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> +                       if (!symbol_match(elf, sh_types[i], sym, name))

ok, so let's consider what we are doing here. While previously we did
a relatively expensive strstr() operation once, now we are doing it
for every symbol in ELF. This might add up.

Plus, we then do dynamic allocations with asprintf, which also is kind
of unfortunate.

But let's take a step back. Why we don't determine if the name is
qualified once. Remember what is the length of unqualified name, where
does the version part starts, and pass all that to symbol_match in a
prepared form. Then we don't need to construct "fully qualified" form
of an ELF symbol. We can compare unqual name and version name
separately.

No allocation, no wasted work.

Not sure if we need to care whether we had "@" or "@@" in the
requested symbol, but that's a detail.

>                                 continue;
>
>                         cur_bind = GELF_ST_BIND(sym->sym.st_info);
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 96ff1aa4bf6a..30b8f96820a7 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -11512,7 +11512,7 @@ static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf
>
>         *link = NULL;
>
> -       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.]+%li",
> +       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.@]+%li",

BTW, while you are at it. Arnaldo was trying to use this SEC("uprobe")
stuff for tracing Go functions. Go doesn't seem to do any mangling, so
function names can have lots of "interesting" symbols ([], @, etc).

If you get a chance, would you mind updating this partsing logic to be
able to accommodate such crazy function names as well? Thanks!

>                    &probe_type, &binary_path, &func_name, &offset);
>         switch (n) {
>         case 1:
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 3/3] selftests/bpf: Add tests for symbol versioning for uprobe
  2023-09-11  1:50 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add tests for " Hengqi Chen
@ 2023-09-12 23:19   ` Andrii Nakryiko
  0 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2023-09-12 23:19 UTC (permalink / raw)
  To: Hengqi Chen; +Cc: bpf, ast, daniel, andrii, alan.maguire, olsajiri, Jiri Olsa

On Sun, Sep 10, 2023 at 6:51 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
> This exercises the newly added dynsym symbol versioning logics.
> Now we accept symbols in form of func, func@LIB_VERSION or
> func@@LIB_VERSION.
>
> The test rely on liburandom_read.so. For liburandom_read.so, we have:
>
>     $ nm -D liburandom_read.so
>                      w __cxa_finalize@GLIBC_2.17
>                      w __gmon_start__
>                      w _ITM_deregisterTMCloneTable
>                      w _ITM_registerTMCloneTable
>     0000000000000000 A LIBURANDOM_READ_1.0.0
>     0000000000000000 A LIBURANDOM_READ_2.0.0
>     000000000000081c T urandlib_api@@LIBURANDOM_READ_2.0.0
>     0000000000000814 T urandlib_api@LIBURANDOM_READ_1.0.0
>     0000000000000824 T urandlib_api_sameoffset@LIBURANDOM_READ_1.0.0
>     0000000000000824 T urandlib_api_sameoffset@@LIBURANDOM_READ_2.0.0
>     000000000000082c T urandlib_read_without_sema@@LIBURANDOM_READ_1.0.0
>     00000000000007c4 T urandlib_read_with_sema@@LIBURANDOM_READ_1.0.0
>     0000000000011018 D urandlib_read_with_sema_semaphore@@LIBURANDOM_READ_1.0.0
>
> For `urandlib_api`, specifying `urandlib_api` will cause a conflict because
> there are two symbols named urandlib_api and both are global bind.
> For `urandlib_api_sameoffset`, there are also two symbols in the .so, but
> both are at the same offset and essentially they refer to the same function
> so no conflict.
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
> ---
>  tools/testing/selftests/bpf/Makefile          |  5 +-
>  .../testing/selftests/bpf/liburandom_read.map | 15 +++
>  .../testing/selftests/bpf/prog_tests/uprobe.c | 95 +++++++++++++++++++
>  .../testing/selftests/bpf/progs/test_uprobe.c | 61 ++++++++++++
>  tools/testing/selftests/bpf/urandom_read.c    |  9 ++
>  .../testing/selftests/bpf/urandom_read_lib1.c | 41 ++++++++
>  6 files changed, 224 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/liburandom_read.map
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_uprobe.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index caede9b574cb..47365161b6fc 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -196,11 +196,12 @@ endif
>
>  # Filter out -static for liburandom_read.so and its dependent targets so that static builds
>  # do not fail. Static builds leave urandom_read relying on system-wide shared libraries.
> -$(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
> +$(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c liburandom_read.map
>         $(call msg,LIB,,$@)
>         $(Q)$(CLANG) $(filter-out -static,$(CFLAGS) $(LDFLAGS))   \
> -                    $^ $(filter-out -static,$(LDLIBS))      \
> +                    $(filter %.c,$^) $(filter-out -static,$(LDLIBS)) \
>                      -fuse-ld=$(LLD) -Wl,-znoseparate-code -Wl,--build-id=sha1 \
> +                    -Wl,--version-script=liburandom_read.map \
>                      -fPIC -shared -o $@
>
>  $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
> diff --git a/tools/testing/selftests/bpf/liburandom_read.map b/tools/testing/selftests/bpf/liburandom_read.map
> new file mode 100644
> index 000000000000..38a97a419a04
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/liburandom_read.map
> @@ -0,0 +1,15 @@
> +LIBURANDOM_READ_1.0.0 {
> +       global:
> +               urandlib_api;
> +               urandlib_api_sameoffset;
> +               urandlib_read_without_sema;
> +               urandlib_read_with_sema;
> +               urandlib_read_with_sema_semaphore;
> +       local:
> +               *;
> +};
> +
> +LIBURANDOM_READ_2.0.0 {
> +       global:
> +               urandlib_api;
> +} LIBURANDOM_READ_1.0.0;
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe.c b/tools/testing/selftests/bpf/prog_tests/uprobe.c
> new file mode 100644
> index 000000000000..cf3e0e7a64fa
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023 Hengqi Chen */
> +
> +#include <test_progs.h>
> +#include "test_uprobe.skel.h"
> +
> +static FILE *urand_spawn(int *pid)
> +{
> +       FILE *f;
> +
> +       /* urandom_read's stdout is wired into f */
> +       f = popen("./urandom_read 1 report-pid", "r");
> +       if (!f)
> +               return NULL;
> +
> +       if (fscanf(f, "%d", pid) != 1) {
> +               pclose(f);
> +               errno = EINVAL;
> +               return NULL;
> +       }
> +
> +       return f;
> +}
> +
> +static int urand_trigger(FILE **urand_pipe)
> +{
> +       int exit_code;
> +
> +       /* pclose() waits for child process to exit and returns their exit code */
> +       exit_code = pclose(*urand_pipe);
> +       *urand_pipe = NULL;
> +
> +       return exit_code;
> +}
> +
> +void test_uprobe(void)
> +{
> +       LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts);
> +       struct test_uprobe *skel;
> +       FILE *urand_pipe = NULL;
> +       int urand_pid = 0, err;
> +
> +       skel = test_uprobe__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "skel_open"))
> +               return;
> +
> +       urand_pipe = urand_spawn(&urand_pid);
> +       if (!ASSERT_OK_PTR(urand_pipe, "urand_spawn"))
> +               goto cleanup;
> +
> +       skel->bss->my_pid = urand_pid;
> +
> +       /* Manual attach uprobe to urandlib_api
> +        * There are two `urandlib_api` symbols in .dynsym section:
> +        *   - urandlib_api@LIBURANDOM_READ_1.0.0
> +        *   - urandlib_api@@LIBURANDOM_READ_2.0.0
> +        * Both are global bind and would cause a conflict if user
> +        * specify the symbol name without a version suffix
> +        */
> +       uprobe_opts.func_name = "urandlib_api";
> +       skel->links.test4 = bpf_program__attach_uprobe_opts(skel->progs.test4,
> +                                                           urand_pid,
> +                                                           "./liburandom_read.so",
> +                                                           0 /* offset */,
> +                                                           &uprobe_opts);
> +       if (!ASSERT_ERR_PTR(skel->links.test4, "urandlib_api_attach_conflict"))
> +               goto cleanup;
> +
> +       uprobe_opts.func_name = "urandlib_api@LIBURANDOM_READ_1.0.0";
> +       skel->links.test4 = bpf_program__attach_uprobe_opts(skel->progs.test4,
> +                                                           urand_pid,
> +                                                           "./liburandom_read.so",
> +                                                           0 /* offset */,
> +                                                           &uprobe_opts);
> +       if (!ASSERT_OK_PTR(skel->links.test4, "urandlib_api_attach_ok"))
> +               goto cleanup;
> +
> +       /* Auto attach 3 u[ret]probes to urandlib_api_sameoffset */
> +       err = test_uprobe__attach(skel);
> +       if (!ASSERT_OK(err, "skel_attach"))
> +               goto cleanup;
> +
> +       /* trigger urandom_read */
> +       ASSERT_OK(urand_trigger(&urand_pipe), "urand_exit_code");
> +
> +       ASSERT_EQ(skel->bss->test1_result, 1, "urandlib_api_sameoffset");
> +       ASSERT_EQ(skel->bss->test2_result, 1, "urandlib_api_sameoffset@v1");
> +       ASSERT_EQ(skel->bss->test3_result, 3, "urandlib_api_sameoffset@@v2");
> +       ASSERT_EQ(skel->bss->test4_result, 1, "urandlib_api");
> +
> +cleanup:
> +       if (urand_pipe)
> +               pclose(urand_pipe);
> +       test_uprobe__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_uprobe.c b/tools/testing/selftests/bpf/progs/test_uprobe.c
> new file mode 100644
> index 000000000000..896c88a4960d
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_uprobe.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023 Hengqi Chen */
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +pid_t my_pid = 0;
> +
> +int test1_result = 0;
> +int test2_result = 0;
> +int test3_result = 0;
> +int test4_result = 0;
> +
> +SEC("uprobe/./liburandom_read.so:urandlib_api_sameoffset")
> +int BPF_UPROBE(test1)
> +{
> +       pid_t pid = bpf_get_current_pid_tgid() >> 32;
> +
> +       if (pid != my_pid)
> +               return 0;
> +
> +       test1_result = 1;
> +       return 0;
> +}
> +
> +SEC("uprobe/./liburandom_read.so:urandlib_api_sameoffset@LIBURANDOM_READ_1.0.0")
> +int BPF_UPROBE(test2)
> +{
> +       pid_t pid = bpf_get_current_pid_tgid() >> 32;
> +
> +       if (pid != my_pid)
> +               return 0;
> +
> +       test2_result = 1;
> +       return 0;
> +}
> +
> +SEC("uretprobe/./liburandom_read.so:urandlib_api_sameoffset@@LIBURANDOM_READ_2.0.0")
> +int BPF_URETPROBE(test3, int ret)
> +{
> +       pid_t pid = bpf_get_current_pid_tgid() >> 32;
> +
> +       if (pid != my_pid)
> +               return 0;
> +
> +       test3_result = ret;
> +       return 0;
> +}
> +
> +SEC("uprobe")
> +int BPF_UPROBE(test4)
> +{
> +       pid_t pid = bpf_get_current_pid_tgid() >> 32;
> +
> +       if (pid != my_pid)
> +               return 0;
> +
> +       test4_result = 1;
> +       return 0;
> +}
> diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
> index e92644d0fa75..b28e910a8fbb 100644
> --- a/tools/testing/selftests/bpf/urandom_read.c
> +++ b/tools/testing/selftests/bpf/urandom_read.c
> @@ -21,6 +21,11 @@ void urand_read_without_sema(int iter_num, int iter_cnt, int read_sz);
>  void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz);
>  void urandlib_read_without_sema(int iter_num, int iter_cnt, int read_sz);
>
> +int urandlib_api(void);
> +__asm__(".symver urandlib_api_old,urandlib_api@LIBURANDOM_READ_1.0.0");

any reason to not use COMPAT_VERSION() macro here?

> +int urandlib_api_old(void);
> +int urandlib_api_sameoffset(void);
> +
>  unsigned short urand_read_with_sema_semaphore SEC(".probes");
>
>  static __attribute__((noinline))
> @@ -83,6 +88,10 @@ int main(int argc, char *argv[])
>
>         urandom_read(fd, count);
>
> +       urandlib_api();
> +       urandlib_api_old();
> +       urandlib_api_sameoffset();
> +
>         close(fd);
>         return 0;
>  }
> diff --git a/tools/testing/selftests/bpf/urandom_read_lib1.c b/tools/testing/selftests/bpf/urandom_read_lib1.c
> index 86186e24b740..403b0735e223 100644
> --- a/tools/testing/selftests/bpf/urandom_read_lib1.c
> +++ b/tools/testing/selftests/bpf/urandom_read_lib1.c
> @@ -11,3 +11,44 @@ void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz)
>  {
>         STAP_PROBE3(urandlib, read_with_sema, iter_num, iter_cnt, read_sz);
>  }
> +
> +/* Symbol versioning is different between static and shared library.
> + * Properly versioned symbols are needed for shared library, but
> + * only the symbol of the new version is needed for static library.
> + * Starting with GNU C 10, use symver attribute instead of .symver assembler
> + * directive, which works better with GCC LTO builds.
> + */
> +#if defined(__GNUC__) && __GNUC__ >= 10
> +
> +#define DEFAULT_VERSION(internal_name, api_name, version) \
> +       __attribute__((symver(#api_name "@@" #version)))
> +#define COMPAT_VERSION(internal_name, api_name, version) \
> +       __attribute__((symver(#api_name "@" #version)))
> +
> +#else
> +
> +#define COMPAT_VERSION(internal_name, api_name, version) \
> +       asm(".symver " #internal_name "," #api_name "@" #version);
> +#define DEFAULT_VERSION(internal_name, api_name, version) \
> +       asm(".symver " #internal_name "," #api_name "@@" #version);
> +
> +#endif

maybe let's just use libbpf_internal.h instead of copy/pasting this
barely maintainable piece of magic?

> +
> +COMPAT_VERSION(urandlib_api_v1, urandlib_api, LIBURANDOM_READ_1.0.0)
> +int urandlib_api_v1(void)
> +{
> +       return 1;
> +}
> +
> +DEFAULT_VERSION(urandlib_api_v2, urandlib_api, LIBURANDOM_READ_2.0.0)
> +int urandlib_api_v2(void)
> +{
> +       return 2;
> +}
> +
> +COMPAT_VERSION(urandlib_api_sameoffset, urandlib_api_sameoffset, LIBURANDOM_READ_1.0.0)
> +DEFAULT_VERSION(urandlib_api_sameoffset, urandlib_api_sameoffset, LIBURANDOM_READ_2.0.0)
> +int urandlib_api_sameoffset(void)
> +{
> +       return 3;
> +}
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning for uprobe
  2023-09-12 23:14   ` Andrii Nakryiko
@ 2023-09-14 12:36     ` Hengqi Chen
  2023-09-14 17:12       ` Andrii Nakryiko
  0 siblings, 1 reply; 10+ messages in thread
From: Hengqi Chen @ 2023-09-14 12:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, andrii, alan.maguire, olsajiri, Jiri Olsa

On Wed, Sep 13, 2023 at 7:14 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sun, Sep 10, 2023 at 6:51 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
> >
> > In current implementation, we assume that symbol found in .dynsym section
> > would have a version suffix and use it to compare with symbol user supplied.
> > According to the spec ([0]), this assumption is incorrect, the version info
> > of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
> > of ELF objects. For example:
> >
> >     $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> >     000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
> >     000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
> >     000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5
> >
> >     $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> >       706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
> >       2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
> >       2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5
> >
> > In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
> > pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
> > Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
> > version suffix) in .dynsym sections.
> >
> > This commit implements the symbol versioning for dynsym and allows user to
> > specify symbol in the following forms:
> >   - func
> >   - func@LIB_VERSION
> >   - func@@LIB_VERSION
> >
> > In case of symbol conflicts, error out and users should resolve it by
> > specifying a qualified name.
> >
> >   [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
> >
> > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
> > ---
> >  tools/lib/bpf/elf.c    | 146 +++++++++++++++++++++++++++++++++++++----
> >  tools/lib/bpf/libbpf.c |   2 +-
> >  2 files changed, 134 insertions(+), 14 deletions(-)
> >
> > diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
> > index 5c9e588b17da..825da903a34c 100644
> > --- a/tools/lib/bpf/elf.c
> > +++ b/tools/lib/bpf/elf.c
> > @@ -1,5 +1,8 @@
> >  // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> >
> > +#ifndef _GNU_SOURCE
> > +#define _GNU_SOURCE
> > +#endif
> >  #include <libelf.h>
> >  #include <gelf.h>
> >  #include <fcntl.h>
> > @@ -10,6 +13,17 @@
> >
> >  #define STRERR_BUFSIZE  128
> >
> > +/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
> > + * the symbol is hidden and can only be seen when referenced using an
> > + * explicit version number. This is a GNU extension.
> > + */
> > +#define VERSYM_HIDDEN  0x8000
> > +
> > +/* This is the mask for the rest of the data in a word read from a
> > + * SHT_GNU_versym section.
> > + */
> > +#define VERSYM_VERSION 0x7fff
> > +
> >  int elf_open(const char *binary_path, struct elf_fd *elf_fd)
> >  {
> >         char errmsg[STRERR_BUFSIZE];
> > @@ -64,11 +78,14 @@ struct elf_sym {
> >         const char *name;
> >         GElf_Sym sym;
> >         GElf_Shdr sh;
> > +       int ver;
> > +       bool hidden;
> >  };
> >
> >  struct elf_sym_iter {
> >         Elf *elf;
> >         Elf_Data *syms;
> > +       Elf_Data *versyms;
> >         size_t nr_syms;
> >         size_t strtabidx;
> >         size_t next_sym_idx;
> > @@ -111,6 +128,18 @@ static int elf_sym_iter_new(struct elf_sym_iter *iter,
> >         iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
> >         iter->elf = elf;
> >         iter->st_type = st_type;
> > +
> > +       /* Version symbol table is meaningful to dynsym only */
> > +       if (sh_type != SHT_DYNSYM)
> > +               return 0;
> > +
> > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
> > +       if (!scn)
> > +               return 0;
> > +       if (!gelf_getshdr(scn, &sh))
> > +               return -EINVAL;
> > +       iter->versyms = elf_getdata(scn, 0);
> > +
> >         return 0;
> >  }
> >
> > @@ -119,6 +148,7 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> >         struct elf_sym *ret = &iter->sym;
> >         GElf_Sym *sym = &ret->sym;
> >         const char *name = NULL;
> > +       GElf_Versym versym;
> >         Elf_Scn *sym_scn;
> >         size_t idx;
> >
> > @@ -138,12 +168,113 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> >
> >                 iter->next_sym_idx = idx + 1;
> >                 ret->name = name;
> > +               ret->ver = 0;
> > +               ret->hidden = false;
> > +
> > +               if (iter->versyms) {
> > +                       if (!gelf_getversym(iter->versyms, idx, &versym))
> > +                               continue;
> > +                       ret->ver = versym & VERSYM_VERSION;
> > +                       ret->hidden = versym & VERSYM_HIDDEN;
> > +               }
> >                 return ret;
> >         }
> >
> >         return NULL;
> >  }
> >
> > +static const char *elf_get_vername(Elf *elf, int ver)
> > +{
> > +       GElf_Verdaux verdaux;
> > +       GElf_Verdef verdef;
> > +       Elf_Data *verdefs;
> > +       size_t strtabidx;
> > +       GElf_Shdr sh;
> > +       Elf_Scn *scn;
> > +       int offset;
> > +
> > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);
>
> so this is a linear search, right? And we'll be doing it for every
> .dynsym symbol. Let's do this once at the creation time and remember a
> pointer inside struct Elf?
>

We reach here only when the symbol part match, and likely we get the
desired one.
if we store the pointers in struct Elf, then we have to involve
dynamic allocations.

> > +       if (!scn)
> > +               return NULL;
> > +       if (!gelf_getshdr(scn, &sh))
> > +               return NULL;
> > +       strtabidx = sh.sh_link;
> > +       verdefs =  elf_getdata(scn, 0);
> > +
> > +       offset = 0;
> > +       while (gelf_getverdef(verdefs, offset, &verdef)) {
> > +               if (verdef.vd_ndx != ver) {
> > +                       if (!verdef.vd_next)
> > +                               break;
> > +
> > +                       offset += verdef.vd_next;
> > +                       continue;
> > +               }
> > +
> > +               if (!gelf_getverdaux(verdefs, offset + verdef.vd_aux, &verdaux))
> > +                       break;
> > +
> > +               return elf_strptr(elf, strtabidx, verdaux.vda_name);
> > +
> > +       }
> > +       return NULL;
> > +}
> > +
> > +static bool symbol_match(Elf *elf, int sh_type, struct elf_sym *sym, const char *name)
> > +{
> > +       size_t name_len, sname_len;
> > +       bool is_name_qualified;
> > +       const char *ver;
> > +       char *sname;
> > +       int ret;
> > +
> > +       name_len = strlen(name);
> > +       /* Does name specify "@LIB" or "@@LIB" ? */
> > +       is_name_qualified = strstr(name, "@") != NULL;
> > +
> > +       /* If user specify a qualified name, for dynamic symbol,
> > +        * it is in form of func, NOT func@LIB_VER or func@@LIB_VER.
> > +        * So construct a full quailified symbol name using versym info
>
> gmail points out typo: qualified
>
> > +        * for comparison.
> > +        */
> > +       if (is_name_qualified && sh_type == SHT_DYNSYM) {
> > +               /* Make sure func match func@LIB_VER */
> > +               sname_len = strlen(sym->name);
> > +               if (strncmp(sym->name, name, sname_len) != 0)
> > +                       return false;
> > +
> > +               /* But not func2@LIB_VER */
> > +               if (name[sname_len] != '@')
> > +                       return false;
> > +
> > +               ver = elf_get_vername(elf, sym->ver);
> > +               if (!ver)
> > +                       return false;
> > +
> > +               ret = asprintf(&sname, "%s%s%s", sym->name,
> > +                              sym->hidden ? "@" : "@@", ver);
> > +               if (ret == -1)
>
> nit: ret < 0, I've spent enough time switching all users of libbpf to
> not rely on exact -1 return, let's not show a bad example ;)
>
> > +                       return false;
> > +
> > +               sname_len = ret;
> > +               ret = strncmp(sname, name, sname_len);
>
> why is this strncmp? shouldn't the match be exact? both name is
> version-qualified, and current ELF symbol is version-qualified. They
> have to exactly match, no?
>
> > +               free(sname);
> > +               return ret == 0;
> > +       }
> > +
> > +       /* Otherwise, for normal symbols or non-qualified names
> > +        * User can specify func, func@@LIB or func@@LIB_VERSION.
> > +        */
> > +       if (strncmp(sym->name, name, name_len) != 0)
> > +               return false;
> > +       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> > +        * additional characters in sname should be of the form "@LIB" or "@@LIB".
> > +        */
> > +       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> > +               return false;
> > +
> > +       return true;
> > +}
> >
> >  /* Transform symbol's virtual address (absolute for binaries and relative
> >   * for shared libs) into file offset, which is what kernel is expecting
> > @@ -166,9 +297,8 @@ static unsigned long elf_sym_offset(struct elf_sym *sym)
> >  long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> >  {
> >         int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
> > -       bool is_shared_lib, is_name_qualified;
> > +       bool is_shared_lib;
> >         long ret = -ENOENT;
> > -       size_t name_len;
> >         GElf_Ehdr ehdr;
> >
> >         if (!gelf_getehdr(elf, &ehdr)) {
> > @@ -179,10 +309,6 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> >         /* for shared lib case, we do not need to calculate relative offset */
> >         is_shared_lib = ehdr.e_type == ET_DYN;
> >
> > -       name_len = strlen(name);
> > -       /* Does name specify "@@LIB"? */
> > -       is_name_qualified = strstr(name, "@@") != NULL;
> > -
> >         /* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if
> >          * a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically
> >          * linked binary may not have SHT_DYMSYM, so absence of a section should not be
> > @@ -201,13 +327,7 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> >                         goto out;
> >
> >                 while ((sym = elf_sym_iter_next(&iter))) {
> > -                       /* User can specify func, func@@LIB or func@@LIB_VERSION. */
> > -                       if (strncmp(sym->name, name, name_len) != 0)
> > -                               continue;
> > -                       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> > -                        * additional characters in sname should be of the form "@@LIB".
> > -                        */
> > -                       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> > +                       if (!symbol_match(elf, sh_types[i], sym, name))
>
> ok, so let's consider what we are doing here. While previously we did
> a relatively expensive strstr() operation once, now we are doing it
> for every symbol in ELF. This might add up.
>
> Plus, we then do dynamic allocations with asprintf, which also is kind
> of unfortunate.
>
> But let's take a step back. Why we don't determine if the name is
> qualified once. Remember what is the length of unqualified name, where
> does the version part starts, and pass all that to symbol_match in a
> prepared form. Then we don't need to construct "fully qualified" form
> of an ELF symbol. We can compare unqual name and version name
> separately.
>
> No allocation, no wasted work.
>
> Not sure if we need to care whether we had "@" or "@@" in the
> requested symbol, but that's a detail.
>

Sounds good, will do.

> >                                 continue;
> >
> >                         cur_bind = GELF_ST_BIND(sym->sym.st_info);
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 96ff1aa4bf6a..30b8f96820a7 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -11512,7 +11512,7 @@ static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf
> >
> >         *link = NULL;
> >
> > -       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.]+%li",
> > +       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.@]+%li",
>
> BTW, while you are at it. Arnaldo was trying to use this SEC("uprobe")
> stuff for tracing Go functions. Go doesn't seem to do any mangling, so
> function names can have lots of "interesting" symbols ([], @, etc).
>

Go symbols are complicated, I will try in another patch.

> If you get a chance, would you mind updating this partsing logic to be
> able to accommodate such crazy function names as well? Thanks!
>
> >                    &probe_type, &binary_path, &func_name, &offset);
> >         switch (n) {
> >         case 1:
> > --
> > 2.34.1
> >

Cheers,
---
Hengqi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning for uprobe
  2023-09-14 12:36     ` Hengqi Chen
@ 2023-09-14 17:12       ` Andrii Nakryiko
  2023-09-15  7:30         ` Hengqi Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2023-09-14 17:12 UTC (permalink / raw)
  To: Hengqi Chen; +Cc: bpf, ast, daniel, andrii, alan.maguire, olsajiri, Jiri Olsa

On Thu, Sep 14, 2023 at 5:37 AM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
> On Wed, Sep 13, 2023 at 7:14 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sun, Sep 10, 2023 at 6:51 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
> > >
> > > In current implementation, we assume that symbol found in .dynsym section
> > > would have a version suffix and use it to compare with symbol user supplied.
> > > According to the spec ([0]), this assumption is incorrect, the version info
> > > of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
> > > of ELF objects. For example:
> > >
> > >     $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> > >     000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
> > >     000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
> > >     000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5
> > >
> > >     $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> > >       706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
> > >       2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
> > >       2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5
> > >
> > > In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
> > > pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
> > > Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
> > > version suffix) in .dynsym sections.
> > >
> > > This commit implements the symbol versioning for dynsym and allows user to
> > > specify symbol in the following forms:
> > >   - func
> > >   - func@LIB_VERSION
> > >   - func@@LIB_VERSION
> > >
> > > In case of symbol conflicts, error out and users should resolve it by
> > > specifying a qualified name.
> > >
> > >   [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
> > >
> > > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > > Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
> > > ---
> > >  tools/lib/bpf/elf.c    | 146 +++++++++++++++++++++++++++++++++++++----
> > >  tools/lib/bpf/libbpf.c |   2 +-
> > >  2 files changed, 134 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
> > > index 5c9e588b17da..825da903a34c 100644
> > > --- a/tools/lib/bpf/elf.c
> > > +++ b/tools/lib/bpf/elf.c
> > > @@ -1,5 +1,8 @@
> > >  // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > >
> > > +#ifndef _GNU_SOURCE
> > > +#define _GNU_SOURCE
> > > +#endif
> > >  #include <libelf.h>
> > >  #include <gelf.h>
> > >  #include <fcntl.h>
> > > @@ -10,6 +13,17 @@
> > >
> > >  #define STRERR_BUFSIZE  128
> > >
> > > +/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
> > > + * the symbol is hidden and can only be seen when referenced using an
> > > + * explicit version number. This is a GNU extension.
> > > + */
> > > +#define VERSYM_HIDDEN  0x8000
> > > +
> > > +/* This is the mask for the rest of the data in a word read from a
> > > + * SHT_GNU_versym section.
> > > + */
> > > +#define VERSYM_VERSION 0x7fff
> > > +
> > >  int elf_open(const char *binary_path, struct elf_fd *elf_fd)
> > >  {
> > >         char errmsg[STRERR_BUFSIZE];
> > > @@ -64,11 +78,14 @@ struct elf_sym {
> > >         const char *name;
> > >         GElf_Sym sym;
> > >         GElf_Shdr sh;
> > > +       int ver;
> > > +       bool hidden;
> > >  };
> > >
> > >  struct elf_sym_iter {
> > >         Elf *elf;
> > >         Elf_Data *syms;
> > > +       Elf_Data *versyms;
> > >         size_t nr_syms;
> > >         size_t strtabidx;
> > >         size_t next_sym_idx;
> > > @@ -111,6 +128,18 @@ static int elf_sym_iter_new(struct elf_sym_iter *iter,
> > >         iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
> > >         iter->elf = elf;
> > >         iter->st_type = st_type;
> > > +
> > > +       /* Version symbol table is meaningful to dynsym only */
> > > +       if (sh_type != SHT_DYNSYM)
> > > +               return 0;
> > > +
> > > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
> > > +       if (!scn)
> > > +               return 0;
> > > +       if (!gelf_getshdr(scn, &sh))
> > > +               return -EINVAL;
> > > +       iter->versyms = elf_getdata(scn, 0);
> > > +
> > >         return 0;
> > >  }
> > >
> > > @@ -119,6 +148,7 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> > >         struct elf_sym *ret = &iter->sym;
> > >         GElf_Sym *sym = &ret->sym;
> > >         const char *name = NULL;
> > > +       GElf_Versym versym;
> > >         Elf_Scn *sym_scn;
> > >         size_t idx;
> > >
> > > @@ -138,12 +168,113 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> > >
> > >                 iter->next_sym_idx = idx + 1;
> > >                 ret->name = name;
> > > +               ret->ver = 0;
> > > +               ret->hidden = false;
> > > +
> > > +               if (iter->versyms) {
> > > +                       if (!gelf_getversym(iter->versyms, idx, &versym))
> > > +                               continue;
> > > +                       ret->ver = versym & VERSYM_VERSION;
> > > +                       ret->hidden = versym & VERSYM_HIDDEN;
> > > +               }
> > >                 return ret;
> > >         }
> > >
> > >         return NULL;
> > >  }
> > >
> > > +static const char *elf_get_vername(Elf *elf, int ver)
> > > +{
> > > +       GElf_Verdaux verdaux;
> > > +       GElf_Verdef verdef;
> > > +       Elf_Data *verdefs;
> > > +       size_t strtabidx;
> > > +       GElf_Shdr sh;
> > > +       Elf_Scn *scn;
> > > +       int offset;
> > > +
> > > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);
> >
> > so this is a linear search, right? And we'll be doing it for every
> > .dynsym symbol. Let's do this once at the creation time and remember a
> > pointer inside struct Elf?
> >
>
> We reach here only when the symbol part match, and likely we get the
> desired one.

sure, but you can have multiple versions, so multiple hits of this.
And the ELF itself could be pretty big with lots of sections. So I
think we should try to minimize number of linear searches over ELF
sections, if possible.

> if we store the pointers in struct Elf, then we have to involve
> dynamic allocations.

I'm not sure why dynamic allocations are needed?

But also I had struct elf_sym_iter in mind, where we already cache
Elf_Data *versyms, so why not do that with verdefs as well? I think
all these helpers (symbol_match and elf_get_vername) are called only
from elf_sym_iter stuff right now, right?


>
> > > +       if (!scn)
> > > +               return NULL;
> > > +       if (!gelf_getshdr(scn, &sh))
> > > +               return NULL;
> > > +       strtabidx = sh.sh_link;
> > > +       verdefs =  elf_getdata(scn, 0);
> > > +
> > > +       offset = 0;
> > > +       while (gelf_getverdef(verdefs, offset, &verdef)) {
> > > +               if (verdef.vd_ndx != ver) {
> > > +                       if (!verdef.vd_next)
> > > +                               break;
> > > +
> > > +                       offset += verdef.vd_next;
> > > +                       continue;
> > > +               }
> > > +
> > > +               if (!gelf_getverdaux(verdefs, offset + verdef.vd_aux, &verdaux))
> > > +                       break;
> > > +
> > > +               return elf_strptr(elf, strtabidx, verdaux.vda_name);
> > > +
> > > +       }
> > > +       return NULL;
> > > +}
> > > +
> > > +static bool symbol_match(Elf *elf, int sh_type, struct elf_sym *sym, const char *name)
> > > +{
> > > +       size_t name_len, sname_len;
> > > +       bool is_name_qualified;
> > > +       const char *ver;
> > > +       char *sname;
> > > +       int ret;
> > > +
> > > +       name_len = strlen(name);
> > > +       /* Does name specify "@LIB" or "@@LIB" ? */
> > > +       is_name_qualified = strstr(name, "@") != NULL;
> > > +
> > > +       /* If user specify a qualified name, for dynamic symbol,
> > > +        * it is in form of func, NOT func@LIB_VER or func@@LIB_VER.
> > > +        * So construct a full quailified symbol name using versym info
> >
> > gmail points out typo: qualified
> >
> > > +        * for comparison.
> > > +        */
> > > +       if (is_name_qualified && sh_type == SHT_DYNSYM) {
> > > +               /* Make sure func match func@LIB_VER */
> > > +               sname_len = strlen(sym->name);
> > > +               if (strncmp(sym->name, name, sname_len) != 0)
> > > +                       return false;
> > > +
> > > +               /* But not func2@LIB_VER */
> > > +               if (name[sname_len] != '@')
> > > +                       return false;
> > > +
> > > +               ver = elf_get_vername(elf, sym->ver);
> > > +               if (!ver)
> > > +                       return false;
> > > +
> > > +               ret = asprintf(&sname, "%s%s%s", sym->name,
> > > +                              sym->hidden ? "@" : "@@", ver);
> > > +               if (ret == -1)
> >
> > nit: ret < 0, I've spent enough time switching all users of libbpf to
> > not rely on exact -1 return, let's not show a bad example ;)
> >
> > > +                       return false;
> > > +
> > > +               sname_len = ret;
> > > +               ret = strncmp(sname, name, sname_len);
> >
> > why is this strncmp? shouldn't the match be exact? both name is
> > version-qualified, and current ELF symbol is version-qualified. They
> > have to exactly match, no?
> >
> > > +               free(sname);
> > > +               return ret == 0;
> > > +       }
> > > +
> > > +       /* Otherwise, for normal symbols or non-qualified names
> > > +        * User can specify func, func@@LIB or func@@LIB_VERSION.
> > > +        */
> > > +       if (strncmp(sym->name, name, name_len) != 0)
> > > +               return false;
> > > +       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> > > +        * additional characters in sname should be of the form "@LIB" or "@@LIB".
> > > +        */
> > > +       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> > > +               return false;
> > > +
> > > +       return true;
> > > +}
> > >
> > >  /* Transform symbol's virtual address (absolute for binaries and relative
> > >   * for shared libs) into file offset, which is what kernel is expecting
> > > @@ -166,9 +297,8 @@ static unsigned long elf_sym_offset(struct elf_sym *sym)
> > >  long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> > >  {
> > >         int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
> > > -       bool is_shared_lib, is_name_qualified;
> > > +       bool is_shared_lib;
> > >         long ret = -ENOENT;
> > > -       size_t name_len;
> > >         GElf_Ehdr ehdr;
> > >
> > >         if (!gelf_getehdr(elf, &ehdr)) {
> > > @@ -179,10 +309,6 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> > >         /* for shared lib case, we do not need to calculate relative offset */
> > >         is_shared_lib = ehdr.e_type == ET_DYN;
> > >
> > > -       name_len = strlen(name);
> > > -       /* Does name specify "@@LIB"? */
> > > -       is_name_qualified = strstr(name, "@@") != NULL;
> > > -
> > >         /* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if
> > >          * a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically
> > >          * linked binary may not have SHT_DYMSYM, so absence of a section should not be
> > > @@ -201,13 +327,7 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> > >                         goto out;
> > >
> > >                 while ((sym = elf_sym_iter_next(&iter))) {
> > > -                       /* User can specify func, func@@LIB or func@@LIB_VERSION. */
> > > -                       if (strncmp(sym->name, name, name_len) != 0)
> > > -                               continue;
> > > -                       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> > > -                        * additional characters in sname should be of the form "@@LIB".
> > > -                        */
> > > -                       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> > > +                       if (!symbol_match(elf, sh_types[i], sym, name))
> >
> > ok, so let's consider what we are doing here. While previously we did
> > a relatively expensive strstr() operation once, now we are doing it
> > for every symbol in ELF. This might add up.
> >
> > Plus, we then do dynamic allocations with asprintf, which also is kind
> > of unfortunate.
> >
> > But let's take a step back. Why we don't determine if the name is
> > qualified once. Remember what is the length of unqualified name, where
> > does the version part starts, and pass all that to symbol_match in a
> > prepared form. Then we don't need to construct "fully qualified" form
> > of an ELF symbol. We can compare unqual name and version name
> > separately.
> >
> > No allocation, no wasted work.
> >
> > Not sure if we need to care whether we had "@" or "@@" in the
> > requested symbol, but that's a detail.
> >
>
> Sounds good, will do.
>
> > >                                 continue;
> > >
> > >                         cur_bind = GELF_ST_BIND(sym->sym.st_info);
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 96ff1aa4bf6a..30b8f96820a7 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -11512,7 +11512,7 @@ static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf
> > >
> > >         *link = NULL;
> > >
> > > -       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.]+%li",
> > > +       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.@]+%li",
> >
> > BTW, while you are at it. Arnaldo was trying to use this SEC("uprobe")
> > stuff for tracing Go functions. Go doesn't seem to do any mangling, so
> > function names can have lots of "interesting" symbols ([], @, etc).
> >
>
> Go symbols are complicated, I will try in another patch.

indeed, thanks!

>
> > If you get a chance, would you mind updating this partsing logic to be
> > able to accommodate such crazy function names as well? Thanks!
> >
> > >                    &probe_type, &binary_path, &func_name, &offset);
> > >         switch (n) {
> > >         case 1:
> > > --
> > > 2.34.1
> > >
>
> Cheers,
> ---
> Hengqi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning for uprobe
  2023-09-14 17:12       ` Andrii Nakryiko
@ 2023-09-15  7:30         ` Hengqi Chen
  2023-09-15 20:20           ` Andrii Nakryiko
  0 siblings, 1 reply; 10+ messages in thread
From: Hengqi Chen @ 2023-09-15  7:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, andrii, alan.maguire, olsajiri, Jiri Olsa

On Fri, Sep 15, 2023 at 1:12 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Sep 14, 2023 at 5:37 AM Hengqi Chen <hengqi.chen@gmail.com> wrote:
> >
> > On Wed, Sep 13, 2023 at 7:14 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Sun, Sep 10, 2023 at 6:51 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
> > > >
> > > > In current implementation, we assume that symbol found in .dynsym section
> > > > would have a version suffix and use it to compare with symbol user supplied.
> > > > According to the spec ([0]), this assumption is incorrect, the version info
> > > > of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
> > > > of ELF objects. For example:
> > > >
> > > >     $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> > > >     000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
> > > >     000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
> > > >     000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5
> > > >
> > > >     $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> > > >       706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
> > > >       2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
> > > >       2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5
> > > >
> > > > In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
> > > > pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
> > > > Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
> > > > version suffix) in .dynsym sections.
> > > >
> > > > This commit implements the symbol versioning for dynsym and allows user to
> > > > specify symbol in the following forms:
> > > >   - func
> > > >   - func@LIB_VERSION
> > > >   - func@@LIB_VERSION
> > > >
> > > > In case of symbol conflicts, error out and users should resolve it by
> > > > specifying a qualified name.
> > > >
> > > >   [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
> > > >
> > > > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > > > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > > > Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
> > > > ---
> > > >  tools/lib/bpf/elf.c    | 146 +++++++++++++++++++++++++++++++++++++----
> > > >  tools/lib/bpf/libbpf.c |   2 +-
> > > >  2 files changed, 134 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
> > > > index 5c9e588b17da..825da903a34c 100644
> > > > --- a/tools/lib/bpf/elf.c
> > > > +++ b/tools/lib/bpf/elf.c
> > > > @@ -1,5 +1,8 @@
> > > >  // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > > >
> > > > +#ifndef _GNU_SOURCE
> > > > +#define _GNU_SOURCE
> > > > +#endif
> > > >  #include <libelf.h>
> > > >  #include <gelf.h>
> > > >  #include <fcntl.h>
> > > > @@ -10,6 +13,17 @@
> > > >
> > > >  #define STRERR_BUFSIZE  128
> > > >
> > > > +/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
> > > > + * the symbol is hidden and can only be seen when referenced using an
> > > > + * explicit version number. This is a GNU extension.
> > > > + */
> > > > +#define VERSYM_HIDDEN  0x8000
> > > > +
> > > > +/* This is the mask for the rest of the data in a word read from a
> > > > + * SHT_GNU_versym section.
> > > > + */
> > > > +#define VERSYM_VERSION 0x7fff
> > > > +
> > > >  int elf_open(const char *binary_path, struct elf_fd *elf_fd)
> > > >  {
> > > >         char errmsg[STRERR_BUFSIZE];
> > > > @@ -64,11 +78,14 @@ struct elf_sym {
> > > >         const char *name;
> > > >         GElf_Sym sym;
> > > >         GElf_Shdr sh;
> > > > +       int ver;
> > > > +       bool hidden;
> > > >  };
> > > >
> > > >  struct elf_sym_iter {
> > > >         Elf *elf;
> > > >         Elf_Data *syms;
> > > > +       Elf_Data *versyms;
> > > >         size_t nr_syms;
> > > >         size_t strtabidx;
> > > >         size_t next_sym_idx;
> > > > @@ -111,6 +128,18 @@ static int elf_sym_iter_new(struct elf_sym_iter *iter,
> > > >         iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
> > > >         iter->elf = elf;
> > > >         iter->st_type = st_type;
> > > > +
> > > > +       /* Version symbol table is meaningful to dynsym only */
> > > > +       if (sh_type != SHT_DYNSYM)
> > > > +               return 0;
> > > > +
> > > > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
> > > > +       if (!scn)
> > > > +               return 0;
> > > > +       if (!gelf_getshdr(scn, &sh))
> > > > +               return -EINVAL;
> > > > +       iter->versyms = elf_getdata(scn, 0);
> > > > +
> > > >         return 0;
> > > >  }
> > > >
> > > > @@ -119,6 +148,7 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> > > >         struct elf_sym *ret = &iter->sym;
> > > >         GElf_Sym *sym = &ret->sym;
> > > >         const char *name = NULL;
> > > > +       GElf_Versym versym;
> > > >         Elf_Scn *sym_scn;
> > > >         size_t idx;
> > > >
> > > > @@ -138,12 +168,113 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> > > >
> > > >                 iter->next_sym_idx = idx + 1;
> > > >                 ret->name = name;
> > > > +               ret->ver = 0;
> > > > +               ret->hidden = false;
> > > > +
> > > > +               if (iter->versyms) {
> > > > +                       if (!gelf_getversym(iter->versyms, idx, &versym))
> > > > +                               continue;
> > > > +                       ret->ver = versym & VERSYM_VERSION;
> > > > +                       ret->hidden = versym & VERSYM_HIDDEN;
> > > > +               }
> > > >                 return ret;
> > > >         }
> > > >
> > > >         return NULL;
> > > >  }
> > > >
> > > > +static const char *elf_get_vername(Elf *elf, int ver)
> > > > +{
> > > > +       GElf_Verdaux verdaux;
> > > > +       GElf_Verdef verdef;
> > > > +       Elf_Data *verdefs;
> > > > +       size_t strtabidx;
> > > > +       GElf_Shdr sh;
> > > > +       Elf_Scn *scn;
> > > > +       int offset;
> > > > +
> > > > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);
> > >
> > > so this is a linear search, right? And we'll be doing it for every
> > > .dynsym symbol. Let's do this once at the creation time and remember a
> > > pointer inside struct Elf?
> > >
> >
> > We reach here only when the symbol part match, and likely we get the
> > desired one.
>
> sure, but you can have multiple versions, so multiple hits of this.
> And the ELF itself could be pretty big with lots of sections. So I
> think we should try to minimize number of linear searches over ELF
> sections, if possible.
>
> > if we store the pointers in struct Elf, then we have to involve
> > dynamic allocations.
>
> I'm not sure why dynamic allocations are needed?
>

Your last comment says remember those version name pointer in struct Elf,
I thought this would be allocate an array and save those pointers
(i.e. store a mapping from version index to version name)

> But also I had struct elf_sym_iter in mind, where we already cache
> Elf_Data *versyms, so why not do that with verdefs as well? I think

verdef is not per symbol.

> all these helpers (symbol_match and elf_get_vername) are called only
> from elf_sym_iter stuff right now, right?
>
>
> >
> > > > +       if (!scn)
> > > > +               return NULL;
> > > > +       if (!gelf_getshdr(scn, &sh))
> > > > +               return NULL;
> > > > +       strtabidx = sh.sh_link;
> > > > +       verdefs =  elf_getdata(scn, 0);
> > > > +
> > > > +       offset = 0;
> > > > +       while (gelf_getverdef(verdefs, offset, &verdef)) {
> > > > +               if (verdef.vd_ndx != ver) {
> > > > +                       if (!verdef.vd_next)
> > > > +                               break;
> > > > +
> > > > +                       offset += verdef.vd_next;
> > > > +                       continue;
> > > > +               }
> > > > +
> > > > +               if (!gelf_getverdaux(verdefs, offset + verdef.vd_aux, &verdaux))
> > > > +                       break;
> > > > +
> > > > +               return elf_strptr(elf, strtabidx, verdaux.vda_name);
> > > > +
> > > > +       }
> > > > +       return NULL;
> > > > +}
> > > > +
> > > > +static bool symbol_match(Elf *elf, int sh_type, struct elf_sym *sym, const char *name)
> > > > +{
> > > > +       size_t name_len, sname_len;
> > > > +       bool is_name_qualified;
> > > > +       const char *ver;
> > > > +       char *sname;
> > > > +       int ret;
> > > > +
> > > > +       name_len = strlen(name);
> > > > +       /* Does name specify "@LIB" or "@@LIB" ? */
> > > > +       is_name_qualified = strstr(name, "@") != NULL;
> > > > +
> > > > +       /* If user specify a qualified name, for dynamic symbol,
> > > > +        * it is in form of func, NOT func@LIB_VER or func@@LIB_VER.
> > > > +        * So construct a full quailified symbol name using versym info
> > >
> > > gmail points out typo: qualified
> > >
> > > > +        * for comparison.
> > > > +        */
> > > > +       if (is_name_qualified && sh_type == SHT_DYNSYM) {
> > > > +               /* Make sure func match func@LIB_VER */
> > > > +               sname_len = strlen(sym->name);
> > > > +               if (strncmp(sym->name, name, sname_len) != 0)
> > > > +                       return false;
> > > > +
> > > > +               /* But not func2@LIB_VER */
> > > > +               if (name[sname_len] != '@')
> > > > +                       return false;
> > > > +
> > > > +               ver = elf_get_vername(elf, sym->ver);
> > > > +               if (!ver)
> > > > +                       return false;
> > > > +
> > > > +               ret = asprintf(&sname, "%s%s%s", sym->name,
> > > > +                              sym->hidden ? "@" : "@@", ver);
> > > > +               if (ret == -1)
> > >
> > > nit: ret < 0, I've spent enough time switching all users of libbpf to
> > > not rely on exact -1 return, let's not show a bad example ;)
> > >
> > > > +                       return false;
> > > > +
> > > > +               sname_len = ret;
> > > > +               ret = strncmp(sname, name, sname_len);
> > >
> > > why is this strncmp? shouldn't the match be exact? both name is
> > > version-qualified, and current ELF symbol is version-qualified. They
> > > have to exactly match, no?
> > >
> > > > +               free(sname);
> > > > +               return ret == 0;
> > > > +       }
> > > > +
> > > > +       /* Otherwise, for normal symbols or non-qualified names
> > > > +        * User can specify func, func@@LIB or func@@LIB_VERSION.
> > > > +        */
> > > > +       if (strncmp(sym->name, name, name_len) != 0)
> > > > +               return false;
> > > > +       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> > > > +        * additional characters in sname should be of the form "@LIB" or "@@LIB".
> > > > +        */
> > > > +       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> > > > +               return false;
> > > > +
> > > > +       return true;
> > > > +}
> > > >
> > > >  /* Transform symbol's virtual address (absolute for binaries and relative
> > > >   * for shared libs) into file offset, which is what kernel is expecting
> > > > @@ -166,9 +297,8 @@ static unsigned long elf_sym_offset(struct elf_sym *sym)
> > > >  long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> > > >  {
> > > >         int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB };
> > > > -       bool is_shared_lib, is_name_qualified;
> > > > +       bool is_shared_lib;
> > > >         long ret = -ENOENT;
> > > > -       size_t name_len;
> > > >         GElf_Ehdr ehdr;
> > > >
> > > >         if (!gelf_getehdr(elf, &ehdr)) {
> > > > @@ -179,10 +309,6 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> > > >         /* for shared lib case, we do not need to calculate relative offset */
> > > >         is_shared_lib = ehdr.e_type == ET_DYN;
> > > >
> > > > -       name_len = strlen(name);
> > > > -       /* Does name specify "@@LIB"? */
> > > > -       is_name_qualified = strstr(name, "@@") != NULL;
> > > > -
> > > >         /* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if
> > > >          * a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically
> > > >          * linked binary may not have SHT_DYMSYM, so absence of a section should not be
> > > > @@ -201,13 +327,7 @@ long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name)
> > > >                         goto out;
> > > >
> > > >                 while ((sym = elf_sym_iter_next(&iter))) {
> > > > -                       /* User can specify func, func@@LIB or func@@LIB_VERSION. */
> > > > -                       if (strncmp(sym->name, name, name_len) != 0)
> > > > -                               continue;
> > > > -                       /* ...but we don't want a search for "foo" to match 'foo2" also, so any
> > > > -                        * additional characters in sname should be of the form "@@LIB".
> > > > -                        */
> > > > -                       if (!is_name_qualified && sym->name[name_len] != '\0' && sym->name[name_len] != '@')
> > > > +                       if (!symbol_match(elf, sh_types[i], sym, name))
> > >
> > > ok, so let's consider what we are doing here. While previously we did
> > > a relatively expensive strstr() operation once, now we are doing it
> > > for every symbol in ELF. This might add up.
> > >
> > > Plus, we then do dynamic allocations with asprintf, which also is kind
> > > of unfortunate.
> > >
> > > But let's take a step back. Why we don't determine if the name is
> > > qualified once. Remember what is the length of unqualified name, where
> > > does the version part starts, and pass all that to symbol_match in a
> > > prepared form. Then we don't need to construct "fully qualified" form
> > > of an ELF symbol. We can compare unqual name and version name
> > > separately.
> > >
> > > No allocation, no wasted work.
> > >
> > > Not sure if we need to care whether we had "@" or "@@" in the
> > > requested symbol, but that's a detail.
> > >
> >
> > Sounds good, will do.
> >
> > > >                                 continue;
> > > >
> > > >                         cur_bind = GELF_ST_BIND(sym->sym.st_info);
> > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > index 96ff1aa4bf6a..30b8f96820a7 100644
> > > > --- a/tools/lib/bpf/libbpf.c
> > > > +++ b/tools/lib/bpf/libbpf.c
> > > > @@ -11512,7 +11512,7 @@ static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf
> > > >
> > > >         *link = NULL;
> > > >
> > > > -       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.]+%li",
> > > > +       n = sscanf(prog->sec_name, "%m[^/]/%m[^:]:%m[a-zA-Z0-9_.@]+%li",
> > >
> > > BTW, while you are at it. Arnaldo was trying to use this SEC("uprobe")
> > > stuff for tracing Go functions. Go doesn't seem to do any mangling, so
> > > function names can have lots of "interesting" symbols ([], @, etc).
> > >
> >
> > Go symbols are complicated, I will try in another patch.
>
> indeed, thanks!
>
> >
> > > If you get a chance, would you mind updating this partsing logic to be
> > > able to accommodate such crazy function names as well? Thanks!
> > >
> > > >                    &probe_type, &binary_path, &func_name, &offset);
> > > >         switch (n) {
> > > >         case 1:
> > > > --
> > > > 2.34.1
> > > >
> >
> > Cheers,
> > ---
> > Hengqi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning for uprobe
  2023-09-15  7:30         ` Hengqi Chen
@ 2023-09-15 20:20           ` Andrii Nakryiko
  0 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2023-09-15 20:20 UTC (permalink / raw)
  To: Hengqi Chen; +Cc: bpf, ast, daniel, andrii, alan.maguire, olsajiri, Jiri Olsa

On Fri, Sep 15, 2023 at 12:30 AM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
> On Fri, Sep 15, 2023 at 1:12 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Sep 14, 2023 at 5:37 AM Hengqi Chen <hengqi.chen@gmail.com> wrote:
> > >
> > > On Wed, Sep 13, 2023 at 7:14 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Sun, Sep 10, 2023 at 6:51 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
> > > > >
> > > > > In current implementation, we assume that symbol found in .dynsym section
> > > > > would have a version suffix and use it to compare with symbol user supplied.
> > > > > According to the spec ([0]), this assumption is incorrect, the version info
> > > > > of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
> > > > > of ELF objects. For example:
> > > > >
> > > > >     $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> > > > >     000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
> > > > >     000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
> > > > >     000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5
> > > > >
> > > > >     $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
> > > > >       706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
> > > > >       2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
> > > > >       2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5
> > > > >
> > > > > In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
> > > > > pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
> > > > > Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
> > > > > version suffix) in .dynsym sections.
> > > > >
> > > > > This commit implements the symbol versioning for dynsym and allows user to
> > > > > specify symbol in the following forms:
> > > > >   - func
> > > > >   - func@LIB_VERSION
> > > > >   - func@@LIB_VERSION
> > > > >
> > > > > In case of symbol conflicts, error out and users should resolve it by
> > > > > specifying a qualified name.
> > > > >
> > > > >   [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
> > > > >
> > > > > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > > > > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > > > > Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
> > > > > ---
> > > > >  tools/lib/bpf/elf.c    | 146 +++++++++++++++++++++++++++++++++++++----
> > > > >  tools/lib/bpf/libbpf.c |   2 +-
> > > > >  2 files changed, 134 insertions(+), 14 deletions(-)
> > > > >
> > > > > diff --git a/tools/lib/bpf/elf.c b/tools/lib/bpf/elf.c
> > > > > index 5c9e588b17da..825da903a34c 100644
> > > > > --- a/tools/lib/bpf/elf.c
> > > > > +++ b/tools/lib/bpf/elf.c
> > > > > @@ -1,5 +1,8 @@
> > > > >  // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > > > >
> > > > > +#ifndef _GNU_SOURCE
> > > > > +#define _GNU_SOURCE
> > > > > +#endif
> > > > >  #include <libelf.h>
> > > > >  #include <gelf.h>
> > > > >  #include <fcntl.h>
> > > > > @@ -10,6 +13,17 @@
> > > > >
> > > > >  #define STRERR_BUFSIZE  128
> > > > >
> > > > > +/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
> > > > > + * the symbol is hidden and can only be seen when referenced using an
> > > > > + * explicit version number. This is a GNU extension.
> > > > > + */
> > > > > +#define VERSYM_HIDDEN  0x8000
> > > > > +
> > > > > +/* This is the mask for the rest of the data in a word read from a
> > > > > + * SHT_GNU_versym section.
> > > > > + */
> > > > > +#define VERSYM_VERSION 0x7fff
> > > > > +
> > > > >  int elf_open(const char *binary_path, struct elf_fd *elf_fd)
> > > > >  {
> > > > >         char errmsg[STRERR_BUFSIZE];
> > > > > @@ -64,11 +78,14 @@ struct elf_sym {
> > > > >         const char *name;
> > > > >         GElf_Sym sym;
> > > > >         GElf_Shdr sh;
> > > > > +       int ver;
> > > > > +       bool hidden;
> > > > >  };
> > > > >
> > > > >  struct elf_sym_iter {
> > > > >         Elf *elf;
> > > > >         Elf_Data *syms;
> > > > > +       Elf_Data *versyms;
> > > > >         size_t nr_syms;
> > > > >         size_t strtabidx;
> > > > >         size_t next_sym_idx;
> > > > > @@ -111,6 +128,18 @@ static int elf_sym_iter_new(struct elf_sym_iter *iter,
> > > > >         iter->nr_syms = iter->syms->d_size / sh.sh_entsize;
> > > > >         iter->elf = elf;
> > > > >         iter->st_type = st_type;
> > > > > +
> > > > > +       /* Version symbol table is meaningful to dynsym only */
> > > > > +       if (sh_type != SHT_DYNSYM)
> > > > > +               return 0;
> > > > > +
> > > > > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_versym, NULL);
> > > > > +       if (!scn)
> > > > > +               return 0;
> > > > > +       if (!gelf_getshdr(scn, &sh))
> > > > > +               return -EINVAL;
> > > > > +       iter->versyms = elf_getdata(scn, 0);
> > > > > +
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > @@ -119,6 +148,7 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> > > > >         struct elf_sym *ret = &iter->sym;
> > > > >         GElf_Sym *sym = &ret->sym;
> > > > >         const char *name = NULL;
> > > > > +       GElf_Versym versym;
> > > > >         Elf_Scn *sym_scn;
> > > > >         size_t idx;
> > > > >
> > > > > @@ -138,12 +168,113 @@ static struct elf_sym *elf_sym_iter_next(struct elf_sym_iter *iter)
> > > > >
> > > > >                 iter->next_sym_idx = idx + 1;
> > > > >                 ret->name = name;
> > > > > +               ret->ver = 0;
> > > > > +               ret->hidden = false;
> > > > > +
> > > > > +               if (iter->versyms) {
> > > > > +                       if (!gelf_getversym(iter->versyms, idx, &versym))
> > > > > +                               continue;
> > > > > +                       ret->ver = versym & VERSYM_VERSION;
> > > > > +                       ret->hidden = versym & VERSYM_HIDDEN;
> > > > > +               }
> > > > >                 return ret;
> > > > >         }
> > > > >
> > > > >         return NULL;
> > > > >  }
> > > > >
> > > > > +static const char *elf_get_vername(Elf *elf, int ver)
> > > > > +{
> > > > > +       GElf_Verdaux verdaux;
> > > > > +       GElf_Verdef verdef;
> > > > > +       Elf_Data *verdefs;
> > > > > +       size_t strtabidx;
> > > > > +       GElf_Shdr sh;
> > > > > +       Elf_Scn *scn;
> > > > > +       int offset;
> > > > > +
> > > > > +       scn = elf_find_next_scn_by_type(elf, SHT_GNU_verdef, NULL);
> > > >
> > > > so this is a linear search, right? And we'll be doing it for every
> > > > .dynsym symbol. Let's do this once at the creation time and remember a
> > > > pointer inside struct Elf?
> > > >
> > >
> > > We reach here only when the symbol part match, and likely we get the
> > > desired one.
> >
> > sure, but you can have multiple versions, so multiple hits of this.
> > And the ELF itself could be pretty big with lots of sections. So I
> > think we should try to minimize number of linear searches over ELF
> > sections, if possible.
> >
> > > if we store the pointers in struct Elf, then we have to involve
> > > dynamic allocations.
> >
> > I'm not sure why dynamic allocations are needed?
> >
>
> Your last comment says remember those version name pointer in struct Elf,
> I thought this would be allocate an array and save those pointers
> (i.e. store a mapping from version index to version name)
>
> > But also I had struct elf_sym_iter in mind, where we already cache
> > Elf_Data *versyms, so why not do that with verdefs as well? I think
>
> verdef is not per symbol.

I meant to just not do elf_find_next_scn_by_type() search every time,
and just store Elf_Scn * pointer for SHT_GNU_verdef in iter state.


>
> > all these helpers (symbol_match and elf_get_vername) are called only
> > from elf_sym_iter stuff right now, right?
> >
> >
> > >
> > > > > +       if (!scn)
> > > > > +               return NULL;
> > > > > +       if (!gelf_getshdr(scn, &sh))
> > > > > +               return NULL;
> > > > > +       strtabidx = sh.sh_link;
> > > > > +       verdefs =  elf_getdata(scn, 0);
> > > > > +
> > > > > +       offset = 0;
> > > > > +       while (gelf_getverdef(verdefs, offset, &verdef)) {
> > > > > +               if (verdef.vd_ndx != ver) {
> > > > > +                       if (!verdef.vd_next)
> > > > > +                               break;
> > > > > +
> > > > > +                       offset += verdef.vd_next;
> > > > > +                       continue;
> > > > > +               }
> > > > > +
> > > > > +               if (!gelf_getverdaux(verdefs, offset + verdef.vd_aux, &verdaux))
> > > > > +                       break;
> > > > > +
> > > > > +               return elf_strptr(elf, strtabidx, verdaux.vda_name);
> > > > > +
> > > > > +       }
> > > > > +       return NULL;
> > > > > +}

[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-15 20:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11  1:50 [PATCH bpf-next v3 0/3] libbpf: Support symbol versioning for uprobe Hengqi Chen
2023-09-11  1:50 ` [PATCH bpf-next v3 1/3] libbpf: Resolve symbol conflicts at the same offset " Hengqi Chen
2023-09-11  1:50 ` [PATCH bpf-next v3 2/3] libbpf: Support symbol versioning " Hengqi Chen
2023-09-12 23:14   ` Andrii Nakryiko
2023-09-14 12:36     ` Hengqi Chen
2023-09-14 17:12       ` Andrii Nakryiko
2023-09-15  7:30         ` Hengqi Chen
2023-09-15 20:20           ` Andrii Nakryiko
2023-09-11  1:50 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add tests for " Hengqi Chen
2023-09-12 23:19   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).