From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3D44C43334 for ; Thu, 7 Jul 2022 19:36:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234551AbiGGTgr (ORCPT ); Thu, 7 Jul 2022 15:36:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231875AbiGGTgr (ORCPT ); Thu, 7 Jul 2022 15:36:47 -0400 Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 443972A70A for ; Thu, 7 Jul 2022 12:36:46 -0700 (PDT) Received: by mail-ej1-x636.google.com with SMTP id o25so34188911ejm.3 for ; Thu, 07 Jul 2022 12:36:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9WGrWtDudmv3y15aDeYlBeLs1U2H51/HG+l4ujW/bI4=; b=Wp+twJZIFxw4lD9AUs07nF/Tnp+OYvq6MlJbvHsF90yWUgXRMUia49oj3tO9tb37R1 6aAzujmihqwyx0lQOcZ4LURpqP3IswDKeqpJ2E9+4vnCMxdAs+Y2t6wtrKTSskD8DiYe UFh+GoF9EMel4FzVgQXrbWdGoryiNcH050oAqe8qaYTiJZVdRLqLlFDHOXm1bb+1khYV 5FtW0zQB6phgxJ+A4g20HMnSYdPcfLFwXH4XeSeIOKdq/toDjnAHBTT5Pumdz+83hFDy FVgle5IflAw1xUQrmez9HInr+i8UcV+7Ilz1HBLm6xY615snnKcgoTQ3EvDsTVDiul50 DnIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9WGrWtDudmv3y15aDeYlBeLs1U2H51/HG+l4ujW/bI4=; b=a4XkQR8g6bglaoc+213myGPLS9HPoiNF2n8sOKOkCGnDFlrwDu3Tu5Mmzf0PbsA7I8 emNrmbyOOQUCxr6Lv5LJeQhJl/LUtp69FiDZdvJiOPZu9BuNBLqwfeotcHuOLZ6/pmxC vNucEr19WRkSNVh8RzdRA5Qgls1acjc7KFj+io3gQmOE3AKT4DTcqCIewScReVKxDpfE 1+zPydeeFjhFo0dwwIoHxe+Q794Vy2CZ32d1lmZxfBgsZ2w4CKrXaSTnxZVu/jpugsCy P2scSTbqBbQtC0Nf36V8uVBEMHJdM+W9j8kvKXlSjnm8NSQOuU5wUtzNI+BV99RFGF4z +Amg== X-Gm-Message-State: AJIora8GrnOilm95zK5m7JEVqp1aon/3HtmjC6ernRtvQSxtOIZFazoH jkoKLCAklR+jZ/BHJfQW6kirUf6APFg0Y03ijbEKqDSP X-Google-Smtp-Source: AGRyM1s47iWn+rgO7FvAyW7X91e9ZLAQXF3LPY5IceT0VDObQ7p49ir8VR4437GLK44qVv14r/SkqJPLR/IDpr7uDUI= X-Received: by 2002:a17:906:8444:b0:72a:7dda:5d71 with SMTP id e4-20020a170906844400b0072a7dda5d71mr36191560ejy.94.1657222604683; Thu, 07 Jul 2022 12:36:44 -0700 (PDT) MIME-Version: 1.0 References: <20220707004118.298323-1-andrii@kernel.org> <20220707004118.298323-3-andrii@kernel.org> In-Reply-To: From: Alexei Starovoitov Date: Thu, 7 Jul 2022 12:36:33 -0700 Message-ID: Subject: Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes To: Andrii Nakryiko Cc: Andrii Nakryiko , bpf , Alexei Starovoitov , Daniel Borkmann , Kernel Team , Ilya Leoshkevich , Kenta Tada , Hengqi Chen Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Thu, Jul 7, 2022 at 12:10 PM Andrii Nakryiko wrote: > > On Thu, Jul 7, 2022 at 10:23 AM Alexei Starovoitov > wrote: > > > > On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko wrote: > > > > > > Add SEC("ksyscall")/SEC("ksyscall/") and corresponding > > > kretsyscall variants (for return kprobes) to allow users to kprobe > > > syscall functions in kernel. These special sections allow to ignore > > > complexities and differences between kernel versions and host > > > architectures when it comes to syscall wrapper and corresponding > > > ___sys_ vs __se_sys_ differences, depending on > > > CONFIG_ARCH_HAS_SYSCALL_WRAPPER. > > > > > > Combined with the use of BPF_KSYSCALL() macro, this allows to just > > > specify intended syscall name and expected input arguments and leave > > > dealing with all the variations to libbpf. > > > > > > In addition to SEC("ksyscall+") and SEC("kretsyscall+") add > > > bpf_program__attach_ksyscall() API which allows to specify syscall name > > > at runtime and provide associated BPF cookie value. > > > > > > Signed-off-by: Andrii Nakryiko > > > --- > > > tools/lib/bpf/libbpf.c | 109 ++++++++++++++++++++++++++++++++ > > > tools/lib/bpf/libbpf.h | 16 +++++ > > > tools/lib/bpf/libbpf.map | 1 + > > > tools/lib/bpf/libbpf_internal.h | 2 + > > > 4 files changed, 128 insertions(+) > > > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > > index cb49408eb298..4749fb84e33d 100644 > > > --- a/tools/lib/bpf/libbpf.c > > > +++ b/tools/lib/bpf/libbpf.c > > > @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void) > > > strs, sizeof(strs))); > > > } > > > > > > +static const char *arch_specific_syscall_pfx(void) > > > +{ > > > +#if defined(__x86_64__) > > > + return "x64"; > > > +#elif defined(__i386__) > > > + return "ia32"; > > > +#elif defined(__s390x__) > > > + return "s390x"; > > > +#elif defined(__s390__) > > > + return "s390"; > > > +#elif defined(__arm__) > > > + return "arm"; > > > +#elif defined(__aarch64__) > > > + return "arm64"; > > > +#elif defined(__mips__) > > > + return "mips"; > > > +#elif defined(__riscv) > > > + return "riscv"; > > > +#else > > > + return NULL; > > > +#endif > > > +} > > > + > > > +static int probe_kern_syscall_wrapper(void) > > > +{ > > > + /* available_filter_functions is a few times smaller than > > > + * /proc/kallsyms and has simpler format, so we use it as a faster way > > > + * to check that ___sys_bpf symbol exists, which is a sign that > > > + * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses > > > + * syscall wrappers > > > + */ > > > + static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions"; > > > + char func_name[128], syscall_name[128]; > > > + const char *ksys_pfx; > > > + FILE *f; > > > + int cnt; > > > + > > > + ksys_pfx = arch_specific_syscall_pfx(); > > > + if (!ksys_pfx) > > > + return 0; > > > + > > > + f = fopen(kprobes_file, "r"); > > > + if (!f) > > > + return 0; > > > + > > > + snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx); > > > + > > > + /* check if bpf() syscall wrapper is listed as possible kprobe */ > > > + while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) { > > > + if (strcmp(func_name, syscall_name) == 0) { > > > + fclose(f); > > > + return 1; > > > + } > > > + } > > > > Maybe we should do the other way around ? > > cat /proc/kallsyms |grep sys_bpf > > > > and figure out the prefix from there? > > Then we won't need to do giant > > #if defined(__x86_64__) > > ... > > > > Unfortunately this won't work well due to compat and 32-bit APIs (and > bpf() syscall is particularly bad with also bpf_sys_bpf): > > $ sudo cat /proc/kallsyms| rg '_sys_bpf$' > ffffffff811cb100 t __sys_bpf > ffffffff811cd380 T bpf_sys_bpf > ffffffff811cd520 T __x64_sys_bpf > ffffffff811cd540 T __ia32_sys_bpf > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf > ffffffff83abc400 t _eil_addr___ia32_sys_bpf > ffffffff83abc410 t _eil_addr___x64_sys_bpf That actually means that the current and proposed approaches are both somewhat wrong, since they don't attach to both. Meaning all syscalls done by 32-bit userspace will not be seen by bpf prog. Probably libbpf should attach to both: __x64_sys_bpf and __ia32_sys_bpf. __ksym and __kstr are easy to filter out, since they are standard prefixes. No idea what eil_addr is. > $ sudo cat /proc/kallsyms| rg '_sys_mmap$' > ffffffff81024480 T __x64_sys_mmap > ffffffff810244c0 T __ia32_sys_mmap > ffffffff83abae30 t _eil_addr___ia32_sys_mmap > ffffffff83abae40 t _eil_addr___x64_sys_mmap > > We have similar arch-specific switches in few other places (USDT and > lib path detection, for example), so it's not a new precedent (for > better or worse). > > > > /proc/kallsyms has world read permissions: > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops); > > unlike available_filter_functions. > > > > Also tracefs might be mounted in a different dir than > > /sys/kernel/tracing/ > > like > > /sys/kernel/debug/tracing/ > > Yeah, good point, was trying to avoid parsing more expensive kallsyms, > but given it's done once, it might not be a big deal. Soon we'll have an iterator for them so doing in-kernel search of sys_bpf would be fast.