From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDCCAC433DF for ; Sat, 27 Jun 2020 00:06:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B9C3B2084C for ; Sat, 27 Jun 2020 00:06:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="febIOE8b" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726463AbgF0AGh (ORCPT ); Fri, 26 Jun 2020 20:06:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726086AbgF0AGg (ORCPT ); Fri, 26 Jun 2020 20:06:36 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FC77C03E979; Fri, 26 Jun 2020 17:06:36 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id u17so8862412qtq.1; Fri, 26 Jun 2020 17:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=udBllLsYA6lB00cOOJeSI526QRhFyN1UuKw5hU7KkHo=; b=febIOE8be8srwpirNRekGtuSASef3Zu8G9ORyVKjmYgcKvf3IAu/EwRI7QJA5s9B/N 9868iMPjM0OFZaxWhKExEh2ykodEtbpFhCfvumLlH+hRxklCxE8AKH4yhK/gAlwnvMWQ kkMhttGQYvOcdHYJis2B87ZT2ajqVkhYVyxFB1GyNfyedwKHVPZT4uWQsNZpAnOfQ5Yc 0zEmUg5QTLwt1TzOPN+WQTqftA+O8mmi7bczGKtOVT9j9A1U7keM7z9PO/RMzAXwvePf b6ZAf+nMvtrtc6lawua5zQLI6kDnEbPrm4ahPvf18ImFrhyqhymGPsYCIwi/gKMwFyiX a+lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=udBllLsYA6lB00cOOJeSI526QRhFyN1UuKw5hU7KkHo=; b=Mxx+SR86CYUPIg5FSiWSE2csxDHwrXoph9uW6en6XAmNlKH1ajvPnx1BwMQdYvstfE 0I2mtYWqoLtmldPBkihPrEBMswmNS9yZb8T9RMpyH0X/P/MWLU2nrUONIaWSKoi7k4yh HnMGCDKDQnLV23Gf/XUWPqdvncHPBIzJO+lQp2L8Fiel4XlALGZwYy4+vV4sqOHRrIiQ PtMG3I6bpn7ILseOzTVnErxyByaEnIpDjhvHH7cqwaIB6BCfTbDEO+4sgMOJIlXV5I2P KB2nJBlsCId1BmtkRgGyeisJ4kPphaXwM5pYENfxToaMEI2GjJUGOhtctJu42FtZSOHs yLCw== X-Gm-Message-State: AOAM5302MaKWJ8S64nRz55RKec2vrenyO9Jzx6sG0ESKm49CJZtIlUue +M4JK7kkPVbpKIw6ZuUEbyy2Y5WWnIPaOh1WG6Y= X-Google-Smtp-Source: ABdhPJzHtkpQ4mcoJjnIF7AdtA+0kYcRkb2AhTvezqXZVa2Kg9FixR+Q8uqSgoJy7UVMbQkcs3AaFOB12M+mSWc7VA4= X-Received: by 2002:ac8:1991:: with SMTP id u17mr5271867qtj.93.1593216395389; Fri, 26 Jun 2020 17:06:35 -0700 (PDT) MIME-Version: 1.0 References: <20200626001332.1554603-1-songliubraving@fb.com> <20200626001332.1554603-3-songliubraving@fb.com> In-Reply-To: From: Andrii Nakryiko Date: Fri, 26 Jun 2020 17:06:24 -0700 Message-ID: Subject: Re: [PATCH v2 bpf-next 2/4] bpf: introduce helper bpf_get_task_stak() To: Song Liu Cc: bpf , Networking , open list , Peter Ziljstra , Alexei Starovoitov , Daniel Borkmann , Kernel Team , john fastabend , KP Singh Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 26, 2020 at 4:47 PM Song Liu wrote: > > > > > On Jun 26, 2020, at 3:51 PM, Andrii Nakryiko wrote: > > > > On Fri, Jun 26, 2020 at 3:45 PM Song Liu wrote: > >> > >> > >> > >>> On Jun 26, 2020, at 1:17 PM, Andrii Nakryiko wrote: > >>> > >>> On Thu, Jun 25, 2020 at 5:14 PM Song Liu wrote: > >>>> > >>>> Introduce helper bpf_get_task_stack(), which dumps stack trace of given > >>>> task. This is different to bpf_get_stack(), which gets stack track of > >>>> current task. One potential use case of bpf_get_task_stack() is to call > >>>> it from bpf_iter__task and dump all /proc//stack to a seq_file. > >>>> > >>>> bpf_get_task_stack() uses stack_trace_save_tsk() instead of > >>>> get_perf_callchain() for kernel stack. The benefit of this choice is that > >>>> stack_trace_save_tsk() doesn't require changes in arch/. The downside of > >>>> using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the > >>>> stack trace to unsigned long array. For 32-bit systems, we need to > >>>> translate it to u64 array. > >>>> > >>>> Signed-off-by: Song Liu > >>>> --- > >>> > >>> Looks great, I just think that there are cases where user doesn't > >>> necessarily has valid task_struct pointer, just pid, so would be nice > >>> to not artificially restrict such cases by having extra helper. > >>> > >>> Acked-by: Andrii Nakryiko > >> > >> Thanks! > >> > >>> > >>>> include/linux/bpf.h | 1 + > >>>> include/uapi/linux/bpf.h | 35 ++++++++++++++- > >>>> kernel/bpf/stackmap.c | 79 ++++++++++++++++++++++++++++++++-- > >>>> kernel/trace/bpf_trace.c | 2 + > >>>> scripts/bpf_helpers_doc.py | 2 + > >>>> tools/include/uapi/linux/bpf.h | 35 ++++++++++++++- > >>>> 6 files changed, 149 insertions(+), 5 deletions(-) > >>>> > >>> > >>> [...] > >>> > >>>> + /* stack_trace_save_tsk() works on unsigned long array, while > >>>> + * perf_callchain_entry uses u64 array. For 32-bit systems, it is > >>>> + * necessary to fix this mismatch. > >>>> + */ > >>>> + if (__BITS_PER_LONG != 64) { > >>>> + unsigned long *from = (unsigned long *) entry->ip; > >>>> + u64 *to = entry->ip; > >>>> + int i; > >>>> + > >>>> + /* copy data from the end to avoid using extra buffer */ > >>>> + for (i = entry->nr - 1; i >= (int)init_nr; i--) > >>>> + to[i] = (u64)(from[i]); > >>> > >>> doing this forward would be just fine as well, no? First iteration > >>> will cast and overwrite low 32-bits, all the subsequent iterations > >>> won't even overlap. > >> > >> I think first iteration will write zeros to higher 32 bits, no? > > > > Oh, wait, I completely misread what this is doing. It up-converts from > > 32-bit to 64-bit, sorry. Yeah, ignore me on this :) > > > > But then I have another question. How do you know that entry->ip has > > enough space to keep the same number of 2x bigger entries? > > The buffer is sized for sysctl_perf_event_max_stack u64 numbers. > stack_trace_save_tsk() will put at most stack_trace_save_tsk unsigned > long in it (init_nr == 0). So the buffer is big enough. > Awesome, thanks for clarification! > Thanks, > Song