From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9601C433EF for ; Tue, 5 Oct 2021 21:47:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B2D9611C0 for ; Tue, 5 Oct 2021 21:47:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231167AbhJEVti (ORCPT ); Tue, 5 Oct 2021 17:49:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229974AbhJEVti (ORCPT ); Tue, 5 Oct 2021 17:49:38 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E4DDC061749 for ; Tue, 5 Oct 2021 14:47:47 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id qe4-20020a17090b4f8400b0019f663cfcd1so3027635pjb.1 for ; Tue, 05 Oct 2021 14:47:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/etc7QxniQYv7eDuU6uUoYhn3U/dXvnFGVTlbzcgoc4=; b=j/VmuLRl3apJ4GW7p+t9nfV+YHepaSXzOtJ4tdS+B9Uq6F2wU/99wF58qVQlTi0EKk 0p4K89fydqnmFQ8AsQ3KCdr/sbbVPTQvUkpdOU331FrUhgYnWs+Y+aqhwYNNeL292N9w WrWDaRJp9Lb2hd2tSPtjsqnsG/aSZ8poYGriqUPVlGrQ5lauJU81oAkYa0y+W6rB/27q fxRqi41YK3kVgzZyMYqQkK3qWU7dVcKFKZ2LYOmLOiEYjoI7fFtvhUw7TBrYmEwVgnzt pFHlAZvkEMEjgwQ/e8HEIf4PcNAQZsNMdMpp9F3dg0u0mjSy3U2OBNBsXAYKq8BZeWBA JdAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/etc7QxniQYv7eDuU6uUoYhn3U/dXvnFGVTlbzcgoc4=; b=vrbJy29z4Ho80uxOC7KY1ZZevpOL4LtC1iwQd54h9FNgLuqDNLrVB6rRpmTJl+vSWM fXyCRk6prf157/mJ70mHpdjra3p3sJccYYD2GmA4lQUclQhrGmSRqkGk4ucZqlsSsYOL 3B2R/L2gT42HPwZCwkP0KCmtfzVNj/DuW/j1tgUyD0QUnMjXfH7zKGy2+Boj+5boq0T4 tDnubpMxXXmSl+9IOCTkYqRicxteQLWYWTLW6gG+IoUTCBUMpRRkQGlyMfx8xOsI8DJi M4B2xB4ukBPoZyJUkf9+ggiQquTiUXoW/M1kFCAt3LP2fwNlc8YZiml/tiYa4fN9AkpU iAAg== X-Gm-Message-State: AOAM532yqurZj2rn0QN0nHfBiOjYiGGu+lyfKmWAvKVkf5Z5jKimFmLf dlWtGYrJCUQ4m+Jn55XmUct27mnw3o/rLX3Dw4T6UA== X-Google-Smtp-Source: ABdhPJzFFMQ4ylyxJyeZ9wnoMN9Jr2qifxim0wdXZA0QnewMj+3msBKPQotwNLCpBNu3GMgVx4KBvPAdo1zgQ1sXkcA= X-Received: by 2002:a17:90a:c982:: with SMTP id w2mr6498747pjt.30.1633470466459; Tue, 05 Oct 2021 14:47:46 -0700 (PDT) MIME-Version: 1.0 References: <20210929235910.1765396-1-jevburton.kernel@gmail.com> <20211005051306.4zbdqo3rnecj3hyv@ast-mbp> In-Reply-To: <20211005051306.4zbdqo3rnecj3hyv@ast-mbp> From: Joe Burton Date: Tue, 5 Oct 2021 14:47:34 -0700 Message-ID: Subject: Re: [RFC PATCH v2 00/13] Introduce BPF map tracing capability To: Alexei Starovoitov Cc: Joe Burton , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Petar Penkov , Stanislav Fomichev , Hao Luo , netdev@vger.kernel.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org > It's a neat idea to user verifier powers for this job, > but I wonder why simple tracepoint in map ops was not used instead? My concern with tracepoints is that they execute for all map updates, not for a particular map. Ideally performing an upgrade of program X should not affect the performance characteristics of program Y. If n programs are opted into this model, then upgrading any of them affects the performance characteristics of every other. There's also the (very remote) possibility of multiple simultaneous upgrades tracing map updates at the same time, causing a greater performance hit. > I don't think the "solution" for lookup operation is worth pursuing. > The bpf prog that needs this map tracing is completely in your control. > So just don't do writes after lookup. I eventually want to support apps that use local storage. Those APIs generally only allow updates via a pointer. E.g. bpf_sk_storage_get() only allows updates via the returned pointer and via bpf_sk_storage_delete(). Since I eventually have to solve this problem to handle local storage, then it seems worth solving it for normal maps as well. They seem like isomorphic problems. On Mon, Oct 4, 2021 at 10:13 PM Alexei Starovoitov wrote: > > On Wed, Sep 29, 2021 at 11:58:57PM +0000, Joe Burton wrote: > > From: Joe Burton > > > > This patch introduces 'map tracing': the capability to execute a > > tracing program after updating a map. > > > > Map tracing enables upgrades of stateful programs with fewer race > > conditions than otherwise possible. We use a tracing program to > > imbue a map with copy-on-write semantics, then use an iterator to > > perform a bulk copy of data in the map. After bulk copying concludes, > > updates to that map automatically propagate via the tracing > > program, avoiding a class of race conditions. This use case is > > demonstrated in the new 'real_world_example' selftest. > > > > Extend BPF_PROG_TYPE_TRACING with a new attach type, BPF_TRACE_MAP, > > and allow linking these programs to arbitrary maps. > > > > Extend the verifier to invoke helper calls directly after > > bpf_map_update_elem() and bpf_map_delete_elem(). The helpers have the > > exact same signature as the functions they trace, and simply pass those > > arguments to the list of tracing programs attached to the map. > > It's a neat idea to user verifier powers for this job, > but I wonder why simple tracepoint in map ops was not used instead? > With BTF the bpf progs see the actual types of raw tracepoints. > If tracepoint has map, key, value pointers the prog will be able > to access them in read-only mode. > Such map pointer will be PTR_TO_BTF_ID, so the prog won't be able > to recursively do lookup/update on this map pointer, > but that's what you need anyway, right? > If not we can extend this part of the tracepoint/verifier. > > Instead of tracepoint it could have been an empty noinline function > and fentry/fexit would see all arguments as well. > > > One open question is how to handle pointer-based map updates. For > > example: > > int *x = bpf_map_lookup_elem(...); > > if (...) *x++; > > if (...) *x--; > > We can't just call a helper function right after the > > bpf_map_lookup_elem(), since the updates occur later on. We also can't > > determine where the last modification to the pointer occurs, due to > > branch instructions. I would therefore consider a pattern where we > > 'flush' pointers at the end of a BPF program: > > int *x = bpf_map_lookup_elem(...); > > ... > > /* Execute tracing programs for this cell in this map. */ > > bpf_map_trace_pointer_update(x); > > return 0; > > We can't necessarily do this in the verifier, since 'x' may no > > longer be in a register or on the stack. Thus we might introduce a > > helper to save pointers that should be flushed, then flush all > > registered pointers at every exit point: > > int *x = bpf_map_lookup_elem(...); > > /* > > * Saves 'x' somewhere in kernel memory. Does nothing if no > > * corresponding tracing progs are attached to the map. > > */ > > bpf_map_trace_register_pointer(x); > > ... > > /* flush all registered pointers */ > > bpf_map_trace_pointer_update(); > > return 0; > > This should be easy to implement in the verifier. > > I don't think the "solution" for lookup operation is worth pursuing. > The bpf prog that needs this map tracing is completely in your control. > So just don't do writes after lookup. > > > In addition, we use the verifier to instrument certain map update > > calls. This requires saving arguments onto the stack, which means that > > a program using MAX_BPF_STACK bytes of stack could exceed the limit. > > I don't know whether this actually causes any problems. > > Extra 8*4 bytes of stack is not a deal breaker.