From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D264C282CE for ; Tue, 4 Jun 2019 00:56:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 64A1125A6B for ; Tue, 4 Jun 2019 00:56:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QeUWE/YP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726660AbfFDA4X (ORCPT ); Mon, 3 Jun 2019 20:56:23 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:33806 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726163AbfFDA4X (ORCPT ); Mon, 3 Jun 2019 20:56:23 -0400 Received: by mail-lf1-f65.google.com with SMTP id y198so4553441lfa.1; Mon, 03 Jun 2019 17:56:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=l2Xqb/zhKYuTNu3NSX5P1KmcSJ/8sKaklBWHuV1TgTo=; b=QeUWE/YPDKUBawuFhZPXunbsCorXpNezZpjgJKlHQYPTFrVhjLVJVZBj2iobtbNdzd TQ03PHtx8mFhnVsNZT74RoNsAuVwHyIaJZWGiRjnBoHUJlt6tyjDue95PE0OQ4I9uirH gWbbayLf2z4ku1F4nKjR5rT9jQRXZhIUb6m/BNiurBm432kccJzs4++KG0KhOXS+gBGp ux2NP2jKyEY1gfjI+35NScQvmlHz1gx9eoLV35H0PDSqlgZSKWRS15fqJJ3O+F0R9scx ZzpIzM13/sLIUJG5+KoAVed5dD6aYoaNUxj1Ar2iVai4JAvWymDFcyy484f7UhTXWPSz SfYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=l2Xqb/zhKYuTNu3NSX5P1KmcSJ/8sKaklBWHuV1TgTo=; b=FBy+VeHbfYehF1zP+KyZUIbYh2Ui8suzi6XNizshavsTexztiNoiXhNtxE5E/PB5Rs N9MVV6cMU9BrlXof4PIi3gpq+7ACPVwcexccVvZpakPre3Ah1sDMKwDvyWvIY9/99CME 9yhMSwquf4fItzXTchJygnChiQ/hxT1euPLTWtk6HmytqVqWQ+xOt30RS9LIU2+nJV9e M149VNOE8nQsg0JI4jvxkNHXwrmfshp24fl9FY/oAToxswAucxdMZqYqXCCW7YcaVPG+ txVvXNvyoomdM+tRAIupYnubceOZWcxFKteD9b3Un2Jw0XIYoBCKx96C6cxRzyAeukQV RHeA== X-Gm-Message-State: APjAAAVLU9sSlkbT0V8+mXT6HJLHiUZTAds7O8bpbPav7SArilqzy41q ZfQsWaj4BcsGWJUBSuMHecojwqOMWbKBm81yYCQ= X-Google-Smtp-Source: APXvYqzLKwbRAXpAXUIVksKUoDCoozuXJhz/HVTiZeFCkJygCva8NgNFieyZlOhyd9ruu/zSwFYXWblWxqnKTVi1/sw= X-Received: by 2002:a19:ab1a:: with SMTP id u26mr7424654lfe.6.1559609779982; Mon, 03 Jun 2019 17:56:19 -0700 (PDT) MIME-Version: 1.0 References: <20190531223735.4998-1-mmullins@fb.com> <6c6a4d47-796a-20e2-eb12-503a00d1fa0b@iogearbox.net> <68841715-4d5b-6ad1-5241-4e7199dd63da@iogearbox.net> <05626702394f7b95273ab19fef30461677779333.camel@fb.com> <70b9a1b2-c960-b810-96f9-1fb5f4a4061b@iogearbox.net> <71c96268-7779-6e34-3078-5532d9f8fa55@iogearbox.net> In-Reply-To: <71c96268-7779-6e34-3078-5532d9f8fa55@iogearbox.net> From: Alexei Starovoitov Date: Mon, 3 Jun 2019 17:56:08 -0700 Message-ID: Subject: Re: [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd To: Daniel Borkmann Cc: Matt Mullins , "netdev@vger.kernel.org" , Andrew Hall , "bpf@vger.kernel.org" , "ast@kernel.org" , "linux-kernel@vger.kernel.org" , Martin Lau , Yonghong Song , "rostedt@goodmis.org" , "mingo@redhat.com" , Song Liu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 3, 2019 at 5:44 PM Daniel Borkmann wrote: > > On 06/04/2019 01:54 AM, Alexei Starovoitov wrote: > > On Mon, Jun 3, 2019 at 4:48 PM Daniel Borkmann wrote: > >> On 06/04/2019 01:27 AM, Alexei Starovoitov wrote: > >>> On Mon, Jun 3, 2019 at 3:59 PM Matt Mullins wrote: > >>>> > >>>> If these are invariably non-nested, I can easily keep bpf_misc_sd when > >>>> I resubmit. There was no technical reason other than keeping the two > >>>> codepaths as similar as possible. > >>>> > >>>> What resource gives you worry about doing this for the networking > >>>> codepath? > >>> > >>> my preference would be to keep tracing and networking the same. > >>> there is already minimal nesting in networking and probably we see > >>> more when reuseport progs will start running from xdp and clsbpf > >>> > >>>>> Aside from that it's also really bad to miss events like this as exporting > >>>>> through rb is critical. Why can't you have a per-CPU counter that selects a > >>>>> sample data context based on nesting level in tracing? (I don't see a discussion > >>>>> of this in your commit message.) > >>>> > >>>> This change would only drop messages if the same perf_event is > >>>> attempted to be used recursively (i.e. the same CPU on the same > >>>> PERF_EVENT_ARRAY map, as I haven't observed anything use index != > >>>> BPF_F_CURRENT_CPU in testing). > >>>> > >>>> I'll try to accomplish the same with a percpu nesting level and > >>>> allocating 2 or 3 perf_sample_data per cpu. I think that'll solve the > >>>> same problem -- a local patch keeping track of the nesting level is how > >>>> I got the above stack trace, too. > >>> > >>> I don't think counter approach works. The amount of nesting is unknown. > >>> imo the approach taken in this patch is good. > >>> I don't see any issue when event_outputs will be dropped for valid progs. > >>> Only when user called the helper incorrectly without BPF_F_CURRENT_CPU. > >>> But that's an error anyway. > >> > >> My main worry with this xchg() trick is that we'll miss to export crucial > >> data with the EBUSY bailing out especially given nesting could increase in > >> future as you state, so users might have a hard time debugging this kind of > >> issue if they share the same perf event map among these programs, and no > >> option to get to this data otherwise. Supporting nesting up to a certain > >> level would still be better than a lost event which is also not reported > >> through the usual way aka perf rb. > > > > I simply don't see this 'miss to export data' in all but contrived conditions. > > Say two progs share the same perf event array. > > One prog calls event_output and while rb logic is working > > another prog needs to start executing and use the same event array > > Correct. > > > slot. Today it's only possible for tracing prog combined with networking, > > but having two progs use the same event output array is pretty much > > a user bug. Just like not passing BPF_F_CURRENT_CPU. > > I don't see the user bug part, why should that be a user bug? because I suspect that 'struct bpf_event_entry' is not reentrable (even w/o issues with 'struct perf_sample_data'). Even if we always use 'struct perf_sample_data' on stack, I suspect the same 'struct bpf_event_entry' cannot be reused in the nested way. > It's the same > as if we would say that sharing a BPF hash map between networking programs > attached to different hooks or networking and tracing would be a user bug > which it is not. One concrete example would be cilium monitor where we > currently expose skb trace and drop events a well as debug data through > the same rb. This should be usable from any type that has perf_event_output > helper enabled (e.g. XDP and tc/BPF) w/o requiring to walk yet another per > cpu mmap rb from user space. sure. those are valid use cases, but in all cases bpf_event_output() on particular 'struct bpf_event_entry' will end before another one will begin, no?