From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B77CC4338F for ; Wed, 11 Aug 2021 21:03:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21B126101D for ; Wed, 11 Aug 2021 21:03:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231630AbhHKVDn (ORCPT ); Wed, 11 Aug 2021 17:03:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231316AbhHKVDm (ORCPT ); Wed, 11 Aug 2021 17:03:42 -0400 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D74E0C0613D3 for ; Wed, 11 Aug 2021 14:03:18 -0700 (PDT) Received: by mail-oi1-x232.google.com with SMTP id s13so6598874oie.10 for ; Wed, 11 Aug 2021 14:03:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cilium-io.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wd6T0rEySA9LtvGGXNP1hlrCZYYYrgF6pB+h/IUB6C8=; b=RlErA+tHdrMSqMClvTsJUgbvXB0rV7ky8wdTbS5Q0sW6EjVMsmbX4s+yIaznjrtBx3 4jokUKJ8seIjWdzi6B6Osq0xzbkF91l9CxHNG6CLjlG+EqoBuUaDwU+n/BOA/iOizqS5 +5x7M0ibB8Ft+iKeJMNPEpGdVnFB8fPOSFoIfj6WoRs7ZBo8vTiLeAZFqVD4bNlu9g+1 FBLO4dCe4JsQVVNpRV1sqsKMTXDtxFZ3/V486QfgcXkdKQXVrHAt3L2rRlOiQyyISYFb c3QsRHtj/uLuuT5yaJnWVzWbQpx22MyONirLcXhOjZekJf1kL975amhnmKWekCNuFeU7 fSBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wd6T0rEySA9LtvGGXNP1hlrCZYYYrgF6pB+h/IUB6C8=; b=EFNk5ZDAWtaJmg1bKba+mvmiwGsxMNT/G8rO4xc9g+V+s3N4w8A41XmrRAwdJCmyFm BH8neFBX7WWQsE1GKwVBu37fWKenifqValV4Q1u8AQbTXviifY5H8o012Qat+eT27DzR YcOFSOyuBbspRpMx0cSwfdRF9cJrsFwxrGz7APp5pYcmjOr5KfCN0MgNMsA2UiI7gSUw AHfEUzcRVjGLW/tP5u922PJSBNkGil3CdygJM1r+uXf6KzJaADMhsbdzHVkKSu58K3bs Esbn9ZoR9YzRMN5ZVw1sCvxfTcgCNI0mE+TRKykvZIKxeCJd0KDrSa8ilU7kMxWwwoo4 /SuA== X-Gm-Message-State: AOAM533jN3BfykLCKungA72m/O2G7cjA16jiWiFD+/c52UhfnrBh6Wi4 QVNoVORGpA3ltRYhp/SQPazrIBxa31Vgq+kRwR3/xg== X-Google-Smtp-Source: ABdhPJzkHC2RzO8iBfAGGt3BqvnWFv2/4EzL/x8hdCnM69LLaLAkIyNDeHmOmlKNPvGssx6opPcx97/0mWeZLv2/8a0= X-Received: by 2002:a54:488c:: with SMTP id r12mr8583818oic.111.1628715798125; Wed, 11 Aug 2021 14:03:18 -0700 (PDT) MIME-Version: 1.0 References: <20210402192823.bqwgipmky3xsucs5@ast-mbp> <20210402234500.by3wigegeluy5w7j@ast-mbp> <20210412230151.763nqvaadrrg77kd@ast-mbp.dhcp.thefacebook.com> <20210427020159.hhgyfkjhzjk3lxgs@ast-mbp.dhcp.thefacebook.com> In-Reply-To: From: Joe Stringer Date: Wed, 11 Aug 2021 14:03:07 -0700 Message-ID: Subject: Re: [RFC Patch bpf-next] bpf: introduce bpf timer To: Cong Wang Cc: Jamal Hadi Salim , Joe Stringer , Alexei Starovoitov , Linux Kernel Network Developers , bpf , Xiongchun Duan , Dongdong Wang , Muchun Song , Cong Wang , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , Pedro Tammela Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Hi folks, apparently I never clicked 'send' on this email, but if you wanted to continue the discussion I had some questions and thoughts. This is also an interesting enough topic that it may be worth considering to submit for the upcoming LPC Networking & BPF track (submission deadline is this Friday August 13, Conference dates 20-24 September). On Thu, May 13, 2021 at 7:53 PM Cong Wang wrote: > > On Thu, May 13, 2021 at 11:46 AM Jamal Hadi Salim wrote: > > > > On 2021-05-12 6:43 p.m., Jamal Hadi Salim wrote: > > > > > > > > Will run some tests tomorrow to see the effect of batching vs nobatch > > > and capture cost of syscalls and cpu. > > > > > > > So here are some numbers: > > Processor: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz > > This machine is very similar to where a real deployment > > would happen. > > > > Hyperthreading turned off so we can dedicate the core to the > > dumping process and Performance mode on, so no frequency scaling > > meddling. > > Tests were ran about 3 times each. Results eye-balled to make > > sure deviation was reasonable. > > 100% of the one core was used just for dumping during each run. > > I checked with Cilium users here at Bytedance, they actually observed > 100% CPU usage too. Thanks for the feedback. Can you provide further details? For instance, * Which version of Cilium? * How long do you observe this 100% CPU usage? * What size CT map is in use? * How frequently do you intend for CT GC to run? (Do you use the default settings or are they mismatched with your requirements for some reason? If so can we learn more about the requirements/why?) * Do you have a threshold in mind that would be sufficient? If necessary we can take these discussions off-list if the details are sensitive but I'd prefer to continue the discussion here to have some public examples we can discuss & use to motivate future discussions. We can alternatively move the discussion to a Cilium GitHub issue if the tradeoffs are more about the userspace implementation rather than the kernel specifics, though I suspect some of the folks here would also like to follow along so I don't want to exclude the list from the discussion. FWIW I'm not inherently against a timer, in fact I've wondered for a while what kind of interesting things we could build with such support. At the same time, connection tracking entry management is a nuanced topic and it's easy to fix an issue in one area only to introduce a problem in another area. > > > > bpftool does linear retrieval whereas our tool does batch dumping. > > bpftool does print the dumped results, for our tool we just count > > the number of entries retrieved (cost would have been higher if > > we actually printed). In any case in the real setup there is > > a processing cost which is much higher. > > > > Summary is: the dumping is problematic costwise as the number of > > entries increase. While batching does improve things it doesnt > > solve our problem (Like i said we have upto 16M entries and most > > of the time we are dumping useless things) > > Thank you for sharing these numbers! Hopefully they could convince > people here to accept the bpf timer. I will include your use case and > performance number in my next update. Yes, Thanks Jamal for the numbers. It's very interesting, clearly batch dumping is far more efficient and we should enhance bpftool to take advantage of it where applicable. > Like i said we have upto 16M entries and most > of the time we are dumping useless things) I'm curious if there's a more intelligent way to figure out this 'dumping useless things' aspect? I can see how timers would eliminate the cycles spent on the syscall aspect of this entirely (in favor of the timer handling logic which I'd guess is cheaper), but at some point if you're running certain logic on every entry in a map then of course it will scale linearly. The use case is different for the CT problem we discussed above, but if I look at the same question for the CT case, this is why I find LRU useful - rather than firing off a number of timers linear on the size of the map, the eviction logic is limited to the map insert rate, which itself can be governed and ratelimited by logic running in eBPF. The scan of the map then becomes less critical, so it can be run less frequently and alleviate the CPU usage question that way.