From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF79BC47082 for ; Wed, 26 May 2021 15:34:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 866AF613D3 for ; Wed, 26 May 2021 15:34:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234310AbhEZPfj (ORCPT ); Wed, 26 May 2021 11:35:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232762AbhEZPfi (ORCPT ); Wed, 26 May 2021 11:35:38 -0400 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 667B6C061756 for ; Wed, 26 May 2021 08:34:07 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id g8so1174697qtp.4 for ; Wed, 26 May 2021 08:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mojatatu-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=NctmoWs8tLUvGWlpFwF5tdeAyBbyMzRUvFfUG7aqzEU=; b=Txf7OUJcOzDUzoa1IPdnaAzZT/nz45uqHFWcn7jBbIdL8EhyLLFr9Sa/l7aRwMURyH pZh3Azr9TIbBYgk+WX08xeweTbUb5OArr4dKkPDLPwiitXB1lS1+Ic3Qq1bRk60w/uX+ jnf0cK5BihddvDSaXUZEc5L8unCHf647TFHk32rhuxM2gd232UMaetwgJQuDUU0ki85p WOizgS44T39EoAjUoYxTO177nN/GKh4ZKDnyHPUk4RJMBEw3bszmI9dUHDhGC/vAZl7t rtxEUIDhuN1O/+DqvnyKwCwWXRqNxP4SOw+Q+eAWGM9MwB4T60wgXJunVaS8D7y/oS7i ZDHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=NctmoWs8tLUvGWlpFwF5tdeAyBbyMzRUvFfUG7aqzEU=; b=Gra3x6DtyWXZ9JMhCfNCO4mzU6tCG+Sw6TRsLeqIUZbKc4nbewp1BwI2wqmviJZIa4 P9JpgYk4mhmPe9L2zf7aNYIj4hUpotZrHA5PUESl8q4eFnFpb/mieZYoc8b0iW6hWHOx U1mrv5fCVIj02wwPFtFVP95upiIeCkXLKfH30aQYRnP8qxR/hjj+m+LGPM+6jteC7+FM P11rgKMLy4EJLf7zcZS2UVnqc8UaOQOVHRR+MimAsInzjao+LPqVoLKE0Bhzmou9DPuq gfDHbwxC9FzKFfkXxg9sWCsxxDKuKWsFsaBWuYSyiq7HteBxPvrpq6TQbstGC5r/RGXE GrWA== X-Gm-Message-State: AOAM532CJTaGtVnr78ZkRusYhYXeq5njqXEiVbLU3tx7xAcU01DhhWCm 8W6V7VFukpdnQ4XatOejzHf2uA== X-Google-Smtp-Source: ABdhPJwpIhVTy2xUHL/jiUCnDO/QVUEneEXfVN/xPxO4/avILHpXWn4PqNR1l/BA63+vYKdQRN8eJQ== X-Received: by 2002:ac8:7194:: with SMTP id w20mr37767833qto.363.1622043246421; Wed, 26 May 2021 08:34:06 -0700 (PDT) Received: from [192.168.1.79] (bras-base-kntaon1617w-grc-28-184-148-47-211.dsl.bell.ca. [184.148.47.211]) by smtp.googlemail.com with ESMTPSA id j15sm1659361qtv.11.2021.05.26.08.34.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 May 2021 08:34:05 -0700 (PDT) Subject: Re: [RFC PATCH bpf-next] bpf: Introduce bpf_timer To: Alexei Starovoitov Cc: Cong Wang , David Miller , Daniel Borkmann , Andrii Nakryiko , John Fastabend , Lorenz Bauer , Linux Kernel Network Developers , bpf , kernel-team , Pedro Tammela References: <20210520185550.13688-1-alexei.starovoitov@gmail.com> <27dae780-b66b-4ee9-cff1-a3257e42070e@mojatatu.com> From: Jamal Hadi Salim Message-ID: <2dfc5180-40df-ae4c-7146-d64130be9ad4@mojatatu.com> Date: Wed, 26 May 2021 11:34:04 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On 2021-05-25 6:08 p.m., Alexei Starovoitov wrote: > On Tue, May 25, 2021 at 2:09 PM Jamal Hadi Salim wrote: >> >> This is certainly a useful feature (for other reasons as well). >> Does this include create/update/delete issued from user space? > > Right. Any kind of update/delete and create is a subset of update. > The lookup is not included (yet or may be ever) since it doesn't > have deterministic start/end points. > The prog can do a lookup and update values in place while > holding on the element until prog execution ends. > > While update/delete have precise points in hash/lru/lpm maps. > Array is a different story. > Didnt follow why this wouldnt work in the same way for Array? One interesting concept i see come out of this is emulating netlink-like event generation towards user space i.e a user space app listening to changes to a map. >> >> The challenge we have in this case is LRU makes the decision >> which entry to victimize. We do have some entries we want to >> keep longer - even if they are not seeing a lot of activity. > > Right. That's certainly an argument to make LRU eviction > logic programmable. > John/Joe/Daniel proposed it as a concept long ago. > Design ideas are in demand to make further progress here :) > would like to hear what the proposed ideas are. I see this as a tricky problem to solve - you can make LRU programmable to allow the variety of LRU replacement algos out there but not all encompansing for custom or other types of algos. The problem remains that LRU is very specific to evicting entries that are least used. I can imagine that if i wanted to do a LIFO aging for example then it can be done with some acrobatics as an overlay on top of LRU with all sorts of tweaking. It is sort of fitting a square peg into a round hole - you can do it, but why the torture when you have a flexible architecture. We need to provide the mechanisms (I dont see a disagreement on need for timers at least). >> You could just notify user space to re-add the entry but then >> you have sync challenges. >> The timers do provide us a way to implement custom GC. > > My point is that time is always going to be a heuristic that will > break under certain traffic conditions. > I recommend to focus development effort on creating > building blocks that are truly great instead of reimplementing > old ideas in bpf with all of their shortcomings. > There are some basic mechanisms i dont think that we can avoid. Agreed on the general sentiment of what you are saying. >> So a question (which may have already been discussed), >> assuming the following setup: >> - 2 programs a) Ingress b) egress >> - sharing a conntrack map which and said map pinned. >> - a timer prog (with a map with just timers; >> even a single timer would be enough in some cases). >> >> ingress and egress do std stuff like create/update >> timer prog does the deletes. For simplicity sake assume >> we just have one timer that does a foreach and iterates >> all entries. >> >> What happens when both ingress and egress are ejected? > > What is 'ejected'? Like a CD? ;) I was going to use other verbs to describe this; but may have sounded obscene ;-> > I think you mean 'detached' ? Yes. > and then, I assume, the user space doesn't hold to prog FD? Right. The pinning may still exist on the maps (therefore a ref count). Note, this may be design intent. > The kernel can choose to do different things with the timer here. > One option is to cancel the outstanding timers and unload > .text where the timer callback lives > > Another option is to let the timer stay armed and auto unload > .text of bpf function when it finishes executing. > > If timer callback decides to re-arm itself it can continue > executing indefinitely. > This patch is doing the latter. > There could be a combination of both options. > All options have their pros/cons. A reasonable approach is to let the policy be defined from user space. I may want the timer to keep polling a map that is not being updated until the next program restarts and starts updating it. I thought Cong's approach with timerids/maps was a good way to achieve control. cheers, jamal