From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFABFC433ED for ; Fri, 2 Apr 2021 23:45:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8173861179 for ; Fri, 2 Apr 2021 23:45:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234821AbhDBXpG (ORCPT ); Fri, 2 Apr 2021 19:45:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234161AbhDBXpG (ORCPT ); Fri, 2 Apr 2021 19:45:06 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72F1EC0613E6; Fri, 2 Apr 2021 16:45:04 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id w11so3113246ply.6; Fri, 02 Apr 2021 16:45:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=+foaeO5OfbBiZpUT5n3bA81rc2IF+4U2NHdudGSQ/fQ=; b=W0iGzU/sa0O2IzTT0QKB0ljymjlr1O/f54n3daiUgE9cNJBUJXR0zQY74Y5Zzr4XmI E3P2jj1twZdSL7rX5E4L4+rNkLwdeE6YN4a/JxT7aOombbWiv2pWCLvsDhHd62mK1d6T XnJtdTLgv8e0PzjITQDHSqVWjNQ/3RT/LwiSGWV/0G2u1t+LZg7ZVaVzDphLMhLwZoKH 28wKapMAP9ubKHQ/lwBHM5D6ZJp0Qjqn048RIQZd6tp67JzVqKSMD5AMo273e4UuXcdT lL/WuYT9ixlkkei+xFRVuwkJsOjbAAoUFfw5JF9SLJOWemcDT5RbADn5Z/HxiOQywFaU HtTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+foaeO5OfbBiZpUT5n3bA81rc2IF+4U2NHdudGSQ/fQ=; b=JDkLlNXfkCuDHtKD7U33NecuN8Y7kjVBVfvS2AcXiGgB5UQ4TqaN0CsM73Ny2iPpMw O4623/3vbsL/dDcRwPnfui1GPBN9Jb8ZyPEOjKcoY6+w33IvPPUPWEzX59KIQFU4bzZv gZAIBRyHfu0AN4iH7vYUHQ8fbe4kKhC4PdFSscJKET4JoSko20xtJgg0N1cmwRRWnDuY 5oujKxAQCISBHjeTqTbNa4/g9JNkV/U9KpHEsVt6Gf8DfxIhY8/bW+4CqIQT+D0Pb6Fh WaqIGcvz3s776SntsqyvGx4bnL8iIM/ucSL2IQWH99lRekHXHVXn9Jy9Oc2IQfyJkd/k 8qHQ== X-Gm-Message-State: AOAM530aOAogwd4ERl23HHqu3Mkfz35aU/Kugd/Eh43yCjSLTn0kw0X0 2pRM7yUOw2ZMzsp83hKYmwU= X-Google-Smtp-Source: ABdhPJyNRBc3YLuMneuAvdM1Vbhm54FYK9b2yBg0/hxCbEd67mJFtplZLiZqgH6sYq/u9/XLn4efBw== X-Received: by 2002:a17:90b:1651:: with SMTP id il17mr16135021pjb.16.1617407103899; Fri, 02 Apr 2021 16:45:03 -0700 (PDT) Received: from ast-mbp ([2620:10d:c090:400::5:f671]) by smtp.gmail.com with ESMTPSA id w17sm8180430pfu.29.2021.04.02.16.45.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Apr 2021 16:45:03 -0700 (PDT) Date: Fri, 2 Apr 2021 16:45:00 -0700 From: Alexei Starovoitov To: Cong Wang Cc: Linux Kernel Network Developers , bpf , duanxiongchun@bytedance.com, Dongdong Wang , Muchun Song , Cong Wang , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song Subject: Re: [RFC Patch bpf-next] bpf: introduce bpf timer Message-ID: <20210402234500.by3wigegeluy5w7j@ast-mbp> References: <20210401042635.19768-1-xiyou.wangcong@gmail.com> <20210402192823.bqwgipmky3xsucs5@ast-mbp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Fri, Apr 02, 2021 at 02:24:51PM -0700, Cong Wang wrote: > > > where the key is the timer ID and the value is the timer expire > > > timer. > > > > The timer ID is unnecessary. We cannot introduce new IDR for every new > > bpf object. It doesn't scale. > > The IDR is per map, not per timer. Per-map is not acceptable. One IDR for all maps with timers is not acceptable either. We have 3 IDRs now: for progs, for maps, and for links. No other objects need IDRs. > > Here is how more general timers might look like: > > https://lore.kernel.org/bpf/20210310011905.ozz4xahpkqbfkkvd@ast-mbp.dhcp.thefacebook.com/ > > > > include/uapi/linux/bpf.h: > > struct bpf_timer { > > u64 opaque; > > }; > > The 'opaque' field contains a pointer to dynamically allocated struct timer_list and other data. > > This is my initial design as we already discussed, it does not work, > please see below. It does work. The perceived "issue" you referred to is a misunderstanding. See below. > > > > The prog would do: > > struct map_elem { > > int stuff; > > struct bpf_timer timer; > > }; > > > > struct { > > __uint(type, BPF_MAP_TYPE_HASH); > > __uint(max_entries, 1); > > __type(key, int); > > __type(value, struct map_elem); > > } hmap SEC(".maps"); > > > > static int timer_cb(struct map_elem *elem) > > { > > if (whatever && elem->stuff) > > bpf_timer_mod(&elem->timer, new_expire); > > } > > > > int bpf_timer_test(...) > > { > > struct map_elem *val; > > > > val = bpf_map_lookup_elem(&hmap, &key); > > if (val) { > > bpf_timer_init(&val->timer, timer_cb, flags); > > val->stuff = 123; > > bpf_timer_mod(&val->timer, expires); > > } > > } > > > > bpf_map_update_elem() either from bpf prog or from user space > > allocates map element and zeros 8 byte space for the timer pointer. > > bpf_timer_init() allocates timer_list and stores it into opaque if opaque == 0. > > The validation of timer_cb() is done by the verifier. > > bpf_map_delete_elem() either from bpf prog or from user space > > does del_timer() if elem->opaque != 0. > > If prog refers such hmap as above during prog free the kernel does > > for_each_map_elem {if (elem->opaque) del_timer().} > > I think that is the simplest way of prevent timers firing past the prog life time. > > There could be other ways to solve it (like prog_array and ref/uref). > > > > Pseudo code: > > int bpf_timer_init(struct bpf_timer *timer, void *timer_cb, int flags) > > { > > if (timer->opaque) > > return -EBUSY; > > t = alloc timer_list > > t->cb = timer_cb; > > t->.. > > timer->opaque = (long)t; > > } > > > > int bpf_timer_mod(struct bpf_timer *timer, u64 expires) > > { > > if (!time->opaque) > > return -EINVAL; > > t = (struct timer_list *)timer->opaque; > > mod_timer(t,..); > > } > > > > int bpf_timer_del(struct bpf_timer *timer) > > { > > if (!time->opaque) > > return -EINVAL; > > t = (struct timer_list *)timer->opaque; > > del_timer(t); > > } > > > > The verifier would need to check that 8 bytes occupied by bpf_timer and not accessed > > via load/store by the program. The same way it does it for bpf_spin_lock. > > This does not work, because bpf_timer_del() has to be matched > with bpf_timer_init(), otherwise we would leak timer resources. > For example: > > SEC("foo") > bad_ebpf_code() > { > struct bpf_timer t; > bpf_timer_init(&t, ...); // allocate a timer > bpf_timer_mod(&t, ..); > // end of BPF program > // now the timer is leaked, no one will delete it > } > > We can not enforce the matching in the verifier, because users would > have to call bpf_timer_del() before exiting, which is not what we want > either. ``` bad_ebpf_code() { struct bpf_timer t; ``` is not at all what was proposed. This kind of code will be rejected by the verifier. 'struct bpf_timer' has to be part of the map element and the verifier will enforce that just like it does so for bpf_spin_lock. Try writing the following program: ``` bad_ebpf_code() { struct bpf_spin_lock t; bpf_spin_lock(&t); } `` and then follow the code to see why the verifier rejects it. The implementation of what I'm proposing is straightforward. I certainly understand that it might look intimidating and "impossible", but it's really quite simple.