From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A88CC433EF for ; Thu, 23 Sep 2021 18:43:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A9B661216 for ; Thu, 23 Sep 2021 18:43:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242778AbhIWSoj (ORCPT ); Thu, 23 Sep 2021 14:44:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242708AbhIWSoj (ORCPT ); Thu, 23 Sep 2021 14:44:39 -0400 Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32881C061574 for ; Thu, 23 Sep 2021 11:43:07 -0700 (PDT) Received: by mail-yb1-xb33.google.com with SMTP id s16so311295ybe.0 for ; Thu, 23 Sep 2021 11:43:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mLVB4AglitkNdTvXxlPc4PVf5UljXwmlSEsEX1A02J4=; b=KI7Iyurra2hmTFjVIkCJSfQC8V7nEA3nzudvx0ITzQ5vHKUp6HYBS/klPhB6hnKo3g jOqndLOy0sPdkngwHcmRG9VJIzIXlhYMkxskbLHRfHqU6OfszjLpD7bPiRcBNSamV8jX sWYCP7Vv5LmDZ8Z2SmEWjxG11LhPv9ZBl5RD+27WCj8ln4De+p576z7Z0NXbnrX4FZMo Sw+uAKh6lFR8Zb9jMK9zMMgutY+Pdt42QqtesLzghrhkAlMvKlp8JOjVvz9aKfAiNqgC BbPDCFixbsRtokSZBZL3mIoA2LlBrDaXGYp+ggrElLkWf+bUVlzGrGHSzDzD+vDpswvd DgCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mLVB4AglitkNdTvXxlPc4PVf5UljXwmlSEsEX1A02J4=; b=38SU4nTwbJobnV49kktqOybVeilokiANYNaDLicEUnX40qQHmAVj1TLS6tiPitlnZn EPqf7cBRBDxQS1Ilv0aBeoyDiYNPbkRwmhFWxOBrwy6gYBozu7gwtczgTFgIvzdwmiaP HAcU/Xp7+BfdjrX78lwbkvu0ArpKvKfj0rqMosavHKsetEms/ItOZwTQJcoNKME8ziOi qJfA9NFz5cyvIzGFaQw7jkhAjJSCxksKWiABKq5/gWamXaf2Zr5f5pZxplncjcj8Mg3a XNtDEuJDE5Kv9E2/p+sL23sjnkCWSLXBAe06KX2Hdi7BsFLqrC+Nsz9538+eyrrpLl6b VUwA== X-Gm-Message-State: AOAM531hn1gFGWv45BcmxBdAP/q2NhzCdm/oDpFeHo/BlJ1KRcXUEqyT 10PbUMLalFlfPtRUkPTnG3VR5eX0/9THjvv8DYw0Me1K X-Google-Smtp-Source: ABdhPJyJB0Rzv2QPyqZUyUkmr5ADNvipzVbZQcy/+gL2UEMqbxDE0IN+0j+9KHLk0ktIwTOm9llwoLEmLqXbYUHyC1Y= X-Received: by 2002:a05:6902:724:: with SMTP id l4mr7319294ybt.433.1632422586295; Thu, 23 Sep 2021 11:43:06 -0700 (PDT) MIME-Version: 1.0 References: <20210921210225.4095056-1-joannekoong@fb.com> <20210921210225.4095056-2-joannekoong@fb.com> <517a137d-66aa-8aa8-a064-fad8ae0c7fa8@fb.com> <20210922193827.ypqlt3ube4cbbp5a@kafai-mbp.dhcp.thefacebook.com> <20210922220844.ihzoapwytaz2o7nn@kafai-mbp.dhcp.thefacebook.com> <20210923012849.qfgammwxxcd47fgn@kafai-mbp.dhcp.thefacebook.com> In-Reply-To: <20210923012849.qfgammwxxcd47fgn@kafai-mbp.dhcp.thefacebook.com> From: Andrii Nakryiko Date: Thu, 23 Sep 2021 11:42:55 -0700 Message-ID: Subject: Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation To: Martin KaFai Lau Cc: Joanne Koong , bpf , Kernel Team Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Wed, Sep 22, 2021 at 6:28 PM Martin KaFai Lau wrote: > > On Wed, Sep 22, 2021 at 04:07:52PM -0700, Andrii Nakryiko wrote: > > > > Please see my RFC ([0]). I don't think there is much to coordinate. It > > > > could be purely BPF-side code, or BPF + user-space initialization > > > > code, depending on the need. It's a simple and beautiful algorithm, > > > > which BPF is powerful enough to implement customly and easily. > > > > > > > > [0] https://lore.kernel.org/bpf/20210922203224.912809-1-andrii@kernel.org/T/#t > > > In practice, the bloom filter will be populated only once by the userspace. > > > > > > The future update will be done by map-in-map to replace the whole bloom filter. > > > May be with more max_entries with more nr_hashes. May be fewer > > > max_entries with fewer nr_hashes. > > > > > > Currently, the continuous running bpf prog using this bloom filter does > > > not need to worry about any change in the newer bloom filter > > > configure/setup. > > > > > > I wonder how that may look like in the custom bpf bloom filter in the > > > bench prog for the map-in-map usage. > > > > You'd have to use BPF_MAP_TYPE_ARRAY for the map-in-map use case. > Right, another map is needed. When the user space generates > a new bloom filter as inner map, it is likely that it has different > number of entries, so the map size is different. > > The old and new inner array map need to at least have the same value_size, > so an one element array with different value_size will not work. > > The inner array map with BPF_F_INNER_MAP can have different max_entries > but then there is no inline code lookup generation. It may not be too > bad to call it multiple times to lookup a value considering the > array_map_lookup_elem will still be directly called without retpoline. All true, of course, due to generic BPF limitations. In practice, I'd decide what's the maximum size of the bloom filter I'd need and use that as an inner map definition. If I understand correctly, there is going to be only one "active" Bloom filter map and it's generally not that big (few megabytes covers tons of "max_entries"), so I'd just work with maximum expected size. If I absolutely needed variable-sized filters, I'd consider doing a multi-element array as you suggested, but I'd expect lower performance, as you mentioned. > The next part is how to learn those "const volatile __u32 bloom_*;" > values of the new inner map. I think the max_entires can be obtained > by map_ptr->max_entries. Other vars (e.g. hash_cnt and seed) can > be used as non-const global, allow the update, and a brief moment of > inconsistence may be fine. For single-element array with fixed value_size I'd put those in first 8 bytes: struct my_bloom { __u32 msk; __u32 seed; __u64 data[]; } For multi-element BPF_MAP_TYPE_ARRAY I'd put a mask and seed into elem[0]. I'd expect that hash_cnt would be just hard-coded, because as I mentioned before, it determines the probability of false positive, which is what end-users probably care about the most and set upfront, at least they should be coming at this from the perspective "1% of false positives is acceptable" rather than "hmm... 3 hash functions is probably acceptable", no? But if not, first two elements would be taken. > > It all sounds doable but all these small need-to-pay-attention > things add up. Of course, there is always a tension between "make it simple and provide a dedicated BPF helper/BPF map" and "let users implement it on their own". I'm saying I'm not convinced that it has to be the former in this case. Bloom filter is just a glorified bit set, once you have a hashing helper. I don't think we've added BPF_MAP_TYPE_BITSET yet, though it probably would be pretty useful in some cases, just like the Bloom filter. Similarly, we don't have BPF_MAP_TYPE_HASHSET in addition to BPF_MAP_TYPE_HASHMAP. I've seen many cases where HASHMAP is used as HASHSET, yet we didn't have a dedicated map for that either. I'm just curious where we draw the line between what should be added to the kernel for BPF, if there are reasonable ways to avoid that.