From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mauricio Vasquez Subject: Re: [PATCH bpf-next 1/3] bpf: add bpf queue map Date: Wed, 8 Aug 2018 22:08:47 -0500 Message-ID: References: <153356387977.6981.12236150594041620482.stgit@kernel> <153356390770.6981.4228793745105954649.stgit@kernel> <20180807144226.pmalhe3mjvc3a45y@ast-mbp.dhcp.thefacebook.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org To: Alexei Starovoitov Return-path: Received: from fm2nodo5.polito.it ([130.192.180.19]:58638 "EHLO fm2nodo5.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725779AbeHIFbr (ORCPT ); Thu, 9 Aug 2018 01:31:47 -0400 In-Reply-To: <20180807144226.pmalhe3mjvc3a45y@ast-mbp.dhcp.thefacebook.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 08/07/2018 09:42 AM, Alexei Starovoitov wrote: > On Mon, Aug 06, 2018 at 03:58:30PM +0200, Mauricio Vasquez B wrote: >> Bpf queue implements a LIFO/FIFO data containers for ebpf programs. > queue/stack datastructure would be a great addition. > >> It allows to push an element to the queue by using the update operation >> and to pop an element from the queue by using the lookup operation. >> >> A use case for this is to keep track of a pool of elements, like >> network ports in a SNAT. >> >> Signed-off-by: Mauricio Vasquez B >> --- >> include/linux/bpf_types.h | 1 >> include/uapi/linux/bpf.h | 5 + >> kernel/bpf/Makefile | 2 >> kernel/bpf/queuemap.c | 287 +++++++++++++++++++++++++++++++++++++++++++++ >> kernel/bpf/syscall.c | 61 +++++++--- >> kernel/bpf/verifier.c | 16 ++- >> 6 files changed, 353 insertions(+), 19 deletions(-) >> create mode 100644 kernel/bpf/queuemap.c >> >> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h >> index c5700c2d5549..6c7a62f3fe43 100644 >> --- a/include/linux/bpf_types.h >> +++ b/include/linux/bpf_types.h >> @@ -58,3 +58,4 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops) >> BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops) >> #endif >> #endif >> +BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 0ebaaf7f3568..2c171c40eb45 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -120,6 +120,7 @@ enum bpf_map_type { >> BPF_MAP_TYPE_CPUMAP, >> BPF_MAP_TYPE_XSKMAP, >> BPF_MAP_TYPE_SOCKHASH, >> + BPF_MAP_TYPE_QUEUE, >> }; >> >> enum bpf_prog_type { >> @@ -255,6 +256,10 @@ enum bpf_attach_type { >> /* Flag for stack_map, store build_id+offset instead of pointer */ >> #define BPF_F_STACK_BUILD_ID (1U << 5) >> >> +/* Flags for queue_map, type of queue */ >> +#define BPF_F_QUEUE_FIFO (1U << 16) >> +#define BPF_F_QUEUE_LIFO (2U << 16) > the choice of flags looks odd. > Why start at bit 16 and why waste two bits? > It's either stack or queue. > May be instead define two map_types: > BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK > that share common implementation? I like this solution, I'll spin a v2 with two different types of maps. > And how about adding three new helpers: push/pop/peek as well? > Reusing lookup/update is neat, but does lookup == pop > or does lookup == peek ? > I suspect it will be confusing. > Three new helpers cost nothing, but will make bpf progs easier to read. I agree. I have one doubt here, update/lookup/delete is implemented in all map types, if the operation is not supported it returns -EINVAL. For push/pop/peek, should we implement it in all maps or is it better to check the map type before invoking map->ops->push/pop/seek? (Maybe checking if map->ops->xxx is NULL will also work) > Could you also add a flag for replacement policy? > In this implementation when max_entries limit is reached > the map_update_elem (aka push) will fail with e2big. > It would be useful to allow pushing and dropping elements from > the other side then such queue/stack can be used to keep > track of the last N events (the prog would unconditionally push > and user space would swap the queue for new one via map-in-map > and drain old queue). I like it, will do. > Speaking of map-in-map, please add a test to make sure > queue/stack works with hash/array_of_maps. Sure. > selftests in patch2 needs to have kernel side as well. > it's not enough to test syscall access only. > Will do.