From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mauricio Vasquez Subject: Re: [PATCH bpf-next 1/3] bpf: add bpf queue map Date: Thu, 9 Aug 2018 18:41:54 -0500 Message-ID: <4d92f158-8fea-cfeb-09c9-0353c0ffd8d5@polito.it> References: <153356387977.6981.12236150594041620482.stgit@kernel> <153356390770.6981.4228793745105954649.stgit@kernel> <20180807144226.pmalhe3mjvc3a45y@ast-mbp.dhcp.thefacebook.com> <20180809044806.bbvzyp3sqh6orcnl@ast-mbp> <867e95e1-cb45-54c8-ce66-d3e49161d5e6@polito.it> <20180809162328.rnyxihpiim2wz5dk@ast-mbp> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Daniel Borkmann , Alexei Starovoitov , netdev@vger.kernel.org To: Alexei Starovoitov Return-path: Received: from fm1nodo5.polito.it ([130.192.180.18]:59717 "EHLO fm1nodo5.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727391AbeHJCJ2 (ORCPT ); Thu, 9 Aug 2018 22:09:28 -0400 In-Reply-To: <20180809162328.rnyxihpiim2wz5dk@ast-mbp> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 08/09/2018 11:23 AM, Alexei Starovoitov wrote: > On Thu, Aug 09, 2018 at 09:51:49AM -0500, Mauricio Vasquez wrote: >>> Agree that existing ops are not the right alias, but deferring to user >>> space as inline function also doesn't really seem like a good fit, imho, >>> so I'd prefer rather to have something native. (Aside from that, the >>> above inline bpf_pop() would also race between CPUs.) >> I think we should have push/pop/peek syscalls as well, having a bpf_pop() >> that is race prone would create problems. Users expected maps operations to >> be safe, so having one that is not will confuse them. > agree the races are not acceptable. > How about a mixed solution: > - introduce bpf_push/pop/peak helpers that programs will use, so > they don't need to pass useless key=NULL > - introduce map->ops->lookup_and_delete and map->ops->lookup_or_init > that prog-side helpers can use and syscall has 1-1 mapping for I think if is a fair solution. > Native lookup_or_init() helper for programs and syscall is badly missing. > Most of the bcc scripts use it and bcc has a racy workaround. > Similarly lookup_and_delete() syscall is 1-1 to pop() for stack/queue > and useful for regular hash maps. > > At the end for stack/queue map the programs will use: > int bpf_push(map, value); Also flags should be passed here. > value_or_null = bpf_pop(map); // guaranteed non-racy for multi-cpu > value_or_null = bpf_peak(map); // racy if 2+ cpus doing it Is there any reason for it to be racy? > > from syscall: > bpf_map_lookup_elem(map, NULL, &value); // returns top of stack > bpf_map_lookup_and_delete_elem(map, NULL, &value); // returns top and deletes top atomically > bpf_map_update_elem(map, NULL, &value); // pushes new value into stack atomically > > Eventually hash and other maps will implement bpf_map_lookup_and_delete() > for both bpf progs and syscall. > > The main point that prog-side api doesn't have to match 1-1 to syscall-side, > since they're different enough already. > Like lookup_or_init() is badly needed for programs, but unnecessary for syscall. > > Thoughts? > I agree with the idea, if there are not more thoughts on this I'd proceed to the implementation.