From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D35EC282C3 for ; Fri, 25 Jan 2019 02:38:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ED36D218CD for ; Fri, 25 Jan 2019 02:38:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lqLj/JFB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728567AbfAYCiV (ORCPT ); Thu, 24 Jan 2019 21:38:21 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:44801 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728400AbfAYCiV (ORCPT ); Thu, 24 Jan 2019 21:38:21 -0500 Received: by mail-pf1-f193.google.com with SMTP id u6so3979938pfh.11 for ; Thu, 24 Jan 2019 18:38:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=rjnxLPiws6H6LGtqB6H5Y95m8OMbC+kBEqVGKThkuuQ=; b=lqLj/JFBgNWBA5MuJjoJUAlHkQmLlYwkszA2XXxrsXtoEK3oLGYfmo/5xdt9j/RxXx +W4haCarjvGLQrv8w7RWES25SGCY44xE6vZBiOQOUbgnPeodxTejZsLLSgOOYra1jo6H /kyh6a94/KSixC5AV5SizU3Cwq/HVVshjx4lU/THXyW5ddS53DhOnScHCJxQgBP8pKqL rNqYXtuN1HCCd//ilz3cO1rexyfrxZmj856fH8ed9qMiG4hHqgJCf70S8NP7j8kllHsV 61+yQbfGDu1+TjO8okb8bCn8Mlm3EGCANAN+DEjZD+CbAn5+D5FyhkcZSG/uW6bjDAZs pwCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=rjnxLPiws6H6LGtqB6H5Y95m8OMbC+kBEqVGKThkuuQ=; b=AcEHyMXFnmWVjlWLiUbNsO/+lKXgbR8yxNU6KTNE2Auyc8jAJhDro8Z+PYqzZz8G6Z qYVrvu4luYU12DB59Y2ciYqSvIze3ZhYeIgtOCr5Dzdah1opOLr9VBNzArAG26dt3+vK LpZOALFTf5EC7glSJ8GVILyRchE3r1RrNDToxB+O0XWb5064brn74iC3GNa0373thGb7 pm2SLoVQP19lTQjL76ERqCxu9DreUNyWtiwbp4xSH5cIUe/pGvaAchlKWkhoCFGCR/ES hBf8oBQfNzC3EW9sZ+6bCSk7Ut58WREpFNxvPzilCK+9CNa19eMrUKaHfS3l27PHdM4r MZsg== X-Gm-Message-State: AJcUukfp35OBmoYWcSXw5gEDd/wwOBmm5DaCNGJ3mvYqc77vSPt3ncf5 ejyYOvRuAWN7kAK9d0wSej8= X-Google-Smtp-Source: ALg8bN6dmHUhdA/iQ5zv9uzOSU1/j5D0b+9l1oNEMqtNmF96lpIFsbtHztltLYrVCY8Z3ppnWcUWqQ== X-Received: by 2002:a63:a35c:: with SMTP id v28mr8214270pgn.205.1548383900515; Thu, 24 Jan 2019 18:38:20 -0800 (PST) Received: from ast-mbp.dhcp.thefacebook.com ([2620:10d:c090:200::7:5429]) by smtp.gmail.com with ESMTPSA id y6sm35764239pfd.104.2019.01.24.18.38.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 Jan 2019 18:38:19 -0800 (PST) Date: Thu, 24 Jan 2019 18:38:18 -0800 From: Alexei Starovoitov To: Jann Horn Cc: paulmck@linux.ibm.com, Peter Zijlstra , Alexei Starovoitov , "David S. Miller" , Daniel Borkmann , jakub.kicinski@netronome.com, Network Development , kernel-team@fb.com, Ingo Molnar , Will Deacon Subject: Re: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock Message-ID: <20190125023816.zolpqls5bcsbqsga@ast-mbp.dhcp.thefacebook.com> References: <20190124041403.2100609-1-ast@kernel.org> <20190124041403.2100609-2-ast@kernel.org> <20190124180109.GA27771@hirez.programming.kicks-ass.net> <20190124185652.GB17767@hirez.programming.kicks-ass.net> <20190124234232.GY4240@linux.ibm.com> <20190125000515.jizijxz4n735gclx@ast-mbp.dhcp.thefacebook.com> <20190125012224.GZ4240@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180223 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, Jan 25, 2019 at 02:46:55AM +0100, Jann Horn wrote: > On Fri, Jan 25, 2019 at 2:22 AM Paul E. McKenney wrote: > > On Thu, Jan 24, 2019 at 04:05:16PM -0800, Alexei Starovoitov wrote: > > > On Thu, Jan 24, 2019 at 03:42:32PM -0800, Paul E. McKenney wrote: > > > > On Thu, Jan 24, 2019 at 07:56:52PM +0100, Peter Zijlstra wrote: > > > > > On Thu, Jan 24, 2019 at 07:01:09PM +0100, Peter Zijlstra wrote: > > > > > > > > > > > > Thanks for having kernel/locking people on Cc... > > > > > > > > > > > > On Wed, Jan 23, 2019 at 08:13:55PM -0800, Alexei Starovoitov wrote: > > > > > > > > > > > > > Implementation details: > > > > > > > - on !SMP bpf_spin_lock() becomes nop > > > > > > > > > > > > Because no BPF program is preemptible? I don't see any assertions or > > > > > > even a comment that says this code is non-preemptible. > > > > > > > > > > > > AFAICT some of the BPF_RUN_PROG things are under rcu_read_lock() only, > > > > > > which is not sufficient. > > > > > > > > > > > > > - on architectures that don't support queued_spin_lock trivial lock is used. > > > > > > > Note that arch_spin_lock cannot be used, since not all archs agree that > > > > > > > zero == unlocked and sizeof(arch_spinlock_t) != sizeof(__u32). > > > > > > > > > > > > I really don't much like direct usage of qspinlock; esp. not as a > > > > > > surprise. > > > > > > > > Substituting the lightweight-reader SRCU as discussed earlier would allow > > > > use of a more generic locking primitive, for example, one that allowed > > > > blocking, at least in cases were the context allowed this. > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > > > > branch srcu-lr.2019.01.16a. > > > > > > > > One advantage of a more generic locking primitive would be keeping BPF > > > > programs independent of internal changes to spinlock primitives. > > > > > > Let's keep "srcu in bpf" discussion separate from bpf_spin_lock discussion. > > > bpf is not switching to srcu any time soon. > > > If/when it happens it will be only for certain prog+map types > > > like bpf syscall probes that need to be able to do copy_from_user > > > from bpf prog. > > > > Hmmm... What prevents BPF programs from looping infinitely within an > > RCU reader, and as you noted, preemption disabled? > > > > If BPF programs are in fact allowed to loop infinitely, it would be > > very good for the health of the kernel to have preemption enabled. > > And to be within an SRCU read-side critical section instead of an RCU > > read-side critical section. > > The BPF verifier prevents loops; this is in push_insn() in > kernel/bpf/verifier.c, which errors out with -EINVAL when a back edge > is encountered. For non-root programs, that limits the maximum number > of instructions per eBPF engine execution to > BPF_MAXINSNS*MAX_TAIL_CALL_CNT==4096*32==131072 (but that includes > call instructions, which can cause relatively expensive operations > like hash table lookups). correct. > For programs created with CAP_SYS_ADMIN, > things get more tricky because you can create your own functions and > call them repeatedly; I'm not sure whether the pessimal runtime there > becomes exponential, or whether there is some check that catches this. I think you're referring to bpf-to-bpf calls. The limit it still the same. 4k per program including all calls. tail calls are not allowed when bpf-to-bpf is used. So no 32 multiplier. Note that classic bpf has the same 4k limit and it can call expensive functions too via SKF_AD extensions.