From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01DCCC282C8 for ; Mon, 28 Jan 2019 21:37:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C02B121738 for ; Mon, 28 Jan 2019 21:37:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lVn7evlL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727156AbfA1VhP (ORCPT ); Mon, 28 Jan 2019 16:37:15 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:38882 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726661AbfA1VhP (ORCPT ); Mon, 28 Jan 2019 16:37:15 -0500 Received: by mail-pl1-f193.google.com with SMTP id e5so8338682plb.5 for ; Mon, 28 Jan 2019 13:37:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=vhAnwTswPaT0oq75fEwTl2NhsQwgpR58uPigaICIDUU=; b=lVn7evlLN74EPEjH8rXWwU8fT5bABinQ85O9VUrFawb93OvBIZOHjX0dDL8Ea3bzQF PFlMl4aZjctN2VCH/t/9D1cVYlubaCAAKO7MNgriwqZuuLIwnNB2Pxa/CbEJn/z486QC RFIJR+8HEB60KFn4Nczd6Mef1dUEf98q6DH/BZHPTKsznQCZCyQgtO3XWt2LzHcHwNW9 Z5ea+iUvRJYnpZWMeGdejsVpx7Q1pdEuDq4AZvd/6urmZnd+/w1dpJ+S3gTdzjTNjHRx KS5wp5hwFYt33kjQEJm4JgD6pxvMqsIiKgGHYrf20cvSgZADn25EyDRkQCbvt54hfyKe W9PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=vhAnwTswPaT0oq75fEwTl2NhsQwgpR58uPigaICIDUU=; b=XcUI8pVqQHnFQPshKM8HMl4DNqX1Vt00kVL1kKJNpT4rhNrGWm9YTUqsKTiiYm0lHl R7rdU9e8GIO/+VHRLMjpqEY08hHTj/Up5VaJPNVF2Qev3Cmr+2ATvvUZ+cjZv3yObnTC xpr//4ZZWFUOu/QuQh1g+C/dxL0g5tqGoRdRjZ4DK/Gyt0TD2ATwBMEJQysHm971sw6b 5S8X94x4Q2UmYV9qfH82rp65wtAc1Hr5RYhTnyo8IVzIMVqijclCnK9DOf2WsnMDuMug dnlHWyIT+DPU6u/HV9YAbzS81uPRobQ2vAmfVRo7zz/CvD8CponCEl3pTwQLF2/iK7Q2 mpYg== X-Gm-Message-State: AJcUukf6DMFioXj1QopS0O9dAdrPfFR3QnwXFLxD6hMZnQ3DPTfEFZQL gXyOFl7yz8UeGpfU59Xod6E= X-Google-Smtp-Source: ALg8bN4tvXa8Vjz8wNULK7j+JKM0bzUWFXIsLCKr4Q90evHI/AmaI+5RvTzK39Oh1KAF1RdGZd1ksw== X-Received: by 2002:a17:902:d202:: with SMTP id t2mr23762837ply.193.1548711434562; Mon, 28 Jan 2019 13:37:14 -0800 (PST) Received: from ast-mbp ([2620:10d:c090:200::4:db56]) by smtp.gmail.com with ESMTPSA id o189sm62765149pfg.117.2019.01.28.13.37.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Jan 2019 13:37:13 -0800 (PST) Date: Mon, 28 Jan 2019 13:37:12 -0800 From: Alexei Starovoitov To: Peter Zijlstra Cc: Alexei Starovoitov , davem@davemloft.net, daniel@iogearbox.net, jakub.kicinski@netronome.com, netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com, will.deacon@arm.com, Paul McKenney , jannh@google.com Subject: Re: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock Message-ID: <20190128213710.vjxnc2eq5rsisgfx@ast-mbp> References: <20190124041403.2100609-1-ast@kernel.org> <20190124041403.2100609-2-ast@kernel.org> <20190124180109.GA27771@hirez.programming.kicks-ass.net> <20190124235857.xyb5xx2ufr6x5mbt@ast-mbp.dhcp.thefacebook.com> <20190125091057.GK17749@hirez.programming.kicks-ass.net> <20190125234241.soomtkrgp2i7m7ul@ast-mbp.dhcp.thefacebook.com> <20190128084310.GC28467@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190128084310.GC28467@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20180223 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon, Jan 28, 2019 at 09:43:10AM +0100, Peter Zijlstra wrote: > On Fri, Jan 25, 2019 at 03:42:43PM -0800, Alexei Starovoitov wrote: > > On Fri, Jan 25, 2019 at 10:10:57AM +0100, Peter Zijlstra wrote: > > > > What about the progs that run from SoftIRQ ? Since that bpf_prog_active > > > thing isn't inside BPF_PROG_RUN() what is to stop say: > > > > > > reuseport_select_sock() > > > ... > > > BPF_PROG_RUN() > > > bpf_spin_lock() > > > > > > ... > > > BPF_PROG_RUN() > > > bpf_spin_lock() // forever more > > > > > > > > > > > > Unless you stick that bpf_prog_active stuff inside BPF_PROG_RUN itself, > > > I don't see how you can fundamentally avoid this happening (now or in > > > the future). > > > But your issue above is valid. > > > We don't use bpf_prog_active for networking progs, since we allow > > for one level of nesting due to the classic SKF_AD_PAY_OFFSET legacy. > > Also we allow tracing progs to nest with networking progs. > > People using this actively. > > Typically it's not an issue, since in networking there is no > > arbitrary nesting (unlike kprobe/nmi in tracing), > > but for bpf_spin_lock it can be, since the same map can be shared > > by networking and tracing progs and above deadlock would be possible: > > (first BPF_PROG_RUN will be from networking prog, then kprobe+bpf's > > BPF_PROG_RUN accessing the same map with bpf_spin_lock) > > > > So for now I'm going to allow bpf_spin_lock in networking progs only, > > since there is no arbitrary nesting there. > > Isn't that still broken? AFAIU networking progs can happen in task > context (TX) and SoftIRQ context (RX), which can nest. Sure. sendmsg side of networking can be interrupted by napi receive. Both can have bpf progs attached at different points, but napi won't run when bpf prog is running, because bpf prog disables preemption. More so the whole networking stack can be recursive and there is xmit_recursion counter to check for bad cases. When bpf progs interact with networking they don't add to that recursion. All of *redirect*() helpers do so outside of bpf preempt disabled context. Also there is no nesting of the same networking prog type. Like xdp/tc/lwt/cgroup bpf progs cannot be called recursively by design. There are no arbitrary entry points unlike kprobe/tracepoint. The only nesting is when socket filter _classic_ bpf prog is calling SKF_AD_PAY_OFFSET legacy. That calls flow dissector which may call flow dissector bpf prog. Classic bpf doesn't use bpf maps, so no deadlock issues. > > And once we figure out the safety concerns for kprobe/tracepoint progs > > we can enable bpf_spin_lock there too. > > NMI bpf progs will never have bpf_spin_lock. > > kprobe is like NMI, since it pokes an INT3 instruction which can trigger > in the middle of IRQ-disabled or even in NMIs. Similar arguments can be > made for tracepoints, they can happen 'anywhere'. exactly. that's why there is bpf_prog_active to protect the kernel in general for tracing bpf progs.