From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D704BC282C3 for ; Thu, 24 Jan 2019 18:01:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 977BE2184C for ; Thu, 24 Jan 2019 18:01:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="kslPbxhj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728823AbfAXSBj (ORCPT ); Thu, 24 Jan 2019 13:01:39 -0500 Received: from merlin.infradead.org ([205.233.59.134]:58512 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727573AbfAXSBj (ORCPT ); Thu, 24 Jan 2019 13:01:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=VincyZU80Mzs65HqnXMFYN2bnKkFVbqorrVxaTr5C1o=; b=kslPbxhjR2UQFBwq+Eo7jRlXe prDsK5yLe8ye5G+hr1bmxzqGf6aKt2HQKeqVPEmy38lg3vxejdEpqH5yDC29UkdMUD5BkCSrn3Xce wNPy0BOCXLaUiev5K2+PlWq+hoEJIvEn6WtMBVLdyv0xWgPUCMmO+oBv78ukpyBdGhdLtEhTq5buc hhTj8ntZWQHj9gLrjk1bZm7nOt/bvvNIq+sTLeokX4lGurT1myX9eDDHDbF5yooCBT/vfcnoK/1OF 29xJ3Am3iW8T3ctqNvcdJnFerq15rULOuwZ0Hk3UPw8nTaDyKIa4fbKqRS1q9pCketfofntNPLEzv CoT3xI5iw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1gmjJQ-0005vM-5K; Thu, 24 Jan 2019 18:01:12 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 8EF092392872B; Thu, 24 Jan 2019 19:01:09 +0100 (CET) Date: Thu, 24 Jan 2019 19:01:09 +0100 From: Peter Zijlstra To: Alexei Starovoitov Cc: davem@davemloft.net, daniel@iogearbox.net, jakub.kicinski@netronome.com, netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com, will.deacon@arm.com, Paul McKenney , jannh@google.com Subject: Re: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock Message-ID: <20190124180109.GA27771@hirez.programming.kicks-ass.net> References: <20190124041403.2100609-1-ast@kernel.org> <20190124041403.2100609-2-ast@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190124041403.2100609-2-ast@kernel.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Thanks for having kernel/locking people on Cc... On Wed, Jan 23, 2019 at 08:13:55PM -0800, Alexei Starovoitov wrote: > Implementation details: > - on !SMP bpf_spin_lock() becomes nop Because no BPF program is preemptible? I don't see any assertions or even a comment that says this code is non-preemptible. AFAICT some of the BPF_RUN_PROG things are under rcu_read_lock() only, which is not sufficient. > - on architectures that don't support queued_spin_lock trivial lock is used. > Note that arch_spin_lock cannot be used, since not all archs agree that > zero == unlocked and sizeof(arch_spinlock_t) != sizeof(__u32). I really don't much like direct usage of qspinlock; esp. not as a surprise. Why does it matter if 0 means unlocked; that's what __ARCH_SPIN_LOCK_UNLOCKED is for. I get the sizeof(__u32) thing, but why not key off of that? > Next steps: > - allow bpf_spin_lock in other map types (like cgroup local storage) > - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper > to request kernel to grab bpf_spin_lock before rewriting the value. > That will serialize access to map elements. So clearly this map stuff is shared between bpf proglets, otherwise there would not be a need for locking. But what happens if one is from task context and another from IRQ context? I don't see a local_irq_save()/restore() anywhere. What avoids the trivial lock inversion? > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index a74972b07e74..2e98e4caf5aa 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -221,6 +221,63 @@ const struct bpf_func_proto bpf_get_current_comm_proto = { > .arg2_type = ARG_CONST_SIZE, > }; > > +#ifndef CONFIG_QUEUED_SPINLOCKS > +struct dumb_spin_lock { > + atomic_t val; > +}; > +#endif > + > +notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) > +{ > +#if defined(CONFIG_SMP) > +#ifdef CONFIG_QUEUED_SPINLOCKS > + struct qspinlock *qlock = (void *)lock; > + > + BUILD_BUG_ON(sizeof(*qlock) != sizeof(*lock)); > + queued_spin_lock(qlock); > +#else > + struct dumb_spin_lock *qlock = (void *)lock; > + > + BUILD_BUG_ON(sizeof(*qlock) != sizeof(*lock)); > + do { > + while (atomic_read(&qlock->val) != 0) > + cpu_relax(); > + } while (atomic_cmpxchg(&qlock->val, 0, 1) != 0); > +#endif > +#endif > + return 0; > +} > + > +const struct bpf_func_proto bpf_spin_lock_proto = { > + .func = bpf_spin_lock, > + .gpl_only = false, > + .ret_type = RET_VOID, > + .arg1_type = ARG_PTR_TO_SPIN_LOCK, > +}; > + > +notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) > +{ > +#if defined(CONFIG_SMP) > +#ifdef CONFIG_QUEUED_SPINLOCKS > + struct qspinlock *qlock = (void *)lock; > + > + queued_spin_unlock(qlock); > +#else > + struct dumb_spin_lock *qlock = (void *)lock; > + > + atomic_set(&qlock->val, 0); And this is broken... That should've been atomic_set_release() at the very least. And this would again be the moment where I go pester you about the BPF memory model :-) > +#endif > +#endif > + return 0; > +} > + > +const struct bpf_func_proto bpf_spin_unlock_proto = { > + .func = bpf_spin_unlock, > + .gpl_only = false, > + .ret_type = RET_VOID, > + .arg1_type = ARG_PTR_TO_SPIN_LOCK, > +};