From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=auZG=QG=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_NEOMUTT
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D0425C169C4
	for <netdev@archiver.kernel.org>; Wed, 30 Jan 2019 02:32:17 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 8FCDB2175B
	for <netdev@archiver.kernel.org>; Wed, 30 Jan 2019 02:32:17 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RaXaust5"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1729866AbfA3CcQ (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Tue, 29 Jan 2019 21:32:16 -0500
Received: from mail-pg1-f195.google.com ([209.85.215.195]:38615 "EHLO
        mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1727942AbfA3CcQ (ORCPT
        <rfc822;netdev@vger.kernel.org>); Tue, 29 Jan 2019 21:32:16 -0500
Received: by mail-pg1-f195.google.com with SMTP id g189so9662887pgc.5
        for <netdev@vger.kernel.org>; Tue, 29 Jan 2019 18:32:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to:user-agent;
        bh=t86bJEd9VVuDkLInlY2nU/bvCPKXIQMq9RtZ7KPN5Z4=;
        b=RaXaust5uEx1BCzDk74HIIM/djD5e2Hg73wy4cQWp0u2aNCXXNPjM3DTtTj8jMW/nt
         zL4ucmeVlg2k+LvKXCXXvnGglLLVP2ScB3RO+TZny9hPcZOmDCZIHxOJ39/uMouDMbTQ
         ZDIGZ60+MfU9R3xmAqvOH5RTdJL1k0UcVzqW1QqN978x+afE286nskryvE9y1grdH0gL
         bNeDaxabL4zjsFXI38BcaRCsZMqWydqH2mHjUHAYR7k/nlpc/JKpl9kv/Gr81oEn3xox
         wdQna8XOhbo0QXHTsG6SCQstjMppOT4+qArQXINCgOU/upSkfv6/c8adKyXVv14Z0Xjs
         mrig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to:user-agent;
        bh=t86bJEd9VVuDkLInlY2nU/bvCPKXIQMq9RtZ7KPN5Z4=;
        b=dLJwvuLeQnndzLX/meepRioxctJU8sPsEoXn//IxrV7ZhVyfidLN3rixvuPjDGCuUZ
         iRE5QnQCnU5+PpeEHtbt5FMCNBAiACdrQz9uOyLFz1SzRu64dZ6Z0+HtMFvx9OeLZcl6
         Rv5c8/8ERlJjfa3mBHYC1FVPo0DyYdbPUb0JkrKcVcnt7r1OjAjRcnCVXoLpQFNWekPb
         3QcKfm1U/ADt9adQmUPDMpknqoa6LcoWY6u+oOApJavXyYbzB8uzpjrMgRteI1aEH7Mw
         yuFLosFIBdu1qqp+U3RRZdXRcciq1sPk4AqiXwv1YcHWu7UVhzlQj+TlLpGh6QaZj3iv
         OZyg==
X-Gm-Message-State: AJcUukdZgGJTmHfhtXXjLx+zgtO5pvDK0ntyRayZ+YJ57fuzoggkY9l9
        eSBiVgkfGA1mKJKhtXf2w/g=
X-Google-Smtp-Source: ALg8bN54/x5+xPsJqKtGuTZOVfxOdT6T9WRCedZ3o9y61L2s1UX1/YpiJjK7KyuDyhh6CzUoPc/zHw==
X-Received: by 2002:a63:9f19:: with SMTP id g25mr25952990pge.327.1548815535274;
        Tue, 29 Jan 2019 18:32:15 -0800 (PST)
Received: from ast-mbp.dhcp.thefacebook.com ([2620:10d:c090:200::5:6f31])
        by smtp.gmail.com with ESMTPSA id w10sm126356pgi.81.2019.01.29.18.32.13
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Tue, 29 Jan 2019 18:32:14 -0800 (PST)
Date:   Tue, 29 Jan 2019 18:32:13 -0800
From:   Alexei Starovoitov <alexei.starovoitov@gmail.com>
To:     Peter Zijlstra <peterz@infradead.org>
Cc:     Alexei Starovoitov <ast@kernel.org>, davem@davemloft.net,
        daniel@iogearbox.net, jakub.kicinski@netronome.com,
        netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com,
        will.deacon@arm.com, Paul McKenney <paulmck@linux.vnet.ibm.com>,
        jannh@google.com
Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce
 bpf_spin_lock
Message-ID: <20190130023212.zs4d6hws5tsfl5uc@ast-mbp.dhcp.thefacebook.com>
References: <20190124041403.2100609-1-ast@kernel.org>
 <20190124041403.2100609-2-ast@kernel.org>
 <20190124180109.GA27771@hirez.programming.kicks-ass.net>
 <20190124235857.xyb5xx2ufr6x5mbt@ast-mbp.dhcp.thefacebook.com>
 <20190125102312.GC4500@hirez.programming.kicks-ass.net>
 <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com>
 <20190128092408.GD28467@hirez.programming.kicks-ass.net>
 <20190128215623.6eqskzhklydhympa@ast-mbp>
 <20190129091654.GD28485@hirez.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190129091654.GD28485@hirez.programming.kicks-ass.net>
User-Agent: NeoMutt/20180223
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Tue, Jan 29, 2019 at 10:16:54AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> 
> > > Ah, but the loop won't be in the BPF program itself. The BPF program
> > > would only have had the BPF_SPIN_LOCK instruction, the JIT them emits
> > > code similar to queued_spin_lock()/queued_spin_unlock() (or calls to
> > > out-of-line versions of them).
> > 
> > As I said we considered exactly that and such approach has a lot of downsides
> > comparing with the helper approach.
> > Pretty much every time new feature is added we're evaluating whether it
> > should be new instruction or new helper. 99% of the time we go with new helper.
> 
> Ah; it seems I'm confused on helper vs instruction. As in, I've no idea
> what a helper is.

bpf helper is a normal kernel function that can be called from bpf program.
In assembler it's a direct function call.

> > > There isn't anything that mandates the JIT uses the exact same locking
> > > routines the interpreter does, is there?
> > 
> > sure. This bpf_spin_lock() helper can be optimized whichever way the kernel wants.
> > Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map types.
> > JITs don't even need to do anything. It looks like function call from bpf prog
> > point of view, but in JITed code it is a sequence of native instructions.
> > 
> > Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock()
> > takes too much time then we can inline fast path of queued_spin_lock
> > directly into bpf prog and save function call cost.
> 
> OK, so then the JIT can optimize helpers. Would it not make sense to
> have the simple test-and-set spinlock in the generic code and have the
> JITs use arch_spinlock_t where appropriate?

I think that pretty much the same as what I have with qspinlock.
Instead of taking a risk how JIT writers implement bpf_spin_lock optimization
I'm using qspinlock on architectures that are known to support it.
So instead of starting with dumb test-and-set there will be faster
qspinlock from the start on x86, arm64 and few others archs.
Those are the archs we care about the most anyway. Other archs can take
time to optimize it (if optimizations are necessary at all).
In general hacking JITs is much harder and more error prone than
changing core and adding helpers. Hence we avoid touching JITs
as much as possible.
Like map_lookup inlining optimization we do only when JIT is on.
And we do it purely in the generic core. See array_map_gen_lookup().
We generate bpf instructions only to feed them into JITs so they
can replace them with native asm. That is much easier to implement
correctly than if we were doing inlining in every JIT independently.