From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=auZG=QG=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 28DFCC282D9
	for <netdev@archiver.kernel.org>; Wed, 30 Jan 2019 21:06:06 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id D342B20881
	for <netdev@archiver.kernel.org>; Wed, 30 Jan 2019 21:06:05 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732218AbfA3VGE (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Wed, 30 Jan 2019 16:06:04 -0500
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:52930 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1727119AbfA3VGD (ORCPT
        <rfc822;netdev@vger.kernel.org>); Wed, 30 Jan 2019 16:06:03 -0500
Received: from pps.filterd (m0098419.ppops.net [127.0.0.1])
        by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x0UL4xLV068802
        for <netdev@vger.kernel.org>; Wed, 30 Jan 2019 16:06:02 -0500
Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203])
        by mx0b-001b2d01.pphosted.com with ESMTP id 2qbg1a9x8c-1
        (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT)
        for <netdev@vger.kernel.org>; Wed, 30 Jan 2019 16:06:01 -0500
Received: from localhost
        by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
        for <netdev@vger.kernel.org> from <paulmck@linux.vnet.ibm.com>;
        Wed, 30 Jan 2019 21:06:00 -0000
Received: from b01cxnp23034.gho.pok.ibm.com (9.57.198.29)
        by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;
        (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)
        Wed, 30 Jan 2019 21:05:38 -0000
Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108])
        by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x0UL5bZ022282342
        (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL);
        Wed, 30 Jan 2019 21:05:37 GMT
Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1])
        by IMSVA (Postfix) with ESMTP id 212B4B2065;
        Wed, 30 Jan 2019 21:05:37 +0000 (GMT)
Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1])
        by IMSVA (Postfix) with ESMTP id E5C0CB205F;
        Wed, 30 Jan 2019 21:05:36 +0000 (GMT)
Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.57])
        by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP;
        Wed, 30 Jan 2019 21:05:36 +0000 (GMT)
Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000)
        id E077F16C69B0; Wed, 30 Jan 2019 13:05:36 -0800 (PST)
Date:   Wed, 30 Jan 2019 13:05:36 -0800
From:   "Paul E. McKenney" <paulmck@linux.ibm.com>
To:     Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc:     Will Deacon <will.deacon@arm.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Alexei Starovoitov <ast@kernel.org>, davem@davemloft.net,
        daniel@iogearbox.net, jakub.kicinski@netronome.com,
        netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com,
        jannh@google.com
Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce
 bpf_spin_lock
Reply-To: paulmck@linux.ibm.com
References: <20190124041403.2100609-2-ast@kernel.org>
 <20190124180109.GA27771@hirez.programming.kicks-ass.net>
 <20190124235857.xyb5xx2ufr6x5mbt@ast-mbp.dhcp.thefacebook.com>
 <20190125102312.GC4500@hirez.programming.kicks-ass.net>
 <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com>
 <20190128092408.GD28467@hirez.programming.kicks-ass.net>
 <20190128215623.6eqskzhklydhympa@ast-mbp>
 <20190130181100.GA18558@fuggles.cambridge.arm.com>
 <20190130183618.GX4240@linux.ibm.com>
 <20190130195113.xyqre4sxasit6vpu@ast-mbp.dhcp.thefacebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190130195113.xyqre4sxasit6vpu@ast-mbp.dhcp.thefacebook.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
x-cbid: 19013021-0064-0000-0000-000003A03EF8
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00010505; HX=3.00000242; KW=3.00000007;
 PH=3.00000004; SC=3.00000277; SDB=6.01154080; UDB=6.00601739; IPR=6.00934457;
 MB=3.00025360; MTD=3.00000008; XFM=3.00000015; UTC=2019-01-30 21:05:59
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19013021-0065-0000-0000-00003C3C07B7
Message-Id: <20190130210536.GY4240@linux.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-01-30_15:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501
 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0
 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.0.1-1810050000 definitions=main-1901300156
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Wed, Jan 30, 2019 at 11:51:14AM -0800, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 10:36:18AM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 30, 2019 at 06:11:00PM +0000, Will Deacon wrote:
> > > Hi Alexei,
> > > 
> > > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > > > > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > > > > What I want to avoid is to define the whole execution ordering model upfront.
> > > > > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > > > > Most of the bpf progs are written and running on x86. We shouldn't
> > > > > > twist bpf developer's arm by artificially relaxing memory model.
> > > > > > BPF memory model is equal to memory model of underlying architecture.
> > > > > > What we can do is to make it bpf progs a bit more portable with
> > > > > > smp_rmb instructions, but we must not force weak execution on the developer.
> > > > > 
> > > > > Well, I agree with only introducing bits you actually need, and my
> > > > > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > > > > smp_store_release() might have been a far more useful example.
> > > > > 
> > > > > But I disagree with the last part; we have to pick a model now;
> > > > > otherwise you'll pain yourself into a corner.
> > > > > 
> > > > > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > > > > be gaining a lot of attention and that is very much a weak architecture.
> > > > > Adding strongly ordered assumptions to BPF now, will penalize them in
> > > > > the long run.
> > > > 
> > > > arm64 is gaining attention just like riscV is gaining it too.
> > > > BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> > > > BPF is not picking sides in CPU HW and ISA battles.
> > > 
> > > It's not about picking a side, it's about providing an abstraction of the
> > > various CPU architectures out there so that the programmer doesn't need to
> > > worry about where their program may run. Hell, even if you just said "eBPF
> > > follows x86 semantics" that would be better than saying nothing (and then we
> > > could have a discussion about whether x86 semantics are really what you
> > > want).
> > 
> > To reinforce this point, the Linux-kernel memory model (tools/memory-model)
> > is that abstraction for the Linux kernel.  Why not just use that for BPF?
> 
> I already answered this earlier in the thread.
> tldr: not going to sacrifice performance.

Understood.

But can we at least say that where there are no performance consequences,
BPF should follow LKMM?  You already mentioned smp_load_acquire()
and smp_store_release(), but the void atomics (e.g., atomic_inc())
should also work because they don't provide any ordering guarantees.
The _relaxed(), _release(), and _acquire() variants of the value-returning
atomics should be just fine as well.

The other value-returning atomics have strong ordering, which is fine
on many systems, but potentially suboptimal for the weakly ordered ones.
Though you have to have pretty good locality of reference to be able to
see the difference, because otherwise cache-miss overhead dominates.

Things like cmpxchg() don't seem to fit BPF because they are normally
used in spin loops, though there are some non-spinning use cases.

You correctly pointed out that READ_ONCE() and WRITE_ONCE() are suboptimal
on systems that don't support all sizes of loads, but I bet that there
are some sizes for which they are just fine across systems, for example,
pointer size and int size.

Does that help?  Or am I missing additional cases where performance
could be degraded?

							Thanx, Paul