Re: [PATCH v2 bpf-next 00/13] Atomics for eBPF

From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Brendan Jackman <jackmanb@google.com>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	KP Singh <kpsingh@chromium.org>,
	Florent Revest <revest@chromium.org>,
	open list <linux-kernel@vger.kernel.org>,
	Jann Horn <jannh@google.com>
Subject: Re: [PATCH v2 bpf-next 00/13] Atomics for eBPF
Date: Wed, 2 Dec 2020 00:03:15 -0800	[thread overview]
Message-ID: <d423d325-b861-1954-a3ea-cd7b63aa02fc@fb.com> (raw)
In-Reply-To: <31a67edd-4837-cfd3-c9fe-a6942ebd87bb@fb.com>

On 12/1/20 9:05 PM, Yonghong Song wrote:
> 
> 
> On 12/1/20 6:00 PM, Andrii Nakryiko wrote:
>> On Mon, Nov 30, 2020 at 7:51 PM Yonghong Song <yhs@fb.com> wrote:
>>>
>>>
>>>
>>> On 11/30/20 9:22 AM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 11/28/20 5:40 PM, Alexei Starovoitov wrote:
>>>>> On Fri, Nov 27, 2020 at 09:53:05PM -0800, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 11/27/20 9:57 AM, Brendan Jackman wrote:
>>>>>>> Status of the patches
>>>>>>> =====================
>>>>>>>
>>>>>>> Thanks for the reviews! Differences from v1->v2 [1]:
>>>>>>>
>>>>>>> * Fixed mistakes in the netronome driver
>>>>>>>
>>>>>>> * Addd sub, add, or, xor operations
>>>>>>>
>>>>>>> * The above led to some refactors to keep things readable. (Maybe I
>>>>>>>      should have just waited until I'd implemented these before 
>>>>>>> starting
>>>>>>>      the review...)
>>>>>>>
>>>>>>> * Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
>>>>>>>      include the BPF_FETCH flag
>>>>>>>
>>>>>>> * Added a bit of documentation. Suggestions welcome for more places
>>>>>>>      to dump this info...
>>>>>>>
>>>>>>> The prog_test that's added depends on Clang/LLVM features added by
>>>>>>> Yonghong in
>>>>>>> https://reviews.llvm.org/D72184 
>>>>>>>
>>>>>>> This only includes a JIT implementation for x86_64 - I don't plan to
>>>>>>> implement JIT support myself for other architectures.
>>>>>>>
>>>>>>> Operations
>>>>>>> ==========
>>>>>>>
>>>>>>> This patchset adds atomic operations to the eBPF instruction set. 
>>>>>>> The
>>>>>>> use-case that motivated this work was a trivial and efficient way to
>>>>>>> generate globally-unique cookies in BPF progs, but I think it's
>>>>>>> obvious that these features are pretty widely applicable.  The
>>>>>>> instructions that are added here can be summarised with this list of
>>>>>>> kernel operations:
>>>>>>>
>>>>>>> * atomic[64]_[fetch_]add
>>>>>>> * atomic[64]_[fetch_]sub
>>>>>>> * atomic[64]_[fetch_]and
>>>>>>> * atomic[64]_[fetch_]or
>>>>>>
>>>>>> * atomic[64]_[fetch_]xor
>>>>>>
>>>>>>> * atomic[64]_xchg
>>>>>>> * atomic[64]_cmpxchg
>>>>>>
>>>>>> Thanks. Overall looks good to me but I did not check carefully
>>>>>> on jit part as I am not an expert in x64 assembly...
>>>>>>
>>>>>> This patch also introduced atomic[64]_{sub,and,or,xor}, similar to
>>>>>> xadd. I am not sure whether it is necessary. For one thing,
>>>>>> users can just use atomic[64]_fetch_{sub,and,or,xor} to ignore
>>>>>> return value and they will achieve the same result, right?
>>>>>>   From llvm side, there is no ready-to-use gcc builtin matching
>>>>>> atomic[64]_{sub,and,or,xor} which does not have return values.
>>>>>> If we go this route, we will need to invent additional bpf
>>>>>> specific builtins.
>>>>>
>>>>> I think bpf specific builtins are overkill.
>>>>> As you said the users can use atomic_fetch_xor() and ignore
>>>>> return value. I think llvm backend should be smart enough to use
>>>>> BPF_ATOMIC | BPF_XOR insn without BPF_FETCH bit in such case.
>>>>> But if it's too cumbersome to do at the moment we skip this
>>>>> optimization for now.
>>>>
>>>> We can initially all have BPF_FETCH bit as at that point we do not
>>>> have def-use chain. Later on we can add a
>>>> machine ssa IR phase and check whether the result of, say
>>>> atomic_fetch_or(), is used or not. If not, we can change the
>>>> instruction to atomic_or.
>>>
>>> Just implemented what we discussed above in llvm:
>>>     
>>> https://reviews.llvm.org/D72184 
>>> main change:
>>>     1. atomic_fetch_sub (and later atomic_sub) is gone. llvm will
>>>        transparently transforms it to negation followed by
>>>        atomic_fetch_add or atomic_add (xadd). Kernel can remove
>>>        atomic_fetch_sub/atomic_sub insns.
>>>     2. added new instructions for atomic_{and, or, xor}.
>>>     3. for gcc builtin e.g., __sync_fetch_and_or(), if return
>>>        value is used, atomic_fetch_or will be generated. Otherwise,
>>>        atomic_or will be generated.
>>
>> Great, this means that all existing valid uses of
>> __sync_fetch_and_add() will generate BPF_XADD instructions and will
>> work on old kernels, right?
> 
> That is correct.
> 
>>
>> If that's the case, do we still need cpu=v4? The new instructions are
>> *only* going to be generated if the user uses previously unsupported
>> __sync_fetch_xxx() intrinsics. So, in effect, the user consciously
>> opts into using new BPF instructions. cpu=v4 seems like an unnecessary
>> tautology then?
> 
> This is a very good question. Essentially this boils to when users can 
> use the new functionality including meaningful return value  of 
> __sync_fetch_and_add().
>    (1). user can write a small bpf program to test the feature. If user
>         gets a failed compilation (fatal error), it won't be supported.
>         Otherwise, it is supported.
>    (2). compiler provides some way to tell user it is safe to use, e.g.,
>         -mcpu=v4, or some clang macro suggested by Brendan earlier.
> 
> I guess since kernel already did a lot of feature discovery. Option (1)
> is probably fine.

Just pushed a new llvm version (https://reviews.llvm.org/D72184) which
removed -mcpu=v4. The new instructions will be generated by default
for 64bit type. For 32bit type, alu32 is required. Currently -mcpu=v3
already has alu32 as default and kernel supporting atomic insns should
have good alu32 support too. So I decided to have skip non-alu32 32bit
mode. But if people feel strongly to support non-alu32 32bit mode atomic
instructions, I can add them in llvm... The instruction encodings are
the same for alu32/non-alu32 32bit mode so the kernel will not be
impacted.