From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755711AbdGXMYQ (ORCPT ); Mon, 24 Jul 2017 08:24:16 -0400 Received: from merlin.infradead.org ([205.233.59.134]:48108 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752626AbdGXMYI (ORCPT ); Mon, 24 Jul 2017 08:24:08 -0400 Date: Mon, 24 Jul 2017 14:23:27 +0200 From: Peter Zijlstra To: Michael Ellerman Cc: Kees Cook , Andrew Morton , Ingo Molnar , Josh Poimboeuf , Christoph Hellwig , "Eric W. Biederman" , Jann Horn , Eric Biggers , Elena Reshetova , Hans Liljestrand , Greg KH , Alexey Dobriyan , "Serge E. Hallyn" , arozansk@redhat.com, Davidlohr Bueso , Manfred Spraul , "axboe@kernel.dk" , James Bottomley , "x86@kernel.org" , Arnd Bergmann , "David S. Miller" , Rik van Riel , LKML , linux-arch , "kernel-hardening@lists.openwall.com" Subject: Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection Message-ID: <20170724122327.z6p4w5yvirnbuvfd@hirez.programming.kicks-ass.net> References: <1500422614-94821-1-git-send-email-keescook@chromium.org> <20170720091106.kigtr6zy7pjgk2s6@gmail.com> <20170721142255.586224f0db9cf0714e654859@linux-foundation.org> <87zibujq5t.fsf@concordia.ellerman.id.au> <20170724084455.o5asa55fckgyjri2@hirez.programming.kicks-ass.net> <87k22yjatf.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87k22yjatf.fsf@concordia.ellerman.id.au> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 24, 2017 at 10:09:32PM +1000, Michael Ellerman wrote: > Peter Zijlstra writes: > > anyway, and the fact that your LL/SC is horrendously slow in any case. > > Boo :/ :-) > Just kidding. I suspect you're right that we can probably pack a > reasonable amount of tests in the body of the LL/SC and not notice. > > > Also, I still haven't seen an actual benchmark where our cmpxchg loop > > actually regresses anything, just a lot of yelling about potential > > regressions :/ > > Heh yeah. Though I have looked at the code it generates on PPC and it's > not sleek, though I guess that's not a benchmark is it :) Oh for sure, GCC still can't sanely convert a cmpxchg loop (esp. if the cmpxchg is implemented using asm) into a native LL/SC sequence, so the generic code will end up looking pretty horrendous. A native implementation of the same semantics should look loads better. One thing that might help you is that refcount_dec_and_test() is weaker than atomic_dec_and_test() wrt ordering, so that might help some (RELEASE vs fully ordered).