From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B75DECDFAA for ; Mon, 16 Jul 2018 14:40:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C5A62208AD for ; Mon, 16 Jul 2018 14:40:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5A62208AD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ellerman.id.au Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729595AbeGPPIQ (ORCPT ); Mon, 16 Jul 2018 11:08:16 -0400 Received: from ozlabs.org ([203.11.71.1]:50727 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727150AbeGPPIQ (ORCPT ); Mon, 16 Jul 2018 11:08:16 -0400 Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPSA id 41TmKh5hhYz9ryt; Tue, 17 Jul 2018 00:40:24 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ellerman.id.au From: Michael Ellerman To: Peter Zijlstra Cc: Linus Torvalds , Paul McKenney , Alan Stern , andrea.parri@amarulasolutions.com, Will Deacon , Akira Yokosawa , Boqun Feng , Daniel Lustig , David Howells , Jade Alglave , Luc Maranget , Nick Piggin , Linux Kernel Mailing List Subject: Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire In-Reply-To: <20180713164239.GZ2494@hirez.programming.kicks-ass.net> References: <20180712134821.GT2494@hirez.programming.kicks-ass.net> <20180712172838.GU3593@linux.vnet.ibm.com> <20180712180511.GP2476@hirez.programming.kicks-ass.net> <20180713110851.GY2494@hirez.programming.kicks-ass.net> <87tvp3xonl.fsf@concordia.ellerman.id.au> <20180713164239.GZ2494@hirez.programming.kicks-ass.net> Date: Tue, 17 Jul 2018 00:40:19 +1000 Message-ID: <87601fz1kc.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Peter Zijlstra writes: > On Fri, Jul 13, 2018 at 11:15:26PM +1000, Michael Ellerman wrote: ... >> >> >> So 18-32% slower, or 23-47 cycles. > > Very good info. Note that another option is to put the SYNC in lock() it > doesn't really matter which of the two primitives gets it. I don't > suppose it really matters for timing either way around. If the numbers can be trusted it is actually slower to put the sync in lock, at least on one of the machines: Time lwsync_sync 84,932,987,977 sync_lwsync 93,185,930,333 On the other machine it's slower but only by 0.1%, so that's slightly weird. The other advantage of putting the sync in unlock is we could get rid of our SYNC_IO logic, which conditionally puts a sync in unlock to order IO accesses vs unlock. >> Next week I can do some macro benchmarks, to see if it's actually >> detectable at all. I guess arguably it's not a very macro benchmark, but we have a context_switch benchmark in the tree[1] which we often use to tune things, and it degrades badly. It just spins up two threads and has them ping-pong using yield. The numbers are context switch iterations, so more == better. | Before | After | Change | Change % +------------+------------+------------+---------- | 35,601,160 | 32,371,164 | -3,229,996 | -9.07% | 35,762,126 | 32,438,798 | -3,323,328 | -9.29% | 35,690,870 | 32,353,676 | -3,337,194 | -9.35% | 35,440,346 | 32,336,750 | -3,103,596 | -8.76% | 35,614,868 | 32,676,378 | -2,938,490 | -8.25% | 35,659,690 | 32,462,624 | -3,197,066 | -8.97% | 35,594,058 | 32,403,922 | -3,190,136 | -8.96% | 35,682,682 | 32,353,146 | -3,329,536 | -9.33% | 35,954,454 | 32,306,168 | -3,648,286 | -10.15% | 35,849,314 | 32,291,094 | -3,558,220 | -9.93% ----------+------------+------------+------------+---------- Average | 35,684,956 | 32,399,372 | -3,285,584 | -9.21% Std Dev | 143,877 | 111,385 | Std Dev % | 0.40% | 0.34% | [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/powerpc/benchmarks/context_switch.c I'll do some kernbench runs tomorrow and see if it shows up there. >> My personal preference would be to switch to sync, we don't want to be >> the only arch finding (or not finding!) exotic ordering bugs. >> >> But we'd also rather not make our slow locks any slower than they have >> to be. > > I completely understand, but I'll get you beer (lots) if you do manage > to make SYNC happen :-) :-) Just so we're clear Fosters is not beer :) cheers