From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00455C282CA for ; Tue, 12 Feb 2019 18:39:00 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C5643222C4 for ; Tue, 12 Feb 2019 18:38:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="IdSMMZeN" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5643222C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:References:To:From:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=1K/KuGYbZn7HhgxedV8a94/XDekdrzTddJvCYeXNodk=; b=IdSMMZeNiBx0PZ CUyBoQoLxSuYSa1liLKUITwltoUWJRY9G6ylICLxtrbqiRe9mnUdY8N+TT/oFo3ixSl/57MBGe40S +ekcivux047xWthbnqNcTIO77x1vuRAJP2rLhJUbige/VKDHXBpPz6hHCHlna4ouQAkX8SJFaFhsw G+O3FTOEfkOGXifuptEiwOgRO+za9ypdFp3e38ZxaWUddCP+9KpdLcaxQFR0NBtw79LNfUpiS3mP7 RWOp6/Av/xAlJcn9/xpZvZrMXSCBwEV4YWUJerLdbzy2KJkTCMauyRyUtUhYPruypIamqUQPvYuNd UpBPa/U0EyPmCcjpddvA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1gtcxJ-00015z-1V; Tue, 12 Feb 2019 18:38:53 +0000 Received: from mx1.redhat.com ([209.132.183.28]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1gtcxF-00015e-Pn for linux-arm-kernel@lists.infradead.org; Tue, 12 Feb 2019 18:38:51 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D0F6014D2A1; Tue, 12 Feb 2019 18:38:48 +0000 (UTC) Received: from llong.remote.csb (ovpn-124-193.rdu2.redhat.com [10.10.124.193]) by smtp.corp.redhat.com (Postfix) with ESMTP id 00FF25D962; Tue, 12 Feb 2019 18:38:44 +0000 (UTC) Subject: Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock() From: Waiman Long To: Peter Zijlstra References: <1549913486-16799-1-git-send-email-longman@redhat.com> <1549913486-16799-3-git-send-email-longman@redhat.com> <20190212132404.GI32494@hirez.programming.kicks-ass.net> <20190212132537.GL32534@hirez.programming.kicks-ass.net> Openpgp: preference=signencrypt Autocrypt: addr=longman@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFgsZGsBEAC3l/RVYISY3M0SznCZOv8aWc/bsAgif1H8h0WPDrHnwt1jfFTB26EzhRea XQKAJiZbjnTotxXq1JVaWxJcNJL7crruYeFdv7WUJqJzFgHnNM/upZuGsDIJHyqBHWK5X9ZO jRyfqV/i3Ll7VIZobcRLbTfEJgyLTAHn2Ipcpt8mRg2cck2sC9+RMi45Epweu7pKjfrF8JUY r71uif2ThpN8vGpn+FKbERFt4hW2dV/3awVckxxHXNrQYIB3I/G6mUdEZ9yrVrAfLw5M3fVU CRnC6fbroC6/ztD40lyTQWbCqGERVEwHFYYoxrcGa8AzMXN9CN7bleHmKZrGxDFWbg4877zX 0YaLRypme4K0ULbnNVRQcSZ9UalTvAzjpyWnlnXCLnFjzhV7qsjozloLTkZjyHimSc3yllH7 VvP/lGHnqUk7xDymgRHNNn0wWPuOpR97J/r7V1mSMZlni/FVTQTRu87aQRYu3nKhcNJ47TGY evz/U0ltaZEU41t7WGBnC7RlxYtdXziEn5fC8b1JfqiP0OJVQfdIMVIbEw1turVouTovUA39 Qqa6Pd1oYTw+Bdm1tkx7di73qB3x4pJoC8ZRfEmPqSpmu42sijWSBUgYJwsziTW2SBi4hRjU h/Tm0NuU1/R1bgv/EzoXjgOM4ZlSu6Pv7ICpELdWSrvkXJIuIwARAQABzR9Mb25nbWFuIExv bmcgPGxsb25nQHJlZGhhdC5jb20+wsF/BBMBAgApBQJYLGRrAhsjBQkJZgGABwsJCAcDAgEG FQgCCQoLBBYCAwECHgECF4AACgkQbjBXZE7vHeYwBA//ZYxi4I/4KVrqc6oodVfwPnOVxvyY oKZGPXZXAa3swtPGmRFc8kGyIMZpVTqGJYGD9ZDezxpWIkVQDnKM9zw/qGarUVKzElGHcuFN ddtwX64yxDhA+3Og8MTy8+8ZucM4oNsbM9Dx171bFnHjWSka8o6qhK5siBAf9WXcPNogUk4S fMNYKxexcUayv750GK5E8RouG0DrjtIMYVJwu+p3X1bRHHDoieVfE1i380YydPd7mXa7FrRl 7unTlrxUyJSiBc83HgKCdFC8+ggmRVisbs+1clMsK++ehz08dmGlbQD8Fv2VK5KR2+QXYLU0 rRQjXk/gJ8wcMasuUcywnj8dqqO3kIS1EfshrfR/xCNSREcv2fwHvfJjprpoE9tiL1qP7Jrq 4tUYazErOEQJcE8Qm3fioh40w8YrGGYEGNA4do/jaHXm1iB9rShXE2jnmy3ttdAh3M8W2OMK 4B/Rlr+Awr2NlVdvEF7iL70kO+aZeOu20Lq6mx4Kvq/WyjZg8g+vYGCExZ7sd8xpncBSl7b3 99AIyT55HaJjrs5F3Rl8dAklaDyzXviwcxs+gSYvRCr6AMzevmfWbAILN9i1ZkfbnqVdpaag QmWlmPuKzqKhJP+OMYSgYnpd/vu5FBbc+eXpuhydKqtUVOWjtp5hAERNnSpD87i1TilshFQm TFxHDzbOwU0EWCxkawEQALAcdzzKsZbcdSi1kgjfce9AMjyxkkZxcGc6Rhwvt78d66qIFK9D Y9wfcZBpuFY/AcKEqjTo4FZ5LCa7/dXNwOXOdB1Jfp54OFUqiYUJFymFKInHQYlmoES9EJEU yy+2ipzy5yGbLh3ZqAXyZCTmUKBU7oz/waN7ynEP0S0DqdWgJnpEiFjFN4/ovf9uveUnjzB6 lzd0BDckLU4dL7aqe2ROIHyG3zaBMuPo66pN3njEr7IcyAL6aK/IyRrwLXoxLMQW7YQmFPSw drATP3WO0x8UGaXlGMVcaeUBMJlqTyN4Swr2BbqBcEGAMPjFCm6MjAPv68h5hEoB9zvIg+fq M1/Gs4D8H8kUjOEOYtmVQ5RZQschPJle95BzNwE3Y48ZH5zewgU7ByVJKSgJ9HDhwX8Ryuia 79r86qZeFjXOUXZjjWdFDKl5vaiRbNWCpuSG1R1Tm8o/rd2NZ6l8LgcK9UcpWorrPknbE/pm MUeZ2d3ss5G5Vbb0bYVFRtYQiCCfHAQHO6uNtA9IztkuMpMRQDUiDoApHwYUY5Dqasu4ZDJk bZ8lC6qc2NXauOWMDw43z9He7k6LnYm/evcD+0+YebxNsorEiWDgIW8Q/E+h6RMS9kW3Rv1N qd2nFfiC8+p9I/KLcbV33tMhF1+dOgyiL4bcYeR351pnyXBPA66ldNWvABEBAAHCwWUEGAEC AA8FAlgsZGsCGwwFCQlmAYAACgkQbjBXZE7vHeYxSQ/+PnnPrOkKHDHQew8Pq9w2RAOO8gMg 9Ty4L54CsTf21Mqc6GXj6LN3WbQta7CVA0bKeq0+WnmsZ9jkTNh8lJp0/RnZkSUsDT9Tza9r GB0svZnBJMFJgSMfmwa3cBttCh+vqDV3ZIVSG54nPmGfUQMFPlDHccjWIvTvyY3a9SLeamaR jOGye8MQAlAD40fTWK2no6L1b8abGtziTkNh68zfu3wjQkXk4kA4zHroE61PpS3oMD4AyI9L 7A4Zv0Cvs2MhYQ4Qbbmafr+NOhzuunm5CoaRi+762+c508TqgRqH8W1htZCzab0pXHRfywtv 0P+BMT7vN2uMBdhr8c0b/hoGqBTenOmFt71tAyyGcPgI3f7DUxy+cv3GzenWjrvf3uFpxYx4 yFQkUcu06wa61nCdxXU/BWFItryAGGdh2fFXnIYP8NZfdA+zmpymJXDQeMsAEHS0BLTVQ3+M 7W5Ak8p9V+bFMtteBgoM23bskH6mgOAw6Cj/USW4cAJ8b++9zE0/4Bv4iaY5bcsL+h7TqQBH Lk1eByJeVooUa/mqa2UdVJalc8B9NrAnLiyRsg72Nurwzvknv7anSgIkL+doXDaG21DgCYTD wGA5uquIgb8p3/ENgYpDPrsZ72CxVC2NEJjJwwnRBStjJOGQX4lV1uhN1XsZjBbRHdKF2W9g weim8xU= Organization: Red Hat Message-ID: <0dc32b77-9485-0800-695d-1a076923b1e6@redhat.com> Date: Tue, 12 Feb 2019 13:38:43 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 12 Feb 2019 18:38:49 +0000 (UTC) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190212_103849_894887_2AC013DB X-CRM114-Status: GOOD ( 18.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch@vger.kernel.org, linux-xtensa@linux-xtensa.org, Davidlohr Bueso , linux-ia64@vger.kernel.org, Tim Chen , Arnd Bergmann , linux-sh@vger.kernel.org, linux-hexagon@vger.kernel.org, x86@kernel.org, Will Deacon , linux-kernel@vger.kernel.org, Linus Torvalds , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , linux-alpha@vger.kernel.org, sparclinux@vger.kernel.org, Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Andrew Morton , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 02/12/2019 01:36 PM, Waiman Long wrote: > On 02/12/2019 08:25 AM, Peter Zijlstra wrote: >> On Tue, Feb 12, 2019 at 02:24:04PM +0100, Peter Zijlstra wrote: >>> On Mon, Feb 11, 2019 at 02:31:26PM -0500, Waiman Long wrote: >>>> Modify __down_read_trylock() to make it generate slightly better code >>>> (smaller and maybe a tiny bit faster). >>>> >>>> Before this patch, down_read_trylock: >>>> >>>> 0x0000000000000000 <+0>: callq 0x5 >>>> 0x0000000000000005 <+5>: jmp 0x18 >>>> 0x0000000000000007 <+7>: lea 0x1(%rdx),%rcx >>>> 0x000000000000000b <+11>: mov %rdx,%rax >>>> 0x000000000000000e <+14>: lock cmpxchg %rcx,(%rdi) >>>> 0x0000000000000013 <+19>: cmp %rax,%rdx >>>> 0x0000000000000016 <+22>: je 0x23 >>>> 0x0000000000000018 <+24>: mov (%rdi),%rdx >>>> 0x000000000000001b <+27>: test %rdx,%rdx >>>> 0x000000000000001e <+30>: jns 0x7 >>>> 0x0000000000000020 <+32>: xor %eax,%eax >>>> 0x0000000000000022 <+34>: retq >>>> 0x0000000000000023 <+35>: mov %gs:0x0,%rax >>>> 0x000000000000002c <+44>: or $0x3,%rax >>>> 0x0000000000000030 <+48>: mov %rax,0x20(%rdi) >>>> 0x0000000000000034 <+52>: mov $0x1,%eax >>>> 0x0000000000000039 <+57>: retq >>>> >>>> After patch, down_read_trylock: >>>> >>>> 0x0000000000000000 <+0>: callq 0x5 >>>> 0x0000000000000005 <+5>: mov (%rdi),%rax >>>> 0x0000000000000008 <+8>: test %rax,%rax >>>> 0x000000000000000b <+11>: js 0x2f >>>> 0x000000000000000d <+13>: lea 0x1(%rax),%rdx >>>> 0x0000000000000011 <+17>: lock cmpxchg %rdx,(%rdi) >>>> 0x0000000000000016 <+22>: jne 0x8 >>>> 0x0000000000000018 <+24>: mov %gs:0x0,%rax >>>> 0x0000000000000021 <+33>: or $0x3,%rax >>>> 0x0000000000000025 <+37>: mov %rax,0x20(%rdi) >>>> 0x0000000000000029 <+41>: mov $0x1,%eax >>>> 0x000000000000002e <+46>: retq >>>> 0x000000000000002f <+47>: xor %eax,%eax >>>> 0x0000000000000031 <+49>: retq >>>> >>>> By using a rwsem microbenchmark, the down_read_trylock() rate on a >>>> x86-64 system before and after the patch were: >>>> >>>> Before Patch After Patch >>>> # of Threads rlock rlock >>>> ------------ ----- ----- >>>> 1 27,787 28,259 >>>> 2 8,359 9,234 >>> From 1/2: >>> >>> 1 29,201 30,143 29,458 28,615 30,172 29,201 >>> 2 6,807 13,299 1,171 7,725 15,025 1,804 >> Argh, fat fingered and send before I was done typing. >> >> What I wanted to say was; those rlock numbers don't match up. What >> gives? >> >> The before _this_ patch number of 27k787 should be the same as the after >> first patch number of 30k172. > The rlock number in patch 1 refers to down_read() which uses xadd. The > number here in patch 2 refers specifically to down_read_trylock() which > uses cmpxchg() as this patch changes only __down_read_tryulock(). So the > performance data differ. You can see that the performance is worse if we use cmpxchg for down_read instead of using xadd. Cheers, Longman _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel