From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752144AbcFFVNw (ORCPT ); Mon, 6 Jun 2016 17:13:52 -0400 Received: from mail-bn1bn0106.outbound.protection.outlook.com ([157.56.110.106]:53432 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750933AbcFFVNu (ORCPT ); Mon, 6 Jun 2016 17:13:50 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=waiman.long@hpe.com; Message-ID: <5755E782.90800@hpe.com> Date: Mon, 6 Jun 2016 17:13:38 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Linus Torvalds CC: Dave Hansen , "Chen, Tim C" , Ingo Molnar , Davidlohr Bueso , "Peter Zijlstra (Intel)" , Jason Low , Michel Lespinasse , "Paul E. McKenney" , Waiman Long , Al Viro , LKML Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion References: <5755D671.9070908@intel.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.229] X-ClientProxiedBy: CY1PR13CA0007.namprd13.prod.outlook.com (10.162.30.145) To AT5PR84MB0307.NAMPRD84.PROD.OUTLOOK.COM (10.162.138.29) X-MS-Office365-Filtering-Correlation-Id: 28e07635-09f8-448e-cb7a-08d38e4f7334 X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0307;2:yLMkoXTXRXud1hSSmpXPVPfQPMR1RoQFgdvtzMHURE3rzw4uGcUpOODfA63p+1p2s4JzfiBRNIovsscb9DXpRYd0Eg2y4NAfF1ijkhLjIwC7IWevAPgWeG+7Ju09HW/3WB3ahBnDnN8iyHKo0g/WpTsLKA4OPExrlEZVxsmhXlTMHhAdRR5x652RUVeP6nyS;3:X2/9CNzH4sYq8B2wHsmMP3O4WyLK3l1MSUNbYV0A6fJVaRANPoqZuCotDR/mS40UkA+VQ3ohbBVUjDdLE1hvGH1te0ZpAln4jpo+45bNVL8oGzdPW8KDl8gztKuUXYrU;25:g7xE0w9jRfmmeUDQWJ3NOYxImHQXeyyf97LkHTgQUHnfFMKiVuHm7YUbfCtaSE1kBHXaGZ7a4qVnnJEHkGA/D7Tb4PkuNwtt0QISOfDZPER2QCBt3bhOY+8wqe/SKvqDCZMChvdwr8+k3AhLsugDdM1EbkCRsM1hUn8q3rIbacaz1+YrYYftmkc+DMHMg0tySiSZ3v/sJGe9Sdx5Dl+tR2Gu0wmveKNZjuk8n3S+9HFMdIDzkcX5OHhuQpBjUPh0E6juKTu3xU7Mdh0WyL2HqULvOrKno+dXPAwn3El7AQdSoxFGNIUCoRO1WZoEkRBAccPKl64+J8BqViYsScydN9e0HCvzDobb2Xj6uxko3/zeNo4u+z4x+6Z9Y8ZeOcLcBE+mmp972OJrULTLuI1ccFjqEjKf6tVLF2K0MxTVj+c= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AT5PR84MB0307; X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0307;20:NgfAFt8poAvSwrO4sWKv8YZfHrAghPx0rW/A7Ue9E2e79KEyYIH+c6diuw6TA+gjL/qoovftdDDQ8zBNXYLk9VKtwzYz72id33u5MBC+cQHqcieLbA0x7/4TUttABNwLo7LySkxkeJQiJKyzgvm4bvFIaJfDOPARPVFxzalgc55nGBhCZ0ztCC4mmiIlS0zwuP09z/cFUzBrOhTcNgpjke7MzFnGAMLfMH5INANUsHbkxRLfyNZnFDc8/RJ9eYfwM+i7F2X38P6AmyVPKykJZ3IoLDg/9ieg7yLDtRIXW3VB/bjGbgtUrbLek3DVXpCl7edcsmlz7308gDkpfC5KpTRAzmfhR0hGeLyBU9fWJtuPeXJmZHgobGC4pYA973HA2Oz5UMriweI7bGs81sfxjWvzKJavM790lIOyRTSr0+pDGIbHe6NzqWfyYiVz7JYlL7ZQF72kgvbRgWARHDDG4846DKg+aWN/fR6EAtJvDisYuVkYnGvQ/1XMXAAF1Py1;4:4ApTUb3ARi2JV3IA7xUp+rYIhkxmu2rnaf/u9JhO+U7ghCjzWuWod46ZCRIY0SORpuYhj3njKnhCIHQICU0TIAsDMRipd0emFuZR9g0+Lr+KQP2kLxWAqriVu4yMG6si2v+3PneRDYwf1f/r1cusASTKW8FT6ou9XOs79PXRZ2g78Z/prAMXxEZB9fZ4I9ZjyQGro1Z2gNtq1vuXiy3/BVjjP0pXHyzaGcuVAC9+VEyu1o/XZ5epdmwz2Q0RdS4/RLCzfp3zYmrBYrwTN2sxIkXTr2KeON6tEBrALthCxFXaVEqYvb2YzxLokUISezOqYSCy2wCBmrH8LcUGGOZ2cC5he2txoPEwc04Q4t6YewXWtR4WSUQGmpc5ER9oc/zCGiJDSxI1xyFW/f2Y859rbBV+f+UPkLJr224CKOMJvLw= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(228905959029699); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046);SRVR:AT5PR84MB0307;BCL:0;PCL:0;RULEID:;SRVR:AT5PR84MB0307; X-Forefront-PRVS: 096507C068 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(189002)(377454003)(24454002)(199003)(4326007)(64126003)(42186005)(3846002)(36756003)(586003)(6116002)(68736007)(2906002)(19580395003)(19580405001)(50466002)(5004730100002)(106356001)(86362001)(5008740100001)(76176999)(117156001)(83506001)(33656002)(230700001)(110136002)(50986999)(97736004)(15975445007)(92566002)(77096005)(4001350100001)(81166006)(81156014)(19300405004)(23676002)(54356999)(8676002)(65816999)(47776003)(189998001)(66066001)(65806001)(101416001)(2950100001)(65956001)(8666004)(105586002)(7059030)(562404015);DIR:OUT;SFP:1102;SCL:1;SRVR:AT5PR84MB0307;H:[192.168.142.146];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtBVDVQUjg0TUIwMzA3OzIzOk9BRTFhaVA5akxCNzVmNUJLYmlnakxPbUlG?= =?utf-8?B?STNlR2Q3VnViNkZPRWtmdTA3NVJ2eEZoOUh6VzBzQzZuaFB3dVYvN1dmeDNi?= =?utf-8?B?WXRONTB3Y2JDaW1BdGRtSEE2WXdVOXczU2p3aEwrNVNzNys3c3U0ZU5xT3p1?= =?utf-8?B?TzFOeUYvdE1RdUxLTms5UlBicEw2MldNbXFpRmtGdGx3dTFSbk0rYkNQRlBE?= =?utf-8?B?ZmFjV1hDVXZTRVRjNzBzcVp2VTNwVStBQ3BPSjZVbHk5NXlZMytSZUtmY0k2?= =?utf-8?B?RGkyTGc2SXlzay9xS0dFcE5NcXk4RGZIZ3FTcXgzVERZZzl0SW0xZUp4ZTMy?= =?utf-8?B?MTFHQ3A0RHZ3SzJrZmVIS0xHUW9PcURzZ01QYk8rdklDTityZjVwLzI4dkRB?= =?utf-8?B?czd0cTZYVVNzNWNJTFFDc1FqZjVkQTdoUUpTQnZZZVNaeVB2ajhrNHBzR01Y?= =?utf-8?B?czl5UWw4dVFzTGVESVloMHIzY1NNQkx6VGhQenoyT2JCbTN4SXZQYWpSTEQw?= =?utf-8?B?b1AzR1M4aU5HQVJhVzlDNkkzaVRKNWRnVWdHMlZNRkFnU0RqZ2JKZGlvZ1F0?= =?utf-8?B?WmpuL1VuL3lkZG03VkFKRXp6djVzZHY5YVlqdUNqaW01MzFKWW94ZVdMaEdn?= =?utf-8?B?ZWY2SXQ1UWFYUHVPcWxuYTRxQ3haUXRaWTVoTTVZNXAwVTZyeXI3M0hwUGRD?= =?utf-8?B?UHBrT3dzV2RCNkVSYjVVa0hGUFFvQUZMN2svOXhEYVdlMTlwZG5pWlFhN0Jk?= =?utf-8?B?d2paMHU0WEJjY2Nla25kRDZrRzA0QTZ0bUJtemJwbnlIYzk2SEZ4dCsvSjAv?= =?utf-8?B?NUhqamVyN2RJM1VWaUovTVhOb0ppalFIVWNidWgwWW9Jd282azE0MFBmaHcz?= =?utf-8?B?bHFZbHdLWXE2a2xMQWQ1NGpVMVBQaGpVenNDT25GbHlCaFBNRVJudWxSekRK?= =?utf-8?B?TE5lK212MWlFTEFLaWExR1Y5Y2tiRGQxRi9EeW9YVkJGMWJ3aTJFcjlyK2dO?= =?utf-8?B?UWcwaW14WThNaE1Xck5sckJOYnJaUWZTUllDOG0wRXdnUHRwQXUrOXZ3bVRp?= =?utf-8?B?ajIxT01mL1UwTkVyVXhHSXN0eFJobkVvREpwVk9BU01TVFZEQ0ovZ3pRV25G?= =?utf-8?B?TlNzWlA2REcxR3NBUCtMTGR5WWttc2JhUjVEVUJQMjJLU0phQkRnVW1qeVhT?= =?utf-8?B?WEt4a3cvMlFxTDQwRjZkWFZEZWx1RnBoUHE4UUwyeUNPcWkxalpXck8wNVAv?= =?utf-8?B?U0JzS01BREVUY1RoeGVtWk9MOFB4WG5XRWNzQmNyeFRaRU5ob1dNU0hac3RI?= =?utf-8?B?V21rVHNzb2FjT0RkYWVydnJlZGp5T1gwb2lvQmtySCtvblpFbzV4NURzVXVw?= =?utf-8?B?R3M5MXRFZ1VCOEYwWG8wN3dKYmY4VkxKd3JQdkgwdDhYNDNHbmV1a2pCSDRj?= =?utf-8?B?T0haQUgvYTN3OUhQc0NzTVBZZlRHTzE4ekZqUnh2SWd6MnNpd2ZFRzRLSTRh?= =?utf-8?B?Y0ZFMWVIViticnVYc0ZKNUdRY2VtWnZxMUN4MU1JUm5HNXJ3SklPV04rbndN?= =?utf-8?B?TlB2dWlhc0xXdzJ0b1Q3UUs4NmtMZjJWRExYTzJLU0pHL2l6dVRGKzMwMTk0?= =?utf-8?B?ZlBnM3dxMVFOVFIrTnozbHFGZldNWWlpYkZPOUZlSTVTOU04NW1Ha1BVUzNY?= =?utf-8?B?Ym5BdndRdXVqemR6M29OVks1QUNHc2V1RW5KVkZaN0o0am4vNlNXNEk3QVpW?= =?utf-8?B?Q1NTK3NwSWNNKzFqSzkzby9ralRKSHVvKzh4Y3lVKzhRamNLSjlmUy96QzQ5?= =?utf-8?Q?jIWg9gSiHqAkU?= X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0307;5:YrQVqu5+Z9fpvjyvy6swDUfv0DrFTOVjTnOMUpxd090NykJk0SERv6TG6l8QEW4vnGKXXsBFL0WD+WmW7esgi3+nhuawWdaGsZ4qT3pnPo7Y9oaAYbeWuOy4z/HfwH3o/mT7Fy61RGCCLRmL2RF+aA==;24:z9wjmnU+AoUYzP5i+TPAcIslyIREDBC90Zc44G/WCyzIb0ng5kerRk4t20VU3vQvkDKmySgmrSDnPvdmx3WEJFn/EpJpAzsdWKLYs9JjHcQ=;7:AQow+yCV2Pqy/ZXtqoBapiY9G70mpFsY5yoC2vQ9XLO2plIjqsAkHAn9dEwrTifAjSKIaIPd6acBhHqTo3u6NOT/VUSmaglz69pLIKHg3oljtaPRNUHTsWn7TRIDA4lsAwYqo0jT+QoNG+685QtoS51VAFqltE6glMYp/733MEVlaf07/2w5VK/B1yrAVWERGCflRHbVXSMARuI67yeflv1vmyO71R/00/M77E48ZTI= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jun 2016 21:13:45.6396 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AT5PR84MB0307 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/06/2016 04:46 PM, Linus Torvalds wrote: > On Mon, Jun 6, 2016 at 1:00 PM, Dave Hansen wrote: >> I tracked this down to the differences between: >> >> rwsem_spin_on_owner() - false roughly 1% of the time >> mutex_spin_on_owner() - false roughly 0.05% of the time >> >> The optimistic rwsem and mutex code look quite similar, but there is one >> big difference: a hunk of code in rwsem_spin_on_owner() stops the >> spinning for rwsems, but isn't present for mutexes in any form: >> >>> if (READ_ONCE(sem->owner)) >>> return true; /* new owner, continue spinning */ >>> >>> /* >>> * When the owner is not set, the lock could be free or >>> * held by readers. Check the counter to verify the >>> * state. >>> */ >>> count = READ_ONCE(sem->count); >>> return (count == 0 || count == RWSEM_WAITING_BIAS); >> If I hack this out, I end up with: >> >> d9171b9(mutex-original): 689179 >> 9902af7(rwsem-hacked ): 671706 (-2.5%) >> >> I think it's safe to say that this accounts for the majority of the >> difference in behavior. > So my gut feel is that we do want to have the same heuristics for > rwsems and mutexes (well, modulo possible actual semantic differences > due to the whole shared-vs-exclusive issues). > > And I also suspect that the mutexes have gotten a lot more performance > tuning done on them, so it's likely the correct thing to try to make > the rwsem match the mutex code rather than the other way around. > > I think we had Jason and Davidlohr do mutex work last year, let's see > if they agree on that "yes, the mutex case is the likely more tuned > case" feeling. It is probably true that the mutex code has been better tuned as it was more widely used. Now that may be changing as a lot of mutexes have been changed to rwsems. > The fact that your performance improves when you do that obviously > then also validates the assumption that the mutex spinning is the > better optimized one. > >> So, as it stands today in 4.7-rc1, mutexes end up yielding higher >> performance under contention. But, they don't let them system go very >> idle, even under heavy contention, which seems rather wrong. Should we >> be making rwsems spin more, or mutexes spin less? > I think performance is what matters. The fact that it performs better > with spinning is a big mark for spinning more. > > Being idle under load is _not_ something we should see as a good > thing. Yes, yes, it would be lower power, but lock contention is *not* > a low-power load. Being slow under lock contention just tends to make > for more lock contention, and trying to increase idle time is almost > certainly the wrong thing to do. > > Spinning behavior tends to have a secondary advantage too: it is a > hell of a lot nicer to do performance analysis on. So if you get lock > contention on real loads (as opposed to some extreme > unlink-microbenchmark), I think a lot of people will be happier seeing > the spinning behavior just because it helps pinpoint the problem in > ways idling does not. > > So I think everything points to: "make rwsems do the same thing > mutexes do". But I'll let it locking maintainers pipe up. Peter? Ingo? > > Linus The tricky part about optimistic spinning in rwsem is that we don't know for sure if any of the lock holding readers is running or not. So we don't do spinning when readers have the lock. Currently, we use the state of the owner variable as the heuristic to determine if the lock owner is a writer (owner) or reader (!owner). However, it is also possible that a writer gets the lock, but hasn't set the owner field yet while while another task samples the owner value at that interval causing it to abort optimistic spinning. I do have a patchset that allow us to more accurately determine the state of the lock owner. locking/rwsem: Add reader-owned state to the owner field http://www.spinics.net/lists/kernel/msg2258572.html That should eliminate the performance gap between mutex and rwsem wrt spinning when only writers are present. I am hoping that that patchset can be queued for 4.8. Cheers, Longman