From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938895AbcIVU2w (ORCPT ); Thu, 22 Sep 2016 16:28:52 -0400 Received: from mail-bl2nam02on0129.outbound.protection.outlook.com ([104.47.38.129]:49072 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S938851AbcIVU2t (ORCPT ); Thu, 22 Sep 2016 16:28:49 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=waiman.long@hpe.com; Message-ID: <57E43EF9.8000400@hpe.com> Date: Thu, 22 Sep 2016 16:28:41 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Davidlohr Bueso CC: Thomas Gleixner , Peter Zijlstra , Mike Galbraith , Ingo Molnar , Jonathan Corbet , , , Jason Low , Scott J Norton , Douglas Hatch Subject: Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes References: <1474378963-15496-1-git-send-email-Waiman.Long@hpe.com> <1474378963-15496-4-git-send-email-Waiman.Long@hpe.com> <1474441172.27308.19.camel@gmail.com> <57E319BE.2050208@hpe.com> <20160922074932.GV5008@twins.programming.kicks-ass.net> <20160922144123.GB13358@linux-80c1.suse> <20160922151144.GC13358@linux-80c1.suse> <57E43A46.9080601@hpe.com> In-Reply-To: <57E43A46.9080601@hpe.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.90] X-ClientProxiedBy: DM5PR18CA0038.namprd18.prod.outlook.com (10.173.208.24) To AT5PR84MB0305.NAMPRD84.PROD.OUTLOOK.COM (10.162.138.27) X-MS-Office365-Filtering-Correlation-Id: cd0055c2-aa33-40aa-0e87-08d3e3270df0 X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0305;2:VLA1EVUCokgReQHZtDlfYzftmHuwgsATFw9MA14znHa1YsiolsWYuIX5Ldnyz6d+bGCYa+NfQI4kgCE0IMHd8hbbfi1JkfGiqwFIjvI3tEcG6BBJGgCzH6yPC8B9HQNaBhMK3RqIZlyBFyfrD8k1S9hb0XJRVh0ChHtiqRN0cbrD/ls6Jbm7UW2fWEszrina;3:4bF9ABEmBa+lSjtMrq0/n5g3L6PC/jKl7ClOe8opWzJLq1o+C+SmIclpx6K4ERNQiY7JJnOgqrtfu+d7QZwtxtAvfkvH/wtH8SsOcbfJPa4r+q5r1LUqtg4Qwt7UMK+b;25:UAN1gGvvaN1/oDVPKev27QgF9Oc316vqQvTjN170+LofT/8zIMez+4VNh8E4UIbMWVdX652tesXoMdXw25jdyrkZiHSIDMjs1S2G4O+Dj2T/U2jBehsjSQ5ZEtiwxS99ELV98dmQaOw8EeKVDjwGI0SuaMz36j4YAI1qNrD7R0dfas8sSMoFiLW66S3JFk4wX+KHBXWg4hBEJRozZD8QRYBOlxfrYDb1T8F1duROw3Yr1SpjDW691DV7gsOT3ycw+WYxbciNbGInZbtg3QD9xV3tLuQyFrbQ31GcsSlwA3pK/0xA4HlvsWy4YWGRhNHpasQusMGro4dcpndyonIt9T3hbCFx9MsGPW1RogSJXMDX5iOp1yYxyaq/Hxo0fx/plf9Zi/k3HBL1CkVmqBqyayTtW2qlctbur4kZR7bdVLc= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AT5PR84MB0305; X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0305;31:Av2uEWK0zyEnXpVHJ3onyd0no0wsPtjmBZz5sYDyCWbqJKJI5VDoaXt3VZmhink+lONSp1nYFL2hzQHo+4izCmDJhQgDzSXkH088HBevkyBf6YXWwlI/PZrnA8yOS4RJq/fzM5qsxC7igdQ6k4u6gIkFabhMx9HTj0pMq9NyC7TVSBJH5LbWTUdE87nEifdp2Q7wWJorHNGIPQGIVMoS6ENTRP8vXvJjrgU99a3gUJs=;20:hjxbKqOit0LPWhy7QsX8Gib85WHOuKSY9uFwu1JlY0wJKGNKpMPnaJ9mVO6eHrpShTLVJe5vZ5YYegrcggP2vL9mesl58g3CeU3WokeGmCmyOLzSZ/96c6sazDRjEFQgP/VsH8OjDKaw+n66/BYMrZ8LUoNtEu6YsBx2icB71mcfLNp/r5mt73jvvpgNrDlVaEWkYt+l5prxFIwooousTAP3/OuB1C/Oiv1LSHbXZR3MYsxslACRW2l/X9a3Ydcbgqo8JAOGQ230XlHx511hB98umrgaHhQoFdlMTvA71umFgyq2v+tmiVBmpeEXmmfgy7t/WB74tmaAtpbl5WIcwhY/HSHKWjZDCrItMJCJiUw4tEvfBepBSyj7wdKdn1N1yYmK/Ro5IAprV+HhdmIkkeDzXbKQUoZtCeCZZwoDhlz9Bo7P001X1jJ9FjvPJXJWFpTb8YBm1c7S3RpCSEl2tNBpzrVRLzuNDyZlDLcDtOvNp76PAHdjmgiKOmqUZlWW X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(271806183753584); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6055026);SRVR:AT5PR84MB0305;BCL:0;PCL:0;RULEID:;SRVR:AT5PR84MB0305; X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0305;4:E65NWB/BNOiucN/uzia5K1hl2A8z7UJFAjuABnE4SMAv/8Ecduw2Kl02okOH25Tn8Atdwkmrw2hyf+McjCErcps+eoHCL9Y8peoD/0mc48dXhoEDrTsL0z+pinmQHhywE7ARVc5maHrQmRDt6Tk0MFnNfiFlfzOV+I1Dy8JTfMY/ct48vpP6oxxwxU/ewO6u5S5eY3RkaNNd4wYY8rQliWt78w82CUpt49gIW/6D2EOiuK0I65EdTQoilmqP7rOJdeA5YEII/6sG00WK8DFAL1QpJ3q8mYHxIie7s+IOfj41W9TC7v0jBUoScqc5gm8ZjUvP+imiJKq+RBnnuXhA3J+CjeqdiZ3kcUx5j0d9nm7YjTj6SiwSao+EtjSF82OeJvdJaGRnIHwj5dg94cja8B8PJRiREiZ2SluOqV9CnWREe7M+nWdpu9XunfHBQrk4psb07htJraOJiXsMyeW1mA== X-Forefront-PRVS: 0073BFEF03 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(7916002)(24454002)(377454003)(189002)(199003)(230700001)(76176999)(50986999)(77096005)(65816999)(54356999)(5660300001)(33656002)(87266999)(4001350100001)(23756003)(64126003)(50466002)(2950100001)(92566002)(81166006)(86362001)(189998001)(117156001)(59896002)(8676002)(93886004)(97736004)(110136003)(101416001)(81156014)(42186005)(305945005)(65956001)(4326007)(106356001)(36756003)(6116002)(105586002)(586003)(3846002)(47776003)(80316001)(7736002)(68736007)(7846002)(65806001)(66066001)(2906002)(83506001)(217873001);DIR:OUT;SFP:1102;SCL:1;SRVR:AT5PR84MB0305;H:[192.168.142.128];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;AT5PR84MB0305;23:0nEwFDHOIpiBItBpk10pFbReLJPLdYfUHnGC3Yz?= =?iso-8859-1?Q?ZxgpCEVpg6O7oeQ3G0M88ms58M/D45qPwdUXIyxvAEzKrOiBULlbpzkNCp?= =?iso-8859-1?Q?EaeF6tVDn5l3eb3pQlagUy1kIpAXrJCUHnkBxs2AYTh1aKcgUT4demdMS5?= =?iso-8859-1?Q?kxxprFgF65h1SffW2n5eqq83YzZJOlVtzRp5U0wdb+voU4SoaaKlORamJK?= =?iso-8859-1?Q?03775JqtwCNrWAylF8W4o5cYKQbih+He6q1CB7g2C4cnN0W5qEBnWGlGR6?= =?iso-8859-1?Q?ksOQVCZGM+8cHuivGfDOGuDQmheURKtLtIIlWgh2W7zVYr5/Mk6Sl5NP4H?= =?iso-8859-1?Q?SgVxFVYscF39CCXI04Oh8cP20z1HCASAO77bM3XcN0p2nF8zGAdVV1VDie?= =?iso-8859-1?Q?tNx3Vpdbif8a+mv63qGK1ZS9CrSD2girjWulmfIxY9CgS/vjUnYqbdXzwA?= =?iso-8859-1?Q?1IAgNtG3H+kTAhcvzL0wVpj5LeYSFP4TqV1A+46UMKtxr/b1Z/CaWk9YQZ?= =?iso-8859-1?Q?ryvdph4oXAmvVGYTyls6lgI30KQNOdyF2k7KWFsLa0fbh1es72la1SlTKE?= =?iso-8859-1?Q?v3Vl9ejHQ2YYQLGVMaFRN9jLADRyGo8Oa+LzV8yuMP2TRwsc2rwLOFjcwN?= =?iso-8859-1?Q?+boXC7Uxx2O3Uhp/QcxwBzuY1tjHmw5opvZqMT6PfGASRq4yB1CNfx5WIk?= =?iso-8859-1?Q?Efy/XvCXL1z8nBjd60ZbfT3IgPzlW2qL3Vv4DCQWRxjCtaxMkk6dOAdR8J?= =?iso-8859-1?Q?sblDECmB8gBn5DHr3TPwF8XDET4qJrfVg7NyIC3tLDOq/LeBIXo5llqTdJ?= =?iso-8859-1?Q?0G+GGfX2cjvNqoeRWF9JXujtxwvKnv6HvzZtsmtOMLw7JDMpDtIYQ7lnZR?= =?iso-8859-1?Q?diZAQGRdDlt7HvbMmXbSNpLq4oUwodrjKyVXrlSThNiSlZlgxtPjukzBSR?= =?iso-8859-1?Q?0UvxdyreREhA9RMOVX6nBT3DdyonxDt9phoO0AfstP6ccxr0hIprh8YOdU?= =?iso-8859-1?Q?LrT2NI8I4yIX4DVSSfy4pCVipAJWRZBByxwsuj8zuYhdK9fXWFsEJXM0Ek?= =?iso-8859-1?Q?QcO4W275MiThEaTPj8obe2YagFJrj+P7Aq3b+ppvOIIxRX9NSxDes0dV8S?= =?iso-8859-1?Q?L/wDoUa9wtf4hryiLj2ak80WrXwYwsSnk2sSA958KyosxtuVCzvX5S0YB9?= =?iso-8859-1?Q?HuSxH8EctKiyuBOh1/XqI++FdUETAKhqF2WXKB0Wjd9jmRKpwwcTcOlQxH?= =?iso-8859-1?Q?lCiLsTNBRN1ooXH3GfUGfncJ6AAZvKFHe0dyt5V+bwaH3qpfmmA6heHiIU?= =?iso-8859-1?Q?AUwCVm9pbSlEHdJCCQS4ijgIJrtl/p899JrFVAur5V1ORpsTNcD2ROisR0?= =?iso-8859-1?Q?7lcZNmKLVuEM15RJjDQoAvWgEtrYC?= X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0305;6:Cq9OHMDSAwi8avSnSCt6dibfZs8LUUkqNw75DhsMSOyapttKZt9jEiL/JxrNI15+zRagkyGip2eopZvk0QQL3eXtQw1pOPRc8LhKyR/AtBLMViN2+REUZLfBYkeCcDVsB+BImB5C+h2xM8L8juCNjKeSWbIb2/2r58eM7vFD5ISOMbMCTZTVYaouDs48qnxVJzI55B6fJzealVqCAqZy/SOW6CYykKBOCT6xpz+ZmDPxP3qHXRFoHrdGC1oivoIJ3v9K1W1qsQfYTgve+F2FgYOCLVcqZ47Vnzpp6zAS3WaBEwZJA4NF1wUvTFSlxWxMXGnPbTn+idH9kwGhc/iZrQ==;5:oJq3z8jgwX0fCdnhdBQvfOfAoqHzgu4jKpFHeob1k8a68IJeUWZEvtOTrCl68yKfvej/59V2IzepK3nwVTg4wqef5z+AtEUX718quZb4CjSW/4X+wWe6n/VoyoFDBHs3GU8Gnv1qt3nzR7XZJr4JTA==;24:g/sYxrjGcfVjSdsXlBOSBNSu3NTs2Zf7iqsJnhIk8prdCMM6TsCUkvfcn5NFes0IA9TYghhyNBfGlhgyjj7Hx5gy8t77fNmeHI0EgvEHyA8=;7:DhG6uXsTAbKF5iNgRa1F7C4CyMnFfc3NKIxRfk7mXnFfBqM68/53TF8mx7nGOzDh1SP/ZZGRwBy5Y+lnDt+rjFvWhk+RaEt5IKBZqDozhcfNZmWBIrOQafF077BtTBNYmc4tmE2h5H1npg7txZ/0Umoa8hufYv7iSIFhhI3Gk0f6HJa4n6402FR3tJnr2iE6w8Dn4ISeL6fJ4hU0f0DwkHK60ItU4JF/i+1JtfOEj2hrvaDsS5HiZdLoi8I+Xl7VbzIRMKpkh0sgsXD0tYoatuEmjehh8MmuX1QBkhjoH/Jk7JThBHBLtqH9XNvhOcy+ SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Sep 2016 20:28:45.3025 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AT5PR84MB0305 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/22/2016 04:08 PM, Waiman Long wrote: > On 09/22/2016 11:11 AM, Davidlohr Bueso wrote: >> On Thu, 22 Sep 2016, Thomas Gleixner wrote: >> >>> On Thu, 22 Sep 2016, Davidlohr Bueso wrote: >>>> On Thu, 22 Sep 2016, Thomas Gleixner wrote: >>>> > Also what's the reason that we can't do probabilistic spinning for >>>> > FUTEX_WAIT and have to add yet another specialized variant of >>>> futexes? >>>> >>>> Where would this leave the respective FUTEX_WAKE? A nop? Probably >>>> have to >>>> differentiate the fact that the queue was empty, but there was a >>>> spinning, >>>> instead of straightforward returning 0. >>> >>> Sorry, but I really can't parse this answer. >>> >>> Can you folks please communicate with proper and coherent explanations >>> instead of throwing a few gnawed off bones in my direction? >> >> I actually think that FUTEX_WAIT is the better/nicer approach. But my >> immediate >> question above was how to handle the FUTEX_WAKE counter-part. If we >> want to >> maintain current FIFO ordering for wakeups, now with WAIT spinners >> this will >> create lock stealing scenarios (including if we even guard against >> starvation). >> Or we could reduce the scope of spinners, due to the restrictions, >> similar to >> the top-waiter only being able to spin for rtmutexes. This of course >> will hurt >> the effectiveness of spinning in FUTEX_WAIT in the first place. > > Actually, there can be a lot of lock stealing going on with the > wait-wake futexes. If the critical section is short enough, many of > the lock waiters can be waiting in the hash bucket spinlock queue and > not sleeping yet while the futex value changes. As a result, they will > exit the futex syscall and back to user space with EAGAIN where one of > them may get the lock. So we can't assume that they will get the lock > in the FIFO order anyway. BTW, my initial attempt for the new futex was to use the same workflow as the PI futexes, but use mutex which has optimistic spinning instead of rt_mutex. That version can double the throughput compared with PI futexes but still far short of what can be achieved with wait-wake futex. Looking at the performance figures from the patch: wait-wake futex PI futex TO futex --------------- -------- -------- max time 3.49s 50.91s 2.65s min time 3.24s 50.84s 0.07s average time 3.41s 50.90s 1.84s sys time 7m22.4s 55.73s 2m32.9s lock count 3,090,294 9,999,813 698,318 unlock count 3,268,896 9,999,814 134 The problem with a PI futexes like version is that almost all the lock/unlock operations were done in the kernel which added overhead and latency. Now looking at the numbers for the TO futexes, less than 1/10 of the lock operations were done in the kernel, the number of unlock was insignificant. Locking was done mostly by lock stealing. This is where most of the performance benefit comes from, not optimistic spinning. This is also the reason that a lock handoff mechanism is implemented to prevent lock starvation which is likely to happen without one. Cheers, Longman