From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755636AbcE0AiG (ORCPT <rfc822;w@1wt.eu>);
	Thu, 26 May 2016 20:38:06 -0400
Received: from mail-bn1bon0095.outbound.protection.outlook.com ([157.56.111.95]:37632
	"EHLO na01-bn1-obe.outbound.protection.outlook.com"
	rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
	id S1755531AbcE0AiD (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 26 May 2016 20:38:03 -0400
Authentication-Results: arm.com; dkim=none (message not signed)
 header.d=none;arm.com; dmarc=none action=none header.from=caviumnetworks.com;
Date: Fri, 27 May 2016 03:37:53 +0300
From: Yury Norov <ynorov@caviumnetworks.com>
To: Catalin Marinas <catalin.marinas@arm.com>
CC: David Miller <davem@davemloft.net>, <arnd@arndb.de>,
        <linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org>,
        <linux-doc@vger.kernel.org>, <linux-arch@vger.kernel.org>,
        <linux-s390@vger.kernel.org>, <libc-alpha@sourceware.org>,
        <schwidefsky@de.ibm.com>, <heiko.carstens@de.ibm.com>,
        <pinskia@gmail.com>, <broonie@kernel.org>, <joseph@codesourcery.com>,
        <christoph.muellner@theobroma-systems.com>,
        <bamvor.zhangjian@huawei.com>, <szabolcs.nagy@arm.com>,
        <klimov.linux@gmail.com>, <Nathan_Lynch@mentor.com>, <agraf@suse.de>,
        <Prasun.Kapoor@caviumnetworks.com>, <kilobyte@angband.pl>,
        <geert@linux-m68k.org>, <philipp.tomsich@theobroma-systems.com>
Subject: Re: [PATCH 01/23] all: syscall wrappers: add documentation
Message-ID: <20160527003753.GA14247@yury-N73SV>
References: <6293194.tGy03QJ9ME@wuerfel>
 <20160525.135039.244098606649448826.davem@davemloft.net>
 <6407614.fdv5XFSBue@wuerfel>
 <20160525.142821.1719403997976778673.davem@davemloft.net>
 <20160526204819.GA10274@yury-N73SV>
 <20160526222943.GA16729@MBP.local>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20160526222943.GA16729@MBP.local>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Originating-IP: [50.233.148.158]
X-ClientProxiedBy: BN1PR08CA0018.namprd08.prod.outlook.com (10.242.217.146) To
 CY1PR07MB2230.namprd07.prod.outlook.com (10.164.112.144)
X-MS-Office365-Filtering-Correlation-Id: 0e8bcd95-6b4a-41fd-8e58-08d385c726e2
X-Microsoft-Exchange-Diagnostics: 1;CY1PR07MB2230;2:E2I+f1xIywuOywAiFxJt7Z7zY5RZKBWMBfA5azg54Ylc1iQZ0f4KGWauoWCpH+gq1mKuQXsE7k64G572Wu7zB/DJx522kczpkBL+K9pKU3iTVls8M+chq6kNw85kBe14UdQTa6WuX09v4YQQvy2Xosa5/HwENk+qYvAXzoWkGsPK7Bgo4yRTSE1vRqUKhon6;3:PqCqbl3kDly5CLHEFKdxdubsvoWnCFR0WxXpaNR4jDf/ZGtAVrYEMqOnbUput3BSP4VdDG526pU4HrVWvaFjF+Arl1j986ocsruyNbZ71S3SIt2IrGnGylC5+X/xxKCY
X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR07MB2230;
X-Microsoft-Exchange-Diagnostics: 1;CY1PR07MB2230;25:PcAfU1aF7Z0jaSq+1ueQLqDb5aohZlQFU9x0/a0amEC+lK0FD/0k6Ry8JoIv/TFAC1kAtA1c2st+qdFz+dy/yPz1FYgyuGnIXy9k16ZBR7SoiG5aquZKQhS9ILoofFbK/yFwhJo8/ivdoGdJr/KlVcTguY8oEnn0p+JFmjN4RPH95NMVAeThDXJKQKqSt6OyLPjEwyn/nPsnzCudJyvLJ9BpXQpigMNGpQZQxqWH6MaG0cqzr4G5rozag8B+C58YwP8slKQI2yMeaGDvBWgjCooAF5WjSn3QTDroBY8DFnYKqCHtDnq3WuKPclq8OCj+iA5LRsohEVkKyDeEXHY2V3ySqGNR55falXe+a68XORAgKdkUw4D34yOcy/jA+REvFPN+sQsLvE4XuPiVCK91C4NLouO1HFTdge1uuhk0OjdibA9bdEvP70+hyh96Ha03JO5wbeixCGg5i/iEbcQvF1zp2OGYs5RkFqW/zBtjnyzNHUrZlhjKSuZX1MPtrEPZBcjAfeq4b7RNXRlXOCrJJRiI5CwuMrB9/IQ29XFyrz/F52wBKxau8hTz5V4VS8iY91z/X8kS1wrgxm6Ch+r9PnKDyQe3QRgtMQkwiMIBdmNjs/fIm9WRb3x1cXWVjH4Ywuez8zkoe2QdijnAyITryWiv4VkzKM/+XW1uhZWPdzs7ZLahAnaFFvsZmy9HrasC9BYIWV3ExPFfDJzpIY5yUO6OScTEGL2VS4alKp3nUODsgbWTaQ8EoW6fjEONC3OXO0oLtZhjz2J9wKtHr2HvF06lj5IkGDO6+iTY1iTfyV3XKli4SKpW28AJlHqNnKhmhaGXDvIs/5iIxREEf90CLVvhQwyMT/1dbWPcmxm+zhI=
X-Microsoft-Exchange-Diagnostics: 1;CY1PR07MB2230;20:i733/bT4FrToiHCvu5f07vu03OeCwOPpKiidy8c5MCww178P2Y3N/e6MvfrJ4fHJuVYvjIu1PkFJOuYDIbMHxJ/ETN+oMLPQQA4PoIdQSIeaf1Acs1leUEJcB9BS0f0GLVCEG9759lEwm3Doe9V8stPG0kAeeMnNFmFVHfRFopJgUXmcxjX9qT546EZplqujk3JWyBnz7SKeaM1620/wrFMJ+VClJ78drgVH+13ACBGwrFpLS1+FoekAbV1cK6qnjZHj2cCCnz3zXPdGNd1Q+VyFtU4wNkOluIUSmPFdFqYVhRSmfRUSxfhOWbr+FgE58Vh9w4AREIyUFvUKVx17+cTNBIgx58xEyZV7AJsfs5oAXqjtXluAkzcZSsoJBMggcNgFA7EETK67TsTJ0pz0CWIksrkfSO83ZjC7dQrjUsfB5VelSaLgIUHafiQFsihRMBQ4boVjK68ioNGLnsokGha6+BH3lkeLnZi9pwMDvZxlxlS1xxlOCXTskqCruN7qoD6r3y9n+KexHar9xD6uywlBDrsvTr/nnieSHhyvaRI2BkB3XdOmpM1D32gTg2Yex+pBVgic8nKtxONPgPrteC8KNeS91MIJWUx7Og2eBO4=
X-Microsoft-Antispam-PRVS: <CY1PR07MB2230E31E9A2490D9CE32E2DFEE420@CY1PR07MB2230.namprd07.prod.outlook.com>
X-Exchange-Antispam-Report-Test: UriScan:;
X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:CY1PR07MB2230;BCL:0;PCL:0;RULEID:;SRVR:CY1PR07MB2230;
X-Microsoft-Exchange-Diagnostics: 1;CY1PR07MB2230;4:CyPsZ0MHkn4TfHAenHGuw/64u9mzAO2jzxGUnffY1VvXCmSCumMdPlsHe4mnlt6lFPUWyvb38x4RGiOSLdvwOXVEiw2xjdvKi+fDgTThxbUIjAbkoeSiDSllFAmb3GSG3ivwVTc3vCMSKtX4qJ2Bs/+R6fMZq2WPtDD9vRLRnD5bPaShnOWsitcOk0iKB4ZHPu+oPvQM70yauzovyJjrCVGWgPqNG/8P7QfvSwpape/aNCckt4A06LTokuIlok3yH/oU5lEb+lr2/p6zaQF75KNCQ2uErnzqB7xSVLq5r27CHVygc/P5w00Ny6DCMYfAwvJhuiSTQQbJgbEUhOixHpOe85nOhw+7QgZDeCgHTDA/GaXuyQALgqmL6pvdZiFH
X-Forefront-PRVS: 09555FB1AD
X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6069001)(6009001)(377454003)(24454002)(4326007)(1076002)(33716001)(6116002)(110136002)(46406003)(81166006)(76506005)(23726003)(3846002)(42186005)(93886004)(586003)(2906002)(76176999)(50986999)(50466002)(33656002)(19580405001)(5008740100001)(54356999)(4001350100001)(5004730100002)(189998001)(15975445007)(19580395003)(47776003)(83506001)(66066001)(2950100001)(9686002)(92566002)(97756001)(8676002)(77096005);DIR:OUT;SFP:1101;SCL:1;SRVR:CY1PR07MB2230;H:localhost;FPR:;SPF:None;MLV:sfv;LANG:en;
X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CY1PR07MB2230;23:VyAx06RJV13vMtx4PNM8DcZXdKhPiTCxRibOGT2Dr?=
 =?us-ascii?Q?1SnoRIPLo3ChaUMBRdm/dqKGWRYU0CRYmeAqQO9uc3a03OiDsamCD2MLoAJ8?=
 =?us-ascii?Q?h8M0PU/yLxEF4GiYghyRp3sPWHtX/6VBwq2aLQi9xXFEVpY61ZqFy/ZV0rmp?=
 =?us-ascii?Q?B/MjqAhMJsgs2pg7l8QENQmU/mVVJ9mhe7v9S/SNZq9A1vsKdZpD6TpLy6Vt?=
 =?us-ascii?Q?rEvQyb+jeggeTGCL95IZrPG2toyS0nL7YfrFUXwMfMdqKNNNuh0bScZaMrDn?=
 =?us-ascii?Q?eCO4cv0eygJAbT+1ZBpipUws81kCcHBlq3BcKCx8n1CB7K5nHaD+4tM2CC0f?=
 =?us-ascii?Q?1seK71/qrO7LI83qCnRSdCkDXtKxCBHEoLHaDttMlICtQu0n7DhC8mklrYCf?=
 =?us-ascii?Q?KuButdsh1SSxdVLIKrJluY0cDmKw0BJEs2expw8RPQY/LkcEqhzxh7Wl7mzJ?=
 =?us-ascii?Q?jJRxOmMhpANooQzbjQH413Tf5pqw3BAmFK2yVumCsEalZMQEbAT43i10/HK2?=
 =?us-ascii?Q?S1jI4ndsr8Vdu3u+DSY9We7eK3nzi6lLF9dwIAEod0OZBqtAEqd9eEEhH274?=
 =?us-ascii?Q?wQVDLJcmSaZoE91iCVqwjVQyPBty1eLGWRJkgeaZn1xbsQNB8mKkZi38Xr6F?=
 =?us-ascii?Q?qCDh3DdhVARc4FjSA9ySydTkUaQUup+Qi8uztIx5y20LRqzr9vI6MEj6zA7C?=
 =?us-ascii?Q?mPwUZReyYuWoxrj4KXWBVoWtBNAraxolB/PIuT8opn0zNShQlYOKoyU4eosD?=
 =?us-ascii?Q?22BHU+I8L4b1LLIYuOLsgg4XrKAYxKfq3Letzeh84NbykXeKeUsGei7uctTe?=
 =?us-ascii?Q?Py+rAQc534Nwl6m8MpcA4iahKCT/VObU8laxGZf3bwIhKz1dYP8lY8L8nWEz?=
 =?us-ascii?Q?/4ElrzELIddjCRyEFaDrD0/xiJEQYQNWJg6XmgFOiGry3eINvklG4naXFku+?=
 =?us-ascii?Q?yX2MmiRA9BOPrn1uv4PQfsvxbUYYEvh69I3KTW60Rm/fgFwAvV9r+KOrnvBo?=
 =?us-ascii?Q?IAGM4chUnibdhN/7l11KuOhuZYA7+zdg/1DQBWegOSQvSWVdmXgCDdsH+L47?=
 =?us-ascii?Q?aYyInw=3D?=
X-Microsoft-Exchange-Diagnostics: 1;CY1PR07MB2230;5:GnQL+KlFv+kOMMgavnwbgYJ0p+eUC4mqe0ox8gjKAjN6Qsb6kyUsNJnYK0GuYsbsiaj+dEPx0lEjk3EQ15OdW/AgqxoGGj67NTGPytOmzx/zZSjlxog6Fz8+1OkaSyiWDzAbiFikdo5vb9XYVZADNg==;24:5NuOyftYt4252fswen93HbZQzb9RtmU4lg/RG+mKpUGafRz+QMCr1ongA57IZoXUBBG4/t4EzyAMmGPdTMcCICEislw5fOHxQ3sIi9dwaHo=;7:lTuUoYG0tgZs/qHl/zmkPd30phX2IMV1dtqvJpf1yJORWqIT/CPttjThYklbX+T3Ilgel2FLv5Vf3nooV2bMxpphCxuCyFzfApflXxpYqgiO0Q7SVIE1Q6vc0SEVrXO62doYb++4KDcDKnL17su/jysqHLYQiBNW0i+R/QLuc7L/ab3M85H2Kbyrmn0h9Zfr
SpamDiagnosticOutput: 1:23
SpamDiagnosticMetadata: NSPM
X-OriginatorOrg: caviumnetworks.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 May 2016 00:37:58.3614 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR07MB2230
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, May 26, 2016 at 11:29:45PM +0100, Catalin Marinas wrote:
> On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote:
> > On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote:
> > > From: Arnd Bergmann <arnd@arndb.de>
> > > Date: Wed, 25 May 2016 23:01:06 +0200
> > > 
> > > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote:
> > > >> From: Arnd Bergmann <arnd@arndb.de>
> > > >> Date: Wed, 25 May 2016 22:47:33 +0200
> > > >> 
> > > >> > If we use the normal calling conventions, we could remove these overrides
> > > >> > along with the respective special-case handling in glibc. None of them
> > > >> > look particularly performance-sensitive, but I could be wrong there.
> > > >> 
> > > >> You could set the lowest bit in the system call entry pointer to indicate
> > > >> the upper-half clears should be elided.
> > > > 
> > > > Right, but that would introduce an extra conditional branch in the syscall
> > > > hotpath, and likely eliminate the gains from passing the loff_t arguments
> > > > in a single register instead of a pair.
> > > 
> > > Ok, then, how much are you really gaining from avoiding a 'shift' and
> > > an 'or' to build the full 64-bit value?  3 cycles?  Maybe 4?
> > 
> > 4 cycles in kernel and ~same cost in glibc to create a pair.
> 
> It would take a single instruction per argument in the kernel to do
> shift+or and maybe 1-2 more instructions to move the remaining arguments
> in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S).
> And the glibc counterpart.
> 
> > And 8 'mov's that exist for every syscall, even yield().
> > 
> > > And the executing the wrappers, those have a non-trivial cost too.
> > 
> > The cost is pretty trivial though. See kernel/compat_wrapper.o:
> > COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode);
> > 0:   a9bf7bfd        stp     x29, x30, [sp,#-16]!
> > 4:   910003fd        mov     x29, sp
> > 8:   2a0003e0        mov     w0, w0
> > c:   94000000        bl      0 <sys_creat>
> > 10:  a8c17bfd        ldp     x29, x30, [sp],#16
> > 14:  d65f03c0        ret
> 
> I would say the above could be more expensive than 8 movs (16 bytes to
> write, read, a branch and a ret). You can also add the I-cache locality,
> having wrappers for each syscalls instead of a single place for zeroing
> the upper half (where no other wrapper is necessary).
> 
> Can we trick the compiler into doing a tail call optimisation. This
> could have simply been:
> 
> COMPAT_SYSCALL_WRAP2(creat, ...):
> 	mov	w0, w0
> 	b	<sys_creat>

What you talk about was in my initial version. But Heiko insisted on having all
wrappers together.
http://www.spinics.net/lists/linux-s390/msg11593.html

Grep your email for discussion.

> 
> > > Cost wise, this seems like it all cancels out in the end, but what
> > > do I know?
> > 
> > I think you know something, and I also think Heiko and other s390 guys
> > know something as well. So I'd like to listen their arguments here.
> > 
> > For me spark64 way is looking reasonable only because it's really simple
> > and takes less coding. I'll try it on some branch and share here what happened.
> 
> The kernel code will definitely look simpler ;). It would be good to see
> if there actually is any performance impact. Even with 16 more cycles on
> syscall entry, would they be lost in the noise? You don't need a full
> implementation, just some dummy mov x0, x0 on the entry path.
> 
> -- 
> Catalin

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yury Norov <ynorov@caviumnetworks.com>
Subject: Re: [PATCH 01/23] all: syscall wrappers: add documentation
Date: Fri, 27 May 2016 03:37:53 +0300
Message-ID: <20160527003753.GA14247@yury-N73SV>
References: <6293194.tGy03QJ9ME@wuerfel>
 <20160525.135039.244098606649448826.davem@davemloft.net>
 <6407614.fdv5XFSBue@wuerfel>
 <20160525.142821.1719403997976778673.davem@davemloft.net>
 <20160526204819.GA10274@yury-N73SV>
 <20160526222943.GA16729@MBP.local>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Return-path: <linux-arch-owner@vger.kernel.org>
Content-Disposition: inline
In-Reply-To: <20160526222943.GA16729@MBP.local>
Sender: linux-arch-owner@vger.kernel.org
List-Archive: <https://lore.kernel.org/linux-arch/>
List-Post: <mailto:linux-arch@vger.kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Miller <davem@davemloft.net>, arnd@arndb.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, libc-alpha@sourceware.org, schwidefsky@de.ibm.com, heiko.carstens@de.ibm.com, pinskia@gmail.com, broonie@kernel.org, joseph@codesourcery.com, christoph.muellner@theobroma-systems.com, bamvor.zhangjian@huawei.com, szabolcs.nagy@arm.com, klimov.linux@gmail.com, Nathan_Lynch@mentor.com, agraf@suse.de, Prasun.Kapoor@caviumnetworks.com, kilobyte@angband.pl, geert@linux-m68k.org, philipp.tomsich@theobroma-systems.com
List-ID: <linux-s390.vger.kernel.org>

On Thu, May 26, 2016 at 11:29:45PM +0100, Catalin Marinas wrote:
> On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote:
> > On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote:
> > > From: Arnd Bergmann <arnd@arndb.de>
> > > Date: Wed, 25 May 2016 23:01:06 +0200
> > > 
> > > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote:
> > > >> From: Arnd Bergmann <arnd@arndb.de>
> > > >> Date: Wed, 25 May 2016 22:47:33 +0200
> > > >> 
> > > >> > If we use the normal calling conventions, we could remove these overrides
> > > >> > along with the respective special-case handling in glibc. None of them
> > > >> > look particularly performance-sensitive, but I could be wrong there.
> > > >> 
> > > >> You could set the lowest bit in the system call entry pointer to indicate
> > > >> the upper-half clears should be elided.
> > > > 
> > > > Right, but that would introduce an extra conditional branch in the syscall
> > > > hotpath, and likely eliminate the gains from passing the loff_t arguments
> > > > in a single register instead of a pair.
> > > 
> > > Ok, then, how much are you really gaining from avoiding a 'shift' and
> > > an 'or' to build the full 64-bit value?  3 cycles?  Maybe 4?
> > 
> > 4 cycles in kernel and ~same cost in glibc to create a pair.
> 
> It would take a single instruction per argument in the kernel to do
> shift+or and maybe 1-2 more instructions to move the remaining arguments
> in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S).
> And the glibc counterpart.
> 
> > And 8 'mov's that exist for every syscall, even yield().
> > 
> > > And the executing the wrappers, those have a non-trivial cost too.
> > 
> > The cost is pretty trivial though. See kernel/compat_wrapper.o:
> > COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode);
> > 0:   a9bf7bfd        stp     x29, x30, [sp,#-16]!
> > 4:   910003fd        mov     x29, sp
> > 8:   2a0003e0        mov     w0, w0
> > c:   94000000        bl      0 <sys_creat>
> > 10:  a8c17bfd        ldp     x29, x30, [sp],#16
> > 14:  d65f03c0        ret
> 
> I would say the above could be more expensive than 8 movs (16 bytes to
> write, read, a branch and a ret). You can also add the I-cache locality,
> having wrappers for each syscalls instead of a single place for zeroing
> the upper half (where no other wrapper is necessary).
> 
> Can we trick the compiler into doing a tail call optimisation. This
> could have simply been:
> 
> COMPAT_SYSCALL_WRAP2(creat, ...):
> 	mov	w0, w0
> 	b	<sys_creat>

What you talk about was in my initial version. But Heiko insisted on having all
wrappers together.
http://www.spinics.net/lists/linux-s390/msg11593.html

Grep your email for discussion.

> 
> > > Cost wise, this seems like it all cancels out in the end, but what
> > > do I know?
> > 
> > I think you know something, and I also think Heiko and other s390 guys
> > know something as well. So I'd like to listen their arguments here.
> > 
> > For me spark64 way is looking reasonable only because it's really simple
> > and takes less coding. I'll try it on some branch and share here what happened.
> 
> The kernel code will definitely look simpler ;). It would be good to see
> if there actually is any performance impact. Even with 16 more cycles on
> syscall entry, would they be lost in the noise? You don't need a full
> implementation, just some dummy mov x0, x0 on the entry path.
> 
> -- 
> Catalin

From mboxrd@z Thu Jan  1 00:00:00 1970
From: ynorov@caviumnetworks.com (Yury Norov)
Date: Fri, 27 May 2016 03:37:53 +0300
Subject: [PATCH 01/23] all: syscall wrappers: add documentation
In-Reply-To: <20160526222943.GA16729@MBP.local>
References: <6293194.tGy03QJ9ME@wuerfel>
 <20160525.135039.244098606649448826.davem@davemloft.net>
 <6407614.fdv5XFSBue@wuerfel>
 <20160525.142821.1719403997976778673.davem@davemloft.net>
 <20160526204819.GA10274@yury-N73SV>
 <20160526222943.GA16729@MBP.local>
Message-ID: <20160527003753.GA14247@yury-N73SV>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Thu, May 26, 2016 at 11:29:45PM +0100, Catalin Marinas wrote:
> On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote:
> > On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote:
> > > From: Arnd Bergmann <arnd@arndb.de>
> > > Date: Wed, 25 May 2016 23:01:06 +0200
> > > 
> > > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote:
> > > >> From: Arnd Bergmann <arnd@arndb.de>
> > > >> Date: Wed, 25 May 2016 22:47:33 +0200
> > > >> 
> > > >> > If we use the normal calling conventions, we could remove these overrides
> > > >> > along with the respective special-case handling in glibc. None of them
> > > >> > look particularly performance-sensitive, but I could be wrong there.
> > > >> 
> > > >> You could set the lowest bit in the system call entry pointer to indicate
> > > >> the upper-half clears should be elided.
> > > > 
> > > > Right, but that would introduce an extra conditional branch in the syscall
> > > > hotpath, and likely eliminate the gains from passing the loff_t arguments
> > > > in a single register instead of a pair.
> > > 
> > > Ok, then, how much are you really gaining from avoiding a 'shift' and
> > > an 'or' to build the full 64-bit value?  3 cycles?  Maybe 4?
> > 
> > 4 cycles in kernel and ~same cost in glibc to create a pair.
> 
> It would take a single instruction per argument in the kernel to do
> shift+or and maybe 1-2 more instructions to move the remaining arguments
> in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S).
> And the glibc counterpart.
> 
> > And 8 'mov's that exist for every syscall, even yield().
> > 
> > > And the executing the wrappers, those have a non-trivial cost too.
> > 
> > The cost is pretty trivial though. See kernel/compat_wrapper.o:
> > COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode);
> > 0:   a9bf7bfd        stp     x29, x30, [sp,#-16]!
> > 4:   910003fd        mov     x29, sp
> > 8:   2a0003e0        mov     w0, w0
> > c:   94000000        bl      0 <sys_creat>
> > 10:  a8c17bfd        ldp     x29, x30, [sp],#16
> > 14:  d65f03c0        ret
> 
> I would say the above could be more expensive than 8 movs (16 bytes to
> write, read, a branch and a ret). You can also add the I-cache locality,
> having wrappers for each syscalls instead of a single place for zeroing
> the upper half (where no other wrapper is necessary).
> 
> Can we trick the compiler into doing a tail call optimisation. This
> could have simply been:
> 
> COMPAT_SYSCALL_WRAP2(creat, ...):
> 	mov	w0, w0
> 	b	<sys_creat>

What you talk about was in my initial version. But Heiko insisted on having all
wrappers together.
http://www.spinics.net/lists/linux-s390/msg11593.html

Grep your email for discussion.

> 
> > > Cost wise, this seems like it all cancels out in the end, but what
> > > do I know?
> > 
> > I think you know something, and I also think Heiko and other s390 guys
> > know something as well. So I'd like to listen their arguments here.
> > 
> > For me spark64 way is looking reasonable only because it's really simple
> > and takes less coding. I'll try it on some branch and share here what happened.
> 
> The kernel code will definitely look simpler ;). It would be good to see
> if there actually is any performance impact. Even with 16 more cycles on
> syscall entry, would they be lost in the noise? You don't need a full
> implementation, just some dummy mov x0, x0 on the entry path.
> 
> -- 
> Catalin