From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751428AbdFAG7F (ORCPT ); Thu, 1 Jun 2017 02:59:05 -0400 Received: from mail.fireflyinternet.com ([109.228.58.192]:64175 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751113AbdFAG7E (ORCPT ); Thu, 1 Jun 2017 02:59:04 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; From: Chris Wilson To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, intel-gfx@lists.freedesktop.org Subject: x86: Static optimisations for copy_user Date: Thu, 1 Jun 2017 07:58:40 +0100 Message-Id: <20170601065843.2392-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.11.0 X-Originating-IP: 78.156.65.138 X-Country: code=GB country="United Kingdom" ip=78.156.65.138 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I was looking at the overhead of drmIoctl() in a microbenchmark that repeatedly did a copy_from_user(.size=8) followed by a copy_to_user(.size=8) as part of the DRM_IOCTL_I915_GEM_BUSY. I found that if I forced inlined the get_user/put_user instead the walltime of the ioctl was improved by about 20%. If copy_user_generic_unrolled was used instead of copy_user_enhanced_fast_string, performance of the microbenchmark was improved by 10%. Benchmarking on a few machines (Broadwell) benchmark_copy_user(hot): size unrolled string fast-string 1 158 77 79 2 306 154 158 4 614 308 317 6 926 462 476 8 1344 298 635 12 1773 482 952 16 2797 602 1269 24 4020 903 1906 32 5055 1204 2540 48 6150 1806 3810 64 9564 2409 5082 96 13583 3612 6483 128 18108 4815 8434 (Broxton) benchmark_copy_user(hot): size unrolled string fast-string 1 270 52 53 2 364 106 109 4 460 213 218 6 486 305 312 8 1250 253 437 12 1009 332 625 16 2059 514 897 24 2624 672 1071 32 3043 1014 1750 48 3620 1499 2561 64 7777 1971 3333 96 7499 2876 4772 128 9999 3733 6088 which says that for this cache hot case in benchmarking the rep mov microcode noticeably underperforms. Though once we pass a few cachelines, and definitely after exceeding L1 cache, rep mov is the clear winner. From cold, there is no difference in timings. I can improve the microbenchmark by either force inlining the raw_copy_*_user switches, or by switching to copy_user_generic_unrolled. Both leave a sour taste. The switch is too big to be inlined, and if called out-of-line the function call overhead negates its benefits. Switching between fast-string and unrolled makes a presumption on behaviour. In the end, I limited this series to just adding a few extra translations for statically known copy_*_user(). -Chris From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: x86: Static optimisations for copy_user Date: Thu, 1 Jun 2017 07:58:40 +0100 Message-ID: <20170601065843.2392-1-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9DBE76E2EF for ; Thu, 1 Jun 2017 06:58:48 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: linux-kernel@vger.kernel.org Cc: intel-gfx@lists.freedesktop.org, x86@kernel.org List-Id: intel-gfx@lists.freedesktop.org SSB3YXMgbG9va2luZyBhdCB0aGUgb3ZlcmhlYWQgb2YgZHJtSW9jdGwoKSBpbiBhIG1pY3JvYmVu Y2htYXJrIHRoYXQKcmVwZWF0ZWRseSBkaWQgYSBjb3B5X2Zyb21fdXNlciguc2l6ZT04KSBmb2xs b3dlZCBieSBhCmNvcHlfdG9fdXNlciguc2l6ZT04KSBhcyBwYXJ0IG9mIHRoZSBEUk1fSU9DVExf STkxNV9HRU1fQlVTWS4gSSBmb3VuZAp0aGF0IGlmIEkgZm9yY2VkIGlubGluZWQgdGhlIGdldF91 c2VyL3B1dF91c2VyIGluc3RlYWQgdGhlIHdhbGx0aW1lIG9mCnRoZSBpb2N0bCB3YXMgaW1wcm92 ZWQgYnkgYWJvdXQgMjAlLiBJZiBjb3B5X3VzZXJfZ2VuZXJpY191bnJvbGxlZCB3YXMKdXNlZCBp bnN0ZWFkIG9mIGNvcHlfdXNlcl9lbmhhbmNlZF9mYXN0X3N0cmluZywgcGVyZm9ybWFuY2Ugb2Yg dGhlCm1pY3JvYmVuY2htYXJrIHdhcyBpbXByb3ZlZCBieSAxMCUuIEJlbmNobWFya2luZyBvbiBh IGZldyBtYWNoaW5lcwoKKEJyb2Fkd2VsbCkKIGJlbmNobWFya19jb3B5X3VzZXIoaG90KToKICAg ICAgIHNpemUgICB1bnJvbGxlZCAgICAgc3RyaW5nIGZhc3Qtc3RyaW5nCiAgICAgICAgICAxICAg ICAgICAxNTggICAgICAgICA3NyAgICAgICAgIDc5CiAgICAgICAgICAyICAgICAgICAzMDYgICAg ICAgIDE1NCAgICAgICAgMTU4CiAgICAgICAgICA0ICAgICAgICA2MTQgICAgICAgIDMwOCAgICAg ICAgMzE3CiAgICAgICAgICA2ICAgICAgICA5MjYgICAgICAgIDQ2MiAgICAgICAgNDc2CiAgICAg ICAgICA4ICAgICAgIDEzNDQgICAgICAgIDI5OCAgICAgICAgNjM1CiAgICAgICAgIDEyICAgICAg IDE3NzMgICAgICAgIDQ4MiAgICAgICAgOTUyCiAgICAgICAgIDE2ICAgICAgIDI3OTcgICAgICAg IDYwMiAgICAgICAxMjY5CiAgICAgICAgIDI0ICAgICAgIDQwMjAgICAgICAgIDkwMyAgICAgICAx OTA2CiAgICAgICAgIDMyICAgICAgIDUwNTUgICAgICAgMTIwNCAgICAgICAyNTQwCiAgICAgICAg IDQ4ICAgICAgIDYxNTAgICAgICAgMTgwNiAgICAgICAzODEwCiAgICAgICAgIDY0ICAgICAgIDk1 NjQgICAgICAgMjQwOSAgICAgICA1MDgyCiAgICAgICAgIDk2ICAgICAgMTM1ODMgICAgICAgMzYx MiAgICAgICA2NDgzCiAgICAgICAgMTI4ICAgICAgMTgxMDggICAgICAgNDgxNSAgICAgICA4NDM0 CgooQnJveHRvbikKIGJlbmNobWFya19jb3B5X3VzZXIoaG90KToKICAgICAgIHNpemUgICB1bnJv bGxlZCAgICAgc3RyaW5nIGZhc3Qtc3RyaW5nCiAgICAgICAgICAxICAgICAgICAyNzAgICAgICAg ICA1MiAgICAgICAgIDUzCiAgICAgICAgICAyICAgICAgICAzNjQgICAgICAgIDEwNiAgICAgICAg MTA5CiAgICAgICAgICA0ICAgICAgICA0NjAgICAgICAgIDIxMyAgICAgICAgMjE4CiAgICAgICAg ICA2ICAgICAgICA0ODYgICAgICAgIDMwNSAgICAgICAgMzEyCiAgICAgICAgICA4ICAgICAgIDEy NTAgICAgICAgIDI1MyAgICAgICAgNDM3CiAgICAgICAgIDEyICAgICAgIDEwMDkgICAgICAgIDMz MiAgICAgICAgNjI1CiAgICAgICAgIDE2ICAgICAgIDIwNTkgICAgICAgIDUxNCAgICAgICAgODk3 CiAgICAgICAgIDI0ICAgICAgIDI2MjQgICAgICAgIDY3MiAgICAgICAxMDcxCiAgICAgICAgIDMy ICAgICAgIDMwNDMgICAgICAgMTAxNCAgICAgICAxNzUwCiAgICAgICAgIDQ4ICAgICAgIDM2MjAg ICAgICAgMTQ5OSAgICAgICAyNTYxCiAgICAgICAgIDY0ICAgICAgIDc3NzcgICAgICAgMTk3MSAg ICAgICAzMzMzCiAgICAgICAgIDk2ICAgICAgIDc0OTkgICAgICAgMjg3NiAgICAgICA0NzcyCiAg ICAgICAgMTI4ICAgICAgIDk5OTkgICAgICAgMzczMyAgICAgICA2MDg4Cgp3aGljaCBzYXlzIHRo YXQgZm9yIHRoaXMgY2FjaGUgaG90IGNhc2UgaW4gYmVuY2htYXJraW5nIHRoZSByZXAgbW92Cm1p Y3JvY29kZSBub3RpY2VhYmx5IHVuZGVycGVyZm9ybXMuIFRob3VnaCBvbmNlIHdlIHBhc3MgYSBm ZXcKY2FjaGVsaW5lcywgYW5kIGRlZmluaXRlbHkgYWZ0ZXIgZXhjZWVkaW5nIEwxIGNhY2hlLCBy ZXAgbW92IGlzIHRoZQpjbGVhciB3aW5uZXIuIEZyb20gY29sZCwgdGhlcmUgaXMgbm8gZGlmZmVy ZW5jZSBpbiB0aW1pbmdzLgoKSSBjYW4gaW1wcm92ZSB0aGUgbWljcm9iZW5jaG1hcmsgYnkgZWl0 aGVyIGZvcmNlIGlubGluaW5nIHRoZQpyYXdfY29weV8qX3VzZXIgc3dpdGNoZXMsIG9yIGJ5IHN3 aXRjaGluZyB0byBjb3B5X3VzZXJfZ2VuZXJpY191bnJvbGxlZC4KQm90aCBsZWF2ZSBhIHNvdXIg dGFzdGUuIFRoZSBzd2l0Y2ggaXMgdG9vIGJpZyB0byBiZSBpbmxpbmVkLCBhbmQgaWYKY2FsbGVk IG91dC1vZi1saW5lIHRoZSBmdW5jdGlvbiBjYWxsIG92ZXJoZWFkIG5lZ2F0ZXMgaXRzIGJlbmVm aXRzLgpTd2l0Y2hpbmcgYmV0d2VlbiBmYXN0LXN0cmluZyBhbmQgdW5yb2xsZWQgbWFrZXMgYSBw cmVzdW1wdGlvbiBvbgpiZWhhdmlvdXIuCgpJbiB0aGUgZW5kLCBJIGxpbWl0ZWQgdGhpcyBzZXJp ZXMgdG8ganVzdCBhZGRpbmcgYSBmZXcgZXh0cmEKdHJhbnNsYXRpb25zIGZvciBzdGF0aWNhbGx5 IGtub3duIGNvcHlfKl91c2VyKCkuCi1DaHJpcwoKX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlz dHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlzdHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4v bGlzdGluZm8vaW50ZWwtZ2Z4Cg==