From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752434AbdFULNf (ORCPT ); Wed, 21 Jun 2017 07:13:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37838 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751688AbdFULNe (ORCPT ); Wed, 21 Jun 2017 07:13:34 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 7BAF7369C9 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=pabeni@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 7BAF7369C9 From: Paolo Abeni To: x86@kernel.org Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Al Viro , Kees Cook , Hannes Frederic Sowa , linux-kernel@vger.kernel.org Subject: [PATCH] x86/uaccess: use unrolled string copy for short strings Date: Wed, 21 Jun 2017 13:09:51 +0200 Message-Id: <63d913f28bc64bd4ea66a39a532f0b59ee015382.1498039056.git.pabeni@redhat.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 21 Jun 2017 11:13:33 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The 'rep' prefix suffers for a relevant "setup cost"; as a result string copies with unrolled loops are faster than even optimized string copy using 'rep' variant, for short string. This change updates __copy_user_generic() to use the unrolled version for small string length. The threshold length for short string - 64 - has been selected with empirical measures as the larger value that still ensure a measurable gain. A micro-benchmark of __copy_from_user() with different lengths shows the following: string len vanilla patched delta bytes ticks ticks tick(%) 0 58 26 32(55%) 1 49 29 20(40%) 2 49 31 18(36%) 3 49 32 17(34%) 4 50 34 16(32%) 5 49 35 14(28%) 6 49 36 13(26%) 7 49 38 11(22%) 8 50 31 19(38%) 9 51 33 18(35%) 10 52 36 16(30%) 11 52 37 15(28%) 12 52 38 14(26%) 13 52 40 12(23%) 14 52 41 11(21%) 15 52 42 10(19%) 16 51 34 17(33%) 17 51 35 16(31%) 18 52 37 15(28%) 19 51 38 13(25%) 20 52 39 13(25%) 21 52 40 12(23%) 22 51 42 9(17%) 23 51 46 5(9%) 24 52 35 17(32%) 25 52 37 15(28%) 26 52 38 14(26%) 27 52 39 13(25%) 28 52 40 12(23%) 29 53 42 11(20%) 30 52 43 9(17%) 31 52 44 8(15%) 32 51 36 15(29%) 33 51 38 13(25%) 34 51 39 12(23%) 35 51 41 10(19%) 36 52 41 11(21%) 37 52 43 9(17%) 38 51 44 7(13%) 39 52 46 6(11%) 40 51 37 14(27%) 41 50 38 12(24%) 42 50 39 11(22%) 43 50 40 10(20%) 44 50 42 8(16%) 45 50 43 7(14%) 46 50 43 7(14%) 47 50 45 5(10%) 48 50 37 13(26%) 49 49 38 11(22%) 50 50 40 10(20%) 51 50 42 8(16%) 52 50 42 8(16%) 53 49 46 3(6%) 54 50 46 4(8%) 55 49 48 1(2%) 56 50 39 11(22%) 57 50 40 10(20%) 58 49 42 7(14%) 59 50 42 8(16%) 60 50 46 4(8%) 61 50 47 3(6%) 62 50 48 2(4%) 63 50 48 2(4%) 64 51 38 13(25%) Above 64 bytes the gain fades away. Very similar values are collectd for __copy_to_user(). UDP receive performances under flood with small packets using recvfrom() increase by ~5%. Signed-off-by: Paolo Abeni --- arch/x86/include/asm/uaccess_64.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index c5504b9..16a8871 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len) { unsigned ret; + if (len <= 64) + return copy_user_generic_unrolled(to, from, len); + /* * If CPU has ERMS feature, use copy_user_enhanced_fast_string. * Otherwise, if CPU has rep_good feature, use copy_user_generic_string. -- 2.9.4