linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp
@ 2018-06-08 10:20 Christophe Leroy
  2018-06-08 10:20 ` [PATCH v4 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Christophe Leroy @ 2018-06-08 10:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, wei.guo.simon
  Cc: linux-kernel, linuxppc-dev

This patch renames memcmp test to memcmp_64 and adds
a memcmp_32 test for testing the 32 bits version of memcmp()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v4: new

 tools/testing/selftests/powerpc/stringloops/Makefile    |  14 +++++++++++---
 tools/testing/selftests/powerpc/stringloops/memcmp_32   | Bin 0 -> 9092 bytes
 tools/testing/selftests/powerpc/stringloops/memcmp_32.S |   1 +
 3 files changed, 12 insertions(+), 3 deletions(-)
 create mode 100755 tools/testing/selftests/powerpc/stringloops/memcmp_32
 create mode 120000 tools/testing/selftests/powerpc/stringloops/memcmp_32.S

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index 1125e489055e..1e7301d4bac9 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -1,10 +1,18 @@
 # SPDX-License-Identifier: GPL-2.0
 # The loops are all 64-bit code
-CFLAGS += -m64
 CFLAGS += -I$(CURDIR)
 
-TEST_GEN_PROGS := memcmp
-EXTRA_SOURCES := memcmp_64.S ../harness.c
+EXTRA_SOURCES := ../harness.c
+
+$(OUTPUT)/memcmp_64: memcmp.c
+$(OUTPUT)/memcmp_64: CFLAGS += -m64
+
+$(OUTPUT)/memcmp_32: memcmp.c
+$(OUTPUT)/memcmp_32: CFLAGS += -m32
+
+ASFLAGS = $(CFLAGS)
+
+TEST_GEN_PROGS := memcmp_32 memcmp_64
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp_32 b/tools/testing/selftests/powerpc/stringloops/memcmp_32
new file mode 100755
index 0000000000000000000000000000000000000000..413a02181daf59b0bdd7f0e2e05b74c4c702e7ad
GIT binary patch
literal 9092
zcmcIqeQZ?MmA`MsU_(su07D&M+h^I?n%0aBm@r@x#(bK@5E_gW+H{|XnXx^~d}Zc2
z8F$CLfgx?3ZMRfTrINZpw^24(r6`s1N7aTjRg((TM9QY!6;jg3QCcP4@CQxliguIC
z{(kq}YsRl6L6!ExnS0N<=bm%!x#xa9yW9JB7y(1fq=0Bb*)YZ2Hsswxe2O_smZ%r)
zVuiRF*j!Rf1t?D|n1j3y5QK;(QvvV=3Rp*FQX$$4f)I;Vcp&C@HT+$)!ZbVp`|}V<
zyXsYa0QGUyO91_x0@_6#0HY21B>=Y+$hx@myUzi?R$_azGnG8pnTm%B+0bgm#C9~*
z*tuuFA_>?a{a_(q2xwM0GO?IHNb>=o08s8t0Lp0qECsMXOg93S0O+IZP4$Wqeirip
zHvk$HO=t8m$72zIw$t8grEhB@8IksRnb#2IWj=h&had9c>wWlFefZz|@Y6ngl@I^1
z#x1YVqH$&K9v^;#Da=PftHYqsr;q#UBR>4|K0I$4OWHtVdyd^2Uwzq!|BDar^Wi6E
z;fSpmBkNXS@4Cw_IPql09xWu|VrVEeS{M<DV$u=0L_VL*i`=MF5JTDgA(2j`qp9R@
zhNRd?G%unDvw24xN~TgGmrrJ#Az|BSl*!tuY%J;|vl+IDrE>x*vZIc~a$};9aB{=R
zxQM2r`Lrk`hodpcT*yZ=@oZW=5KTHc)Y~=<g?-VyW2d9Z3@F3tY(`ORv9o{ImaX=h
zQ1?u!OZGE>xgYTTW9$VmbNzHiB&qyWOTmI<lJkObeV#PnI(B~rT$sk<7ZL7q?s((0
zNX|9WXljQT&Li+N!Ipq7hapavBN$d)#yHK<<-M4qx_l3o8C@R4bk*e%Of_B3VQTAg
z5%%bE8Piagzl!kF<>Q$Cx{Uc|>hftU9lHEDmJMBg3d@5o{|L*2E<cB*LzmAXJZ1Ue
zmg=QaL*TLFhG?4nnP}K@O3ZQ31s>b{x-fcPU+`Y}lhrGou&no4(a>kyIMF8pEq!9%
zAIjh_H3T24piQ_%n0v~?+Eg~hra@DLk1Z78*!*~(sDGxk@`mC-i?}r`gz$6)sY~33
zW#z%g&4(egUl{*<LEAz7w1YO#Cg{K9KKE|Zk=x6ne7Ib<VWkL;xAyJ*?1Y#b`L1XO
zy`ggg_MH?fXw&%0;A6Axs<TL^ouq49p)>w*I;pRGvMfp!Yj5e;(p>4-jR50Yy+@(<
zAIgoL57p@1b-j8Yx(L0~^ttM5v+8TxWK=Xx9uX${Jo!D*#C}%}m(9sH#EOX?A=uBz
zNpUZ9uc$m*ZmxV#ZgT(q?Y&1f3*nw!n(H}Rt-2F$N5=2nSBy-FcKFKJ{~2@R_`Q2W
z-vrM%Y}`|>Ymxlm>z%4j-&y@&VuK}&6Oa!-wN+q)^dIb~zy|oOxdQ$0o5eAv-SnFr
z8~Am$PaiTr&Zii!w(-_Gi$_{#`qS(mAJ{YkK1+`4v(?{nOlkL>)#`<h(qH<wrT3MF
zmKQNTZKdC1tfz%24TVFcx^O7`mZ&QQMWi=Sy%aerHs!uAg1e6jtGyxuy~o6g($Gfm
zZ6sfyG}!#P-Cq_~KkCu`4BDR|jr|V~3S;*X=GAJAKVF{f70vKTH^$Gx*tH>s+c9n(
zlZG%STQIkx&@r~WSo(Hrv1iZ_tzR&Nd*YQQ_eZZZ^}x<POzfVjFx*oYB8cT@+~56g
z(S7utqFZ^h*k{atqjY$Av6cp3EOE~-8;g{yFS_TKkM$m{K3|G2FLM65Z=vmuUlrXK
z{#Xph%*Wm8D@~B`p!>*`2;yR@^;J>rsaC6-PlNZwyG=c3&5Q1azchVq3i`kQ4)q;y
zr_MzpC##px-}%6Q=w5iY$-M%elQTT)+>?-ZmFMf^>-ZGb4Yyj29IbNRbyv`T8Bg#*
z1wN|44;A<bvDJpyYDa8!Ahx=cZ>ej$(s>)oQ8nH#han8~GW{t1OrJuY-`{=4ytAc$
zqCbpw-3GKLN+n5a;`jl-uLln3IsIv~2flDO{}O$k_x_ZNzIlDR?8aX$hEHRRr2pp~
zK>cq@$sM`zmBz#SS4Kka@0<66cD_{l^BMQpn?*PFYVpqM`+w^AJM<s(5dD8r>6+6M
z`LwBYoT$|y<wK9Bqv20Ir-cQXqu0<ArCjP#vR|vwd8A_*bbemyoPwLWEAWl<`G@&?
zPzddFxn9T2%Y8<}w>*0re!#K7JbszJzs)iY#&OAAUMwe<w9Ujg#;!EgTI6^z-qC&)
z-~{57zMtwmS*;?DFT2nG5x#Ahpqz4Y@g~L`@Rsmt*l+~v{Lf{5OJ@b+fqBq#cHuVn
zuy`5k+&$pyRqI3@@Ygy|VV#4oxdwW1TK~gvS=>b5_Zju?!~QwQUzjbs{skr5=sj6|
ziu@(W2j5HOsA$N=g^Tsz>4DfUZpR$7Jbx_gK>dF3&QrYKqi>{dIgh#TeMk(e_Sh@y
zw|^u*<jo#~g(ljxTyL8VZNeB|S?^$tG_VggE4>4K#@z4HR;&*Lik1WIyrM<WC-{Hv
zpIxkvTvJg#%dy$8a&h>0Qi$UdqE3y!+@rV;mBzhwB^+CL9OKWmt(kjL=~OHHK6h2k
zzCQ(?XF$6dG$VX$aafMUF`N+w#f_j(Fh1b#0r$69qp;Rs&YbA1h!}N{hL|Y==RP;(
z*}n+BoO2$2X=AKN-@_*VzNGs4Yw*6rGG)P*Y2aUH+n(nze=u&8ajkt#wZ91dFJ7mv
zE8tx%d7php+Vx?wg;ANAPioI^S9^Z9+#4`Y<(`k&;+{XVrg7eJ&v>M@So+Cl;j91I
zkA1POf-{LbF}>HFT6!P$x~`rls_*q+?mT##sB;roPqtz0Z~bMF`zH4kj<b7enrrnm
z?EU!lL9NG(17RFZeb}Ck7--bzGn_@lgPW`Gbxz26->mm+P2)ZR-81Jftox0Ke|c{F
zs6AclM%>LhpTZYsDNn`&&URWBR;y{eKbWStsRJ;L8-n3oW~_X|8O>)BaqE_NhgBHK
zj;7+)Nc8?hZH3jI9U8(t$GQcTh2)nKEUcU_#4TNGik1*pFH>Hbmn83&1$;DfD3g64
zW62AcHI&b$Eqx^tEiD!<W3hzAYgaNeY~50TR6+VI3h>xS63XLASbA^_Mhs`7DcC%p
zytC`H!dMiyKp~t&!P%$;-`^cty?)iY@cLE5x329vxOPKg?ZI2uu2~gN<{e(bo#=33
zqcxm#?E4e>0<L@;J%g^^>V%N}Py!2wk~uAVD4I--<`Wtfh0$0nQGhtrCx{m~UQ}yy
zmU;j3?qI}-FKGvK;4KFEBHo)pTSQ#H-&i2=en)6D#hmAnFTfo=j=T<UG~1BRF+~ug
zfI%0=CFJz$ZvcGPFah$OgFFb`)5v*ed=J2TAm2hhVTwB3heSQ{r;s;59qZ-+8ne~v
z2cVacvp)v%OTcH|4M6)_Q_&PRT=-iX|9`Z@d~bp8>06n0Zr!@kYR9Y9N^4!HJJhA=
z{+w@hq~YStae~v0HigS~BQ;K}$A4MRaUxRZ;d1<l^UVqEJ^upd+ZXXNGS<ttE8rX_
z;utGg&+#M9F$2Gh6XJ4?aXnD=j0x1s_+h<#_X7T~s+aF*rdZUhaK5iGCRZq&?{LI*
z|D;^t9jcyhd93eJxO`>;E`17p>b+sHu9tq(a{~41e!h5GwdY$O<jFTB+RHbknFJhm
z8aHb~ov8i+WjO|<DZCN-rTwU&{mXz$`+?Jb&OK>AaN55P{_020i9&y^2mTy#jo-m}
z3)BEV?K!|XjJ!_aDO1#kfouL@;FoLgub84i_FKt63f!rwhZrv@`zK+aMG7it|7krw
zz&So=eD?no>;Bv1SMq<1@$IX@|C41VCI59(%tIU+gNpwxQ_RPB8IHm)dE7Jne^K+c
z&Z`eTit%HA8(<ImWi$Yf0RNi8_4=drX>kfSEZ8r{2Nmd_v6_1rnb00>jZLVR@dF(3
zV{G^F-vfQWuHnzfeGld$_%MFPVb(EO@;9&_{4)N4)80oU89&~esPO|I{!<_RA3pp|
z$eSh&!9;ui+gH!ERwgAM>z+6NfHz{47=tzSffgUW)`#yv{~3Q`VSp+F`*eSRpg(~@
zAAe5cmMjEZjeGV4zTv}9L0&g$kWYC}!rn)LsyE@lPkj6r5D%}R-xnbd{tmpsvj=#d
zRygOROrAai``eN(5@0XpPY|)^C1uY6%(pgB+PwBb#7FI$a&S$Jzp*|9Z&&MElj7ex
zTV8OV#w}S04r|=g7yObb7SR5F)&47x{~~$3{s$*fe-5aY|D><|&opkyLhx0Md-9F%
z%*KNkG2f4q=IIN(qt|QT^F=K06r9nap->F3##;yOy4&vGwQs<-fo+?G-LZA1DxSzE
zhLZ&+k++?+9ZO|1iGqk_)45c_NyJ0l8`i89aVML%lkp<4bs`o`rEJ@dXYJus_Fyz+
zv$|kMM~hz7P(GSY*zwVHdQ6+Kea|*&#g4mscW<{fMk~DXUx|#(i1ZGi=n+q7Hms9{
zSQLjgyvM`X0_x<ex;=uY^i%?G^n7Et)dPDXHMA<3NjlOeEZEU}K00P6GI(s?x2@OS
z(z|avJhbh=p5EQNw!-2XCix66?4A90ZRzc|@7l3r-}V7}pm$3@TX-+^vBIc5geLf`
zFtFRc>dk@OTj|Sz=)qJ%s8{|-M8Dp5sjmlDDc$Nr#8pa{`gTxDu@mvA6IGuWYAf`I
zidvfVLoLDkyirS0XY5*%iWNOb)aQMCke@|nbI$rQ!7nJa)jXTm;s}{SRt=K>Bg|iY
z_NZ-`AI;bt{aTlEkn~5HT3-LxnA$q;i%c!WwhtBxs(12Z3qShIVz}de`))Nqwx*(m
z0w#z0lvB$lV^f7Ati;%>z^<*Z(a?61=>$Iu&D<4g#mErS({WY+$<I!;tWvQ0(j-EK
zv9uFCh}_A0`G_vz6ICLg6QNAjNrZa0>{^A(iwKQG@x?0?AIqSjmpge88qSP{)Qx5q
zVxumfNJZH|m2xRZgya~6oJ0|2IR!%btPIakV#JoKfC$Nn3>BQuBJ__js>wQQq|#_Q
z8AE{qO`-^4Fwi2KPA4*sUihvrUk)|{fDMu7X?doW=TX(1bxf@Qo`cDUeS#n?fKGZp
zSq%cu)8yL*K7iobt?QW+)&qFXBi|+P5z4^XmO9CUtx9YL;K)h7U=Brqyk{YkN!zzm
z-Ic({bG$sucY#Ce#j%M;Gcmxs70cfFzOIZi`D*n76)^zYlJ6-L0rGpfmdCsikOr{4
z1$@}A<eiRZ57Ky!)_f13Oj~g&;5~sL`T;tTk7*pB_;`OH%pZ_7-jC+1T;=1rMxJfD
zuEB?|H&Pz@!TScmICc%bZ(il&Jw)C`v@WviGWS*5hkNbJ-#|2wK_$)iufR258{`4J
zdy|jvi5k8R@BwCYkyXn(Q^SXT%DYZC8C0S!rl$eCztO%p_yF>6EjYeRlt+8d0+f9`
zm&&v06lgl>KK&d7^3{9i6M05y(oD!HkB|qj4f&=a4<KV7B$>#=bPhm1*vIn~`DXnc
zgywq*bm~=OD)|-y*NJrm(l|d<`XpoOzchxNJWR|P|0*rM3^xq|;r!nSpk5QeyKIfT
NWdlI3fToj__kRhC;J5$)

literal 0
HcmV?d00001

diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp_32.S b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
new file mode 120000
index 000000000000..056f2b3af789
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/memcmp_32.S
\ No newline at end of file
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/4] selftests/powerpc: Add test for strlen()
  2018-06-08 10:20 [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
@ 2018-06-08 10:20 ` Christophe Leroy
  2018-06-08 10:20 ` [PATCH v4 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Christophe Leroy @ 2018-06-08 10:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, wei.guo.simon
  Cc: linux-kernel, linuxppc-dev

This patch adds a test for strlen()

string.c contains a copy of strlen() from lib/string.c

The test first tests the correctness of strlen() by comparing
the result with libc strlen(). It tests all cases of alignment.

It them tests the duration of an aligned strlen() on a 4 bytes string,
on a 16 bytes string and on a 256 bytes string.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v4: new

 .../testing/selftests/powerpc/stringloops/Makefile |   5 +-
 .../testing/selftests/powerpc/stringloops/string.c |  36 ++++++
 .../testing/selftests/powerpc/stringloops/strlen.c | 123 +++++++++++++++++++++
 3 files changed, 163 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/stringloops/string.c
 create mode 100644 tools/testing/selftests/powerpc/stringloops/strlen.c

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index 1e7301d4bac9..df663ee9ddb3 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -10,9 +10,12 @@ $(OUTPUT)/memcmp_64: CFLAGS += -m64
 $(OUTPUT)/memcmp_32: memcmp.c
 $(OUTPUT)/memcmp_32: CFLAGS += -m32
 
+$(OUTPUT)/strlen: strlen.c string.o
+$(OUTPUT)/string.o: string.c
+
 ASFLAGS = $(CFLAGS)
 
-TEST_GEN_PROGS := memcmp_32 memcmp_64
+TEST_GEN_PROGS := memcmp_32 memcmp_64 strlen
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/stringloops/string.c b/tools/testing/selftests/powerpc/stringloops/string.c
new file mode 100644
index 000000000000..d05200481017
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/string.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  linux/lib/string.c
+ *
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ */
+
+/*
+ * stupid library routines.. The optimized versions should generally be found
+ * as inline code in <asm-xx/string.h>
+ *
+ * These are buggy as well..
+ *
+ * * Fri Jun 25 1999, Ingo Oeser <ioe@informatik.tu-chemnitz.de>
+ * -  Added strsep() which will replace strtok() soon (because strsep() is
+ *    reentrant and should be faster). Use only strsep() in new code, please.
+ *
+ * * Sat Feb 09 2002, Jason Thomas <jason@topic.com.au>,
+ *                    Matthew Hawkins <matt@mh.dropbear.id.au>
+ * -  Kissed strtok() goodbye
+ */
+
+#include <stddef.h>
+
+/**
+ * strlen - Find the length of a string
+ * @s: The string to be sized
+ */
+size_t test_strlen(const char *s)
+{
+	const char *sc;
+
+	for (sc = s; *sc != '\0'; ++sc)
+		/* nothing */;
+	return sc - s;
+}
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
new file mode 100644
index 000000000000..e87ca65ea156
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <malloc.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include "utils.h"
+
+#define SIZE 256
+#define ITERATIONS 1000
+#define ITERATIONS_BENCH 100000
+
+int test_strlen(const void *s);
+
+/* test all offsets and lengths */
+static void test_one(char *s)
+{
+	unsigned long offset;
+
+	for (offset = 0; offset < SIZE; offset++) {
+		int x, y;
+		unsigned long i;
+
+		y = strlen(s + offset);
+		x = test_strlen(s + offset);
+
+		if (x != y) {
+			printf("strlen() returned %d, should have returned %d (%p offset %ld)\n", x, y, s, offset);
+
+			for (i = offset; i < SIZE; i++)
+				printf("%02x ", s[i]);
+			printf("\n");
+		}
+	}
+}
+
+static int testcase(void)
+{
+	char *s;
+	unsigned long i;
+	struct timespec ts_start, ts_end;
+
+	s = memalign(128, SIZE);
+	if (!s) {
+		perror("memalign");
+		exit(1);
+	}
+
+	srandom(1);
+
+	memset(s, 0, SIZE);
+	for (i = 0; i < SIZE; i++) {
+		char c;
+
+		do {
+			c = random() & 0x7f;
+		} while (!c);
+		s[i] = c;
+		test_one(s);
+	}
+
+	for (i = 0; i < ITERATIONS; i++) {
+		unsigned long j;
+
+		for (j = 0; j < SIZE; j++) {
+			char c;
+
+			do {
+				c = random() & 0x7f;
+			} while (!c);
+			s[j] = c;
+		}
+		for (j = 0; j < sizeof(long); j++) {
+			s[SIZE - 1 - j] = 0;
+			test_one(s);
+		}
+	}
+
+	for (i = 0; i < SIZE; i++) {
+		char c;
+
+		do {
+			c = random() & 0x7f;
+		} while (!c);
+		s[i] = c;
+	}
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_start);
+
+	s[SIZE - 1] = 0;
+	for (i = 0; i < ITERATIONS_BENCH; i++)
+		test_strlen(s);
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_end);
+
+	printf("len %3.3d : time = %.6f\n", SIZE, ts_end.tv_sec - ts_start.tv_sec + (ts_end.tv_nsec - ts_start.tv_nsec) / 1e9);
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_start);
+
+	s[16] = 0;
+	for (i = 0; i < ITERATIONS_BENCH; i++)
+		test_strlen(s);
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_end);
+
+	printf("len 16  : time = %.6f\n", ts_end.tv_sec - ts_start.tv_sec + (ts_end.tv_nsec - ts_start.tv_nsec) / 1e9);
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_start);
+
+	s[4] = 0;
+	for (i = 0; i < ITERATIONS_BENCH; i++)
+		test_strlen(s);
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_end);
+
+	printf("len 4   : time = %.6f\n", ts_end.tv_sec - ts_start.tv_sec + (ts_end.tv_nsec - ts_start.tv_nsec) / 1e9);
+
+	return 0;
+}
+
+int main(void)
+{
+	return test_harness(testcase, "strlen");
+}
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 3/4] powerpc/lib: implement strlen() in assembly
  2018-06-08 10:20 [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
  2018-06-08 10:20 ` [PATCH v4 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
@ 2018-06-08 10:20 ` Christophe Leroy
  2018-06-08 11:45   ` Gabriel Paubert
  2018-06-08 10:20 ` [PATCH v4 4/4] selftests/powerpc: update strlen() test to test the new assembly function Christophe Leroy
  2018-06-08 10:26 ` [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
  3 siblings, 1 reply; 7+ messages in thread
From: Christophe Leroy @ 2018-06-08 10:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, wei.guo.simon
  Cc: linux-kernel, linuxppc-dev

The generic implementation of strlen() reads strings byte per byte.

This patch implements strlen() in assembly based on a read of entire
words, in the same spirit as what some other arches and glibc do.

On a 8xx the time spent in strlen is reduced by 2/3 for long strings.

strlen() selftest on an 8xx provides the following values:

Before the patch (ie with the generic strlen() in lib/string.c):

len 256 : time = 0.803648
len 16  : time = 0.062989
len 4   : time = 0.026269

After the patch:

len 256 : time = 0.267791  ==>  66% improvment
len 16  : time = 0.037902  ==>  41% improvment
len 4   : time = 0.026124  ==>  no degradation

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
Not tested on PPC64.

Changes in v4:
 - Added alignment of the loop
 - doing the andc only if still not 0 as it happends only for bytes above 0x7f which is pretty rare in a string

Changes in v3:
 - Made it common to PPC32 and PPC64

Changes in v2:
 - Moved handling of unaligned strings outside of the main path as it is very unlikely.
 - Removed the verification of the fourth byte in case none of the three first ones are NUL.


 arch/powerpc/include/asm/asm-compat.h |  4 +++
 arch/powerpc/include/asm/string.h     |  1 +
 arch/powerpc/lib/string.S             | 57 +++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+)

diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h
index 7f2a7702596c..0e99fe7570c0 100644
--- a/arch/powerpc/include/asm/asm-compat.h
+++ b/arch/powerpc/include/asm/asm-compat.h
@@ -20,8 +20,10 @@
 
 /* operations for longs and pointers */
 #define PPC_LL		stringify_in_c(ld)
+#define PPC_LLU		stringify_in_c(ldu)
 #define PPC_STL		stringify_in_c(std)
 #define PPC_STLU	stringify_in_c(stdu)
+#define PPC_ROTLI	stringify_in_c(rotldi)
 #define PPC_LCMPI	stringify_in_c(cmpdi)
 #define PPC_LCMPLI	stringify_in_c(cmpldi)
 #define PPC_LCMP	stringify_in_c(cmpd)
@@ -53,8 +55,10 @@
 
 /* operations for longs and pointers */
 #define PPC_LL		stringify_in_c(lwz)
+#define PPC_LLU		stringify_in_c(lwzu)
 #define PPC_STL		stringify_in_c(stw)
 #define PPC_STLU	stringify_in_c(stwu)
+#define PPC_ROTLI	stringify_in_c(rotlwi)
 #define PPC_LCMPI	stringify_in_c(cmpwi)
 #define PPC_LCMPLI	stringify_in_c(cmplwi)
 #define PPC_LCMP	stringify_in_c(cmpw)
diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
index 9b8cedf618f4..8fdcb532de72 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -13,6 +13,7 @@
 #define __HAVE_ARCH_MEMCHR
 #define __HAVE_ARCH_MEMSET16
 #define __HAVE_ARCH_MEMCPY_FLUSHCACHE
+#define __HAVE_ARCH_STRLEN
 
 extern char * strcpy(char *,const char *);
 extern char * strncpy(char *,const char *, __kernel_size_t);
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
index 4b41970e9ed8..238f61e2024f 100644
--- a/arch/powerpc/lib/string.S
+++ b/arch/powerpc/lib/string.S
@@ -67,3 +67,60 @@ _GLOBAL(memchr)
 2:	li	r3,0
 	blr
 EXPORT_SYMBOL(memchr)
+
+_GLOBAL(strlen)
+	andi.   r9, r3, (SZL - 1)
+	addi	r10, r3, -SZL
+	bne-	1f
+2:	lis	r6, 0x8080
+	ori	r6, r6, 0x8080		/* r6 = 0x80808080 (himagic) */
+#ifdef CONFIG_PPC64
+	rldimi	r6, r6, 32, 0		/* r6 = 0x8080808080808080 (himagic) */
+#endif
+	PPC_ROTLI  r7, r6, 1 		/* r7 = 0x01010101(01010101) (lomagic)*/
+	.balign IFETCH_ALIGN_BYTES
+3:	PPC_LLU	r9, SZL(r10)
+	/* ((x - lomagic) & ~x & himagic) == 0 means no byte in x is NUL */
+	subf	r8, r7, r9
+	and.	r8, r8, r6
+	beq+	3b
+	andc.	r8, r8, r9
+	beq+	3b
+#ifdef CONFIG_PPC64
+	rldicl.	r8, r9, 8, 56
+	beq	20f
+	rldicl.	r8, r9, 16, 56
+	beq	21f
+	rldicl.	r8, r9, 24, 56
+	beq	22f
+	rldicl.	r8, r9, 32, 56
+	beq	23f
+	addi	r10, r10, 4
+#endif
+	rlwinm.	r8, r9, 0, 0xff000000
+	beq	20f
+	rlwinm.	r8, r9, 0, 0x00ff0000
+	beq	21f
+	rlwinm.	r8, r9, 0, 0x0000ff00
+	beq	22f
+23:	subf	r3, r3, r10
+	addi	r3, r3, 3
+	blr
+22:	subf	r3, r3, r10
+	addi	r3, r3, 2
+	blr
+21:	subf	r3, r3, r10
+	addi	r3, r3, 1
+	blr
+19:	addi	r10, r10, (SZL - 1)
+20:	subf	r3, r3, r10
+	blr
+
+1:	lbz	r9, SZL(r10)
+	addi	r10, r10, 1
+	cmpwi	cr1, r9, 0
+	andi.	r9, r10, (SZL - 1)
+	beq	cr1, 19b
+	bne	1b
+	b	2b
+EXPORT_SYMBOL(strlen)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 4/4] selftests/powerpc: update strlen() test to test the new assembly function
  2018-06-08 10:20 [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
  2018-06-08 10:20 ` [PATCH v4 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
  2018-06-08 10:20 ` [PATCH v4 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
@ 2018-06-08 10:20 ` Christophe Leroy
  2018-06-08 10:26 ` [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
  3 siblings, 0 replies; 7+ messages in thread
From: Christophe Leroy @ 2018-06-08 10:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, wei.guo.simon
  Cc: linux-kernel, linuxppc-dev

This patch modifies the test for testing the new assembly strlen() instead
of the generic strlen()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v4: new

 tools/testing/selftests/powerpc/stringloops/Makefile |  3 +--
 .../selftests/powerpc/stringloops/asm/cache.h        |  1 +
 .../selftests/powerpc/stringloops/asm/ppc_asm.h      | 20 ++++++++++++++++++++
 tools/testing/selftests/powerpc/stringloops/string.S |  1 +
 4 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/stringloops/asm/cache.h
 create mode 120000 tools/testing/selftests/powerpc/stringloops/string.S

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index df663ee9ddb3..0c088a6d0369 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -10,8 +10,7 @@ $(OUTPUT)/memcmp_64: CFLAGS += -m64
 $(OUTPUT)/memcmp_32: memcmp.c
 $(OUTPUT)/memcmp_32: CFLAGS += -m32
 
-$(OUTPUT)/strlen: strlen.c string.o
-$(OUTPUT)/string.o: string.c
+$(OUTPUT)/strlen: strlen.c string.S
 
 ASFLAGS = $(CFLAGS)
 
diff --git a/tools/testing/selftests/powerpc/stringloops/asm/cache.h b/tools/testing/selftests/powerpc/stringloops/asm/cache.h
new file mode 100644
index 000000000000..8a2840831122
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/asm/cache.h
@@ -0,0 +1 @@
+#define	IFETCH_ALIGN_BYTES 4
diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
index 136242ec4b0e..9f8e34ffc131 100644
--- a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
+++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
@@ -1,4 +1,12 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(CONFIG_PPC64) && !defined(CONFIG_PPC32)
+#ifdef __powerpc64__
+#define CONFIG_PPC64
+#else
+#define CONFIG_PPC32
+#endif
+#endif
+
 #include <ppc-asm.h>
 
 #ifndef r1
@@ -6,3 +14,15 @@
 #endif
 
 #define _GLOBAL(A) FUNC_START(test_ ## A)
+
+#ifdef __powerpc64__
+#define SZL		8
+#define PPC_LLU		ldu
+#define PPC_LCMPI	cmpldi
+#define PPC_ROTLI	rotldi
+#else
+#define SZL		4
+#define PPC_LLU		lwzu
+#define PPC_LCMPI	cmplwi
+#define PPC_ROTLI	rotlwi
+#endif
diff --git a/tools/testing/selftests/powerpc/stringloops/string.S b/tools/testing/selftests/powerpc/stringloops/string.S
new file mode 120000
index 000000000000..9f5babec7d21
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/string.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/string.S
\ No newline at end of file
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp
  2018-06-08 10:20 [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
                   ` (2 preceding siblings ...)
  2018-06-08 10:20 ` [PATCH v4 4/4] selftests/powerpc: update strlen() test to test the new assembly function Christophe Leroy
@ 2018-06-08 10:26 ` Christophe Leroy
  3 siblings, 0 replies; 7+ messages in thread
From: Christophe Leroy @ 2018-06-08 10:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, wei.guo.simon
  Cc: linux-kernel, linuxppc-dev

This patch renames memcmp test to memcmp_64 and adds
a memcmp_32 test for testing the 32 bits version of memcmp()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 RESENDING without memcmp_32 BIN

 v4: new

 tools/testing/selftests/powerpc/stringloops/Makefile    |  14 +++++++++++---
 tools/testing/selftests/powerpc/stringloops/memcmp_32.S |   1 +
 2 files changed, 12 insertions(+), 3 deletions(-)
 create mode 120000 tools/testing/selftests/powerpc/stringloops/memcmp_32.S

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index 1125e489055e..1e7301d4bac9 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -1,10 +1,18 @@
 # SPDX-License-Identifier: GPL-2.0
 # The loops are all 64-bit code
-CFLAGS += -m64
 CFLAGS += -I$(CURDIR)
 
-TEST_GEN_PROGS := memcmp
-EXTRA_SOURCES := memcmp_64.S ../harness.c
+EXTRA_SOURCES := ../harness.c
+
+$(OUTPUT)/memcmp_64: memcmp.c
+$(OUTPUT)/memcmp_64: CFLAGS += -m64
+
+$(OUTPUT)/memcmp_32: memcmp.c
+$(OUTPUT)/memcmp_32: CFLAGS += -m32
+
+ASFLAGS = $(CFLAGS)
+
+TEST_GEN_PROGS := memcmp_32 memcmp_64
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp_32.S b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
new file mode 120000
index 000000000000..056f2b3af789
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/memcmp_32.S
\ No newline at end of file
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 3/4] powerpc/lib: implement strlen() in assembly
  2018-06-08 10:20 ` [PATCH v4 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
@ 2018-06-08 11:45   ` Gabriel Paubert
  2018-06-08 12:05     ` Segher Boessenkool
  0 siblings, 1 reply; 7+ messages in thread
From: Gabriel Paubert @ 2018-06-08 11:45 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, linuxppc-dev, linux-kernel

On Fri, Jun 08, 2018 at 10:20:41AM +0000, Christophe Leroy wrote:
> The generic implementation of strlen() reads strings byte per byte.
> 
> This patch implements strlen() in assembly based on a read of entire
> words, in the same spirit as what some other arches and glibc do.
> 
> On a 8xx the time spent in strlen is reduced by 2/3 for long strings.
> 
> strlen() selftest on an 8xx provides the following values:
> 
> Before the patch (ie with the generic strlen() in lib/string.c):
> 
> len 256 : time = 0.803648
> len 16  : time = 0.062989
> len 4   : time = 0.026269
> 
> After the patch:
> 
> len 256 : time = 0.267791  ==>  66% improvment
> len 16  : time = 0.037902  ==>  41% improvment
> len 4   : time = 0.026124  ==>  no degradation
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> Not tested on PPC64.
> 
> Changes in v4:
>  - Added alignment of the loop
>  - doing the andc only if still not 0 as it happends only for bytes above 0x7f which is pretty rare in a string
> 
> Changes in v3:
>  - Made it common to PPC32 and PPC64
> 
> Changes in v2:
>  - Moved handling of unaligned strings outside of the main path as it is very unlikely.
>  - Removed the verification of the fourth byte in case none of the three first ones are NUL.
> 
> 
>  arch/powerpc/include/asm/asm-compat.h |  4 +++
>  arch/powerpc/include/asm/string.h     |  1 +
>  arch/powerpc/lib/string.S             | 57 +++++++++++++++++++++++++++++++++++
>  3 files changed, 62 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h
> index 7f2a7702596c..0e99fe7570c0 100644
> --- a/arch/powerpc/include/asm/asm-compat.h
> +++ b/arch/powerpc/include/asm/asm-compat.h
> @@ -20,8 +20,10 @@
>  
>  /* operations for longs and pointers */
>  #define PPC_LL		stringify_in_c(ld)
> +#define PPC_LLU		stringify_in_c(ldu)
>  #define PPC_STL		stringify_in_c(std)
>  #define PPC_STLU	stringify_in_c(stdu)
> +#define PPC_ROTLI	stringify_in_c(rotldi)
>  #define PPC_LCMPI	stringify_in_c(cmpdi)
>  #define PPC_LCMPLI	stringify_in_c(cmpldi)
>  #define PPC_LCMP	stringify_in_c(cmpd)
> @@ -53,8 +55,10 @@
>  
>  /* operations for longs and pointers */
>  #define PPC_LL		stringify_in_c(lwz)
> +#define PPC_LLU		stringify_in_c(lwzu)
>  #define PPC_STL		stringify_in_c(stw)
>  #define PPC_STLU	stringify_in_c(stwu)
> +#define PPC_ROTLI	stringify_in_c(rotlwi)
>  #define PPC_LCMPI	stringify_in_c(cmpwi)
>  #define PPC_LCMPLI	stringify_in_c(cmplwi)
>  #define PPC_LCMP	stringify_in_c(cmpw)
> diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
> index 9b8cedf618f4..8fdcb532de72 100644
> --- a/arch/powerpc/include/asm/string.h
> +++ b/arch/powerpc/include/asm/string.h
> @@ -13,6 +13,7 @@
>  #define __HAVE_ARCH_MEMCHR
>  #define __HAVE_ARCH_MEMSET16
>  #define __HAVE_ARCH_MEMCPY_FLUSHCACHE
> +#define __HAVE_ARCH_STRLEN
>  
>  extern char * strcpy(char *,const char *);
>  extern char * strncpy(char *,const char *, __kernel_size_t);
> diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
> index 4b41970e9ed8..238f61e2024f 100644
> --- a/arch/powerpc/lib/string.S
> +++ b/arch/powerpc/lib/string.S
> @@ -67,3 +67,60 @@ _GLOBAL(memchr)
>  2:	li	r3,0
>  	blr
>  EXPORT_SYMBOL(memchr)
> +
> +_GLOBAL(strlen)
> +	andi.   r9, r3, (SZL - 1)
> +	addi	r10, r3, -SZL
> +	bne-	1f
> +2:	lis	r6, 0x8080
> +	ori	r6, r6, 0x8080		/* r6 = 0x80808080 (himagic) */
> +#ifdef CONFIG_PPC64
> +	rldimi	r6, r6, 32, 0		/* r6 = 0x8080808080808080 (himagic) */
> +#endif
> +	PPC_ROTLI  r7, r6, 1 		/* r7 = 0x01010101(01010101) (lomagic)*/
> +	.balign IFETCH_ALIGN_BYTES
> +3:	PPC_LLU	r9, SZL(r10)
> +	/* ((x - lomagic) & ~x & himagic) == 0 means no byte in x is NUL */
> +	subf	r8, r7, r9
> +	and.	r8, r8, r6
> +	beq+	3b
> +	andc.	r8, r8, r9
> +	beq+	3b
> +#ifdef CONFIG_PPC64
> +	rldicl.	r8, r9, 8, 56
> +	beq	20f
> +	rldicl.	r8, r9, 16, 56
> +	beq	21f
> +	rldicl.	r8, r9, 24, 56
> +	beq	22f
> +	rldicl.	r8, r9, 32, 56
> +	beq	23f
> +	addi	r10, r10, 4
> +#endif
> +	rlwinm.	r8, r9, 0, 0xff000000
> +	beq	20f
> +	rlwinm.	r8, r9, 0, 0x00ff0000
> +	beq	21f
> +	rlwinm.	r8, r9, 0, 0x0000ff00
> +	beq	22f
> +23:	subf	r3, r3, r10

Actually these rlwinm. can likely be replaced by a single
cntlzw /cntlzd; for 32 bit something like:

	cntlzw  r8,r9
	subf    r3,r3,r10	
	srwi	r8,r8,3
	add	r3,r3,r8
	blr

and similar for 64 bit but with cntlzd.

	Gabriel


> +	addi	r3, r3, 3
> +	blr
> +22:	subf	r3, r3, r10
> +	addi	r3, r3, 2
> +	blr
> +21:	subf	r3, r3, r10
> +	addi	r3, r3, 1
> +	blr
> +19:	addi	r10, r10, (SZL - 1)
> +20:	subf	r3, r3, r10
> +	blr
> +
> +1:	lbz	r9, SZL(r10)
> +	addi	r10, r10, 1
> +	cmpwi	cr1, r9, 0
> +	andi.	r9, r10, (SZL - 1)
> +	beq	cr1, 19b
> +	bne	1b
> +	b	2b
> +EXPORT_SYMBOL(strlen)
> -- 
> 2.13.3
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 3/4] powerpc/lib: implement strlen() in assembly
  2018-06-08 11:45   ` Gabriel Paubert
@ 2018-06-08 12:05     ` Segher Boessenkool
  0 siblings, 0 replies; 7+ messages in thread
From: Segher Boessenkool @ 2018-06-08 12:05 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Christophe Leroy, wei.guo.simon, linux-kernel, Paul Mackerras,
	linuxppc-dev

On Fri, Jun 08, 2018 at 01:45:13PM +0200, Gabriel Paubert wrote:
> On Fri, Jun 08, 2018 at 10:20:41AM +0000, Christophe Leroy wrote:
> > +	rlwinm.	r8, r9, 0, 0xff000000
> > +	beq	20f
> > +	rlwinm.	r8, r9, 0, 0x00ff0000
> > +	beq	21f
> > +	rlwinm.	r8, r9, 0, 0x0000ff00
> > +	beq	22f
> > +23:	subf	r3, r3, r10
> 
> Actually these rlwinm. can likely be replaced by a single
> cntlzw /cntlzd; for 32 bit something like:
> 
> 	cntlzw  r8,r9
> 	subf    r3,r3,r10	
> 	srwi	r8,r8,3
> 	add	r3,r3,r8
> 	blr

The code is finding the first zero byte in the word, not how many leading
zero bytes there are.


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-06-08 12:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-08 10:20 [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
2018-06-08 10:20 ` [PATCH v4 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
2018-06-08 10:20 ` [PATCH v4 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
2018-06-08 11:45   ` Gabriel Paubert
2018-06-08 12:05     ` Segher Boessenkool
2018-06-08 10:20 ` [PATCH v4 4/4] selftests/powerpc: update strlen() test to test the new assembly function Christophe Leroy
2018-06-08 10:26 ` [PATCH v4 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).