* [PATCH v6 1/4] selftests/powerpc: add test for 32 bits memcmp
@ 2018-06-12 9:14 Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Christophe Leroy @ 2018-06-12 9:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
wei.guo.simon, segher
Cc: linux-kernel, linuxppc-dev
This patch renames memcmp test to memcmp_64 and adds
a memcmp_32 test for testing the 32 bits version of memcmp()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v6: no change
v5: no change
v4: new
tools/testing/selftests/powerpc/stringloops/Makefile | 14 +++++++++++---
tools/testing/selftests/powerpc/stringloops/memcmp_32.S | 1 +
2 files changed, 12 insertions(+), 3 deletions(-)
create mode 120000 tools/testing/selftests/powerpc/stringloops/memcmp_32.S
diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index 1125e489055e..1e7301d4bac9 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -1,10 +1,18 @@
# SPDX-License-Identifier: GPL-2.0
# The loops are all 64-bit code
-CFLAGS += -m64
CFLAGS += -I$(CURDIR)
-TEST_GEN_PROGS := memcmp
-EXTRA_SOURCES := memcmp_64.S ../harness.c
+EXTRA_SOURCES := ../harness.c
+
+$(OUTPUT)/memcmp_64: memcmp.c
+$(OUTPUT)/memcmp_64: CFLAGS += -m64
+
+$(OUTPUT)/memcmp_32: memcmp.c
+$(OUTPUT)/memcmp_32: CFLAGS += -m32
+
+ASFLAGS = $(CFLAGS)
+
+TEST_GEN_PROGS := memcmp_32 memcmp_64
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp_32.S b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
new file mode 120000
index 000000000000..056f2b3af789
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/memcmp_32.S
\ No newline at end of file
--
2.13.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v6 2/4] selftests/powerpc: Add test for strlen()
2018-06-12 9:14 [PATCH v6 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
@ 2018-06-12 9:14 ` Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 4/4] selftests/powerpc: update strlen() test to test the new assembly function Christophe Leroy
2 siblings, 0 replies; 8+ messages in thread
From: Christophe Leroy @ 2018-06-12 9:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
wei.guo.simon, segher
Cc: linux-kernel, linuxppc-dev
This patch adds a test for strlen()
string.c contains a copy of strlen() from lib/string.c
The test first tests the correctness of strlen() by comparing
the result with libc strlen(). It tests all cases of alignment.
It them tests the duration of an aligned strlen() on a 4 bytes string,
on a 16 bytes string and on a 256 bytes string.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v6: refactorised the benchmark test
v5: no change
v4: new
.../testing/selftests/powerpc/stringloops/Makefile | 5 +-
.../testing/selftests/powerpc/stringloops/string.c | 36 ++++++
.../testing/selftests/powerpc/stringloops/strlen.c | 127 +++++++++++++++++++++
3 files changed, 167 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/stringloops/string.c
create mode 100644 tools/testing/selftests/powerpc/stringloops/strlen.c
diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index 1e7301d4bac9..df663ee9ddb3 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -10,9 +10,12 @@ $(OUTPUT)/memcmp_64: CFLAGS += -m64
$(OUTPUT)/memcmp_32: memcmp.c
$(OUTPUT)/memcmp_32: CFLAGS += -m32
+$(OUTPUT)/strlen: strlen.c string.o
+$(OUTPUT)/string.o: string.c
+
ASFLAGS = $(CFLAGS)
-TEST_GEN_PROGS := memcmp_32 memcmp_64
+TEST_GEN_PROGS := memcmp_32 memcmp_64 strlen
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/stringloops/string.c b/tools/testing/selftests/powerpc/stringloops/string.c
new file mode 100644
index 000000000000..d05200481017
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/string.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * linux/lib/string.c
+ *
+ * Copyright (C) 1991, 1992 Linus Torvalds
+ */
+
+/*
+ * stupid library routines.. The optimized versions should generally be found
+ * as inline code in <asm-xx/string.h>
+ *
+ * These are buggy as well..
+ *
+ * * Fri Jun 25 1999, Ingo Oeser <ioe@informatik.tu-chemnitz.de>
+ * - Added strsep() which will replace strtok() soon (because strsep() is
+ * reentrant and should be faster). Use only strsep() in new code, please.
+ *
+ * * Sat Feb 09 2002, Jason Thomas <jason@topic.com.au>,
+ * Matthew Hawkins <matt@mh.dropbear.id.au>
+ * - Kissed strtok() goodbye
+ */
+
+#include <stddef.h>
+
+/**
+ * strlen - Find the length of a string
+ * @s: The string to be sized
+ */
+size_t test_strlen(const char *s)
+{
+ const char *sc;
+
+ for (sc = s; *sc != '\0'; ++sc)
+ /* nothing */;
+ return sc - s;
+}
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
new file mode 100644
index 000000000000..9055ebc484d0
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <malloc.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include "utils.h"
+
+#define SIZE 256
+#define ITERATIONS 1000
+#define ITERATIONS_BENCH 100000
+
+int test_strlen(const void *s);
+
+/* test all offsets and lengths */
+static void test_one(char *s)
+{
+ unsigned long offset;
+
+ for (offset = 0; offset < SIZE; offset++) {
+ int x, y;
+ unsigned long i;
+
+ y = strlen(s + offset);
+ x = test_strlen(s + offset);
+
+ if (x != y) {
+ printf("strlen() returned %d, should have returned %d (%p offset %ld)\n", x, y, s, offset);
+
+ for (i = offset; i < SIZE; i++)
+ printf("%02x ", s[i]);
+ printf("\n");
+ }
+ }
+}
+
+static void bench_test(char *s)
+{
+ struct timespec ts_start, ts_end;
+ int i;
+
+ clock_gettime(CLOCK_MONOTONIC, &ts_start);
+
+ for (i = 0; i < ITERATIONS_BENCH; i++)
+ test_strlen(s);
+
+ clock_gettime(CLOCK_MONOTONIC, &ts_end);
+
+ printf("len %3.3d : time = %.6f\n", test_strlen(s), ts_end.tv_sec - ts_start.tv_sec + (ts_end.tv_nsec - ts_start.tv_nsec) / 1e9);
+}
+
+static int testcase(void)
+{
+ char *s;
+ unsigned long i;
+
+ s = memalign(128, SIZE);
+ if (!s) {
+ perror("memalign");
+ exit(1);
+ }
+
+ srandom(1);
+
+ memset(s, 0, SIZE);
+ for (i = 0; i < SIZE; i++) {
+ char c;
+
+ do {
+ c = random() & 0x7f;
+ } while (!c);
+ s[i] = c;
+ test_one(s);
+ }
+
+ for (i = 0; i < ITERATIONS; i++) {
+ unsigned long j;
+
+ for (j = 0; j < SIZE; j++) {
+ char c;
+
+ do {
+ c = random() & 0x7f;
+ } while (!c);
+ s[j] = c;
+ }
+ for (j = 0; j < sizeof(long); j++) {
+ s[SIZE - 1 - j] = 0;
+ test_one(s);
+ }
+ }
+
+ for (i = 0; i < SIZE; i++) {
+ char c;
+
+ do {
+ c = random() & 0x7f;
+ } while (!c);
+ s[i] = c;
+ }
+
+ bench_test(s);
+
+ s[16] = 0;
+ bench_test(s);
+
+ s[8] = 0;
+ bench_test(s);
+
+ s[4] = 0;
+ bench_test(s);
+
+ s[3] = 0;
+ bench_test(s);
+
+ s[2] = 0;
+ bench_test(s);
+
+ s[1] = 0;
+ bench_test(s);
+
+ return 0;
+}
+
+int main(void)
+{
+ return test_harness(testcase, "strlen");
+}
--
2.13.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
2018-06-12 9:14 [PATCH v6 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
@ 2018-06-12 9:14 ` Christophe Leroy
2018-06-12 14:53 ` Segher Boessenkool
2018-06-12 9:14 ` [PATCH v6 4/4] selftests/powerpc: update strlen() test to test the new assembly function Christophe Leroy
2 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2018-06-12 9:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
wei.guo.simon, segher
Cc: linux-kernel, linuxppc-dev
The generic implementation of strlen() reads strings byte per byte.
This patch implements strlen() in assembly based on a read of entire
words, in the same spirit as what some other arches and glibc do.
On a 8xx the time spent in strlen is reduced by 3/4 for long strings.
strlen() selftest on an 8xx provides the following values:
Before the patch (ie with the generic strlen() in lib/string.c):
len 256 : time = 1.195055
len 016 : time = 0.083745
len 008 : time = 0.046828
len 004 : time = 0.028390
After the patch:
len 256 : time = 0.272185 ==> 78% improvment
len 016 : time = 0.040632 ==> 51% improvment
len 008 : time = 0.033060 ==> 29% improvment
len 004 : time = 0.029149 ==> 2% degradation
On a 832x:
Before the patch:
len 256 : time = 0.236125
len 016 : time = 0.018136
len 008 : time = 0.011000
len 004 : time = 0.007229
After the patch:
len 256 : time = 0.094950 ==> 60% improvment
len 016 : time = 0.013357 ==> 26% improvment
len 008 : time = 0.010586 ==> 4% improvment
len 004 : time = 0.008784
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
Not tested on PPC64.
Changes in v6:
- Reworked for having branchless conclusion
Changes in v5:
- Fixed for PPC64 LITTLE ENDIAN
Changes in v4:
- Added alignment of the loop
- doing the andc only if still not 0 as it happends only for bytes above 0x7f which is pretty rare in a string
Changes in v3:
- Made it common to PPC32 and PPC64
Changes in v2:
- Moved handling of unaligned strings outside of the main path as it is very unlikely.
- Removed the verification of the fourth byte in case none of the three first ones are NUL.
arch/powerpc/include/asm/asm-compat.h | 6 +++
arch/powerpc/include/asm/string.h | 1 +
arch/powerpc/lib/string.S | 81 +++++++++++++++++++++++++++++++++++
3 files changed, 88 insertions(+)
diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h
index 7f2a7702596c..fe2b459c8486 100644
--- a/arch/powerpc/include/asm/asm-compat.h
+++ b/arch/powerpc/include/asm/asm-compat.h
@@ -20,8 +20,11 @@
/* operations for longs and pointers */
#define PPC_LL stringify_in_c(ld)
+#define PPC_LLU stringify_in_c(ldu)
#define PPC_STL stringify_in_c(std)
#define PPC_STLU stringify_in_c(stdu)
+#define PPC_ROTLI stringify_in_c(rotldi)
+#define PPC_SRLI stringify_in_c(srdi)
#define PPC_LCMPI stringify_in_c(cmpdi)
#define PPC_LCMPLI stringify_in_c(cmpldi)
#define PPC_LCMP stringify_in_c(cmpd)
@@ -53,8 +56,11 @@
/* operations for longs and pointers */
#define PPC_LL stringify_in_c(lwz)
+#define PPC_LLU stringify_in_c(lwzu)
#define PPC_STL stringify_in_c(stw)
#define PPC_STLU stringify_in_c(stwu)
+#define PPC_ROTLI stringify_in_c(rotlwi)
+#define PPC_SRLI stringify_in_c(srwi)
#define PPC_LCMPI stringify_in_c(cmpwi)
#define PPC_LCMPLI stringify_in_c(cmplwi)
#define PPC_LCMP stringify_in_c(cmpw)
diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
index 9b8cedf618f4..8fdcb532de72 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -13,6 +13,7 @@
#define __HAVE_ARCH_MEMCHR
#define __HAVE_ARCH_MEMSET16
#define __HAVE_ARCH_MEMCPY_FLUSHCACHE
+#define __HAVE_ARCH_STRLEN
extern char * strcpy(char *,const char *);
extern char * strncpy(char *,const char *, __kernel_size_t);
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
index 4b41970e9ed8..1d0593cba9d4 100644
--- a/arch/powerpc/lib/string.S
+++ b/arch/powerpc/lib/string.S
@@ -67,3 +67,84 @@ _GLOBAL(memchr)
2: li r3,0
blr
EXPORT_SYMBOL(memchr)
+
+/*
+ * Algorigthm:
+ *
+ * 1) Given a word 'x', we can test to see if it contains any 0 bytes
+ * by subtracting 0x01010101, and seeing if any of the high bits of each
+ * byte changed from 0 to 1. This works because the least significant
+ * 0 byte must have had no incoming carry (otherwise it's not the least
+ * significant), so it is 0x00 - 0x01 == 0xff. For all other
+ * byte values, either they have the high bit set initially, or when
+ * 1 is subtracted you get a value in the range 0x00-0x7f, none of which
+ * have their high bit set. The expression here is
+ * (x - 0x01010101) & ~x & 0x80808080), which gives 0x00000000 when
+ * there were no 0x00 bytes in the word. You get 0x80 in bytes that
+ * match, but possibly false 0x80 matches in the next more significant
+ * byte to a true match due to carries. For little-endian this is
+ * of no consequence since the least significant match is the one
+ * we're interested in, but big-endian needs method 2 to find which
+ * byte matches.
+ * 2) Given a word 'x', we can test to see _which_ byte was zero by
+ * calculating ~(((x & ~0x80808080) - 0x80808080 - 1) | x | ~0x80808080).
+ * This produces 0x80 in each byte that was zero, and 0x00 in all
+ * the other bytes. The '| ~0x80808080' clears the low 7 bits in each
+ * byte, and the '| x' part ensures that bytes with the high bit set
+ * produce 0x00. The addition will carry into the high bit of each byte
+ * iff that byte had one of its low 7 bits set. We can then just see
+ * which was the most significant bit set and divide by 8 to find how
+ * many to add to the index.
+ * This is from the book 'The PowerPC Compiler Writer's Guide',
+ * by Steve Hoxey, Faraydon Karim, Bill Hay and Hank Warren.
+ */
+
+_GLOBAL(strlen)
+ andi. r9, r3, (SZL - 1)
+ lis r7, 0x0101
+ addi r10, r3, -SZL
+ addic r7, r7, 0x0101 /* r7 = 0x01010101 (lomagic) & clr CA */
+#ifdef CONFIG_PPC64
+ rldimi r7, r7, 32, 0 /* r7 = 0x0101010101010101 (lomagic) */
+#endif
+ bne- 1f
+2: PPC_ROTLI r6, r7, 31 /* r6 = 0x80808080(80808080) (himagic)*/
+ .balign IFETCH_ALIGN_BYTES
+3: PPC_LLU r9, SZL(r10)
+ /* ((x - lomagic) & ~x & himagic) == 0 means no byte in x is NUL */
+ subf r8, r7, r9
+ and. r8, r8, r6
+ beq+ 3b
+ andc. r8, r8, r9
+ beq+ 3b
+#ifdef CONFIG_CPU_BIG_ENDIAN
+ andc r8, r9, r6
+ orc r9, r9, r6
+ subfe r8, r6, r8
+ nor r8, r8, r9
+ PPC_CNTLZL r8, r8
+ subf r3, r3, r10
+ PPC_SRLI r8, r8, 3
+ add r3, r3, r8
+#else
+ addi r9, r8, -1
+ addi r10, r10, (SZL - 1)
+ andc r8, r9, r8
+ PPC_CNTLZL r8, r8
+ subf r3, r3, r10
+ PPC_SRLI r8, r8, 3
+ subf r3, r8, r3
+#endif
+ blr
+
+1: lbz r9, SZL(r10)
+ addi r10, r10, 1
+ cmpwi cr1, r9, 0
+ andi. r9, r10, (SZL - 1)
+ beq cr1, 4f
+ bne 1b
+ b 2b
+4: addi r10, r10, (SZL - 1)
+ subf r3, r3, r10
+ blr
+EXPORT_SYMBOL(strlen)
--
2.13.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v6 4/4] selftests/powerpc: update strlen() test to test the new assembly function
2018-06-12 9:14 [PATCH v6 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
@ 2018-06-12 9:14 ` Christophe Leroy
2 siblings, 0 replies; 8+ messages in thread
From: Christophe Leroy @ 2018-06-12 9:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
wei.guo.simon, segher
Cc: linux-kernel, linuxppc-dev
This patch modifies the test for testing the new assembly strlen() instead
of the generic strlen()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v6: added additional necessary defines in ppc_asm.h
v5: no change
v4: new
.../testing/selftests/powerpc/stringloops/Makefile | 3 +--
.../selftests/powerpc/stringloops/asm/cache.h | 1 +
.../selftests/powerpc/stringloops/asm/ppc_asm.h | 30 ++++++++++++++++++++++
.../testing/selftests/powerpc/stringloops/string.S | 1 +
4 files changed, 33 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/stringloops/asm/cache.h
create mode 120000 tools/testing/selftests/powerpc/stringloops/string.S
diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index df663ee9ddb3..0c088a6d0369 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -10,8 +10,7 @@ $(OUTPUT)/memcmp_64: CFLAGS += -m64
$(OUTPUT)/memcmp_32: memcmp.c
$(OUTPUT)/memcmp_32: CFLAGS += -m32
-$(OUTPUT)/strlen: strlen.c string.o
-$(OUTPUT)/string.o: string.c
+$(OUTPUT)/strlen: strlen.c string.S
ASFLAGS = $(CFLAGS)
diff --git a/tools/testing/selftests/powerpc/stringloops/asm/cache.h b/tools/testing/selftests/powerpc/stringloops/asm/cache.h
new file mode 100644
index 000000000000..8a2840831122
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/asm/cache.h
@@ -0,0 +1 @@
+#define IFETCH_ALIGN_BYTES 4
diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
index 136242ec4b0e..5226bd8bc39f 100644
--- a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
+++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
@@ -1,4 +1,18 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(CONFIG_PPC64) && !defined(CONFIG_PPC32)
+#ifdef __powerpc64__
+#define CONFIG_PPC64
+#else
+#define CONFIG_PPC32
+#endif
+#endif
+
+#ifdef __LITTLE_ENDIAN__
+#define CONFIG_CPU_LITTLE_ENDIAN
+#else
+#define CONFIG_CPU_BIG_ENDIAN
+#endif
+
#include <ppc-asm.h>
#ifndef r1
@@ -6,3 +20,19 @@
#endif
#define _GLOBAL(A) FUNC_START(test_ ## A)
+
+#ifdef __powerpc64__
+#define SZL 8
+#define PPC_LLU ldu
+#define PPC_LCMPI cmpldi
+#define PPC_ROTLI rotldi
+#define PPC_CNTLZL cntlzd
+#define PPC_SRLI srdi
+#else
+#define SZL 4
+#define PPC_LLU lwzu
+#define PPC_LCMPI cmplwi
+#define PPC_ROTLI rotlwi
+#define PPC_CNTLZL cntlzw
+#define PPC_SRLI srwi
+#endif
diff --git a/tools/testing/selftests/powerpc/stringloops/string.S b/tools/testing/selftests/powerpc/stringloops/string.S
new file mode 120000
index 000000000000..9f5babec7d21
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/string.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/string.S
\ No newline at end of file
--
2.13.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
2018-06-12 9:14 ` [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
@ 2018-06-12 14:53 ` Segher Boessenkool
2018-06-12 17:01 ` Christophe LEROY
0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2018-06-12 14:53 UTC (permalink / raw)
To: Christophe Leroy
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
wei.guo.simon, linux-kernel, linuxppc-dev
On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
> ---
> Not tested on PPC64.
It won't be acceptable until that happens. It also is likely quite bad
performance on all 64-bit CPUs from the last fifteen years or so. Or you
did nothing to prove otherwise, at least.
> + * Algorigthm:
Typo.
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
2018-06-12 14:53 ` Segher Boessenkool
@ 2018-06-12 17:01 ` Christophe LEROY
2018-06-13 3:58 ` Michael Ellerman
2018-06-14 21:51 ` Segher Boessenkool
0 siblings, 2 replies; 8+ messages in thread
From: Christophe LEROY @ 2018-06-12 17:01 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
wei.guo.simon, linux-kernel, linuxppc-dev
Le 12/06/2018 à 16:53, Segher Boessenkool a écrit :
> On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
>> ---
>> Not tested on PPC64.
>
> It won't be acceptable until that happens. It also is likely quite bad
> performance on all 64-bit CPUs from the last fifteen years or so. Or you
> did nothing to prove otherwise, at least.
Will it be as bad as the generic implementation which does it byte per
byte ?
I don't have any 64 bits target, can someone test it using the test app
I have added in selftests ?
Or should I just leave it as is for 64 bits and just do the
implementation for 32 bits until someone wants to try and do it for PPC64 ?
Christophe
>
>> + * Algorigthm:
>
> Typo.
>
>
> Segher
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
2018-06-12 17:01 ` Christophe LEROY
@ 2018-06-13 3:58 ` Michael Ellerman
2018-06-14 21:51 ` Segher Boessenkool
1 sibling, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-06-13 3:58 UTC (permalink / raw)
To: Christophe LEROY, Segher Boessenkool
Cc: Benjamin Herrenschmidt, Paul Mackerras, wei.guo.simon,
linux-kernel, linuxppc-dev
Christophe LEROY <christophe.leroy@c-s.fr> writes:
> Le 12/06/2018 à 16:53, Segher Boessenkool a écrit :
>> On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
>>> ---
>>> Not tested on PPC64.
>>
>> It won't be acceptable until that happens. It also is likely quite bad
>> performance on all 64-bit CPUs from the last fifteen years or so. Or you
>> did nothing to prove otherwise, at least.
>
> Will it be as bad as the generic implementation which does it byte per
> byte ?
>
> I don't have any 64 bits target, can someone test it using the test app
> I have added in selftests ?
I /can/ but I won't have time this week.
> Or should I just leave it as is for 64 bits and just do the
> implementation for 32 bits until someone wants to try and do it for PPC64 ?
That's probably best yeah.
cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
2018-06-12 17:01 ` Christophe LEROY
2018-06-13 3:58 ` Michael Ellerman
@ 2018-06-14 21:51 ` Segher Boessenkool
1 sibling, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2018-06-14 21:51 UTC (permalink / raw)
To: Christophe LEROY
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
wei.guo.simon, linux-kernel, linuxppc-dev
On Tue, Jun 12, 2018 at 07:01:59PM +0200, Christophe LEROY wrote:
>
>
> Le 12/06/2018 à 16:53, Segher Boessenkool a écrit :
> >On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
> >>---
> >>Not tested on PPC64.
> >
> >It won't be acceptable until that happens. It also is likely quite bad
> >performance on all 64-bit CPUs from the last fifteen years or so. Or you
> >did nothing to prove otherwise, at least.
>
> Will it be as bad as the generic implementation which does it byte per
> byte ?
Probably not. But how is it for short inputs, etc.?
The main point is that it needs actual testing _for correctness_.
Btw, GCC 7 and later can expand many memcmp as builtins on PowerPC (just
like memset and memcpy etc.), creating better code, without function call.
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-06-14 21:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-12 9:14 [PATCH v6 1/4] selftests/powerpc: add test for 32 bits memcmp Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 2/4] selftests/powerpc: Add test for strlen() Christophe Leroy
2018-06-12 9:14 ` [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly Christophe Leroy
2018-06-12 14:53 ` Segher Boessenkool
2018-06-12 17:01 ` Christophe LEROY
2018-06-13 3:58 ` Michael Ellerman
2018-06-14 21:51 ` Segher Boessenkool
2018-06-12 9:14 ` [PATCH v6 4/4] selftests/powerpc: update strlen() test to test the new assembly function Christophe Leroy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).