* [PATCH v4 0/3] riscv: optimized mem* functions
@ 2021-09-19 19:21 Matteo Croce
2021-09-19 19:21 ` [PATCH v4 1/3] riscv: optimized memcpy Matteo Croce
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Matteo Croce @ 2021-09-19 19:21 UTC (permalink / raw)
To: linux-riscv
Cc: linux-kernel, linux-arch, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Atish Patra, Emil Renner Berthing, Akira Tsukamoto,
Drew Fustini, Bin Meng, David Laight, Guo Ren, Christoph Hellwig
From: Matteo Croce <mcroce@microsoft.com>
Replace the assembly mem{cpy,move,set} with C equivalent.
Try to access RAM with the largest bit width possible, but without
doing unaligned accesses.
A further improvement could be to use multiple read and writes as the
assembly version was trying to do.
Tested on a BeagleV Starlight with a SiFive U74 core, where the
improvement is noticeable.
v3 -> v4:
- incorporate changes from proposed generic version:
https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/
v2 -> v3:
- alias mem* to __mem* and not viceversa
- use __alias instead of a tail call
v1 -> v2:
- reduce the threshold from 64 to 16 bytes
- fix KASAN build
- optimize memset
Matteo Croce (3):
riscv: optimized memcpy
riscv: optimized memmove
riscv: optimized memset
arch/riscv/include/asm/string.h | 18 ++--
arch/riscv/kernel/Makefile | 1 -
arch/riscv/kernel/riscv_ksyms.c | 17 ----
arch/riscv/lib/Makefile | 4 +-
arch/riscv/lib/memcpy.S | 108 ----------------------
arch/riscv/lib/memmove.S | 64 -------------
arch/riscv/lib/memset.S | 113 -----------------------
arch/riscv/lib/string.c | 154 ++++++++++++++++++++++++++++++++
8 files changed, 164 insertions(+), 315 deletions(-)
delete mode 100644 arch/riscv/kernel/riscv_ksyms.c
delete mode 100644 arch/riscv/lib/memcpy.S
delete mode 100644 arch/riscv/lib/memmove.S
delete mode 100644 arch/riscv/lib/memset.S
create mode 100644 arch/riscv/lib/string.c
--
2.31.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 1/3] riscv: optimized memcpy
2021-09-19 19:21 [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
@ 2021-09-19 19:21 ` Matteo Croce
2021-09-19 19:21 ` [PATCH v4 2/3] riscv: optimized memmove Matteo Croce
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Matteo Croce @ 2021-09-19 19:21 UTC (permalink / raw)
To: linux-riscv
Cc: linux-kernel, linux-arch, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Atish Patra, Emil Renner Berthing, Akira Tsukamoto,
Drew Fustini, Bin Meng, David Laight, Guo Ren, Christoph Hellwig
From: Matteo Croce <mcroce@microsoft.com>
Write a C version of memcpy() which uses the biggest data size allowed,
without generating unaligned accesses.
The procedure is made of three steps:
First copy data one byte at time until the destination buffer is aligned
to a long boundary.
Then copy the data one long at time shifting the current and the next u8
to compose a long at every cycle.
Finally, copy the remainder one byte at time.
On a BeagleV, the TCP RX throughput increased by 45%:
before:
$ iperf3 -c beaglev
Connecting to host beaglev, port 5201
[ 5] local 192.168.85.6 port 44840 connected to 192.168.85.48 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 76.4 MBytes 641 Mbits/sec 27 624 KBytes
[ 5] 1.00-2.00 sec 72.5 MBytes 608 Mbits/sec 0 708 KBytes
[ 5] 2.00-3.00 sec 73.8 MBytes 619 Mbits/sec 10 451 KBytes
[ 5] 3.00-4.00 sec 72.5 MBytes 608 Mbits/sec 0 564 KBytes
[ 5] 4.00-5.00 sec 73.8 MBytes 619 Mbits/sec 0 658 KBytes
[ 5] 5.00-6.00 sec 73.8 MBytes 619 Mbits/sec 14 522 KBytes
[ 5] 6.00-7.00 sec 73.8 MBytes 619 Mbits/sec 0 621 KBytes
[ 5] 7.00-8.00 sec 72.5 MBytes 608 Mbits/sec 0 706 KBytes
[ 5] 8.00-9.00 sec 73.8 MBytes 619 Mbits/sec 20 580 KBytes
[ 5] 9.00-10.00 sec 73.8 MBytes 619 Mbits/sec 0 672 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 736 MBytes 618 Mbits/sec 71 sender
[ 5] 0.00-10.01 sec 733 MBytes 615 Mbits/sec receiver
after:
$ iperf3 -c beaglev
Connecting to host beaglev, port 5201
[ 5] local 192.168.85.6 port 44864 connected to 192.168.85.48 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 109 MBytes 912 Mbits/sec 48 559 KBytes
[ 5] 1.00-2.00 sec 108 MBytes 902 Mbits/sec 0 690 KBytes
[ 5] 2.00-3.00 sec 106 MBytes 891 Mbits/sec 36 396 KBytes
[ 5] 3.00-4.00 sec 108 MBytes 902 Mbits/sec 0 567 KBytes
[ 5] 4.00-5.00 sec 106 MBytes 891 Mbits/sec 0 699 KBytes
[ 5] 5.00-6.00 sec 106 MBytes 891 Mbits/sec 32 414 KBytes
[ 5] 6.00-7.00 sec 106 MBytes 891 Mbits/sec 0 583 KBytes
[ 5] 7.00-8.00 sec 106 MBytes 891 Mbits/sec 0 708 KBytes
[ 5] 8.00-9.00 sec 106 MBytes 891 Mbits/sec 28 433 KBytes
[ 5] 9.00-10.00 sec 108 MBytes 902 Mbits/sec 0 591 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.04 GBytes 897 Mbits/sec 144 sender
[ 5] 0.00-10.01 sec 1.04 GBytes 894 Mbits/sec receiver
And the decreased CPU time of the memcpy() is observable with perf top.
This is the `perf top -Ue task-clock` output when doing the test:
before:
Overhead Shared O Symbol
42.22% [kernel] [k] memcpy
35.00% [kernel] [k] __asm_copy_to_user
3.50% [kernel] [k] sifive_l2_flush64_range
2.30% [kernel] [k] stmmac_napi_poll_rx
1.11% [kernel] [k] memset
after:
Overhead Shared O Symbol
45.69% [kernel] [k] __asm_copy_to_user
29.06% [kernel] [k] memcpy
4.09% [kernel] [k] sifive_l2_flush64_range
2.77% [kernel] [k] stmmac_napi_poll_rx
1.24% [kernel] [k] memset
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
---
arch/riscv/include/asm/string.h | 8 ++-
arch/riscv/kernel/riscv_ksyms.c | 2 -
arch/riscv/lib/Makefile | 2 +-
arch/riscv/lib/memcpy.S | 108 --------------------------------
arch/riscv/lib/string.c | 90 ++++++++++++++++++++++++++
5 files changed, 97 insertions(+), 113 deletions(-)
delete mode 100644 arch/riscv/lib/memcpy.S
create mode 100644 arch/riscv/lib/string.c
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 909049366555..6b5d6fc3eab4 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -12,9 +12,13 @@
#define __HAVE_ARCH_MEMSET
extern asmlinkage void *memset(void *, int, size_t);
extern asmlinkage void *__memset(void *, int, size_t);
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
#define __HAVE_ARCH_MEMCPY
-extern asmlinkage void *memcpy(void *, const void *, size_t);
-extern asmlinkage void *__memcpy(void *, const void *, size_t);
+extern void *memcpy(void *dest, const void *src, size_t count);
+extern void *__memcpy(void *dest, const void *src, size_t count);
+#endif
+
#define __HAVE_ARCH_MEMMOVE
extern asmlinkage void *memmove(void *, const void *, size_t);
extern asmlinkage void *__memmove(void *, const void *, size_t);
diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
index 5ab1c7e1a6ed..3f6d512a5b97 100644
--- a/arch/riscv/kernel/riscv_ksyms.c
+++ b/arch/riscv/kernel/riscv_ksyms.c
@@ -10,8 +10,6 @@
* Assembly functions that may be used (directly or indirectly) by modules
*/
EXPORT_SYMBOL(memset);
-EXPORT_SYMBOL(memcpy);
EXPORT_SYMBOL(memmove);
EXPORT_SYMBOL(__memset);
-EXPORT_SYMBOL(__memcpy);
EXPORT_SYMBOL(__memmove);
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 25d5c9664e57..2ffe85d4baee 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -1,9 +1,9 @@
# SPDX-License-Identifier: GPL-2.0-only
lib-y += delay.o
-lib-y += memcpy.o
lib-y += memset.o
lib-y += memmove.o
lib-$(CONFIG_MMU) += uaccess.o
lib-$(CONFIG_64BIT) += tishift.o
+lib-$(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE) += string.o
obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/riscv/lib/memcpy.S b/arch/riscv/lib/memcpy.S
deleted file mode 100644
index 51ab716253fa..000000000000
--- a/arch/riscv/lib/memcpy.S
+++ /dev/null
@@ -1,108 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2013 Regents of the University of California
- */
-
-#include <linux/linkage.h>
-#include <asm/asm.h>
-
-/* void *memcpy(void *, const void *, size_t) */
-ENTRY(__memcpy)
-WEAK(memcpy)
- move t6, a0 /* Preserve return value */
-
- /* Defer to byte-oriented copy for small sizes */
- sltiu a3, a2, 128
- bnez a3, 4f
- /* Use word-oriented copy only if low-order bits match */
- andi a3, t6, SZREG-1
- andi a4, a1, SZREG-1
- bne a3, a4, 4f
-
- beqz a3, 2f /* Skip if already aligned */
- /*
- * Round to nearest double word-aligned address
- * greater than or equal to start address
- */
- andi a3, a1, ~(SZREG-1)
- addi a3, a3, SZREG
- /* Handle initial misalignment */
- sub a4, a3, a1
-1:
- lb a5, 0(a1)
- addi a1, a1, 1
- sb a5, 0(t6)
- addi t6, t6, 1
- bltu a1, a3, 1b
- sub a2, a2, a4 /* Update count */
-
-2:
- andi a4, a2, ~((16*SZREG)-1)
- beqz a4, 4f
- add a3, a1, a4
-3:
- REG_L a4, 0(a1)
- REG_L a5, SZREG(a1)
- REG_L a6, 2*SZREG(a1)
- REG_L a7, 3*SZREG(a1)
- REG_L t0, 4*SZREG(a1)
- REG_L t1, 5*SZREG(a1)
- REG_L t2, 6*SZREG(a1)
- REG_L t3, 7*SZREG(a1)
- REG_L t4, 8*SZREG(a1)
- REG_L t5, 9*SZREG(a1)
- REG_S a4, 0(t6)
- REG_S a5, SZREG(t6)
- REG_S a6, 2*SZREG(t6)
- REG_S a7, 3*SZREG(t6)
- REG_S t0, 4*SZREG(t6)
- REG_S t1, 5*SZREG(t6)
- REG_S t2, 6*SZREG(t6)
- REG_S t3, 7*SZREG(t6)
- REG_S t4, 8*SZREG(t6)
- REG_S t5, 9*SZREG(t6)
- REG_L a4, 10*SZREG(a1)
- REG_L a5, 11*SZREG(a1)
- REG_L a6, 12*SZREG(a1)
- REG_L a7, 13*SZREG(a1)
- REG_L t0, 14*SZREG(a1)
- REG_L t1, 15*SZREG(a1)
- addi a1, a1, 16*SZREG
- REG_S a4, 10*SZREG(t6)
- REG_S a5, 11*SZREG(t6)
- REG_S a6, 12*SZREG(t6)
- REG_S a7, 13*SZREG(t6)
- REG_S t0, 14*SZREG(t6)
- REG_S t1, 15*SZREG(t6)
- addi t6, t6, 16*SZREG
- bltu a1, a3, 3b
- andi a2, a2, (16*SZREG)-1 /* Update count */
-
-4:
- /* Handle trailing misalignment */
- beqz a2, 6f
- add a3, a1, a2
-
- /* Use word-oriented copy if co-aligned to word boundary */
- or a5, a1, t6
- or a5, a5, a3
- andi a5, a5, 3
- bnez a5, 5f
-7:
- lw a4, 0(a1)
- addi a1, a1, 4
- sw a4, 0(t6)
- addi t6, t6, 4
- bltu a1, a3, 7b
-
- ret
-
-5:
- lb a4, 0(a1)
- addi a1, a1, 1
- sb a4, 0(t6)
- addi t6, t6, 1
- bltu a1, a3, 5b
-6:
- ret
-END(__memcpy)
diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c
new file mode 100644
index 000000000000..bfc912ee23f8
--- /dev/null
+++ b/arch/riscv/lib/string.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * String functions optimized for hardware which doesn't
+ * handle unaligned memory accesses efficiently.
+ *
+ * Copyright (C) 2021 Matteo Croce
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+
+/* Minimum size for a word copy to be convenient */
+#define BYTES_LONG sizeof(long)
+#define WORD_MASK (BYTES_LONG - 1)
+#define MIN_THRESHOLD (BYTES_LONG * 2)
+
+/* convenience union to avoid cast between different pointer types */
+union types {
+ u8 *as_u8;
+ unsigned long *as_ulong;
+ uintptr_t as_uptr;
+};
+
+union const_types {
+ const u8 *as_u8;
+ unsigned long *as_ulong;
+ uintptr_t as_uptr;
+};
+
+void *__memcpy(void *dest, const void *src, size_t count)
+{
+ union const_types s = { .as_u8 = src };
+ union types d = { .as_u8 = dest };
+ int distance = 0;
+
+ if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
+ if (count < MIN_THRESHOLD)
+ goto copy_remainder;
+
+ /* Copy a byte at time until destination is aligned. */
+ for (; d.as_uptr & WORD_MASK; count--)
+ *d.as_u8++ = *s.as_u8++;
+
+ distance = s.as_uptr & WORD_MASK;
+ }
+
+ if (distance) {
+ unsigned long last, next;
+
+ /*
+ * s is distance bytes ahead of d, and d just reached
+ * the alignment boundary. Move s backward to word align it
+ * and shift data to compensate for distance, in order to do
+ * word-by-word copy.
+ */
+ s.as_u8 -= distance;
+
+ next = s.as_ulong[0];
+ for (; count >= BYTES_LONG; count -= BYTES_LONG) {
+ last = next;
+ next = s.as_ulong[1];
+
+ d.as_ulong[0] = last >> (distance * 8) |
+ next << ((BYTES_LONG - distance) * 8);
+
+ d.as_ulong++;
+ s.as_ulong++;
+ }
+
+ /* Restore s with the original offset. */
+ s.as_u8 += distance;
+ } else {
+ /*
+ * If the source and dest lower bits are the same, do a simple
+ * 32/64 bit wide copy.
+ */
+ for (; count >= BYTES_LONG; count -= BYTES_LONG)
+ *d.as_ulong++ = *s.as_ulong++;
+ }
+
+copy_remainder:
+ while (count--)
+ *d.as_u8++ = *s.as_u8++;
+
+ return dest;
+}
+EXPORT_SYMBOL(__memcpy);
+
+void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
+EXPORT_SYMBOL(memcpy);
--
2.31.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 2/3] riscv: optimized memmove
2021-09-19 19:21 [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
2021-09-19 19:21 ` [PATCH v4 1/3] riscv: optimized memcpy Matteo Croce
@ 2021-09-19 19:21 ` Matteo Croce
2021-09-19 22:05 ` kernel test robot
2021-09-19 19:21 ` [PATCH v4 3/3] riscv: optimized memset Matteo Croce
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Matteo Croce @ 2021-09-19 19:21 UTC (permalink / raw)
To: linux-riscv
Cc: linux-kernel, linux-arch, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Atish Patra, Emil Renner Berthing, Akira Tsukamoto,
Drew Fustini, Bin Meng, David Laight, Guo Ren, Christoph Hellwig
From: Matteo Croce <mcroce@microsoft.com>
When the destination buffer is before the source one, or when the
buffers doesn't overlap, it's safe to use memcpy() instead, which is
optimized to use a bigger data size possible.
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
---
arch/riscv/include/asm/string.h | 6 ++--
arch/riscv/kernel/riscv_ksyms.c | 2 --
arch/riscv/lib/Makefile | 1 -
arch/riscv/lib/memmove.S | 64 ---------------------------------
arch/riscv/lib/string.c | 23 ++++++++++++
5 files changed, 26 insertions(+), 70 deletions(-)
delete mode 100644 arch/riscv/lib/memmove.S
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 6b5d6fc3eab4..25d9b9078569 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -17,11 +17,11 @@ extern asmlinkage void *__memset(void *, int, size_t);
#define __HAVE_ARCH_MEMCPY
extern void *memcpy(void *dest, const void *src, size_t count);
extern void *__memcpy(void *dest, const void *src, size_t count);
+#define __HAVE_ARCH_MEMMOVE
+extern void *memmove(void *dest, const void *src, size_t count);
+extern void *__memmove(void *dest, const void *src, size_t count);
#endif
-#define __HAVE_ARCH_MEMMOVE
-extern asmlinkage void *memmove(void *, const void *, size_t);
-extern asmlinkage void *__memmove(void *, const void *, size_t);
/* For those files which don't want to check by kasan. */
#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
#define memcpy(dst, src, len) __memcpy(dst, src, len)
diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
index 3f6d512a5b97..361565c4db7e 100644
--- a/arch/riscv/kernel/riscv_ksyms.c
+++ b/arch/riscv/kernel/riscv_ksyms.c
@@ -10,6 +10,4 @@
* Assembly functions that may be used (directly or indirectly) by modules
*/
EXPORT_SYMBOL(memset);
-EXPORT_SYMBOL(memmove);
EXPORT_SYMBOL(__memset);
-EXPORT_SYMBOL(__memmove);
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 2ffe85d4baee..484f5ff7b508 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -1,7 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
lib-y += delay.o
lib-y += memset.o
-lib-y += memmove.o
lib-$(CONFIG_MMU) += uaccess.o
lib-$(CONFIG_64BIT) += tishift.o
lib-$(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE) += string.o
diff --git a/arch/riscv/lib/memmove.S b/arch/riscv/lib/memmove.S
deleted file mode 100644
index 07d1d2152ba5..000000000000
--- a/arch/riscv/lib/memmove.S
+++ /dev/null
@@ -1,64 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-
-#include <linux/linkage.h>
-#include <asm/asm.h>
-
-ENTRY(__memmove)
-WEAK(memmove)
- move t0, a0
- move t1, a1
-
- beq a0, a1, exit_memcpy
- beqz a2, exit_memcpy
- srli t2, a2, 0x2
-
- slt t3, a0, a1
- beqz t3, do_reverse
-
- andi a2, a2, 0x3
- li t4, 1
- beqz t2, byte_copy
-
-word_copy:
- lw t3, 0(a1)
- addi t2, t2, -1
- addi a1, a1, 4
- sw t3, 0(a0)
- addi a0, a0, 4
- bnez t2, word_copy
- beqz a2, exit_memcpy
- j byte_copy
-
-do_reverse:
- add a0, a0, a2
- add a1, a1, a2
- andi a2, a2, 0x3
- li t4, -1
- beqz t2, reverse_byte_copy
-
-reverse_word_copy:
- addi a1, a1, -4
- addi t2, t2, -1
- lw t3, 0(a1)
- addi a0, a0, -4
- sw t3, 0(a0)
- bnez t2, reverse_word_copy
- beqz a2, exit_memcpy
-
-reverse_byte_copy:
- addi a0, a0, -1
- addi a1, a1, -1
-
-byte_copy:
- lb t3, 0(a1)
- addi a2, a2, -1
- sb t3, 0(a0)
- add a1, a1, t4
- add a0, a0, t4
- bnez a2, byte_copy
-
-exit_memcpy:
- move a0, t0
- move a1, t1
- ret
-END(__memmove)
diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c
index bfc912ee23f8..da033af8fe2f 100644
--- a/arch/riscv/lib/string.c
+++ b/arch/riscv/lib/string.c
@@ -88,3 +88,26 @@ EXPORT_SYMBOL(__memcpy);
void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
EXPORT_SYMBOL(memcpy);
+
+/*
+ * Simply check if the buffer overlaps an call memcpy() in case,
+ * otherwise do a simple one byte at time backward copy.
+ */
+void *__memmove(void *dest, const void *src, size_t count)
+{
+ if (dest < src || src + count <= dest)
+ return memcpy(dest, src, count);
+
+ if (dest > src) {
+ const char *s = src + count;
+ char *tmp = dest + count;
+
+ while (count--)
+ *--tmp = *--s;
+ }
+ return dest;
+}
+EXPORT_SYMBOL(__memmove);
+
+void *memmove(void *dest, const void *src, size_t count) __weak __alias(__memmove);
+EXPORT_SYMBOL(memmove);
--
2.31.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 3/3] riscv: optimized memset
2021-09-19 19:21 [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
2021-09-19 19:21 ` [PATCH v4 1/3] riscv: optimized memcpy Matteo Croce
2021-09-19 19:21 ` [PATCH v4 2/3] riscv: optimized memmove Matteo Croce
@ 2021-09-19 19:21 ` Matteo Croce
2021-09-19 22:00 ` [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
2021-10-08 1:26 ` Palmer Dabbelt
4 siblings, 0 replies; 10+ messages in thread
From: Matteo Croce @ 2021-09-19 19:21 UTC (permalink / raw)
To: linux-riscv
Cc: linux-kernel, linux-arch, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Atish Patra, Emil Renner Berthing, Akira Tsukamoto,
Drew Fustini, Bin Meng, David Laight, Guo Ren, Christoph Hellwig
From: Matteo Croce <mcroce@microsoft.com>
The generic memset is defined as a byte at time write. This is always
safe, but it's slower than a 4 byte or even 8 byte write.
Write a generic memset which fills the data one byte at time until the
destination is aligned, then fills using the largest size allowed,
and finally fills the remaining data one byte at time.
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
---
arch/riscv/include/asm/string.h | 10 +--
arch/riscv/kernel/Makefile | 1 -
arch/riscv/kernel/riscv_ksyms.c | 13 ----
arch/riscv/lib/Makefile | 1 -
arch/riscv/lib/memset.S | 113 --------------------------------
arch/riscv/lib/string.c | 41 ++++++++++++
6 files changed, 44 insertions(+), 135 deletions(-)
delete mode 100644 arch/riscv/kernel/riscv_ksyms.c
delete mode 100644 arch/riscv/lib/memset.S
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 25d9b9078569..90500635035a 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -6,14 +6,10 @@
#ifndef _ASM_RISCV_STRING_H
#define _ASM_RISCV_STRING_H
-#include <linux/types.h>
-#include <linux/linkage.h>
-
-#define __HAVE_ARCH_MEMSET
-extern asmlinkage void *memset(void *, int, size_t);
-extern asmlinkage void *__memset(void *, int, size_t);
-
#ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
+#define __HAVE_ARCH_MEMSET
+extern void *memset(void *s, int c, size_t count);
+extern void *__memset(void *s, int c, size_t count);
#define __HAVE_ARCH_MEMCPY
extern void *memcpy(void *dest, const void *src, size_t count);
extern void *__memcpy(void *dest, const void *src, size_t count);
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 3397ddac1a30..fecf03822435 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -31,7 +31,6 @@ obj-y += syscall_table.o
obj-y += sys_riscv.o
obj-y += time.o
obj-y += traps.o
-obj-y += riscv_ksyms.o
obj-y += stacktrace.o
obj-y += cacheinfo.o
obj-y += patch.o
diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
deleted file mode 100644
index 361565c4db7e..000000000000
--- a/arch/riscv/kernel/riscv_ksyms.c
+++ /dev/null
@@ -1,13 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (C) 2017 Zihao Yu
- */
-
-#include <linux/export.h>
-#include <linux/uaccess.h>
-
-/*
- * Assembly functions that may be used (directly or indirectly) by modules
- */
-EXPORT_SYMBOL(memset);
-EXPORT_SYMBOL(__memset);
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 484f5ff7b508..e33263cc622a 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -1,6 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
lib-y += delay.o
-lib-y += memset.o
lib-$(CONFIG_MMU) += uaccess.o
lib-$(CONFIG_64BIT) += tishift.o
lib-$(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE) += string.o
diff --git a/arch/riscv/lib/memset.S b/arch/riscv/lib/memset.S
deleted file mode 100644
index 34c5360c6705..000000000000
--- a/arch/riscv/lib/memset.S
+++ /dev/null
@@ -1,113 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2013 Regents of the University of California
- */
-
-
-#include <linux/linkage.h>
-#include <asm/asm.h>
-
-/* void *memset(void *, int, size_t) */
-ENTRY(__memset)
-WEAK(memset)
- move t0, a0 /* Preserve return value */
-
- /* Defer to byte-oriented fill for small sizes */
- sltiu a3, a2, 16
- bnez a3, 4f
-
- /*
- * Round to nearest XLEN-aligned address
- * greater than or equal to start address
- */
- addi a3, t0, SZREG-1
- andi a3, a3, ~(SZREG-1)
- beq a3, t0, 2f /* Skip if already aligned */
- /* Handle initial misalignment */
- sub a4, a3, t0
-1:
- sb a1, 0(t0)
- addi t0, t0, 1
- bltu t0, a3, 1b
- sub a2, a2, a4 /* Update count */
-
-2: /* Duff's device with 32 XLEN stores per iteration */
- /* Broadcast value into all bytes */
- andi a1, a1, 0xff
- slli a3, a1, 8
- or a1, a3, a1
- slli a3, a1, 16
- or a1, a3, a1
-#ifdef CONFIG_64BIT
- slli a3, a1, 32
- or a1, a3, a1
-#endif
-
- /* Calculate end address */
- andi a4, a2, ~(SZREG-1)
- add a3, t0, a4
-
- andi a4, a4, 31*SZREG /* Calculate remainder */
- beqz a4, 3f /* Shortcut if no remainder */
- neg a4, a4
- addi a4, a4, 32*SZREG /* Calculate initial offset */
-
- /* Adjust start address with offset */
- sub t0, t0, a4
-
- /* Jump into loop body */
- /* Assumes 32-bit instruction lengths */
- la a5, 3f
-#ifdef CONFIG_64BIT
- srli a4, a4, 1
-#endif
- add a5, a5, a4
- jr a5
-3:
- REG_S a1, 0(t0)
- REG_S a1, SZREG(t0)
- REG_S a1, 2*SZREG(t0)
- REG_S a1, 3*SZREG(t0)
- REG_S a1, 4*SZREG(t0)
- REG_S a1, 5*SZREG(t0)
- REG_S a1, 6*SZREG(t0)
- REG_S a1, 7*SZREG(t0)
- REG_S a1, 8*SZREG(t0)
- REG_S a1, 9*SZREG(t0)
- REG_S a1, 10*SZREG(t0)
- REG_S a1, 11*SZREG(t0)
- REG_S a1, 12*SZREG(t0)
- REG_S a1, 13*SZREG(t0)
- REG_S a1, 14*SZREG(t0)
- REG_S a1, 15*SZREG(t0)
- REG_S a1, 16*SZREG(t0)
- REG_S a1, 17*SZREG(t0)
- REG_S a1, 18*SZREG(t0)
- REG_S a1, 19*SZREG(t0)
- REG_S a1, 20*SZREG(t0)
- REG_S a1, 21*SZREG(t0)
- REG_S a1, 22*SZREG(t0)
- REG_S a1, 23*SZREG(t0)
- REG_S a1, 24*SZREG(t0)
- REG_S a1, 25*SZREG(t0)
- REG_S a1, 26*SZREG(t0)
- REG_S a1, 27*SZREG(t0)
- REG_S a1, 28*SZREG(t0)
- REG_S a1, 29*SZREG(t0)
- REG_S a1, 30*SZREG(t0)
- REG_S a1, 31*SZREG(t0)
- addi t0, t0, 32*SZREG
- bltu t0, a3, 3b
- andi a2, a2, SZREG-1 /* Update count */
-
-4:
- /* Handle trailing misalignment */
- beqz a2, 6f
- add a3, t0, a2
-5:
- sb a1, 0(t0)
- addi t0, t0, 1
- bltu t0, a3, 5b
-6:
- ret
-END(__memset)
diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c
index da033af8fe2f..12daa71698fb 100644
--- a/arch/riscv/lib/string.c
+++ b/arch/riscv/lib/string.c
@@ -111,3 +111,44 @@ EXPORT_SYMBOL(__memmove);
void *memmove(void *dest, const void *src, size_t count) __weak __alias(__memmove);
EXPORT_SYMBOL(memmove);
+
+void *__memset(void *s, int c, size_t count)
+{
+ union types dest = { .as_u8 = s };
+
+ if (count >= MIN_THRESHOLD) {
+ unsigned long cu = (unsigned long)c;
+
+ /* Compose an ulong with 'c' repeated 4/8 times */
+#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER
+ cu *= 0x0101010101010101UL;
+#else
+ cu |= cu << 8;
+ cu |= cu << 16;
+ /* Suppress warning on 32 bit machines */
+ cu |= (cu << 16) << 16;
+#endif
+ if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
+ /*
+ * Fill the buffer one byte at time until
+ * the destination is word aligned.
+ */
+ for (; count && dest.as_uptr & WORD_MASK; count--)
+ *dest.as_u8++ = c;
+ }
+
+ /* Copy using the largest size allowed */
+ for (; count >= BYTES_LONG; count -= BYTES_LONG)
+ *dest.as_ulong++ = cu;
+ }
+
+ /* copy the remainder */
+ while (count--)
+ *dest.as_u8++ = c;
+
+ return s;
+}
+EXPORT_SYMBOL(__memset);
+
+void *memset(void *s, int c, size_t count) __weak __alias(__memset);
+EXPORT_SYMBOL(memset);
--
2.31.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/3] riscv: optimized mem* functions
2021-09-19 19:21 [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
` (2 preceding siblings ...)
2021-09-19 19:21 ` [PATCH v4 3/3] riscv: optimized memset Matteo Croce
@ 2021-09-19 22:00 ` Matteo Croce
2021-10-08 1:26 ` Palmer Dabbelt
4 siblings, 0 replies; 10+ messages in thread
From: Matteo Croce @ 2021-09-19 22:00 UTC (permalink / raw)
To: linux-riscv
Cc: Linux Kernel Mailing List, linux-arch, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Atish Patra, Emil Renner Berthing,
Akira Tsukamoto, Drew Fustini, Bin Meng, David Laight, Guo Ren,
Christoph Hellwig
On Sun, Sep 19, 2021 at 9:21 PM Matteo Croce <mcroce@linux.microsoft.com> wrote:
>
> From: Matteo Croce <mcroce@microsoft.com>
>
> Replace the assembly mem{cpy,move,set} with C equivalent.
>
> Try to access RAM with the largest bit width possible, but without
> doing unaligned accesses.
>
> A further improvement could be to use multiple read and writes as the
> assembly version was trying to do.
>
> Tested on a BeagleV Starlight with a SiFive U74 core, where the
> improvement is noticeable.
>
> v3 -> v4:
> - incorporate changes from proposed generic version:
> https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/
>
Sorry, the correct link is:
https://lore.kernel.org/lkml/20210702123153.14093-1-mcroce@linux.microsoft.com/
--
per aspera ad upstream
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/3] riscv: optimized memmove
2021-09-19 19:21 ` [PATCH v4 2/3] riscv: optimized memmove Matteo Croce
@ 2021-09-19 22:05 ` kernel test robot
2021-09-27 10:48 ` Matteo Croce
0 siblings, 1 reply; 10+ messages in thread
From: kernel test robot @ 2021-09-19 22:05 UTC (permalink / raw)
To: Matteo Croce, linux-riscv
Cc: kbuild-all, linux-kernel, linux-arch, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Atish Patra, Emil Renner Berthing,
Akira Tsukamoto, Drew Fustini
[-- Attachment #1: Type: text/plain, Size: 2434 bytes --]
Hi Matteo,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.15-rc1 next-20210917]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Matteo-Croce/riscv-optimized-mem-functions/20210920-032303
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bdb575f872175ed0ecf2638369da1cb7a6e86a14
config: riscv-randconfig-r004-20210919 (attached as .config)
compiler: riscv64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/9a948fd7d78a58890608e9dd0f77e5ff84f36e3e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Matteo-Croce/riscv-optimized-mem-functions/20210920-032303
git checkout 9a948fd7d78a58890608e9dd0f77e5ff84f36e3e
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=riscv SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arch/riscv/lib/string.c: In function '__memmove':
>> arch/riscv/lib/string.c:89:7: error: inlining failed in call to 'always_inline' 'memcpy': function body can be overwritten at link time
89 | void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
| ^~~~~~
arch/riscv/lib/string.c:99:24: note: called from here
99 | return memcpy(dest, src, count);
| ^~~~~~~~~~~~~~~~~~~~~~~~
vim +89 arch/riscv/lib/string.c
86c5866e9b7fdd Matteo Croce 2021-09-19 88
86c5866e9b7fdd Matteo Croce 2021-09-19 @89 void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
86c5866e9b7fdd Matteo Croce 2021-09-19 90 EXPORT_SYMBOL(memcpy);
9a948fd7d78a58 Matteo Croce 2021-09-19 91
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32582 bytes --]
[-- Attachment #3: Type: text/plain, Size: 161 bytes --]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/3] riscv: optimized memmove
2021-09-19 22:05 ` kernel test robot
@ 2021-09-27 10:48 ` Matteo Croce
2021-09-29 17:04 ` Emil Renner Berthing
0 siblings, 1 reply; 10+ messages in thread
From: Matteo Croce @ 2021-09-27 10:48 UTC (permalink / raw)
To: kernel test robot
Cc: linux-riscv, kbuild-all, Linux Kernel Mailing List, linux-arch,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
Emil Renner Berthing, Akira Tsukamoto, Drew Fustini
On Mon, Sep 20, 2021 at 12:06 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Matteo,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linux/master]
> [also build test ERROR on linus/master v5.15-rc1 next-20210917]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Matteo-Croce/riscv-optimized-mem-functions/20210920-032303
> base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bdb575f872175ed0ecf2638369da1cb7a6e86a14
> config: riscv-randconfig-r004-20210919 (attached as .config)
> compiler: riscv64-linux-gcc (GCC) 11.2.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/0day-ci/linux/commit/9a948fd7d78a58890608e9dd0f77e5ff84f36e3e
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Matteo-Croce/riscv-optimized-mem-functions/20210920-032303
> git checkout 9a948fd7d78a58890608e9dd0f77e5ff84f36e3e
> # save the attached .config to linux build tree
> mkdir build_dir
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=riscv SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
> arch/riscv/lib/string.c: In function '__memmove':
> >> arch/riscv/lib/string.c:89:7: error: inlining failed in call to 'always_inline' 'memcpy': function body can be overwritten at link time
> 89 | void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
> | ^~~~~~
> arch/riscv/lib/string.c:99:24: note: called from here
> 99 | return memcpy(dest, src, count);
> | ^~~~~~~~~~~~~~~~~~~~~~~~
>
>
> vim +89 arch/riscv/lib/string.c
>
> 86c5866e9b7fdd Matteo Croce 2021-09-19 88
> 86c5866e9b7fdd Matteo Croce 2021-09-19 @89 void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
> 86c5866e9b7fdd Matteo Croce 2021-09-19 90 EXPORT_SYMBOL(memcpy);
> 9a948fd7d78a58 Matteo Croce 2021-09-19 91
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
How can we fix this? Maybe calling __memcpy() instead?
--
per aspera ad upstream
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/3] riscv: optimized memmove
2021-09-27 10:48 ` Matteo Croce
@ 2021-09-29 17:04 ` Emil Renner Berthing
0 siblings, 0 replies; 10+ messages in thread
From: Emil Renner Berthing @ 2021-09-29 17:04 UTC (permalink / raw)
To: Matteo Croce
Cc: kernel test robot, linux-riscv, kbuild-all,
Linux Kernel Mailing List, linux-arch, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Atish Patra, Akira Tsukamoto,
Drew Fustini
On Mon, 27 Sept 2021 at 12:49, Matteo Croce <mcroce@linux.microsoft.com> wrote:
>
> On Mon, Sep 20, 2021 at 12:06 AM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Matteo,
> >
> > Thank you for the patch! Yet something to improve:
> >
> > [auto build test ERROR on linux/master]
> > [also build test ERROR on linus/master v5.15-rc1 next-20210917]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch]
> >
> > url: https://github.com/0day-ci/linux/commits/Matteo-Croce/riscv-optimized-mem-functions/20210920-032303
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bdb575f872175ed0ecf2638369da1cb7a6e86a14
> > config: riscv-randconfig-r004-20210919 (attached as .config)
> > compiler: riscv64-linux-gcc (GCC) 11.2.0
> > reproduce (this is a W=1 build):
> > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # https://github.com/0day-ci/linux/commit/9a948fd7d78a58890608e9dd0f77e5ff84f36e3e
> > git remote add linux-review https://github.com/0day-ci/linux
> > git fetch --no-tags linux-review Matteo-Croce/riscv-optimized-mem-functions/20210920-032303
> > git checkout 9a948fd7d78a58890608e9dd0f77e5ff84f36e3e
> > # save the attached .config to linux build tree
> > mkdir build_dir
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=riscv SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> > All errors (new ones prefixed by >>):
> >
> > arch/riscv/lib/string.c: In function '__memmove':
> > >> arch/riscv/lib/string.c:89:7: error: inlining failed in call to 'always_inline' 'memcpy': function body can be overwritten at link time
> > 89 | void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
> > | ^~~~~~
> > arch/riscv/lib/string.c:99:24: note: called from here
> > 99 | return memcpy(dest, src, count);
> > | ^~~~~~~~~~~~~~~~~~~~~~~~
> >
> >
> > vim +89 arch/riscv/lib/string.c
> >
> > 86c5866e9b7fdd Matteo Croce 2021-09-19 88
> > 86c5866e9b7fdd Matteo Croce 2021-09-19 @89 void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy);
> > 86c5866e9b7fdd Matteo Croce 2021-09-19 90 EXPORT_SYMBOL(memcpy);
> > 9a948fd7d78a58 Matteo Croce 2021-09-19 91
> >
> > ---
> > 0-DAY CI Kernel Test Service, Intel Corporation
> > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>
> How can we fix this? Maybe calling __memcpy() instead?
Yes, that fixes building with CONFIG_FORTIFY_SOURCE=y for me. Kasan
already wraps memmove itself, so it should be fine to call __memcpy
directly.
/Emil
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/3] riscv: optimized mem* functions
2021-09-19 19:21 [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
` (3 preceding siblings ...)
2021-09-19 22:00 ` [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
@ 2021-10-08 1:26 ` Palmer Dabbelt
2021-10-08 1:39 ` Matteo Croce
4 siblings, 1 reply; 10+ messages in thread
From: Palmer Dabbelt @ 2021-10-08 1:26 UTC (permalink / raw)
To: mcroce
Cc: linux-riscv, linux-kernel, linux-arch, Paul Walmsley, aou,
Atish Patra, kernel, akira.tsukamoto, drew, bmeng.cn,
David.Laight, guoren, Christoph Hellwig
On Sun, 19 Sep 2021 12:21:01 PDT (-0700), mcroce@linux.microsoft.com wrote:
> From: Matteo Croce <mcroce@microsoft.com>
>
> Replace the assembly mem{cpy,move,set} with C equivalent.
>
> Try to access RAM with the largest bit width possible, but without
> doing unaligned accesses.
>
> A further improvement could be to use multiple read and writes as the
> assembly version was trying to do.
>
> Tested on a BeagleV Starlight with a SiFive U74 core, where the
> improvement is noticeable.
>
> v3 -> v4:
> - incorporate changes from proposed generic version:
> https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/
>
> v2 -> v3:
> - alias mem* to __mem* and not viceversa
> - use __alias instead of a tail call
>
> v1 -> v2:
> - reduce the threshold from 64 to 16 bytes
> - fix KASAN build
> - optimize memset
>
> Matteo Croce (3):
> riscv: optimized memcpy
> riscv: optimized memmove
> riscv: optimized memset
>
> arch/riscv/include/asm/string.h | 18 ++--
> arch/riscv/kernel/Makefile | 1 -
> arch/riscv/kernel/riscv_ksyms.c | 17 ----
> arch/riscv/lib/Makefile | 4 +-
> arch/riscv/lib/memcpy.S | 108 ----------------------
> arch/riscv/lib/memmove.S | 64 -------------
> arch/riscv/lib/memset.S | 113 -----------------------
> arch/riscv/lib/string.c | 154 ++++++++++++++++++++++++++++++++
> 8 files changed, 164 insertions(+), 315 deletions(-)
> delete mode 100644 arch/riscv/kernel/riscv_ksyms.c
> delete mode 100644 arch/riscv/lib/memcpy.S
> delete mode 100644 arch/riscv/lib/memmove.S
> delete mode 100644 arch/riscv/lib/memset.S
> create mode 100644 arch/riscv/lib/string.c
Thanks. These generally look good, but they're failing to build for me.
I'm getting errors along the lines of
arch/riscv/lib/string.c:89:7: error: inlining failed in call to ‘always_inline’ ‘memcpy’: function body can be overwritten at link time
89 | void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy); | ^~~~~~
arch/riscv/lib/string.c:99:10: note: called from here
99 | return memcpy(dest, src, count);
| ^~~~~~~~~~~~~~~~~~~~~~~~
I'm still a bit behind on email so I'm going to keep going through
patches, but if there's no v5 by the time I get back here then I'll take
a look.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/3] riscv: optimized mem* functions
2021-10-08 1:26 ` Palmer Dabbelt
@ 2021-10-08 1:39 ` Matteo Croce
0 siblings, 0 replies; 10+ messages in thread
From: Matteo Croce @ 2021-10-08 1:39 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: linux-riscv, Linux Kernel Mailing List, linux-arch,
Paul Walmsley, Albert Ou, Atish Patra, Emil Renner Berthing,
Akira Tsukamoto, Drew Fustini, Bin Meng, David Laight, Guo Ren,
Christoph Hellwig
On Fri, Oct 8, 2021 at 3:26 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Sun, 19 Sep 2021 12:21:01 PDT (-0700), mcroce@linux.microsoft.com wrote:
> > From: Matteo Croce <mcroce@microsoft.com>
> >
> > Replace the assembly mem{cpy,move,set} with C equivalent.
> >
> > Try to access RAM with the largest bit width possible, but without
> > doing unaligned accesses.
> >
> > A further improvement could be to use multiple read and writes as the
> > assembly version was trying to do.
> >
> > Tested on a BeagleV Starlight with a SiFive U74 core, where the
> > improvement is noticeable.
> >
> > v3 -> v4:
> > - incorporate changes from proposed generic version:
> > https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/
> >
> > v2 -> v3:
> > - alias mem* to __mem* and not viceversa
> > - use __alias instead of a tail call
> >
> > v1 -> v2:
> > - reduce the threshold from 64 to 16 bytes
> > - fix KASAN build
> > - optimize memset
> >
> > Matteo Croce (3):
> > riscv: optimized memcpy
> > riscv: optimized memmove
> > riscv: optimized memset
> >
> > arch/riscv/include/asm/string.h | 18 ++--
> > arch/riscv/kernel/Makefile | 1 -
> > arch/riscv/kernel/riscv_ksyms.c | 17 ----
> > arch/riscv/lib/Makefile | 4 +-
> > arch/riscv/lib/memcpy.S | 108 ----------------------
> > arch/riscv/lib/memmove.S | 64 -------------
> > arch/riscv/lib/memset.S | 113 -----------------------
> > arch/riscv/lib/string.c | 154 ++++++++++++++++++++++++++++++++
> > 8 files changed, 164 insertions(+), 315 deletions(-)
> > delete mode 100644 arch/riscv/kernel/riscv_ksyms.c
> > delete mode 100644 arch/riscv/lib/memcpy.S
> > delete mode 100644 arch/riscv/lib/memmove.S
> > delete mode 100644 arch/riscv/lib/memset.S
> > create mode 100644 arch/riscv/lib/string.c
>
> Thanks. These generally look good, but they're failing to build for me.
> I'm getting errors along the lines of
>
> arch/riscv/lib/string.c:89:7: error: inlining failed in call to ‘always_inline’ ‘memcpy’: function body can be overwritten at link time
> 89 | void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy); | ^~~~~~
> arch/riscv/lib/string.c:99:10: note: called from here
> 99 | return memcpy(dest, src, count);
> | ^~~~~~~~~~~~~~~~~~~~~~~~
>
> I'm still a bit behind on email so I'm going to keep going through
> patches, but if there's no v5 by the time I get back here then I'll take
> a look.
I've sent a v5 here:
https://lore.kernel.org/linux-riscv/20210929172234.31620-1-mcroce@linux.microsoft.com/
Regards,
--
per aspera ad upstream
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-10-08 1:41 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-19 19:21 [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
2021-09-19 19:21 ` [PATCH v4 1/3] riscv: optimized memcpy Matteo Croce
2021-09-19 19:21 ` [PATCH v4 2/3] riscv: optimized memmove Matteo Croce
2021-09-19 22:05 ` kernel test robot
2021-09-27 10:48 ` Matteo Croce
2021-09-29 17:04 ` Emil Renner Berthing
2021-09-19 19:21 ` [PATCH v4 3/3] riscv: optimized memset Matteo Croce
2021-09-19 22:00 ` [PATCH v4 0/3] riscv: optimized mem* functions Matteo Croce
2021-10-08 1:26 ` Palmer Dabbelt
2021-10-08 1:39 ` Matteo Croce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).