linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] microblaze: Fix issues with freestanding
@ 2022-02-25 13:55 Michal Simek
  2022-02-25 13:55 ` [PATCH v2 1/3] microblaze: Use simple memset implementation from lib/string.c Michal Simek
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Michal Simek @ 2022-02-25 13:55 UTC (permalink / raw)
  To: linux-kernel, monstr, michal.simek, git; +Cc: Mahesh Bodapati, Randy Dunlap

Hi,

with GCC 10 there is issue with simple memset implementation which is
called recursively. There are couple of discussions about it and the first
two patches are trying to workaround this.
The third patch only removes simple implementations from arch code and use
generic one which is the same.

Thanks,
Michal

I sent only 1 patch in v1 that's why sending v2 with all 3.


Changes in v2:
- missing patch in v1
- missing patch in v1

Michal Simek (3):
  microblaze: Use simple memset implementation from lib/string.c
  microblaze: Do loop unrolling for optimized memset implementation
  microblaze: Use simple memmove/memcpy implementation from lib/string.c

 arch/microblaze/include/asm/string.h |  2 ++
 arch/microblaze/lib/memcpy.c         | 18 ++-------------
 arch/microblaze/lib/memmove.c        | 29 ++----------------------
 arch/microblaze/lib/memset.c         | 33 ++++++++++++----------------
 4 files changed, 20 insertions(+), 62 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] microblaze: Use simple memset implementation from lib/string.c
  2022-02-25 13:55 [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek
@ 2022-02-25 13:55 ` Michal Simek
  2022-02-25 13:55 ` [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation Michal Simek
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Michal Simek @ 2022-02-25 13:55 UTC (permalink / raw)
  To: linux-kernel, monstr, michal.simek, git; +Cc: Mahesh Bodapati, Randy Dunlap

On microblaze systems which are not using OPT_LIB_FUNCTION only simple
memset is used. This function is already implemented in lib/string.c that's
why it should be used instead.
This change is done in respect of issue fixed by commit 33d0f96ffd73
("lib/string.c: Use freestanding environment") where gcc-10.x moved
-ftree-loop-distribute-patterns optimization is to O2 optimization level.
This optimization causes GCC to convert the while loop in memset.c into a
call to memset. So This optimization is transforming a loop in a
memset/memcpy into a call to the function itself. This makes the memset
implementation as recursive.

Based on fix above -ffreestanding was used and it needs to be used on
Microblaze too but the patch is not adding this flag it removes simple
implementation to cause that generic implementation is used where this flag
is already setup.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Mahesh Bodapati <mbodapat@xilinx.com>
---

Changes in v2:
- missing patch in v1

 arch/microblaze/include/asm/string.h |  2 ++
 arch/microblaze/lib/memset.c         | 20 ++------------------
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/arch/microblaze/include/asm/string.h b/arch/microblaze/include/asm/string.h
index 34071a848b6a..dbdb9eb4a733 100644
--- a/arch/microblaze/include/asm/string.h
+++ b/arch/microblaze/include/asm/string.h
@@ -8,7 +8,9 @@
 
 #ifdef __KERNEL__
 
+#ifdef CONFIG_OPT_LIB_FUNCTION
 #define __HAVE_ARCH_MEMSET
+#endif
 #define __HAVE_ARCH_MEMCPY
 #define __HAVE_ARCH_MEMMOVE
 
diff --git a/arch/microblaze/lib/memset.c b/arch/microblaze/lib/memset.c
index eb6c8988af02..615a2f8f53cb 100644
--- a/arch/microblaze/lib/memset.c
+++ b/arch/microblaze/lib/memset.c
@@ -30,22 +30,7 @@
 #include <linux/compiler.h>
 #include <linux/string.h>
 
-#ifdef __HAVE_ARCH_MEMSET
-#ifndef CONFIG_OPT_LIB_FUNCTION
-void *memset(void *v_src, int c, __kernel_size_t n)
-{
-	char *src = v_src;
-
-	/* Truncate c to 8 bits */
-	c = (c & 0xFF);
-
-	/* Simple, byte oriented memset or the rest of count. */
-	while (n--)
-		*src++ = c;
-
-	return v_src;
-}
-#else /* CONFIG_OPT_LIB_FUNCTION */
+#ifdef CONFIG_OPT_LIB_FUNCTION
 void *memset(void *v_src, int c, __kernel_size_t n)
 {
 	char *src = v_src;
@@ -94,6 +79,5 @@ void *memset(void *v_src, int c, __kernel_size_t n)
 
 	return v_src;
 }
-#endif /* CONFIG_OPT_LIB_FUNCTION */
 EXPORT_SYMBOL(memset);
-#endif /* __HAVE_ARCH_MEMSET */
+#endif /* CONFIG_OPT_LIB_FUNCTION */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation
  2022-02-25 13:55 [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek
  2022-02-25 13:55 ` [PATCH v2 1/3] microblaze: Use simple memset implementation from lib/string.c Michal Simek
@ 2022-02-25 13:55 ` Michal Simek
  2022-02-25 21:50   ` David Laight
  2022-02-25 13:55 ` [PATCH v2 3/3] microblaze: Use simple memmove/memcpy implementation from lib/string.c Michal Simek
  2022-04-21  8:56 ` [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek
  3 siblings, 1 reply; 7+ messages in thread
From: Michal Simek @ 2022-02-25 13:55 UTC (permalink / raw)
  To: linux-kernel, monstr, michal.simek, git; +Cc: Mahesh Bodapati, Randy Dunlap

Align implementation with memcpy and memmove where also remaining bytes are
copied via final switch case instead of using simple implementations which
loop. But this alignment has much stronger reason and definitely aligning
implementation is not the key point here. It is just good to have in mind
that the same technique is used already there.

In GCC 10, now -ftree-loop-distribute-patterns optimization is on at O2.
This optimization causes GCC to convert the while loop in memset.c into a
call to memset.
So this optimization is transforming a loop in a memset/memcpy into a call
to the function itself. This makes the memset implementation as recursive.
"-freestanding" option will disable the built-in library function but it
has been added in generic library implementation.

In default microblaze kernel defconfig we have CONFIG_OPT_LIB_FUNCTION
enabled so it will always pick optimized version of memset which is target
specific so we are replacing the while() loop with switch case to avoid
recursive memset call.

Issue with freestanding was already discussed in connection to commit
33d0f96ffd73 ("lib/string.c: Use freestanding environment") and also this
is topic in glibc and gcc.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
http://patchwork.ozlabs.org/project/glibc/patch/20191121021040.14554-1-sandra@codesourcery.com/

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Mahesh Bodapati <mbodapat@xilinx.com>
---

Changes in v2:
- missing patch in v1

 arch/microblaze/lib/memset.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/microblaze/lib/memset.c b/arch/microblaze/lib/memset.c
index 615a2f8f53cb..7c2352d56bb0 100644
--- a/arch/microblaze/lib/memset.c
+++ b/arch/microblaze/lib/memset.c
@@ -74,8 +74,19 @@ void *memset(void *v_src, int c, __kernel_size_t n)
 	}
 
 	/* Simple, byte oriented memset or the rest of count. */
-	while (n--)
+	switch (n) {
+	case 3:
 		*src++ = c;
+		fallthrough;
+	case 2:
+		*src++ = c;
+		fallthrough;
+	case 1:
+		*src++ = c;
+		break;
+	default:
+		break;
+	}
 
 	return v_src;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] microblaze: Use simple memmove/memcpy implementation from lib/string.c
  2022-02-25 13:55 [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek
  2022-02-25 13:55 ` [PATCH v2 1/3] microblaze: Use simple memset implementation from lib/string.c Michal Simek
  2022-02-25 13:55 ` [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation Michal Simek
@ 2022-02-25 13:55 ` Michal Simek
  2022-04-21  8:56 ` [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek
  3 siblings, 0 replies; 7+ messages in thread
From: Michal Simek @ 2022-02-25 13:55 UTC (permalink / raw)
  To: linux-kernel, monstr, michal.simek, git; +Cc: Mahesh Bodapati, Randy Dunlap

This is based on previous commit ("microblaze: Use simple memset
implementation from lib/string.c") where generic memset implementation is
used when OPT_LIB_FUNCTION is not defined. The same change can be done for
memset/memcpy implementation where doesn't make sense to have generic
implementation in architecture code.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
---

(no changes since v1)

 arch/microblaze/include/asm/string.h |  2 +-
 arch/microblaze/lib/memcpy.c         | 18 ++---------------
 arch/microblaze/lib/memmove.c        | 29 ++--------------------------
 3 files changed, 5 insertions(+), 44 deletions(-)

diff --git a/arch/microblaze/include/asm/string.h b/arch/microblaze/include/asm/string.h
index dbdb9eb4a733..8798ad2c132a 100644
--- a/arch/microblaze/include/asm/string.h
+++ b/arch/microblaze/include/asm/string.h
@@ -10,13 +10,13 @@
 
 #ifdef CONFIG_OPT_LIB_FUNCTION
 #define __HAVE_ARCH_MEMSET
-#endif
 #define __HAVE_ARCH_MEMCPY
 #define __HAVE_ARCH_MEMMOVE
 
 extern void *memset(void *, int, __kernel_size_t);
 extern void *memcpy(void *, const void *, __kernel_size_t);
 extern void *memmove(void *, const void *, __kernel_size_t);
+#endif
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/microblaze/lib/memcpy.c b/arch/microblaze/lib/memcpy.c
index 63041fdf916d..9966dce55619 100644
--- a/arch/microblaze/lib/memcpy.c
+++ b/arch/microblaze/lib/memcpy.c
@@ -31,20 +31,7 @@
 
 #include <linux/string.h>
 
-#ifdef __HAVE_ARCH_MEMCPY
-#ifndef CONFIG_OPT_LIB_FUNCTION
-void *memcpy(void *v_dst, const void *v_src, __kernel_size_t c)
-{
-	const char *src = v_src;
-	char *dst = v_dst;
-
-	/* Simple, byte oriented memcpy. */
-	while (c--)
-		*dst++ = *src++;
-
-	return v_dst;
-}
-#else /* CONFIG_OPT_LIB_FUNCTION */
+#ifdef CONFIG_OPT_LIB_FUNCTION
 void *memcpy(void *v_dst, const void *v_src, __kernel_size_t c)
 {
 	const char *src = v_src;
@@ -188,6 +175,5 @@ void *memcpy(void *v_dst, const void *v_src, __kernel_size_t c)
 
 	return v_dst;
 }
-#endif /* CONFIG_OPT_LIB_FUNCTION */
 EXPORT_SYMBOL(memcpy);
-#endif /* __HAVE_ARCH_MEMCPY */
+#endif /* CONFIG_OPT_LIB_FUNCTION */
diff --git a/arch/microblaze/lib/memmove.c b/arch/microblaze/lib/memmove.c
index 9862f6b1e59d..2e49d0ef1e07 100644
--- a/arch/microblaze/lib/memmove.c
+++ b/arch/microblaze/lib/memmove.c
@@ -30,31 +30,7 @@
 #include <linux/compiler.h>
 #include <linux/string.h>
 
-#ifdef __HAVE_ARCH_MEMMOVE
-#ifndef CONFIG_OPT_LIB_FUNCTION
-void *memmove(void *v_dst, const void *v_src, __kernel_size_t c)
-{
-	const char *src = v_src;
-	char *dst = v_dst;
-
-	if (!c)
-		return v_dst;
-
-	/* Use memcpy when source is higher than dest */
-	if (v_dst <= v_src)
-		return memcpy(v_dst, v_src, c);
-
-	/* copy backwards, from end to beginning */
-	src += c;
-	dst += c;
-
-	/* Simple, byte oriented memmove. */
-	while (c--)
-		*--dst = *--src;
-
-	return v_dst;
-}
-#else /* CONFIG_OPT_LIB_FUNCTION */
+#ifdef CONFIG_OPT_LIB_FUNCTION
 void *memmove(void *v_dst, const void *v_src, __kernel_size_t c)
 {
 	const char *src = v_src;
@@ -215,6 +191,5 @@ void *memmove(void *v_dst, const void *v_src, __kernel_size_t c)
 	}
 	return v_dst;
 }
-#endif /* CONFIG_OPT_LIB_FUNCTION */
 EXPORT_SYMBOL(memmove);
-#endif /* __HAVE_ARCH_MEMMOVE */
+#endif /* CONFIG_OPT_LIB_FUNCTION */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation
  2022-02-25 13:55 ` [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation Michal Simek
@ 2022-02-25 21:50   ` David Laight
  2022-02-28  6:38     ` Michal Simek
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2022-02-25 21:50 UTC (permalink / raw)
  To: 'Michal Simek', linux-kernel, monstr, git
  Cc: Mahesh Bodapati, Randy Dunlap

From: Michal Simek
> Sent: 25 February 2022 13:56
> 
> Align implementation with memcpy and memmove where also remaining bytes are
> copied via final switch case instead of using simple implementations which
> loop. But this alignment has much stronger reason and definitely aligning
> implementation is not the key point here. It is just good to have in mind
> that the same technique is used already there.
> 
> In GCC 10, now -ftree-loop-distribute-patterns optimization is on at O2.
> This optimization causes GCC to convert the while loop in memset.c into a
> call to memset.

Gah...
That is nearly as brain dead as another compiler that would convert
any byte copy loop (on x86) into 'rep movsb'.

If I want to call memcpy() I'll call memcpy.
If I'm copying a few bytes I might write the loop to avoid
the cost of the call and all the conditional tests for
buffer length and alignment.

Don't the compiler writers have better things to do?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation
  2022-02-25 21:50   ` David Laight
@ 2022-02-28  6:38     ` Michal Simek
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Simek @ 2022-02-28  6:38 UTC (permalink / raw)
  To: David Laight, 'Michal Simek', linux-kernel, monstr, git
  Cc: Mahesh Bodapati, Randy Dunlap



On 2/25/22 22:50, David Laight wrote:
> From: Michal Simek
>> Sent: 25 February 2022 13:56
>>
>> Align implementation with memcpy and memmove where also remaining bytes are
>> copied via final switch case instead of using simple implementations which
>> loop. But this alignment has much stronger reason and definitely aligning
>> implementation is not the key point here. It is just good to have in mind
>> that the same technique is used already there.
>>
>> In GCC 10, now -ftree-loop-distribute-patterns optimization is on at O2.
>> This optimization causes GCC to convert the while loop in memset.c into a
>> call to memset.
> 
> Gah...
> That is nearly as brain dead as another compiler that would convert
> any byte copy loop (on x86) into 'rep movsb'.
> 
> If I want to call memcpy() I'll call memcpy.
> If I'm copying a few bytes I might write the loop to avoid
> the cost of the call and all the conditional tests for
> buffer length and alignment.
> 
> Don't the compiler writers have better things to do?

Not sure what you want me to say about it. It is current gcc behavior and I 
can't see the way back. I don't think doing loop unrolling here is a big deal 
for me because the same technique is used for years in memcpy and memmove.

Thanks,
Michal


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] microblaze: Fix issues with freestanding
  2022-02-25 13:55 [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek
                   ` (2 preceding siblings ...)
  2022-02-25 13:55 ` [PATCH v2 3/3] microblaze: Use simple memmove/memcpy implementation from lib/string.c Michal Simek
@ 2022-04-21  8:56 ` Michal Simek
  3 siblings, 0 replies; 7+ messages in thread
From: Michal Simek @ 2022-04-21  8:56 UTC (permalink / raw)
  To: LKML, Michal Simek, git; +Cc: Mahesh Bodapati, Randy Dunlap

pá 25. 2. 2022 v 14:55 odesílatel Michal Simek <michal.simek@xilinx.com> napsal:
>
> Hi,
>
> with GCC 10 there is issue with simple memset implementation which is
> called recursively. There are couple of discussions about it and the first
> two patches are trying to workaround this.
> The third patch only removes simple implementations from arch code and use
> generic one which is the same.
>
> Thanks,
> Michal
>
> I sent only 1 patch in v1 that's why sending v2 with all 3.
>
>
> Changes in v2:
> - missing patch in v1
> - missing patch in v1
>
> Michal Simek (3):
>   microblaze: Use simple memset implementation from lib/string.c
>   microblaze: Do loop unrolling for optimized memset implementation
>   microblaze: Use simple memmove/memcpy implementation from lib/string.c
>
>  arch/microblaze/include/asm/string.h |  2 ++
>  arch/microblaze/lib/memcpy.c         | 18 ++-------------
>  arch/microblaze/lib/memmove.c        | 29 ++----------------------
>  arch/microblaze/lib/memset.c         | 33 ++++++++++++----------------
>  4 files changed, 20 insertions(+), 62 deletions(-)
>
> --
> 2.35.1
>

Applied.
M

-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Xilinx Microblaze
Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-21  8:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25 13:55 [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek
2022-02-25 13:55 ` [PATCH v2 1/3] microblaze: Use simple memset implementation from lib/string.c Michal Simek
2022-02-25 13:55 ` [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation Michal Simek
2022-02-25 21:50   ` David Laight
2022-02-28  6:38     ` Michal Simek
2022-02-25 13:55 ` [PATCH v2 3/3] microblaze: Use simple memmove/memcpy implementation from lib/string.c Michal Simek
2022-04-21  8:56 ` [PATCH v2 0/3] microblaze: Fix issues with freestanding Michal Simek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).