* [PATCH] ARM: make memzero optimization smarter
@ 2018-01-16 16:28 Arnd Bergmann
2018-01-16 17:10 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Arnd Bergmann @ 2018-01-16 16:28 UTC (permalink / raw)
To: Russell King
Cc: Ard Biesheuvel, Zhichang Yuan, Nicolas Pitre, Arnd Bergmann,
linux-arm-kernel, linux-kernel
While testing with a gcc-8.0.0 snapshot, I ran into a harmless
build warning:
In file included from include/linux/string.h:18:0,
...
from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':
arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]
memset((__p),(v),(__n)); \
^~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'
memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);
^~~~~~
I think the warning is unintentional here, and gcc should not actually
warn, so I reported this in the gcc bugzilla as pr82103. From the
discussion there, it seems unlikely that this gets addressed in
the compiler.
The problem here is that testing the 'frame_size' variable for non-zero
in the first memset() macro invocation leads to a code path in which
gcc thinks it may be zero, and that code path would lead to an overly
large length for the following memset that is now "(u32)-1". We know
this won't happen as the skb len is already guaranteed to be nonzero
when we get here (it has just been allocated with a nonzero size).
However, we can avoid this class of bogus warnings for the memset() macro
by only doing the micro-optimization for zero-length arguments when the
length is a compile-time constant. This should also reduce code size by
a few bytes, and avoid an extra branch for the cases that a variable-length
argument is always nonzero, which is probably the common case anyway.
I have made sure that the __memzero implementation can safely handle
a zero length argument.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
Originally sent in September 2017, but then forgot about it
as nobody replied.
I slightly updated the change text now to reflect that the gcc
developers treat it as an invalid bug.
---
arch/arm/include/asm/string.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
index f54a3136aac6..a8b90e33cc87 100644
--- a/arch/arm/include/asm/string.h
+++ b/arch/arm/include/asm/string.h
@@ -44,7 +44,7 @@ extern void __memzero(void *ptr, __kernel_size_t n);
#define memset(p,v,n) \
({ \
void *__p = (p); size_t __n = n; \
- if ((__n) != 0) { \
+ if (!__builtin_constant_p(__n) || (__n) != 0) { \
if (__builtin_constant_p((v)) && (v) == 0) \
__memzero((__p),(__n)); \
else \
--
2.9.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] ARM: make memzero optimization smarter
2018-01-16 16:28 [PATCH] ARM: make memzero optimization smarter Arnd Bergmann
@ 2018-01-16 17:10 ` Nicolas Pitre
2018-01-16 22:30 ` Arnd Bergmann
0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2018-01-16 17:10 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Russell King, Ard Biesheuvel, Zhichang Yuan, linux-arm-kernel,
linux-kernel
On Tue, 16 Jan 2018, Arnd Bergmann wrote:
> While testing with a gcc-8.0.0 snapshot, I ran into a harmless
> build warning:
>
> In file included from include/linux/string.h:18:0,
> ...
> from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:
> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':
> arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]
> memset((__p),(v),(__n)); \
> ^~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'
> memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);
> ^~~~~~
>
> I think the warning is unintentional here, and gcc should not actually
> warn, so I reported this in the gcc bugzilla as pr82103. From the
> discussion there, it seems unlikely that this gets addressed in
> the compiler.
>
> The problem here is that testing the 'frame_size' variable for non-zero
> in the first memset() macro invocation leads to a code path in which
> gcc thinks it may be zero, and that code path would lead to an overly
> large length for the following memset that is now "(u32)-1". We know
> this won't happen as the skb len is already guaranteed to be nonzero
> when we get here (it has just been allocated with a nonzero size).
>
> However, we can avoid this class of bogus warnings for the memset() macro
> by only doing the micro-optimization for zero-length arguments when the
> length is a compile-time constant. This should also reduce code size by
> a few bytes, and avoid an extra branch for the cases that a variable-length
> argument is always nonzero, which is probably the common case anyway.
>
> I have made sure that the __memzero implementation can safely handle
> a zero length argument.
Why not simply drop the test on (__n) != 0 then? I fail to see what the
advantage is in that case.
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> Originally sent in September 2017, but then forgot about it
> as nobody replied.
>
> I slightly updated the change text now to reflect that the gcc
> developers treat it as an invalid bug.
> ---
> arch/arm/include/asm/string.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
> index f54a3136aac6..a8b90e33cc87 100644
> --- a/arch/arm/include/asm/string.h
> +++ b/arch/arm/include/asm/string.h
> @@ -44,7 +44,7 @@ extern void __memzero(void *ptr, __kernel_size_t n);
> #define memset(p,v,n) \
> ({ \
> void *__p = (p); size_t __n = n; \
> - if ((__n) != 0) { \
> + if (!__builtin_constant_p(__n) || (__n) != 0) { \
> if (__builtin_constant_p((v)) && (v) == 0) \
> __memzero((__p),(__n)); \
> else \
> --
> 2.9.0
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ARM: make memzero optimization smarter
2018-01-16 17:10 ` Nicolas Pitre
@ 2018-01-16 22:30 ` Arnd Bergmann
2018-01-17 4:07 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Arnd Bergmann @ 2018-01-16 22:30 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Russell King, Ard Biesheuvel, Zhichang Yuan, Linux ARM,
Linux Kernel Mailing List
On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 16 Jan 2018, Arnd Bergmann wrote:
>
>> While testing with a gcc-8.0.0 snapshot, I ran into a harmless
>> build warning:
>>
>> In file included from include/linux/string.h:18:0,
>> ...
>> from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:
>> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':
>> arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]
>> memset((__p),(v),(__n)); \
>> ^~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'
>> memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);
>> ^~~~~~
>>
>> I think the warning is unintentional here, and gcc should not actually
>> warn, so I reported this in the gcc bugzilla as pr82103. From the
>> discussion there, it seems unlikely that this gets addressed in
>> the compiler.
>>
>> The problem here is that testing the 'frame_size' variable for non-zero
>> in the first memset() macro invocation leads to a code path in which
>> gcc thinks it may be zero, and that code path would lead to an overly
>> large length for the following memset that is now "(u32)-1". We know
>> this won't happen as the skb len is already guaranteed to be nonzero
>> when we get here (it has just been allocated with a nonzero size).
>>
>> However, we can avoid this class of bogus warnings for the memset() macro
>> by only doing the micro-optimization for zero-length arguments when the
>> length is a compile-time constant. This should also reduce code size by
>> a few bytes, and avoid an extra branch for the cases that a variable-length
>> argument is always nonzero, which is probably the common case anyway.
>>
>> I have made sure that the __memzero implementation can safely handle
>> a zero length argument.
>
> Why not simply drop the test on (__n) != 0 then? I fail to see what the
> advantage is in that case.
Good point. We might actually get even better results by dropping the
__memzero path entirely, since gcc has can optimize trivial memset()
operations and inline them.
If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'
instructions compared to the memset.S implementation, but calling
memset() rather than __memzero() from C code ends up saving a
function call at least some of the time.
Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()
and 636 calls to memset() in vmlinux. If I remove the macro entirely,
I get 1775 calls to memset() instead, so 780 memzero instances got
inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the
__memzero implementation that we could possibly also drop.
FWIW, the zero-length check saves five references to __memzero()
and one reference to memset(), or 16 bytes in kernel size, I have not
checked what those are.
Arnd
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ARM: make memzero optimization smarter
2018-01-16 22:30 ` Arnd Bergmann
@ 2018-01-17 4:07 ` Nicolas Pitre
2018-01-17 10:58 ` Russell King - ARM Linux
0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2018-01-17 4:07 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Russell King, Ard Biesheuvel, Zhichang Yuan, Linux ARM,
Linux Kernel Mailing List
On Tue, 16 Jan 2018, Arnd Bergmann wrote:
> On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Tue, 16 Jan 2018, Arnd Bergmann wrote:
> >
> >> However, we can avoid this class of bogus warnings for the memset() macro
> >> by only doing the micro-optimization for zero-length arguments when the
> >> length is a compile-time constant. This should also reduce code size by
> >> a few bytes, and avoid an extra branch for the cases that a variable-length
> >> argument is always nonzero, which is probably the common case anyway.
> >>
> >> I have made sure that the __memzero implementation can safely handle
> >> a zero length argument.
> >
> > Why not simply drop the test on (__n) != 0 then? I fail to see what the
> > advantage is in that case.
>
> Good point. We might actually get even better results by dropping the
> __memzero path entirely, since gcc has can optimize trivial memset()
> operations and inline them.
>
> If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'
> instructions compared to the memset.S implementation, but calling
> memset() rather than __memzero() from C code ends up saving a
> function call at least some of the time.
>
> Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()
> and 636 calls to memset() in vmlinux. If I remove the macro entirely,
> I get 1775 calls to memset() instead, so 780 memzero instances got
> inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the
> __memzero implementation that we could possibly also drop.
I get 3668 fewer bytes just by removing the test against 0 in the macro.
And an additional 5092 fewer bytes by removing the call-to-__memzero
optimization.
That's using gcc v6.3.1.
> FWIW, the zero-length check saves five references to __memzero()
> and one reference to memset(), or 16 bytes in kernel size, I have not
> checked what those are.
They apparently are:
security/keys/key.c:1117:2:
memset(&ktype->lock_class, 0, sizeof(ktype->lock_class));
crypto/drbg.c:615:3:
memset(drbg->V, 1, drbg_statelen(drbg));
crypto/drbg.c:1120:3:
memset(drbg->V, 0, drbg_statelen(drbg));
crypto/drbg.c:1121:3:
memset(drbg->C, 0, drbg_statelen(drbg));
drivers/crypto/bcm/cipher.c:1963:2:
memset(ctx->bcm_spu_req_hdr, 0, alloc_len);
drivers/media/platform/vivid/vivid-vbi-cap.c:106:2:
memset(vbuf, 0x10, vb2_plane_size(&buf->vb.vb2_buf, 0));
drivers/media/platform/vivid/vivid-vbi-cap.c:127:2:
memset(vbuf, 0, vb2_plane_size(&buf->vb.vb2_buf, 0));
drivers/gpu/drm/nouveau/nvkm/subdev/bios/conn.c:50:2:
memset(info, 0x00, sizeof(*info));
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ARM: make memzero optimization smarter
2018-01-17 4:07 ` Nicolas Pitre
@ 2018-01-17 10:58 ` Russell King - ARM Linux
2018-01-17 14:03 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2018-01-17 10:58 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Arnd Bergmann, Ard Biesheuvel, Zhichang Yuan, Linux ARM,
Linux Kernel Mailing List
On Tue, Jan 16, 2018 at 11:07:34PM -0500, Nicolas Pitre wrote:
> On Tue, 16 Jan 2018, Arnd Bergmann wrote:
>
> > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > On Tue, 16 Jan 2018, Arnd Bergmann wrote:
> > >
> > >> However, we can avoid this class of bogus warnings for the memset() macro
> > >> by only doing the micro-optimization for zero-length arguments when the
> > >> length is a compile-time constant. This should also reduce code size by
> > >> a few bytes, and avoid an extra branch for the cases that a variable-length
> > >> argument is always nonzero, which is probably the common case anyway.
> > >>
> > >> I have made sure that the __memzero implementation can safely handle
> > >> a zero length argument.
> > >
> > > Why not simply drop the test on (__n) != 0 then? I fail to see what the
> > > advantage is in that case.
> >
> > Good point. We might actually get even better results by dropping the
> > __memzero path entirely, since gcc has can optimize trivial memset()
> > operations and inline them.
> >
> > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'
> > instructions compared to the memset.S implementation, but calling
> > memset() rather than __memzero() from C code ends up saving a
> > function call at least some of the time.
> >
> > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()
> > and 636 calls to memset() in vmlinux. If I remove the macro entirely,
> > I get 1775 calls to memset() instead, so 780 memzero instances got
> > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the
> > __memzero implementation that we could possibly also drop.
>
> I get 3668 fewer bytes just by removing the test against 0 in the macro.
>
> And an additional 5092 fewer bytes by removing the call-to-__memzero
> optimization.
However, __memzero is not safe against being called with a zero length
so it's not something we can simply remove.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ARM: make memzero optimization smarter
2018-01-17 10:58 ` Russell King - ARM Linux
@ 2018-01-17 14:03 ` Nicolas Pitre
2018-01-17 21:14 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2018-01-17 14:03 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Arnd Bergmann, Ard Biesheuvel, Zhichang Yuan, Linux ARM,
Linux Kernel Mailing List
On Wed, 17 Jan 2018, Russell King - ARM Linux wrote:
> On Tue, Jan 16, 2018 at 11:07:34PM -0500, Nicolas Pitre wrote:
> > On Tue, 16 Jan 2018, Arnd Bergmann wrote:
> >
> > > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > On Tue, 16 Jan 2018, Arnd Bergmann wrote:
> > > >
> > > >> However, we can avoid this class of bogus warnings for the memset() macro
> > > >> by only doing the micro-optimization for zero-length arguments when the
> > > >> length is a compile-time constant. This should also reduce code size by
> > > >> a few bytes, and avoid an extra branch for the cases that a variable-length
> > > >> argument is always nonzero, which is probably the common case anyway.
> > > >>
> > > >> I have made sure that the __memzero implementation can safely handle
> > > >> a zero length argument.
> > > >
> > > > Why not simply drop the test on (__n) != 0 then? I fail to see what the
> > > > advantage is in that case.
> > >
> > > Good point. We might actually get even better results by dropping the
> > > __memzero path entirely, since gcc has can optimize trivial memset()
> > > operations and inline them.
> > >
> > > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'
> > > instructions compared to the memset.S implementation, but calling
> > > memset() rather than __memzero() from C code ends up saving a
> > > function call at least some of the time.
> > >
> > > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()
> > > and 636 calls to memset() in vmlinux. If I remove the macro entirely,
> > > I get 1775 calls to memset() instead, so 780 memzero instances got
> > > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the
> > > __memzero implementation that we could possibly also drop.
> >
> > I get 3668 fewer bytes just by removing the test against 0 in the macro.
> >
> > And an additional 5092 fewer bytes by removing the call-to-__memzero
> > optimization.
>
> However, __memzero is not safe against being called with a zero length
> so it's not something we can simply remove.
The idea is about the possibility of removing __memzero altogether.
It is not clear that the tiny performance gain from a dedicated memzero
implementation is worth the current overhead around it.
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ARM: make memzero optimization smarter
2018-01-17 14:03 ` Nicolas Pitre
@ 2018-01-17 21:14 ` Nicolas Pitre
0 siblings, 0 replies; 8+ messages in thread
From: Nicolas Pitre @ 2018-01-17 21:14 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Arnd Bergmann, Ard Biesheuvel, Zhichang Yuan, Linux ARM,
Linux Kernel Mailing List
On Wed, 17 Jan 2018, Nicolas Pitre wrote:
> On Wed, 17 Jan 2018, Russell King - ARM Linux wrote:
>
> > However, __memzero is not safe against being called with a zero length
> > so it's not something we can simply remove.
>
> The idea is about the possibility of removing __memzero altogether.
> It is not clear that the tiny performance gain from a dedicated memzero
> implementation is worth the current overhead around it.
This being said, I fail to see how __memzero is not safe against a zero
length. Are you sure it isn't?
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: make memzero optimization smarter
@ 2017-09-05 15:05 Arnd Bergmann
0 siblings, 0 replies; 8+ messages in thread
From: Arnd Bergmann @ 2017-09-05 15:05 UTC (permalink / raw)
To: Russell King; +Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel
While testing with a gcc-8.0.0 snapshot, I ran into a harmless
build warning:
In file included from include/linux/string.h:18:0,
...
from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':
arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]
memset((__p),(v),(__n)); \
^~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'
memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);
^~~~~~
I think the warning is unintentional here, and gcc should not actually
warn, so I reported this in the gcc bugzilla as pr82103.
The problem here is that testing the 'frame_size' variable for non-zero
in the first memset() macro invocation leads to a code path in which
gcc thinks it may be zero, and that code path would lead to an overly
large length for the following memset that is now "(u32)-1". We know
this won't happen as the skb len is already guaranteed to be nonzero
when we get here (it has just been allocated with a nonzero size).
However, we can avoid this class of bogus warnings for the memset() macro
by only doing the micro-optimization for zero-length arguments when the
length is a compile-time constant. This should also reduce code size by
a few bytes, and avoid an extra branch for the cases that a variable-length
argument is always nonzero, which is probably the common case anyway.
I have made sure that the __memzero implementation can safely handle
a zero length argument.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
arch/arm/include/asm/string.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
index fe1c6af3a1b1..d4f464b46eae 100644
--- a/arch/arm/include/asm/string.h
+++ b/arch/arm/include/asm/string.h
@@ -43,7 +43,7 @@ extern void __memzero(void *ptr, __kernel_size_t n);
#define memset(p,v,n) \
({ \
void *__p = (p); size_t __n = n; \
- if ((__n) != 0) { \
+ if (!__builtin_constant_p(__n) || (__n) != 0) { \
if (__builtin_constant_p((v)) && (v) == 0) \
__memzero((__p),(__n)); \
else \
--
2.9.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-01-17 21:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-16 16:28 [PATCH] ARM: make memzero optimization smarter Arnd Bergmann
2018-01-16 17:10 ` Nicolas Pitre
2018-01-16 22:30 ` Arnd Bergmann
2018-01-17 4:07 ` Nicolas Pitre
2018-01-17 10:58 ` Russell King - ARM Linux
2018-01-17 14:03 ` Nicolas Pitre
2018-01-17 21:14 ` Nicolas Pitre
-- strict thread matches above, loose matches on Subject: below --
2017-09-05 15:05 Arnd Bergmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).