From: Nicolas Pitre <nicolas.pitre@linaro.org> To: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk>, Ard Biesheuvel <ard.biesheuvel@linaro.org>, Zhichang Yuan <yuanzhichang@hisilicon.com>, Linux ARM <linux-arm-kernel@lists.infradead.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: Re: [PATCH] ARM: make memzero optimization smarter Date: Tue, 16 Jan 2018 23:07:34 -0500 (EST) [thread overview] Message-ID: <nycvar.YSQ.7.76.1801162150190.13881@knanqh.ubzr> (raw) In-Reply-To: <CAK8P3a2bfErHxu2i-3JJpUdG64RSoYgmpip2xLAutiSSM9UEkw@mail.gmail.com> On Tue, 16 Jan 2018, Arnd Bergmann wrote: > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > > > >> However, we can avoid this class of bogus warnings for the memset() macro > >> by only doing the micro-optimization for zero-length arguments when the > >> length is a compile-time constant. This should also reduce code size by > >> a few bytes, and avoid an extra branch for the cases that a variable-length > >> argument is always nonzero, which is probably the common case anyway. > >> > >> I have made sure that the __memzero implementation can safely handle > >> a zero length argument. > > > > Why not simply drop the test on (__n) != 0 then? I fail to see what the > > advantage is in that case. > > Good point. We might actually get even better results by dropping the > __memzero path entirely, since gcc has can optimize trivial memset() > operations and inline them. > > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr' > instructions compared to the memset.S implementation, but calling > memset() rather than __memzero() from C code ends up saving a > function call at least some of the time. > > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero() > and 636 calls to memset() in vmlinux. If I remove the macro entirely, > I get 1775 calls to memset() instead, so 780 memzero instances got > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the > __memzero implementation that we could possibly also drop. I get 3668 fewer bytes just by removing the test against 0 in the macro. And an additional 5092 fewer bytes by removing the call-to-__memzero optimization. That's using gcc v6.3.1. > FWIW, the zero-length check saves five references to __memzero() > and one reference to memset(), or 16 bytes in kernel size, I have not > checked what those are. They apparently are: security/keys/key.c:1117:2: memset(&ktype->lock_class, 0, sizeof(ktype->lock_class)); crypto/drbg.c:615:3: memset(drbg->V, 1, drbg_statelen(drbg)); crypto/drbg.c:1120:3: memset(drbg->V, 0, drbg_statelen(drbg)); crypto/drbg.c:1121:3: memset(drbg->C, 0, drbg_statelen(drbg)); drivers/crypto/bcm/cipher.c:1963:2: memset(ctx->bcm_spu_req_hdr, 0, alloc_len); drivers/media/platform/vivid/vivid-vbi-cap.c:106:2: memset(vbuf, 0x10, vb2_plane_size(&buf->vb.vb2_buf, 0)); drivers/media/platform/vivid/vivid-vbi-cap.c:127:2: memset(vbuf, 0, vb2_plane_size(&buf->vb.vb2_buf, 0)); drivers/gpu/drm/nouveau/nvkm/subdev/bios/conn.c:50:2: memset(info, 0x00, sizeof(*info)); Nicolas
WARNING: multiple messages have this Message-ID (diff)
From: nicolas.pitre@linaro.org (Nicolas Pitre) To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] ARM: make memzero optimization smarter Date: Tue, 16 Jan 2018 23:07:34 -0500 (EST) [thread overview] Message-ID: <nycvar.YSQ.7.76.1801162150190.13881@knanqh.ubzr> (raw) In-Reply-To: <CAK8P3a2bfErHxu2i-3JJpUdG64RSoYgmpip2xLAutiSSM9UEkw@mail.gmail.com> On Tue, 16 Jan 2018, Arnd Bergmann wrote: > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > > > >> However, we can avoid this class of bogus warnings for the memset() macro > >> by only doing the micro-optimization for zero-length arguments when the > >> length is a compile-time constant. This should also reduce code size by > >> a few bytes, and avoid an extra branch for the cases that a variable-length > >> argument is always nonzero, which is probably the common case anyway. > >> > >> I have made sure that the __memzero implementation can safely handle > >> a zero length argument. > > > > Why not simply drop the test on (__n) != 0 then? I fail to see what the > > advantage is in that case. > > Good point. We might actually get even better results by dropping the > __memzero path entirely, since gcc has can optimize trivial memset() > operations and inline them. > > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr' > instructions compared to the memset.S implementation, but calling > memset() rather than __memzero() from C code ends up saving a > function call at least some of the time. > > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero() > and 636 calls to memset() in vmlinux. If I remove the macro entirely, > I get 1775 calls to memset() instead, so 780 memzero instances got > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the > __memzero implementation that we could possibly also drop. I get 3668 fewer bytes just by removing the test against 0 in the macro. And an additional 5092 fewer bytes by removing the call-to-__memzero optimization. That's using gcc v6.3.1. > FWIW, the zero-length check saves five references to __memzero() > and one reference to memset(), or 16 bytes in kernel size, I have not > checked what those are. They apparently are: security/keys/key.c:1117:2: memset(&ktype->lock_class, 0, sizeof(ktype->lock_class)); crypto/drbg.c:615:3: memset(drbg->V, 1, drbg_statelen(drbg)); crypto/drbg.c:1120:3: memset(drbg->V, 0, drbg_statelen(drbg)); crypto/drbg.c:1121:3: memset(drbg->C, 0, drbg_statelen(drbg)); drivers/crypto/bcm/cipher.c:1963:2: memset(ctx->bcm_spu_req_hdr, 0, alloc_len); drivers/media/platform/vivid/vivid-vbi-cap.c:106:2: memset(vbuf, 0x10, vb2_plane_size(&buf->vb.vb2_buf, 0)); drivers/media/platform/vivid/vivid-vbi-cap.c:127:2: memset(vbuf, 0, vb2_plane_size(&buf->vb.vb2_buf, 0)); drivers/gpu/drm/nouveau/nvkm/subdev/bios/conn.c:50:2: memset(info, 0x00, sizeof(*info)); Nicolas
next prev parent reply other threads:[~2018-01-17 4:07 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-01-16 16:28 [PATCH] ARM: make memzero optimization smarter Arnd Bergmann 2018-01-16 17:10 ` Nicolas Pitre 2018-01-16 22:30 ` Arnd Bergmann 2018-01-17 4:07 ` Nicolas Pitre [this message] 2018-01-17 4:07 ` Nicolas Pitre 2018-01-17 10:58 ` Russell King - ARM Linux 2018-01-17 10:58 ` Russell King - ARM Linux 2018-01-17 14:03 ` Nicolas Pitre 2018-01-17 14:03 ` Nicolas Pitre 2018-01-17 21:14 ` Nicolas Pitre 2018-01-17 21:14 ` Nicolas Pitre -- strict thread matches above, loose matches on Subject: below -- 2017-09-05 15:05 Arnd Bergmann 2017-09-05 15:05 ` Arnd Bergmann
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=nycvar.YSQ.7.76.1801162150190.13881@knanqh.ubzr \ --to=nicolas.pitre@linaro.org \ --cc=ard.biesheuvel@linaro.org \ --cc=arnd@arndb.de \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=yuanzhichang@hisilicon.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.