From mboxrd@z Thu Jan 1 00:00:00 1970 From: Heiko Stuebner Date: Mon, 27 Mar 2017 17:17:08 +0200 Subject: [U-Boot] [PATCH 2/3] string: Provide a slimmed-down memset() In-Reply-To: <2c3809e6-20f1-bbd3-9775-7ef5015d6193@suse.de> References: <20170326233817.8834-1-sjg@chromium.org> <20170326233817.8834-3-sjg@chromium.org> <2c3809e6-20f1-bbd3-9775-7ef5015d6193@suse.de> Message-ID: <2820179.4CMHj66v3A@phil> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: u-boot@lists.denx.de Am Montag, 27. M=C3=A4rz 2017, 09:14:47 CEST schrieb Alexander Graf: >=20 > On 27/03/2017 01:38, Simon Glass wrote: > > Most of the time the optimised memset() is what we want. For extreme > > situations such as TPL it may be too large. For example on the 'rock' > > board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and > > the rodata bug, this patch is enough to reduce the TPL image below the > > limit. > > > > Signed-off-by: Simon Glass > > --- > > > > lib/Kconfig | 9 +++++++++ > > lib/string.c | 6 ++++-- > > 2 files changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/lib/Kconfig b/lib/Kconfig > > index 65c01573e1..5bf512d8c0 100644 > > --- a/lib/Kconfig > > +++ b/lib/Kconfig > > @@ -52,6 +52,15 @@ config LIB_RAND > > help > > This library provides pseudo-random number generator functions. > > > > +config FAST_MEMSET > > + bool "Use an optimised memset()" > > + default y > > + help > > + The faster memset() is the arch-specific one (if available) enabled > > + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get > > + better performance by write a word at a time. Disable this option > > + to reduce code size slightly at the cost of some speed. >=20 > The comment sounds slightly confused - it took me a few times of reading = > it until I grasped what it was trying to tell me :). >=20 > > + > > source lib/dhry/Kconfig > > > > source lib/rsa/Kconfig > > diff --git a/lib/string.c b/lib/string.c > > index 67d5f6a421..159493ed17 100644 > > --- a/lib/string.c > > +++ b/lib/string.c > > @@ -437,8 +437,10 @@ char *strswab(const char *s) > > void * memset(void * s,int c,size_t count) > > { > > unsigned long *sl =3D (unsigned long *) s; > > - unsigned long cl =3D 0; > > char *s8; > > + > > +#ifdef CONFIG_FAST_MEMSET > > + unsigned long cl =3D 0; > > int i; > > > > /* do it one word at a time (32 bits or 64 bits) while possible */ > > @@ -452,7 +454,7 @@ void * memset(void * s,int c,size_t count) > > count -=3D sizeof(*sl); > > } > > } > > - /* fill 8 bits at a time */ > > +#endif /* fill 8 bits at a time */ >=20 > So while this is all neat, a few ideas: >=20 > 1) Would having memset in a header improve things even more? After all,=20 > each external function call clobbers registers that you need to=20 > save/restore... I'd guess it really depends on the size constraints. The regular libgeneric memset compiles on my rk3188 tpl to a total of 64bytes on both gcc-4.9 and gcc-6.3 while Simon's fast-memset comes down to 14bytes on my rk3188. On the rk3188 the only memset user is board_init_f, so here memset is called only once without needing to save registers and I'd guess if an implementation really is that size-constrained to worry about 50bytes this one caller will probably always be the only one? > 2) How much would GOLD save you? Have you tried? U-Boot is small enough=20 > of a code base that global optimizations should be able to give=20 > significant size savings. I think the issue that this is trying to solve is to allow more toolchains to be used and thus make rebuilds on changes work on a lot of boards at the same time with random toolchains. gcc-6.3 already produces way smaller results (well within the size constraints the rk3188 has) than for example the gcc-4.9 used by buildman as baseline toolchain.