* [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
@ 2013-11-18 7:04 Christophe Leroy
2013-11-19 14:11 ` Joakim Tjernlund
0 siblings, 1 reply; 5+ messages in thread
From: Christophe Leroy @ 2013-11-18 7:04 UTC (permalink / raw)
To: Vitaly Bordug, Marcelo Tosatti, Joakim Tjernlund, Bob Pearson
Cc: linuxppc-dev, linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1097 bytes --]
On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than CRC32_SLICEBY8,
as shown below:
With CRC32_SLICEBY8:
[ 1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
[ 1.114401] crc32: self tests passed, processed 225944 bytes in 15118910 nsec
[ 1.130655] crc32c: CRC_LE_BITS = 64
[ 1.134235] crc32c: self tests passed, processed 225944 bytes in 4479879 nsec
With CRC32_SLICEBY4:
[ 1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
[ 1.101878] crc32: self tests passed, processed 225944 bytes in 8616242 nsec
[ 1.116298] crc32c: CRC_LE_BITS = 32
[ 1.119607] crc32c: self tests passed, processed 225944 bytes in 3289576 nsec
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Index: a/lib/Kconfig
===================================================================
--- a/lib/Kconfig (révision 5325)
+++ b/lib/Kconfig (copie de travail)
@@ -102,6 +102,7 @@
choice
prompt "CRC32 implementation"
depends on CRC32
+ default CRC32_SLICEBY4 if PPC_8xx
default CRC32_SLICEBY8
help
This option allows a kernel builder to override the default choice
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-18 7:04 [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx Christophe Leroy
@ 2013-11-19 14:11 ` Joakim Tjernlund
2013-11-19 18:29 ` Scott Wood
0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2013-11-19 14:11 UTC (permalink / raw)
To: Christophe Leroy; +Cc: Marcelo Tosatti, Bob Pearson, linuxppc-dev, linux-kernel
I found the same on MPC8321 long time ago(when 64 bits change went in),=20
the 32 bits were much faster. I guess the "smaller"
CPUs cannot handle the cache trashing these big tables impose, I didn't=20
look into the details though.
So I think this is a good change for 8xx.
Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18 08:04:23:
> From: Christophe Leroy <christophe.leroy@c-s.fr>
> To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti=20
<marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, Bob =
Pearson <rpearson@systemfabricworks.com>,=20
> Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> Date: 2013/11/19 13:05
> Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the=20
default slice by 8 on Powerpc 8xx.
>=20
> On PPC=5F8xx, CRC32=5FSLICEBY4 is more efficient (almost twice) than=20
CRC32=5FSLICEBY8,
> as shown below:
>=20
> With CRC32=5FSLICEBY8:
> [ 1.109204] crc32: CRC=5FLE=5FBITS =3D 64, CRC=5FBE BITS =3D 64
> [ 1.114401] crc32: self tests passed, processed 225944 bytes in=20
15118910 nsec
> [ 1.130655] crc32c: CRC=5FLE=5FBITS =3D 64
> [ 1.134235] crc32c: self tests passed, processed 225944 bytes in=20
4479879 nsec
>=20
> With CRC32=5FSLICEBY4:
> [ 1.097129] crc32: CRC=5FLE=5FBITS =3D 32, CRC=5FBE BITS =3D 32
> [ 1.101878] crc32: self tests passed, processed 225944 bytes in=20
8616242 nsec
> [ 1.116298] crc32c: CRC=5FLE=5FBITS =3D 32
> [ 1.119607] crc32c: self tests passed, processed 225944 bytes in=20
3289576 nsec
>=20
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>=20
> Index: a/lib/Kconfig
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- a/lib/Kconfig (r=C3=A9vision 5325)
> +++ b/lib/Kconfig (copie de travail)
> @@ -102,6 +102,7 @@
> choice
> prompt "CRC32 implementation"
> depends on CRC32
> + default CRC32=5FSLICEBY4 if PPC=5F8xx
> default CRC32=5FSLICEBY8
> help
> This option allows a kernel builder to override the default choice
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-19 14:11 ` Joakim Tjernlund
@ 2013-11-19 18:29 ` Scott Wood
2013-11-19 23:39 ` Joakim Tjernlund
0 siblings, 1 reply; 5+ messages in thread
From: Scott Wood @ 2013-11-19 18:29 UTC (permalink / raw)
To: Joakim Tjernlund
Cc: Christophe Leroy, Marcelo Tosatti, Bob Pearson, linuxppc-dev,
linux-kernel
I don't think we should go littering the Kconfig with defaults for
various bits of hardware -- especially since you've already pointed out
non-8xx hardware that would also want this. Put it in defconfig
instead, unless you can identify very broad classes of machines for
which SLICEBY4 is faster.
-Scott
On Tue, 2013-11-19 at 15:11 +0100, Joakim Tjernlund wrote:
> I found the same on MPC8321 long time ago(when 64 bits change went in),=
=20
> the 32 bits were much faster. I guess the "smaller"
> CPUs cannot handle the cache trashing these big tables impose, I didn't=
=20
> look into the details though.
> So I think this is a good change for 8xx.
>=20
> Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
>=20
> Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18 08:04:23=
:
>=20
> > From: Christophe Leroy <christophe.leroy@c-s.fr>
> > To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti=20
> <marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, =
Bob=20
> Pearson <rpearson@systemfabricworks.com>,=20
> > Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> > Date: 2013/11/19 13:05
> > Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the=20
> default slice by 8 on Powerpc 8xx.
> >=20
> > On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than=20
> CRC32_SLICEBY8,
> > as shown below:
> >=20
> > With CRC32_SLICEBY8:
> > [ 1.109204] crc32: CRC_LE_BITS =3D 64, CRC_BE BITS =3D 64
> > [ 1.114401] crc32: self tests passed, processed 225944 bytes in=20
> 15118910 nsec
> > [ 1.130655] crc32c: CRC_LE_BITS =3D 64
> > [ 1.134235] crc32c: self tests passed, processed 225944 bytes in=20
> 4479879 nsec
> >=20
> > With CRC32_SLICEBY4:
> > [ 1.097129] crc32: CRC_LE_BITS =3D 32, CRC_BE BITS =3D 32
> > [ 1.101878] crc32: self tests passed, processed 225944 bytes in=20
> 8616242 nsec
> > [ 1.116298] crc32c: CRC_LE_BITS =3D 32
> > [ 1.119607] crc32c: self tests passed, processed 225944 bytes in=20
> 3289576 nsec
> >=20
> > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >=20
> > Index: a/lib/Kconfig
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- a/lib/Kconfig (r=C3=83=C2=A9vision 5325)
> > +++ b/lib/Kconfig (copie de travail)
> > @@ -102,6 +102,7 @@
> > choice
> > prompt "CRC32 implementation"
> > depends on CRC32
> > + default CRC32_SLICEBY4 if PPC_8xx
> > default CRC32_SLICEBY8
> > help
> > This option allows a kernel builder to override the default cho=
ice
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>=20
>=20
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-19 18:29 ` Scott Wood
@ 2013-11-19 23:39 ` Joakim Tjernlund
2013-11-19 23:43 ` Scott Wood
0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2013-11-19 23:39 UTC (permalink / raw)
To: Scott Wood
Cc: Christophe Leroy, Marcelo Tosatti, Bob Pearson, linuxppc-dev,
linux-kernel
Scott Wood <scottwood@freescale.com> wrote on 2013/11/19 19:29:26:
>=20
> I don't think we should go littering the Kconfig with defaults for
> various bits of hardware -- especially since you've already pointed out
> non-8xx hardware that would also want this. Put it in defconfig
> instead, unless you can identify very broad classes of machines for
> which SLICEBY4 is faster.
hmm, when 64bits went in there was not much proof that it was faster for
a wide range of CPU, just 2 or 3 if I recall correctly. I suspect there
are quite a few CPUs where 32 bits a equal or faster.
Jocke
>=20
> -Scott
>=20
> On Tue, 2013-11-19 at 15:11 +0100, Joakim Tjernlund wrote:
> > I found the same on MPC8321 long time ago(when 64 bits change went=20
in),=20
> > the 32 bits were much faster. I guess the "smaller"
> > CPUs cannot handle the cache trashing these big tables impose, I=20
didn't=20
> > look into the details though.
> > So I think this is a good change for 8xx.
> >=20
> > Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> >=20
> > Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18=20
08:04:23:
> >=20
> > > From: Christophe Leroy <christophe.leroy@c-s.fr>
> > > To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti=20
> > <marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, =
Bob=20
> > Pearson <rpearson@systemfabricworks.com>,=20
> > > Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> > > Date: 2013/11/19 13:05
> > > Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the=20
> > default slice by 8 on Powerpc 8xx.
> > >=20
> > > On PPC=5F8xx, CRC32=5FSLICEBY4 is more efficient (almost twice) than =
> > CRC32=5FSLICEBY8,
> > > as shown below:
> > >=20
> > > With CRC32=5FSLICEBY8:
> > > [ 1.109204] crc32: CRC=5FLE=5FBITS =3D 64, CRC=5FBE BITS =3D 64
> > > [ 1.114401] crc32: self tests passed, processed 225944 bytes in=20
> > 15118910 nsec
> > > [ 1.130655] crc32c: CRC=5FLE=5FBITS =3D 64
> > > [ 1.134235] crc32c: self tests passed, processed 225944 bytes in=20
> > 4479879 nsec
> > >=20
> > > With CRC32=5FSLICEBY4:
> > > [ 1.097129] crc32: CRC=5FLE=5FBITS =3D 32, CRC=5FBE BITS =3D 32
> > > [ 1.101878] crc32: self tests passed, processed 225944 bytes in=20
> > 8616242 nsec
> > > [ 1.116298] crc32c: CRC=5FLE=5FBITS =3D 32
> > > [ 1.119607] crc32c: self tests passed, processed 225944 bytes in=20
> > 3289576 nsec
> > >=20
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > >=20
> > > Index: a/lib/Kconfig
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > --- a/lib/Kconfig (r=C3=A9vision 5325)
> > > +++ b/lib/Kconfig (copie de travail)
> > > @@ -102,6 +102,7 @@
> > > choice
> > > prompt "CRC32 implementation"
> > > depends on CRC32
> > > + default CRC32=5FSLICEBY4 if PPC=5F8xx
> > > default CRC32=5FSLICEBY8
> > > help
> > > This option allows a kernel builder to override the default=20
choice
> >=20
> > =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> >=20
> >=20
>=20
>=20
>=20
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-19 23:39 ` Joakim Tjernlund
@ 2013-11-19 23:43 ` Scott Wood
0 siblings, 0 replies; 5+ messages in thread
From: Scott Wood @ 2013-11-19 23:43 UTC (permalink / raw)
To: Joakim Tjernlund
Cc: Christophe Leroy, Marcelo Tosatti, Bob Pearson, linuxppc-dev,
linux-kernel
On Wed, 2013-11-20 at 00:39 +0100, Joakim Tjernlund wrote:
> Scott Wood <scottwood@freescale.com> wrote on 2013/11/19 19:29:26:
> >
> > I don't think we should go littering the Kconfig with defaults for
> > various bits of hardware -- especially since you've already pointed out
> > non-8xx hardware that would also want this. Put it in defconfig
> > instead, unless you can identify very broad classes of machines for
> > which SLICEBY4 is faster.
>
> hmm, when 64bits went in there was not much proof that it was faster for
> a wide range of CPU, just 2 or 3 if I recall correctly. I suspect there
> are quite a few CPUs where 32 bits a equal or faster.
That may be the case, but I don't think we want a big list of them in
lib/Kconfig. Whether the default should change (for all targets that
don't override it in defconfig, or at least for some broader category
such as "all 32-bit chips") is a different discussion.
-Scott
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-11-19 23:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-18 7:04 [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx Christophe Leroy
2013-11-19 14:11 ` Joakim Tjernlund
2013-11-19 18:29 ` Scott Wood
2013-11-19 23:39 ` Joakim Tjernlund
2013-11-19 23:43 ` Scott Wood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).