linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
@ 2013-11-18  7:04 Christophe Leroy
  2013-11-19 14:11 ` Joakim Tjernlund
  0 siblings, 1 reply; 5+ messages in thread
From: Christophe Leroy @ 2013-11-18  7:04 UTC (permalink / raw)
  To: Vitaly Bordug, Marcelo Tosatti, Joakim Tjernlund, Bob Pearson
  Cc: linuxppc-dev, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1097 bytes --]

On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than CRC32_SLICEBY8,
as shown below:

With CRC32_SLICEBY8:
[    1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
[    1.114401] crc32: self tests passed, processed 225944 bytes in 15118910 nsec
[    1.130655] crc32c: CRC_LE_BITS = 64
[    1.134235] crc32c: self tests passed, processed 225944 bytes in 4479879 nsec

With CRC32_SLICEBY4:
[    1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
[    1.101878] crc32: self tests passed, processed 225944 bytes in 8616242 nsec
[    1.116298] crc32c: CRC_LE_BITS = 32
[    1.119607] crc32c: self tests passed, processed 225944 bytes in 3289576 nsec

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Index: a/lib/Kconfig
===================================================================
--- a/lib/Kconfig	(révision 5325)
+++ b/lib/Kconfig	(copie de travail)
@@ -102,6 +102,7 @@
 choice
 	prompt "CRC32 implementation"
 	depends on CRC32
+	default CRC32_SLICEBY4 if PPC_8xx
 	default CRC32_SLICEBY8
 	help
 	  This option allows a kernel builder to override the default choice

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
  2013-11-18  7:04 [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx Christophe Leroy
@ 2013-11-19 14:11 ` Joakim Tjernlund
  2013-11-19 18:29   ` Scott Wood
  0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2013-11-19 14:11 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Marcelo Tosatti, Bob Pearson, linuxppc-dev, linux-kernel

I found the same on MPC8321 long time ago(when 64 bits change went in),=20
the 32 bits were much faster. I guess the "smaller"
CPUs cannot handle the cache trashing these big tables impose, I didn't=20
look into the details though.
So I think this is a good change for 8xx.

Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>

Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18 08:04:23:

> From: Christophe Leroy <christophe.leroy@c-s.fr>
> To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti=20
<marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, Bob =

Pearson <rpearson@systemfabricworks.com>,=20
> Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> Date: 2013/11/19 13:05
> Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the=20
default slice by 8 on Powerpc 8xx.
>=20
> On PPC=5F8xx, CRC32=5FSLICEBY4 is more efficient (almost twice) than=20
CRC32=5FSLICEBY8,
> as shown below:
>=20
> With CRC32=5FSLICEBY8:
> [    1.109204] crc32: CRC=5FLE=5FBITS =3D 64, CRC=5FBE BITS =3D 64
> [    1.114401] crc32: self tests passed, processed 225944 bytes in=20
15118910 nsec
> [    1.130655] crc32c: CRC=5FLE=5FBITS =3D 64
> [    1.134235] crc32c: self tests passed, processed 225944 bytes in=20
4479879 nsec
>=20
> With CRC32=5FSLICEBY4:
> [    1.097129] crc32: CRC=5FLE=5FBITS =3D 32, CRC=5FBE BITS =3D 32
> [    1.101878] crc32: self tests passed, processed 225944 bytes in=20
8616242 nsec
> [    1.116298] crc32c: CRC=5FLE=5FBITS =3D 32
> [    1.119607] crc32c: self tests passed, processed 225944 bytes in=20
3289576 nsec
>=20
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>=20
> Index: a/lib/Kconfig
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- a/lib/Kconfig   (r=C3=A9vision 5325)
> +++ b/lib/Kconfig   (copie de travail)
> @@ -102,6 +102,7 @@
>  choice
>     prompt "CRC32 implementation"
>     depends on CRC32
> +   default CRC32=5FSLICEBY4 if PPC=5F8xx
>     default CRC32=5FSLICEBY8
>     help
>       This option allows a kernel builder to override the default choice

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
  2013-11-19 14:11 ` Joakim Tjernlund
@ 2013-11-19 18:29   ` Scott Wood
  2013-11-19 23:39     ` Joakim Tjernlund
  0 siblings, 1 reply; 5+ messages in thread
From: Scott Wood @ 2013-11-19 18:29 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: Christophe Leroy, Marcelo Tosatti, Bob Pearson, linuxppc-dev,
	linux-kernel

I don't think we should go littering the Kconfig with defaults for
various bits of hardware -- especially since you've already pointed out
non-8xx hardware that would also want this.  Put it in defconfig
instead, unless you can identify very broad classes of machines for
which SLICEBY4 is faster.

-Scott

On Tue, 2013-11-19 at 15:11 +0100, Joakim Tjernlund wrote:
> I found the same on MPC8321 long time ago(when 64 bits change went in),=
=20
> the 32 bits were much faster. I guess the "smaller"
> CPUs cannot handle the cache trashing these big tables impose, I didn't=
=20
> look into the details though.
> So I think this is a good change for 8xx.
>=20
> Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
>=20
> Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18 08:04:23=
:
>=20
> > From: Christophe Leroy <christophe.leroy@c-s.fr>
> > To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti=20
> <marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, =
Bob=20
> Pearson <rpearson@systemfabricworks.com>,=20
> > Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> > Date: 2013/11/19 13:05
> > Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the=20
> default slice by 8 on Powerpc 8xx.
> >=20
> > On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than=20
> CRC32_SLICEBY8,
> > as shown below:
> >=20
> > With CRC32_SLICEBY8:
> > [    1.109204] crc32: CRC_LE_BITS =3D 64, CRC_BE BITS =3D 64
> > [    1.114401] crc32: self tests passed, processed 225944 bytes in=20
> 15118910 nsec
> > [    1.130655] crc32c: CRC_LE_BITS =3D 64
> > [    1.134235] crc32c: self tests passed, processed 225944 bytes in=20
> 4479879 nsec
> >=20
> > With CRC32_SLICEBY4:
> > [    1.097129] crc32: CRC_LE_BITS =3D 32, CRC_BE BITS =3D 32
> > [    1.101878] crc32: self tests passed, processed 225944 bytes in=20
> 8616242 nsec
> > [    1.116298] crc32c: CRC_LE_BITS =3D 32
> > [    1.119607] crc32c: self tests passed, processed 225944 bytes in=20
> 3289576 nsec
> >=20
> > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >=20
> > Index: a/lib/Kconfig
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- a/lib/Kconfig   (r=C3=83=C2=A9vision 5325)
> > +++ b/lib/Kconfig   (copie de travail)
> > @@ -102,6 +102,7 @@
> >  choice
> >     prompt "CRC32 implementation"
> >     depends on CRC32
> > +   default CRC32_SLICEBY4 if PPC_8xx
> >     default CRC32_SLICEBY8
> >     help
> >       This option allows a kernel builder to override the default cho=
ice
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>=20
>=20

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
  2013-11-19 18:29   ` Scott Wood
@ 2013-11-19 23:39     ` Joakim Tjernlund
  2013-11-19 23:43       ` Scott Wood
  0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2013-11-19 23:39 UTC (permalink / raw)
  To: Scott Wood
  Cc: Christophe Leroy, Marcelo Tosatti, Bob Pearson, linuxppc-dev,
	linux-kernel

Scott Wood <scottwood@freescale.com> wrote on 2013/11/19 19:29:26:
>=20
> I don't think we should go littering the Kconfig with defaults for
> various bits of hardware -- especially since you've already pointed out
> non-8xx hardware that would also want this.  Put it in defconfig
> instead, unless you can identify very broad classes of machines for
> which SLICEBY4 is faster.

hmm, when 64bits went in there was not much proof that it was faster for
a wide range of CPU, just 2 or 3 if I recall correctly. I suspect there
are quite a few CPUs where 32 bits a equal or faster.

  Jocke

>=20
> -Scott
>=20
> On Tue, 2013-11-19 at 15:11 +0100, Joakim Tjernlund wrote:
> > I found the same on MPC8321 long time ago(when 64 bits change went=20
in),=20
> > the 32 bits were much faster. I guess the "smaller"
> > CPUs cannot handle the cache trashing these big tables impose, I=20
didn't=20
> > look into the details though.
> > So I think this is a good change for 8xx.
> >=20
> > Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> >=20
> > Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18=20
08:04:23:
> >=20
> > > From: Christophe Leroy <christophe.leroy@c-s.fr>
> > > To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti=20
> > <marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, =

Bob=20
> > Pearson <rpearson@systemfabricworks.com>,=20
> > > Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> > > Date: 2013/11/19 13:05
> > > Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the=20
> > default slice by 8 on Powerpc 8xx.
> > >=20
> > > On PPC=5F8xx, CRC32=5FSLICEBY4 is more efficient (almost twice) than =

> > CRC32=5FSLICEBY8,
> > > as shown below:
> > >=20
> > > With CRC32=5FSLICEBY8:
> > > [    1.109204] crc32: CRC=5FLE=5FBITS =3D 64, CRC=5FBE BITS =3D 64
> > > [    1.114401] crc32: self tests passed, processed 225944 bytes in=20
> > 15118910 nsec
> > > [    1.130655] crc32c: CRC=5FLE=5FBITS =3D 64
> > > [    1.134235] crc32c: self tests passed, processed 225944 bytes in=20
> > 4479879 nsec
> > >=20
> > > With CRC32=5FSLICEBY4:
> > > [    1.097129] crc32: CRC=5FLE=5FBITS =3D 32, CRC=5FBE BITS =3D 32
> > > [    1.101878] crc32: self tests passed, processed 225944 bytes in=20
> > 8616242 nsec
> > > [    1.116298] crc32c: CRC=5FLE=5FBITS =3D 32
> > > [    1.119607] crc32c: self tests passed, processed 225944 bytes in=20
> > 3289576 nsec
> > >=20
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > >=20
> > > Index: a/lib/Kconfig
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > --- a/lib/Kconfig   (r=C3=A9vision 5325)
> > > +++ b/lib/Kconfig   (copie de travail)
> > > @@ -102,6 +102,7 @@
> > >  choice
> > >     prompt "CRC32 implementation"
> > >     depends on CRC32
> > > +   default CRC32=5FSLICEBY4 if PPC=5F8xx
> > >     default CRC32=5FSLICEBY8
> > >     help
> > >       This option allows a kernel builder to override the default=20
choice
> >=20
> > =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> >=20
> >=20
>=20
>=20
>=20

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
  2013-11-19 23:39     ` Joakim Tjernlund
@ 2013-11-19 23:43       ` Scott Wood
  0 siblings, 0 replies; 5+ messages in thread
From: Scott Wood @ 2013-11-19 23:43 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: Christophe Leroy, Marcelo Tosatti, Bob Pearson, linuxppc-dev,
	linux-kernel

On Wed, 2013-11-20 at 00:39 +0100, Joakim Tjernlund wrote:
> Scott Wood <scottwood@freescale.com> wrote on 2013/11/19 19:29:26:
> > 
> > I don't think we should go littering the Kconfig with defaults for
> > various bits of hardware -- especially since you've already pointed out
> > non-8xx hardware that would also want this.  Put it in defconfig
> > instead, unless you can identify very broad classes of machines for
> > which SLICEBY4 is faster.
> 
> hmm, when 64bits went in there was not much proof that it was faster for
> a wide range of CPU, just 2 or 3 if I recall correctly. I suspect there
> are quite a few CPUs where 32 bits a equal or faster.

That may be the case, but I don't think we want a big list of them in
lib/Kconfig.  Whether the default should change (for all targets that
don't override it in defconfig, or at least for some broader category
such as "all 32-bit chips") is a different discussion.

-Scott

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-11-19 23:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-18  7:04 [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx Christophe Leroy
2013-11-19 14:11 ` Joakim Tjernlund
2013-11-19 18:29   ` Scott Wood
2013-11-19 23:39     ` Joakim Tjernlund
2013-11-19 23:43       ` Scott Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).