* [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
@ 2013-11-18 7:04 Christophe Leroy
2013-11-19 14:11 ` Joakim Tjernlund
0 siblings, 1 reply; 5+ messages in thread
From: Christophe Leroy @ 2013-11-18 7:04 UTC (permalink / raw)
To: Vitaly Bordug, Marcelo Tosatti, Joakim Tjernlund, Bob Pearson
Cc: linux-kernel, linuxppc-dev
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1097 bytes --]
On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than CRC32_SLICEBY8,
as shown below:
With CRC32_SLICEBY8:
[ 1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
[ 1.114401] crc32: self tests passed, processed 225944 bytes in 15118910 nsec
[ 1.130655] crc32c: CRC_LE_BITS = 64
[ 1.134235] crc32c: self tests passed, processed 225944 bytes in 4479879 nsec
With CRC32_SLICEBY4:
[ 1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
[ 1.101878] crc32: self tests passed, processed 225944 bytes in 8616242 nsec
[ 1.116298] crc32c: CRC_LE_BITS = 32
[ 1.119607] crc32c: self tests passed, processed 225944 bytes in 3289576 nsec
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Index: a/lib/Kconfig
===================================================================
--- a/lib/Kconfig (révision 5325)
+++ b/lib/Kconfig (copie de travail)
@@ -102,6 +102,7 @@
choice
prompt "CRC32 implementation"
depends on CRC32
+ default CRC32_SLICEBY4 if PPC_8xx
default CRC32_SLICEBY8
help
This option allows a kernel builder to override the default choice
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-18 7:04 [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx Christophe Leroy
@ 2013-11-19 14:11 ` Joakim Tjernlund
2013-11-19 18:29 ` Scott Wood
0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2013-11-19 14:11 UTC (permalink / raw)
To: Christophe Leroy
Cc: linux-kernel, linuxppc-dev, Marcelo Tosatti, Bob Pearson, Vitaly Bordug
I found the same on MPC8321 long time ago(when 64 bits change went in),
the 32 bits were much faster. I guess the "smaller"
CPUs cannot handle the cache trashing these big tables impose, I didn't
look into the details though.
So I think this is a good change for 8xx.
Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18 08:04:23:
> From: Christophe Leroy <christophe.leroy@c-s.fr>
> To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti
<marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, Bob
Pearson <rpearson@systemfabricworks.com>,
> Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> Date: 2013/11/19 13:05
> Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the
default slice by 8 on Powerpc 8xx.
>
> On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than
CRC32_SLICEBY8,
> as shown below:
>
> With CRC32_SLICEBY8:
> [ 1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
> [ 1.114401] crc32: self tests passed, processed 225944 bytes in
15118910 nsec
> [ 1.130655] crc32c: CRC_LE_BITS = 64
> [ 1.134235] crc32c: self tests passed, processed 225944 bytes in
4479879 nsec
>
> With CRC32_SLICEBY4:
> [ 1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
> [ 1.101878] crc32: self tests passed, processed 225944 bytes in
8616242 nsec
> [ 1.116298] crc32c: CRC_LE_BITS = 32
> [ 1.119607] crc32c: self tests passed, processed 225944 bytes in
3289576 nsec
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>
> Index: a/lib/Kconfig
> ===================================================================
> --- a/lib/Kconfig (révision 5325)
> +++ b/lib/Kconfig (copie de travail)
> @@ -102,6 +102,7 @@
> choice
> prompt "CRC32 implementation"
> depends on CRC32
> + default CRC32_SLICEBY4 if PPC_8xx
> default CRC32_SLICEBY8
> help
> This option allows a kernel builder to override the default choice
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-19 14:11 ` Joakim Tjernlund
@ 2013-11-19 18:29 ` Scott Wood
2013-11-19 23:39 ` Joakim Tjernlund
0 siblings, 1 reply; 5+ messages in thread
From: Scott Wood @ 2013-11-19 18:29 UTC (permalink / raw)
To: Joakim Tjernlund
Cc: Christophe Leroy, Marcelo Tosatti, Bob Pearson, linuxppc-dev,
linux-kernel
I don't think we should go littering the Kconfig with defaults for
various bits of hardware -- especially since you've already pointed out
non-8xx hardware that would also want this. Put it in defconfig
instead, unless you can identify very broad classes of machines for
which SLICEBY4 is faster.
-Scott
On Tue, 2013-11-19 at 15:11 +0100, Joakim Tjernlund wrote:
> I found the same on MPC8321 long time ago(when 64 bits change went in),
> the 32 bits were much faster. I guess the "smaller"
> CPUs cannot handle the cache trashing these big tables impose, I didn't
> look into the details though.
> So I think this is a good change for 8xx.
>
> Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
>
> Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18 08:04:23:
>
> > From: Christophe Leroy <christophe.leroy@c-s.fr>
> > To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti
> <marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>, Bob
> Pearson <rpearson@systemfabricworks.com>,
> > Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> > Date: 2013/11/19 13:05
> > Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the
> default slice by 8 on Powerpc 8xx.
> >
> > On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than
> CRC32_SLICEBY8,
> > as shown below:
> >
> > With CRC32_SLICEBY8:
> > [ 1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
> > [ 1.114401] crc32: self tests passed, processed 225944 bytes in
> 15118910 nsec
> > [ 1.130655] crc32c: CRC_LE_BITS = 64
> > [ 1.134235] crc32c: self tests passed, processed 225944 bytes in
> 4479879 nsec
> >
> > With CRC32_SLICEBY4:
> > [ 1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
> > [ 1.101878] crc32: self tests passed, processed 225944 bytes in
> 8616242 nsec
> > [ 1.116298] crc32c: CRC_LE_BITS = 32
> > [ 1.119607] crc32c: self tests passed, processed 225944 bytes in
> 3289576 nsec
> >
> > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >
> > Index: a/lib/Kconfig
> > ===================================================================
> > --- a/lib/Kconfig (révision 5325)
> > +++ b/lib/Kconfig (copie de travail)
> > @@ -102,6 +102,7 @@
> > choice
> > prompt "CRC32 implementation"
> > depends on CRC32
> > + default CRC32_SLICEBY4 if PPC_8xx
> > default CRC32_SLICEBY8
> > help
> > This option allows a kernel builder to override the default choice
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-19 18:29 ` Scott Wood
@ 2013-11-19 23:39 ` Joakim Tjernlund
2013-11-19 23:43 ` Scott Wood
0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2013-11-19 23:39 UTC (permalink / raw)
To: Scott Wood
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, Marcelo Tosatti,
Bob Pearson
Scott Wood <scottwood@freescale.com> wrote on 2013/11/19 19:29:26:
>
> I don't think we should go littering the Kconfig with defaults for
> various bits of hardware -- especially since you've already pointed out
> non-8xx hardware that would also want this. Put it in defconfig
> instead, unless you can identify very broad classes of machines for
> which SLICEBY4 is faster.
hmm, when 64bits went in there was not much proof that it was faster for
a wide range of CPU, just 2 or 3 if I recall correctly. I suspect there
are quite a few CPUs where 32 bits a equal or faster.
Jocke
>
> -Scott
>
> On Tue, 2013-11-19 at 15:11 +0100, Joakim Tjernlund wrote:
> > I found the same on MPC8321 long time ago(when 64 bits change went
in),
> > the 32 bits were much faster. I guess the "smaller"
> > CPUs cannot handle the cache trashing these big tables impose, I
didn't
> > look into the details though.
> > So I think this is a good change for 8xx.
> >
> > Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> >
> > Christophe Leroy <christophe.leroy@c-s.fr> wrote on 2013/11/18
08:04:23:
> >
> > > From: Christophe Leroy <christophe.leroy@c-s.fr>
> > > To: Vitaly Bordug <vitb@kernel.crashing.org>, Marcelo Tosatti
> > <marcelo@kvack.org>, Joakim Tjernlund <joakim.tjernlund@transmode.se>,
Bob
> > Pearson <rpearson@systemfabricworks.com>,
> > > Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
> > > Date: 2013/11/19 13:05
> > > Subject: [PATCH] lib/crc32: slice by 4 is more efficient than the
> > default slice by 8 on Powerpc 8xx.
> > >
> > > On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than
> > CRC32_SLICEBY8,
> > > as shown below:
> > >
> > > With CRC32_SLICEBY8:
> > > [ 1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
> > > [ 1.114401] crc32: self tests passed, processed 225944 bytes in
> > 15118910 nsec
> > > [ 1.130655] crc32c: CRC_LE_BITS = 64
> > > [ 1.134235] crc32c: self tests passed, processed 225944 bytes in
> > 4479879 nsec
> > >
> > > With CRC32_SLICEBY4:
> > > [ 1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
> > > [ 1.101878] crc32: self tests passed, processed 225944 bytes in
> > 8616242 nsec
> > > [ 1.116298] crc32c: CRC_LE_BITS = 32
> > > [ 1.119607] crc32c: self tests passed, processed 225944 bytes in
> > 3289576 nsec
> > >
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > >
> > > Index: a/lib/Kconfig
> > > ===================================================================
> > > --- a/lib/Kconfig (révision 5325)
> > > +++ b/lib/Kconfig (copie de travail)
> > > @@ -102,6 +102,7 @@
> > > choice
> > > prompt "CRC32 implementation"
> > > depends on CRC32
> > > + default CRC32_SLICEBY4 if PPC_8xx
> > > default CRC32_SLICEBY8
> > > help
> > > This option allows a kernel builder to override the default
choice
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> >
> >
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
2013-11-19 23:39 ` Joakim Tjernlund
@ 2013-11-19 23:43 ` Scott Wood
0 siblings, 0 replies; 5+ messages in thread
From: Scott Wood @ 2013-11-19 23:43 UTC (permalink / raw)
To: Joakim Tjernlund
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, Marcelo Tosatti,
Bob Pearson
On Wed, 2013-11-20 at 00:39 +0100, Joakim Tjernlund wrote:
> Scott Wood <scottwood@freescale.com> wrote on 2013/11/19 19:29:26:
> >
> > I don't think we should go littering the Kconfig with defaults for
> > various bits of hardware -- especially since you've already pointed out
> > non-8xx hardware that would also want this. Put it in defconfig
> > instead, unless you can identify very broad classes of machines for
> > which SLICEBY4 is faster.
>
> hmm, when 64bits went in there was not much proof that it was faster for
> a wide range of CPU, just 2 or 3 if I recall correctly. I suspect there
> are quite a few CPUs where 32 bits a equal or faster.
That may be the case, but I don't think we want a big list of them in
lib/Kconfig. Whether the default should change (for all targets that
don't override it in defconfig, or at least for some broader category
such as "all 32-bit chips") is a different discussion.
-Scott
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-11-19 23:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-18 7:04 [PATCH] lib/crc32: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx Christophe Leroy
2013-11-19 14:11 ` Joakim Tjernlund
2013-11-19 18:29 ` Scott Wood
2013-11-19 23:39 ` Joakim Tjernlund
2013-11-19 23:43 ` Scott Wood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).