From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
To: unlisted-recipients:; (no To-header on input)
Cc: "Bob Pearson" <rpearson@systemfabricworks.com>,
"'Andrew Morton'" <akpm@linux-foundation.org>,
"'frank zago'" <fzago@systemfabricworks.com>,
linux-kernel@vger.kernel.org
Subject: RE: [PATCH] add slice by 8 algorithm to crc32.c
Date: Fri, 5 Aug 2011 15:34:24 +0200 [thread overview]
Message-ID: <OF747F0842.77172E9E-ONC12578E3.004987D0-C12578E3.004A8FCA@transmode.se> (raw)
In-Reply-To: <OF14136E0E.3F2388EF-ONC12578E3.00301969-C12578E3.00338524@LocalDomain>
Joakim Tjernlund/Transmode wrote on 2011/08/05 11:22:44:
>
> "Bob Pearson" <rpearson@systemfabricworks.com> wrote on 2011/08/04 20:53:20:
> >
> > Sure... See below.
> >
> > > -----Original Message-----
> > > From: Joakim Tjernlund [mailto:joakim.tjernlund@transmode.se]
> > > Sent: Thursday, August 04, 2011 6:54 AM
> > > To: Bob Pearson
> > > Cc: 'Andrew Morton'; 'frank zago'; linux-kernel@vger.kernel.org
> > > Subject: RE: [PATCH] add slice by 8 algorithm to crc32.c
> > >
> > > "Bob Pearson" <rpearson@systemfabricworks.com> wrote on 2011/08/02
> > > 23:14:39:
> > > >
> > > > Hi Joakim,
> > > >
> > > > Sorry to take so long to respond.
> > >
> > > No problem but please insert you answers in correct context(like I did).
> > This
> > > makes it much easier to read and comment on.
> > >
> > > >
> > > > Here are some performance data collected from the original and modified
> > > > crc32 algorithms.
> > > > The following is a simple test loop that computes the time to compute
> > 1000
> > > > crc's over 4096 bytes of data aligned on an 8 byte boundary after
> > warming
> > > > the cache. You could make other measurements but this is sort of a best
> > > > case.
> > > >
> > > > These measurements were made on a dual socket Nehalem 2.267 GHz
> > > system.
> > >
> > > Measurements on your SPARC would be good too.
> >
> > Will do. But it is decrepit and quite slow. My main motivation is to run a
> > 10G protocol so I am mostly motivated to get x86_64 going as fast as
> > possible.
>
> 64 bits may be faster on x86_64 but not on ppc32. Your latest patch gives:
> crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
> crc32: self tests passed, processed 225944 bytes in 3987640 nsec
> crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
> crc32: self tests passed, processed 225944 bytes in 2003630 nsec
> Almost a factor 2 slower.
> So in any case I don't think 64 bits should be default for all archs.
> Probably only for 64 bit archs.
I checked the asm on ppc for 32 bits crc32 and compared yours vs. mine. PPC suffers
from your version. The startup cost is much higher. I did notice one win with your
version though. The inner loop was reduced with 3 insns if one use separate arrays.
However, loading 4 separate arrays are 16 insns on PPC so I did the best thing for
ppc:
diff --git a/lib/crc32.c b/lib/crc32.c
index 4855995..e3e391f 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -51,20 +51,21 @@ static inline u32
crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
{
# ifdef __LITTLE_ENDIAN
-# define DO_CRC(x) crc = tab[0][(crc ^ (x)) & 255] ^ (crc >> 8)
-# define DO_CRC4 crc = tab[3][(crc) & 255] ^ \
- tab[2][(crc >> 8) & 255] ^ \
- tab[1][(crc >> 16) & 255] ^ \
- tab[0][(crc >> 24) & 255]
+# define DO_CRC(x) crc = t0[(crc ^ (x)) & 255] ^ (crc >> 8)
+# define DO_CRC4 crc = t3[(crc) & 255] ^ \
+ t2[(crc >> 8) & 255] ^ \
+ t1[(crc >> 16) & 255] ^ \
+ t0[(crc >> 24) & 255]
# else
-# define DO_CRC(x) crc = tab[0][((crc >> 24) ^ (x)) & 255] ^ (crc << 8)
-# define DO_CRC4 crc = tab[0][(crc) & 255] ^ \
- tab[1][(crc >> 8) & 255] ^ \
- tab[2][(crc >> 16) & 255] ^ \
- tab[3][(crc >> 24) & 255]
+# define DO_CRC(x) crc = t0[((crc >> 24) ^ (x)) & 255] ^ (crc << 8)
+# define DO_CRC4 crc = t0[(crc) & 255] ^ \
+ t1[(crc >> 8) & 255] ^ \
+ t2[(crc >> 16) & 255] ^ \
+ t3[(crc >> 24) & 255]
# endif
const u32 *b;
size_t rem_len;
+ const u32 *t0=tab[0], *t1=t0 + 256, *t2=t1 + 256, *t3=t2 + 256;
/* Align it */
if (unlikely((long)buf & 3 && len)) {
This reduces the inner loop with 3 insns while adding only 5 insns startup cost.
I hope this brings my crc32(32 bits) in line with yours, even on x86_64.
Please test.
Jocke
next prev parent reply other threads:[~2011-08-05 13:34 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <OF4AE0115F.3AA5397E-ONC12578DF.002EC6DF-C12578DF.003348E5@transmode.se>
2011-08-02 21:14 ` [PATCH] add slice by 8 algorithm to crc32.c Bob Pearson
2011-08-02 21:19 ` Bob Pearson
2011-08-04 11:54 ` Joakim Tjernlund
2011-08-04 18:53 ` Bob Pearson
2011-08-05 9:22 ` Joakim Tjernlund
2011-08-05 15:51 ` Bob Pearson
2011-08-08 7:11 ` Joakim Tjernlund
2011-08-05 17:27 ` Bob Pearson
2011-08-08 7:15 ` Joakim Tjernlund
[not found] ` <OF14136E0E.3F2388EF-ONC12578E3.00301969-C12578E3.00338524@LocalDomain>
2011-08-05 13:34 ` Joakim Tjernlund [this message]
2011-08-08 9:28 George Spelvin
2011-08-08 10:31 ` Joakim Tjernlund
2011-08-08 10:52 ` George Spelvin
2011-08-08 11:11 ` Joakim Tjernlund
2011-08-08 17:04 ` Bob Pearson
[not found] ` <OFEA1BD2B2.B2A7F07F-ONC12578E6.003D368C-C12578E6.003D7468@LocalDomain>
2011-08-08 11:24 ` Joakim Tjernlund
2011-08-08 11:42 ` Joakim Tjernlund
2011-08-08 12:54 ` George Spelvin
2011-08-08 17:01 ` Bob Pearson
2011-08-08 20:45 ` George Spelvin
2011-08-08 22:21 ` Bob Pearson
2011-08-08 16:54 ` Bob Pearson
2011-08-08 16:50 ` Bob Pearson
-- strict thread matches above, loose matches on Subject: below --
2011-07-20 22:19 frank zago
2011-07-28 22:16 ` Andrew Morton
2011-07-29 1:47 ` Bob Pearson
2011-08-01 19:39 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=OF747F0842.77172E9E-ONC12578E3.004987D0-C12578E3.004A8FCA@transmode.se \
--to=joakim.tjernlund@transmode.se \
--cc=akpm@linux-foundation.org \
--cc=fzago@systemfabricworks.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rpearson@systemfabricworks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).