cleaner/better zlib sources?

* cleaner/better zlib sources?
@ 2007-03-16  1:04 Linus Torvalds
  2007-03-16  1:10 ` Shawn O. Pearce
                   ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Linus Torvalds @ 2007-03-16  1:04 UTC (permalink / raw)
  To: Git Mailing List

I looked at git profiles yesterday, and some of them are pretty scary. We 
spend about 50% of the time under some loads in just zlib uncompression, 
and when I actually looked closer at the zlib sources I can kind of 
understand why. That thing is horrid.

The sad part is that it looks like it should be quite possible to make 
zlib simply just perform better. The profiles seem to say that a lot of 
the cost is literally in the "inflate()" state machine code (and by that I 
mean *not* the code itself, but literally in the indirect jump generated 
by the case-statement).

Now, on any high-performance CPU, doing state-machines by having

	for (;;)
		switch (data->state) {
			...
			data->state = NEW_STATE;
			continue;
		}

(which is what zlib seems to be doing) is just about the worst possible 
way to code things.

Now, it's possible that I'm just wrong, but the instruction-level profile 
really did pinpoint the "look up state branch pointer and jump to it" as 
some of the hottest part of that function. Which is just *evil*. You can 
most likely use direct jumps within the loop (zero cost at all on most OoO 
CPU's) most of the time, and the entry condition is likely quite 
predictable too, so a lot of that overhead seems to be just sad and 
unnecessary.

Now, I'm just wondering if anybody knows if there are better zlib 
implementations out there? This really looks like it could be a noticeable 
performance issue, but I'm lazy and would be much happier to hear that 
somebody has already played with optimizing zlib. Especially since I'm not 
100% sure it's really going to be noticeable..

		Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread