On Sun, 31 Oct 2004, linux-os wrote: > Timer overhead = 88 CPU clocks > push 3, pop 3 = 12 CPU clocks > push 3, pop 2 = 12 CPU clocks > push 3, pop 1 = 12 CPU clocks > push 3, pop none using ADD = 8 CPU clocks > push 3, pop none using LEA = 8 CPU clocks > push 3, pop into same register = 12 CPU clocks your microbenchmark makes assumptions about rdtsc which haven't been valid since the days of the 486. rdtsc has serializing aspects and overhead that you can't just eliminate by running it in a tight loop and subtracting out that "overhead". you have to run your inner loops at least a few thousand of times between rdtsc invocations and divide it out to find out the average cost in order to eliminate the problems associated with rdtsc. -dean