Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

From: George Spelvin <lkml@sdf.org>
To: andriy.shevchenko@linux.intel.com, lkml@sdf.org
Cc: akpm@linux-foundation.org, daniel.wagner@siemens.com,
	dchinner@redhat.com, don.mullis@gmail.com, geert@linux-m68k.org,
	linux-kernel@vger.kernel.org, linux@rasmusvillemoes.dk,
	st5pub@yandex.ru
Subject: Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS
Date: Thu, 14 Mar 2019 09:41:41 GMT	[thread overview]
Message-ID: <201903140941.x2E9ffkx024340@sdf.org> (raw)
In-Reply-To: <20190314091041.GU9224@smile.fi.intel.com>

On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote:
> On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote:
>> +		/* Do merges corresponding to set lsbits in count */
>
>> +		for (bit = 1; count & bit; bit <<= 1) {
>> +			cur = merge(priv, (cmp_func)cmp, pending, cur);
>> +			pending = pending->prev;  /* Untouched by merge() */
>>  		}
>
> Wouldn't be it the same to
>
> 	bit = ffz(count);
> 	while (bit--) {
> 		...
> 	}
> ?
>
> Though I dunno which one is generating better code.

Yes, it's the same.  But count is an incrementing counter, so the
pattern of return values from ffz() is 01020103010201040102010301020105...,
which has mean value 1/2 + 1/4 + 1/8 + 1/16 +... = 1.

So spending one instruction on ffz() to save one instruction per loop
iteration is an even trade, and if the processor doesn't have an ffz()
instruction, it's a loss.

There's a third possible implementation:

>> +		for (bit = count; bit & 1; bit >>= 1) {

...which works fine, too.  (It even saves a few bytes of code, so I
might switch to it.)  I used the form I did because my test code
verified that the length of the lists being merged equalled "bit".

The other forms don't have that property.

Thank you for looking at the code!