linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Discrepancy between comments for sched_find_first_bit
@ 2010-03-29  3:37 Matt Turner
  2010-03-29 10:25 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Turner @ 2010-03-29  3:37 UTC (permalink / raw)
  To: LKML, linux-alpha

include/asm-generic/bitops/sched.h says
/*
 * Every architecture must define this function. It's the fastest
 * way of searching a 100-bit bitmap.  It's guaranteed that at least
 * one of the 100 bits is cleared.
 */

arch/alpha/include/asm/bitops.h says
/*
 * Every architecture must define this function. It's the fastest
 * way of searching a 140-bit bitmap where the first 100 bits are
 * unlikely to be set. It's guaranteed that at least one of the 140
 * bits is set.
 */

Is the guarantee that one of the first 100-bits set (and that the last
40 are useless?), or 140-bits? If it's just the first 100 bits we care
about, then the alpha version needs to be fixed.

I'm just checking this out, because gcc produces horrendous code for
sched_find_first_bit on alpha. I rewrote it in assembly and it's
better than 4 times faster.

Also, is it even worth optimizing that function? It looks like it's
only used in kernel/sched_rt.c.

Thanks,
Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between comments for sched_find_first_bit
  2010-03-29  3:37 Discrepancy between comments for sched_find_first_bit Matt Turner
@ 2010-03-29 10:25 ` Peter Zijlstra
  2010-04-02 20:16   ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2010-03-29 10:25 UTC (permalink / raw)
  To: Matt Turner; +Cc: LKML, linux-alpha, Ingo Molnar

On Sun, 2010-03-28 at 23:37 -0400, Matt Turner wrote:
> include/asm-generic/bitops/sched.h says
> /*
>  * Every architecture must define this function. It's the fastest
>  * way of searching a 100-bit bitmap.  It's guaranteed that at least
>  * one of the 100 bits is cleared.
>  */
> 
> arch/alpha/include/asm/bitops.h says
> /*
>  * Every architecture must define this function. It's the fastest
>  * way of searching a 140-bit bitmap where the first 100 bits are
>  * unlikely to be set. It's guaranteed that at least one of the 140
>  * bits is set.
>  */
> 
> Is the guarantee that one of the first 100-bits set (and that the last
> 40 are useless?), or 140-bits? If it's just the first 100 bits we care
> about, then the alpha version needs to be fixed.
> 
> I'm just checking this out, because gcc produces horrendous code for
> sched_find_first_bit on alpha. I rewrote it in assembly and it's
> better than 4 times faster.
> 
> Also, is it even worth optimizing that function? It looks like it's
> only used in kernel/sched_rt.c.

(might help if you CC the scheduler people on scheduler functions :-)

Right, so it used to be 140 bits with the old O(1) scheduler, currently
(as you noted) sched_rt is the only user left and will therefore only
care about the first 100 bits.

As it stands I think it might still make sense to optimize this as for
rt loads it still on a hot path.

As to the 100 vs 140 length, would it really make much of difference to
shorten the implementation to 100? If not I'd leave it at 140.

Ingo, any comments?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between comments for sched_find_first_bit
  2010-03-29 10:25 ` Peter Zijlstra
@ 2010-04-02 20:16   ` Ingo Molnar
  2010-04-02 20:50     ` Matt Turner
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2010-04-02 20:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Matt Turner, LKML, linux-alpha


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sun, 2010-03-28 at 23:37 -0400, Matt Turner wrote:
> > include/asm-generic/bitops/sched.h says
> > /*
> >  * Every architecture must define this function. It's the fastest
> >  * way of searching a 100-bit bitmap.  It's guaranteed that at least
> >  * one of the 100 bits is cleared.
> >  */
> > 
> > arch/alpha/include/asm/bitops.h says
> > /*
> >  * Every architecture must define this function. It's the fastest
> >  * way of searching a 140-bit bitmap where the first 100 bits are
> >  * unlikely to be set. It's guaranteed that at least one of the 140
> >  * bits is set.
> >  */
> > 
> > Is the guarantee that one of the first 100-bits set (and that the last
> > 40 are useless?), or 140-bits? If it's just the first 100 bits we care
> > about, then the alpha version needs to be fixed.
> > 
> > I'm just checking this out, because gcc produces horrendous code for
> > sched_find_first_bit on alpha. I rewrote it in assembly and it's
> > better than 4 times faster.
> > 
> > Also, is it even worth optimizing that function? It looks like it's
> > only used in kernel/sched_rt.c.
> 
> (might help if you CC the scheduler people on scheduler functions :-)
> 
> Right, so it used to be 140 bits with the old O(1) scheduler, currently
> (as you noted) sched_rt is the only user left and will therefore only
> care about the first 100 bits.
> 
> As it stands I think it might still make sense to optimize this as for
> rt loads it still on a hot path.
> 
> As to the 100 vs 140 length, would it really make much of difference to
> shorten the implementation to 100? If not I'd leave it at 140.
> 
> Ingo, any comments?

I guess getting below the 128 bits boundary would shave an instruction and a 
branch off or so?

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between comments for sched_find_first_bit
  2010-04-02 20:16   ` Ingo Molnar
@ 2010-04-02 20:50     ` Matt Turner
  2010-04-02 21:25       ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Turner @ 2010-04-02 20:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, LKML, linux-alpha

On Fri, Apr 2, 2010 at 4:16 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Sun, 2010-03-28 at 23:37 -0400, Matt Turner wrote:
>> > include/asm-generic/bitops/sched.h says
>> > /*
>> >  * Every architecture must define this function. It's the fastest
>> >  * way of searching a 100-bit bitmap.  It's guaranteed that at least
>> >  * one of the 100 bits is cleared.
>> >  */
>> >
>> > arch/alpha/include/asm/bitops.h says
>> > /*
>> >  * Every architecture must define this function. It's the fastest
>> >  * way of searching a 140-bit bitmap where the first 100 bits are
>> >  * unlikely to be set. It's guaranteed that at least one of the 140
>> >  * bits is set.
>> >  */
>> >
>> > Is the guarantee that one of the first 100-bits set (and that the last
>> > 40 are useless?), or 140-bits? If it's just the first 100 bits we care
>> > about, then the alpha version needs to be fixed.
>> >
>> > I'm just checking this out, because gcc produces horrendous code for
>> > sched_find_first_bit on alpha. I rewrote it in assembly and it's
>> > better than 4 times faster.
>> >
>> > Also, is it even worth optimizing that function? It looks like it's
>> > only used in kernel/sched_rt.c.
>>
>> (might help if you CC the scheduler people on scheduler functions :-)
>>
>> Right, so it used to be 140 bits with the old O(1) scheduler, currently
>> (as you noted) sched_rt is the only user left and will therefore only
>> care about the first 100 bits.
>>
>> As it stands I think it might still make sense to optimize this as for
>> rt loads it still on a hot path.
>>
>> As to the 100 vs 140 length, would it really make much of difference to
>> shorten the implementation to 100? If not I'd leave it at 140.
>>
>> Ingo, any comments?
>
> I guess getting below the 128 bits boundary would shave an instruction and a
> branch off or so?
>
>        Ingo
>

That's right. I should be able to get rid of a cmov, which kind of
counts as two instructions in EV6 scheduling.

So I should send a patch to reduce this to the first 100 (128) bits?

Thanks guys,
Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between comments for sched_find_first_bit
  2010-04-02 20:50     ` Matt Turner
@ 2010-04-02 21:25       ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2010-04-02 21:25 UTC (permalink / raw)
  To: Matt Turner; +Cc: Peter Zijlstra, LKML, linux-alpha


* Matt Turner <mattst88@gmail.com> wrote:

> On Fri, Apr 2, 2010 at 4:16 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >> On Sun, 2010-03-28 at 23:37 -0400, Matt Turner wrote:
> >> > include/asm-generic/bitops/sched.h says
> >> > /*
> >> > ?* Every architecture must define this function. It's the fastest
> >> > ?* way of searching a 100-bit bitmap. ?It's guaranteed that at least
> >> > ?* one of the 100 bits is cleared.
> >> > ?*/
> >> >
> >> > arch/alpha/include/asm/bitops.h says
> >> > /*
> >> > ?* Every architecture must define this function. It's the fastest
> >> > ?* way of searching a 140-bit bitmap where the first 100 bits are
> >> > ?* unlikely to be set. It's guaranteed that at least one of the 140
> >> > ?* bits is set.
> >> > ?*/
> >> >
> >> > Is the guarantee that one of the first 100-bits set (and that the last
> >> > 40 are useless?), or 140-bits? If it's just the first 100 bits we care
> >> > about, then the alpha version needs to be fixed.
> >> >
> >> > I'm just checking this out, because gcc produces horrendous code for
> >> > sched_find_first_bit on alpha. I rewrote it in assembly and it's
> >> > better than 4 times faster.
> >> >
> >> > Also, is it even worth optimizing that function? It looks like it's
> >> > only used in kernel/sched_rt.c.
> >>
> >> (might help if you CC the scheduler people on scheduler functions :-)
> >>
> >> Right, so it used to be 140 bits with the old O(1) scheduler, currently
> >> (as you noted) sched_rt is the only user left and will therefore only
> >> care about the first 100 bits.
> >>
> >> As it stands I think it might still make sense to optimize this as for
> >> rt loads it still on a hot path.
> >>
> >> As to the 100 vs 140 length, would it really make much of difference to
> >> shorten the implementation to 100? If not I'd leave it at 140.
> >>
> >> Ingo, any comments?
> >
> > I guess getting below the 128 bits boundary would shave an instruction and a
> > branch off or so?
> >
> > ? ? ? ?Ingo
> >
> 
> That's right. I should be able to get rid of a cmov, which kind of
> counts as two instructions in EV6 scheduling.
> 
> So I should send a patch to reduce this to the first 100 (128) bits?

Sure, why not, every instruction counts :-)

Note, if you do it then please also include a disassembly of the area that 
changed, so that we document the effect.

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-04-02 21:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-29  3:37 Discrepancy between comments for sched_find_first_bit Matt Turner
2010-03-29 10:25 ` Peter Zijlstra
2010-04-02 20:16   ` Ingo Molnar
2010-04-02 20:50     ` Matt Turner
2010-04-02 21:25       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).