All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
@ 2012-03-13  5:12 Maciej Żenczykowski
  2012-03-13  5:13 ` Maciej Żenczykowski
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Maciej Żenczykowski @ 2012-03-13  5:12 UTC (permalink / raw)
  To: Maciej Żenczykowski; +Cc: netdev

From: Maciej Żenczykowski <maze@google.com>

See ffs(x) definition in arch/x86/include/asm/bitops.h

  ffs - find first set bit in word

  ffs(value) returns 0 if value is 0 or the position of the first
  set bit if value is nonzero. The first (least significant) bit
  is at position 1.

On x86_64 ffs(x) is effectively:
  Z := -1
  BSFL X, Z
  return Z + 1

Since we subtract one, we effectively end up with:
  Z := -1
  BSFL X, Z
  return Z

This is certainly more readable than the open coded array that
was there before, supports an easier change in the number of bands,
and is probably faster to boot (no memory lookup).

However, on other architectures ffs() might not be so pretty,
hence use a clever arithmetic hack on other archs.
Unfortunately it only support 3 bands.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 net/sched/sch_generic.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 67fc573e013a..935492d6a0b6 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -436,9 +436,19 @@ struct pfifo_fast_priv {
  * Convert a bitmap to the first band number where an skb is queued, where:
  * 	bitmap=0 means there are no skbs on any band.
  * 	bitmap=1 means there is an skb on band 0.
- *	bitmap=7 means there are skbs on all 3 bands, etc.
+ * 	bitmap=7 means there are skbs on all 3 bands, etc.
+ *
+ * This is equivalent to ffs(i) - 1
  */
-static const int bitmap2band[] = {-1, 0, 1, 0, 2, 0, 1, 0};
+static inline int bitmap2band(int i)
+{
+#if defined(CONFIG_X86_64)
+	return ffs(i) - 1; /* Known to be efficient. */
+#else
+	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
+	return ((26468 >> (i+i)) & 3) - 1;
+#endif
+}
 
 static inline struct sk_buff_head *band2list(struct pfifo_fast_priv *priv,
 					     int band)
@@ -464,7 +474,7 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc)
 static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 {
 	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-	int band = bitmap2band[priv->bitmap];
+	int band = bitmap2band(priv->bitmap);
 
 	if (likely(band >= 0)) {
 		struct sk_buff_head *list = band2list(priv, band);
@@ -483,7 +493,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
 {
 	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-	int band = bitmap2band[priv->bitmap];
+	int band = bitmap2band(priv->bitmap);
 
 	if (band >= 0) {
 		struct sk_buff_head *list = band2list(priv, band);
-- 
1.7.9.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13  5:12 [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup Maciej Żenczykowski
@ 2012-03-13  5:13 ` Maciej Żenczykowski
  2012-03-13  5:54 ` David Miller
  2012-03-13 11:04 ` David Laight
  2 siblings, 0 replies; 11+ messages in thread
From: Maciej Żenczykowski @ 2012-03-13  5:13 UTC (permalink / raw)
  To: Maciej Żenczykowski; +Cc: netdev

Not really sure about this one, this might be more of a discussion starter...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13  5:12 [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup Maciej Żenczykowski
  2012-03-13  5:13 ` Maciej Żenczykowski
@ 2012-03-13  5:54 ` David Miller
  2012-03-13  7:21   ` Maciej Żenczykowski
  2012-03-13 11:04 ` David Laight
  2 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2012-03-13  5:54 UTC (permalink / raw)
  To: zenczykowski; +Cc: maze, netdev

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 12 Mar 2012 22:12:21 -0700

> From: Maciej Żenczykowski <maze@google.com>
> 
> See ffs(x) definition in arch/x86/include/asm/bitops.h
> 
>   ffs - find first set bit in word
> 
>   ffs(value) returns 0 if value is 0 or the position of the first
>   set bit if value is nonzero. The first (least significant) bit
>   is at position 1.
> 
> On x86_64 ffs(x) is effectively:
>   Z := -1
>   BSFL X, Z
>   return Z + 1
> 
> Since we subtract one, we effectively end up with:
>   Z := -1
>   BSFL X, Z
>   return Z
> 
> This is certainly more readable than the open coded array that
> was there before, supports an easier change in the number of bands,
> and is probably faster to boot (no memory lookup).
> 
> However, on other architectures ffs() might not be so pretty,
> hence use a clever arithmetic hack on other archs.
> Unfortunately it only support 3 bands.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

It's about the same cost, the non-ffs() version, so I would just
use that for now.  Conditionalized code is such a pain in the butt.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13  5:54 ` David Miller
@ 2012-03-13  7:21   ` Maciej Żenczykowski
  2012-03-13  7:23     ` Maciej Żenczykowski
  2012-03-13  7:25     ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Maciej Żenczykowski @ 2012-03-13  7:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

If I understand correctly, you're suggesting to go with

return ((26468 >> (i+i)) & 3) - 1;

correct?

2012/3/12 David Miller <davem@davemloft.net>:
> From: Maciej Żenczykowski <zenczykowski@gmail.com>
> Date: Mon, 12 Mar 2012 22:12:21 -0700
>
>> From: Maciej Żenczykowski <maze@google.com>
>>
>> See ffs(x) definition in arch/x86/include/asm/bitops.h
>>
>>   ffs - find first set bit in word
>>
>>   ffs(value) returns 0 if value is 0 or the position of the first
>>   set bit if value is nonzero. The first (least significant) bit
>>   is at position 1.
>>
>> On x86_64 ffs(x) is effectively:
>>   Z := -1
>>   BSFL X, Z
>>   return Z + 1
>>
>> Since we subtract one, we effectively end up with:
>>   Z := -1
>>   BSFL X, Z
>>   return Z
>>
>> This is certainly more readable than the open coded array that
>> was there before, supports an easier change in the number of bands,
>> and is probably faster to boot (no memory lookup).
>>
>> However, on other architectures ffs() might not be so pretty,
>> hence use a clever arithmetic hack on other archs.
>> Unfortunately it only support 3 bands.
>>
>> Signed-off-by: Maciej Żenczykowski <maze@google.com>
>
> It's about the same cost, the non-ffs() version, so I would just
> use that for now.  Conditionalized code is such a pain in the butt.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13  7:21   ` Maciej Żenczykowski
@ 2012-03-13  7:23     ` Maciej Żenczykowski
  2012-03-13  7:25       ` David Miller
  2012-03-13  7:25     ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Maciej Żenczykowski @ 2012-03-13  7:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

I was actually leaning towards going with ffs()-1, since

it's excellent on x86-64
good on x86_cmov
decent on x86
good on armv5+

and I'm guessing it'll be good or better on any modern platform

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13  7:21   ` Maciej Żenczykowski
  2012-03-13  7:23     ` Maciej Żenczykowski
@ 2012-03-13  7:25     ` David Miller
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2012-03-13  7:25 UTC (permalink / raw)
  To: zenczykowski; +Cc: netdev

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 13 Mar 2012 00:21:55 -0700

> If I understand correctly, you're suggesting to go with
> 
> return ((26468 >> (i+i)) & 3) - 1;
> 
> correct?

Yes, at least for now.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13  7:23     ` Maciej Żenczykowski
@ 2012-03-13  7:25       ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2012-03-13  7:25 UTC (permalink / raw)
  To: zenczykowski; +Cc: netdev

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 13 Mar 2012 00:23:27 -0700

> I was actually leaning towards going with ffs()-1, since
> 
> it's excellent on x86-64
> good on x86_cmov
> decent on x86
> good on armv5+
> 
> and I'm guessing it'll be good or better on any modern platform

Cycle count it with the pure integer operation variant, I bet it's
within a cycle or two, and at that point we're just splitting hairs
for an ugly-as-sin ifdef.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13  5:12 [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup Maciej Żenczykowski
  2012-03-13  5:13 ` Maciej Żenczykowski
  2012-03-13  5:54 ` David Miller
@ 2012-03-13 11:04 ` David Laight
  2012-03-13 16:55   ` David Laight
  2 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2012-03-13 11:04 UTC (permalink / raw)
  To: Maciej Zenczykowski, Maciej Zenczykowski; +Cc: netdev

 
> +	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
> +	return ((26468 >> (i+i)) & 3) - 1;

That expression doesn't seem quite right to me ...
	return (int)(0x12131210 >> (i * 4)) & 3) - 1;
probably does what is wanted.

	David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13 11:04 ` David Laight
@ 2012-03-13 16:55   ` David Laight
  2012-03-13 17:34     ` Eric Dumazet
  2012-03-13 20:37     ` Maciej Żenczykowski
  0 siblings, 2 replies; 11+ messages in thread
From: David Laight @ 2012-03-13 16:55 UTC (permalink / raw)
  To: David Laight, Maciej Zenczykowski, Maciej Zenczykowski; +Cc: netdev

 
> > +	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
> > +	return ((26468 >> (i+i)) & 3) - 1;
> 
> That expression doesn't seem quite right to me ...
> 	return (int)(0x12131210 >> (i * 4)) & 3) - 1;
> probably does what is wanted.

Hmmm... I'm going blind - I read that as 'i+1' not 'i+i'.
But using 'i * 4' and putting the constant in base 16
make the code rather less obscure.

	David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13 16:55   ` David Laight
@ 2012-03-13 17:34     ` Eric Dumazet
  2012-03-13 20:37     ` Maciej Żenczykowski
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-03-13 17:34 UTC (permalink / raw)
  To: David Laight; +Cc: Maciej Zenczykowski, Maciej Zenczykowski, netdev

On Tue, 2012-03-13 at 16:55 +0000, David Laight wrote:
> > > +	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
> > > +	return ((26468 >> (i+i)) & 3) - 1;
> > 
> > That expression doesn't seem quite right to me ...
> > 	return (int)(0x12131210 >> (i * 4)) & 3) - 1;
> > probably does whats wanted.
> 
> Hmmm... I'm going blind - I read that as 'i+1' not 'i+i'.
> But using 'i * 4' and putting the constant in base 16
> make the code rather less obscure.

Point was to get short/fast code.

26458 can probably be expressed with a macro so that it is self
explained.

Check asm output if your way is better. Its probably not the case
because some arches can use smaller code to manipulate small constants.

I am not sure the memory lookup is that expensive anyway
(we probably could use 'char' instead of 'int' to reduce the siwe of
this memory blob)

Code is also read from memory, so you have to trade icache/dcache
issues.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup
  2012-03-13 16:55   ` David Laight
  2012-03-13 17:34     ` Eric Dumazet
@ 2012-03-13 20:37     ` Maciej Żenczykowski
  1 sibling, 0 replies; 11+ messages in thread
From: Maciej Żenczykowski @ 2012-03-13 20:37 UTC (permalink / raw)
  To: David Laight; +Cc: netdev

but a 16-bit constant will be better than a 32-bit one on 16-bit platforms

On Tue, Mar 13, 2012 at 9:55 AM, David Laight <David.Laight@aculab.com> wrote:
>
>> > +   /* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
>> > +   return ((26468 >> (i+i)) & 3) - 1;
>>
>> That expression doesn't seem quite right to me ...
>>       return (int)(0x12131210 >> (i * 4)) & 3) - 1;
>> probably does what is wanted.
>
> Hmmm... I'm going blind - I read that as 'i+1' not 'i+i'.
> But using 'i * 4' and putting the constant in base 16
> make the code rather less obscure.
>
>        David
>
>



-- 
Maciej A. Żenczykowski
Kernel Networking Developer @ Google
1600 Amphitheatre Parkway, Mountain View, CA 94043
tel: +1 (650) 253-0062

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-03-13 20:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13  5:12 [PATCH] net: pfifo_fast - use ffs(x)-1 instead of array lookup Maciej Żenczykowski
2012-03-13  5:13 ` Maciej Żenczykowski
2012-03-13  5:54 ` David Miller
2012-03-13  7:21   ` Maciej Żenczykowski
2012-03-13  7:23     ` Maciej Żenczykowski
2012-03-13  7:25       ` David Miller
2012-03-13  7:25     ` David Miller
2012-03-13 11:04 ` David Laight
2012-03-13 16:55   ` David Laight
2012-03-13 17:34     ` Eric Dumazet
2012-03-13 20:37     ` Maciej Żenczykowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.