linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
@ 2014-11-26 18:34 Alexei Starovoitov
  2014-11-26 18:41 ` Joe Perches
  2014-11-26 20:00 ` Joe Perches
  0 siblings, 2 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2014-11-26 18:34 UTC (permalink / raw)
  To: Joe Perches
  Cc: Quentin Lambert, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches <joe@perches.com> wrote:
> On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
>> On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:
>
>> > Is there any value in reordering these tests for frequency
>> > or maybe using | instead of || to avoid multiple jumps?
>>
>> probably not. It's not a critical path.
>> compiler may fuse conditions depending on values anyway.
>> If it was a critical path, we could have used
>> (1 << reg) & mask trick.
>> I picked explicit 'return true' else 'return false' here,
>> because it felt easier to read. Just a matter of taste.
>
> There is a size difference though: (allyesconfig)
>
> $ size arch/x86/net/built-in.o*
>    text    data     bss     dec     hex filename
>   12999    1012    4336   18347    47ab arch/x86/net/built-in.o.new
>   13177    1076    4592   18845    499d arch/x86/net/built-in.o.old

interesting. Compiler obviously thinks that 178 byte increase
with -O2 is the right trade off. Which I agree with :)

If I think dropping 'inline' and using -Os will give bigger savings...
but I suspect 'tinification' folks will compile JIT out anyway...

> ---
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 3f62734..09e2cea 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -135,11 +135,11 @@ static const int reg2hex[] = {
>   */
>  static inline bool is_ereg(u32 reg)
>  {
> -       if (reg == BPF_REG_5 || reg == AUX_REG ||
> -           (reg >= BPF_REG_7 && reg <= BPF_REG_9))
> -               return true;
> -       else
> -               return false;
> +       return (1 << reg) & (BIT(BPF_REG_5) |
> +                            BIT(AUX_REG) |
> +                            BIT(BPF_REG_7) |
> +                            BIT(BPF_REG_8) |
> +                            BIT(BPF_REG_9));

thanks for giving it a shot :)
That's exactly what I had in mind.
imo it's less readable, but we probably not going
to mess much with this piece of code anyway.
Though to be safe in the future, we'd need to
add BUILD_BUG_ON that largest value (AUX_REG)
fits in 32bit (or 64bit) and add a comment that
verifier goes before the JIT and checks that
insn->src_reg, insn->dst_reg are less than MAX_BPF_REG,
so argument 'reg' also doesn't trigger too large shift.
Perfectionists r us. :)
... or just leave it as-is ;)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-26 18:34 [PATCH] x86: bpf_jit_comp: simplify trivial boolean return Alexei Starovoitov
@ 2014-11-26 18:41 ` Joe Perches
  2014-11-26 20:00 ` Joe Perches
  1 sibling, 0 replies; 17+ messages in thread
From: Joe Perches @ 2014-11-26 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Quentin Lambert, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On Wed, 2014-11-26 at 10:34 -0800, Alexei Starovoitov wrote:
> On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches <joe@perches.com> wrote:
> > On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
> >> On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:
> >
> >> > Is there any value in reordering these tests for frequency
> >> > or maybe using | instead of || to avoid multiple jumps?
> >>
> >> probably not. It's not a critical path.
> >> compiler may fuse conditions depending on values anyway.
> >> If it was a critical path, we could have used
> >> (1 << reg) & mask trick.
> >> I picked explicit 'return true' else 'return false' here,
> >> because it felt easier to read. Just a matter of taste.
> >
> > There is a size difference though: (allyesconfig)
> >
> > $ size arch/x86/net/built-in.o*
> >    text    data     bss     dec     hex filename
> >   12999    1012    4336   18347    47ab arch/x86/net/built-in.o.new
> >   13177    1076    4592   18845    499d arch/x86/net/built-in.o.old
> 
> interesting. Compiler obviously thinks that 178 byte increase
> with -O2 is the right trade off. Which I agree with :)

498 overall.

> If I think dropping 'inline' and using -Os will give bigger savings...
> but I suspect 'tinification' folks will compile JIT out anyway...

Smaller is generally better/faster in any case.

> thanks for giving it a shot :)
> That's exactly what I had in mind.
> imo it's less readable, but we probably not going
> to mess much with this piece of code anyway.
> Though to be safe in the future, we'd need to
> add BUILD_BUG_ON that largest value (AUX_REG)
> fits in 32bit (or 64bit) and add a comment that
> verifier goes before the JIT and checks that
> insn->src_reg, insn->dst_reg are less than MAX_BPF_REG,
> so argument 'reg' also doesn't trigger too large shift.
> Perfectionists r us. :)
> ... or just leave it as-is ;)

16 registers max anyway as it's stored
in a :4

No worries, it was just playtime anyway.

cheers, Joe



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-26 18:34 [PATCH] x86: bpf_jit_comp: simplify trivial boolean return Alexei Starovoitov
  2014-11-26 18:41 ` Joe Perches
@ 2014-11-26 20:00 ` Joe Perches
  2014-11-27 12:25   ` David Laight
  1 sibling, 1 reply; 17+ messages in thread
From: Joe Perches @ 2014-11-26 20:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Quentin Lambert, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On Wed, 2014-11-26 at 10:34 -0800, Alexei Starovoitov wrote:
> On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches <joe@perches.com> wrote:
> > On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
> >> On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:
> >
> >> > Is there any value in reordering these tests for frequency
> >> > or maybe using | instead of || to avoid multiple jumps?
> >>
> >> probably not. It's not a critical path.
> >> compiler may fuse conditions depending on values anyway.
> >> If it was a critical path, we could have used
> >> (1 << reg) & mask trick.
> >> I picked explicit 'return true' else 'return false' here,
> >> because it felt easier to read. Just a matter of taste.
> >
> > There is a size difference though: (allyesconfig)
> >
> > $ size arch/x86/net/built-in.o*
> >    text    data     bss     dec     hex filename
> >   12999    1012    4336   18347    47ab arch/x86/net/built-in.o.new
> >   13177    1076    4592   18845    499d arch/x86/net/built-in.o.old
> 
> interesting. Compiler obviously thinks that 178 byte increase
> with -O2 is the right trade off. Which I agree with :)
> 
> If I think dropping 'inline' and using -Os will give bigger savings...

This was allyesconfig which already uses -Os

Using -O2, there is no difference using inline
or not, but the size delta with the bitmask is
much larger

$ size arch/x86/net/built-in.o* (allyesconfig, but not -Os)
   text	   data	    bss	    dec	    hex	filename
  13410	    820	   3624	  17854	   45be	arch/x86/net/built-in.o.new
  16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.old
  16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.static



^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-26 20:00 ` Joe Perches
@ 2014-11-27 12:25   ` David Laight
  2014-11-27 14:36     ` Quentin Lambert
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: David Laight @ 2014-11-27 12:25 UTC (permalink / raw)
  To: 'Joe Perches', Alexei Starovoitov
  Cc: Quentin Lambert, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

From: Joe Perches
> On Wed, 2014-11-26 at 10:34 -0800, Alexei Starovoitov wrote:
> > On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches <joe@perches.com> wrote:
> > > On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
> > >> On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:
> > >
> > >> > Is there any value in reordering these tests for frequency
> > >> > or maybe using | instead of || to avoid multiple jumps?
> > >>
> > >> probably not. It's not a critical path.
> > >> compiler may fuse conditions depending on values anyway.
> > >> If it was a critical path, we could have used
> > >> (1 << reg) & mask trick.
> > >> I picked explicit 'return true' else 'return false' here,
> > >> because it felt easier to read. Just a matter of taste.
> > >
> > > There is a size difference though: (allyesconfig)
> > >
> > > $ size arch/x86/net/built-in.o*
> > >    text    data     bss     dec     hex filename
> > >   12999    1012    4336   18347    47ab arch/x86/net/built-in.o.new
> > >   13177    1076    4592   18845    499d arch/x86/net/built-in.o.old
> >
> > interesting. Compiler obviously thinks that 178 byte increase
> > with -O2 is the right trade off. Which I agree with :)
> >
> > If I think dropping 'inline' and using -Os will give bigger savings...
> 
> This was allyesconfig which already uses -Os
> 
> Using -O2, there is no difference using inline
> or not, but the size delta with the bitmask is
> much larger
> 
> $ size arch/x86/net/built-in.o* (allyesconfig, but not -Os)
>    text	   data	    bss	    dec	    hex	filename
>   13410	    820	   3624	  17854	   45be	arch/x86/net/built-in.o.new
>   16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.old
>   16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.static

That is quite a big % change in the code size.
Why the change in data?

	David




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-27 12:25   ` David Laight
@ 2014-11-27 14:36     ` Quentin Lambert
  2014-11-27 18:35     ` Joe Perches
  2014-11-27 18:49     ` Joe Perches
  2 siblings, 0 replies; 17+ messages in thread
From: Quentin Lambert @ 2014-11-27 14:36 UTC (permalink / raw)
  To: David Laight, 'Joe Perches', Alexei Starovoitov
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On 27/11/2014 13:25, David Laight wrote:
> From: Joe Perches
>> On Wed, 2014-11-26 at 10:34 -0800, Alexei Starovoitov wrote:
>>> On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches <joe@perches.com> wrote:
>>>> On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
>>>>> On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:
>>>>>> Is there any value in reordering these tests for frequency
>>>>>> or maybe using | instead of || to avoid multiple jumps?
>>>>> probably not. It's not a critical path.
>>>>> compiler may fuse conditions depending on values anyway.
>>>>> If it was a critical path, we could have used
>>>>> (1 << reg) & mask trick.
>>>>> I picked explicit 'return true' else 'return false' here,
>>>>> because it felt easier to read. Just a matter of taste.
>>>> There is a size difference though: (allyesconfig)
>>>>
>>>> $ size arch/x86/net/built-in.o*
>>>>     text    data     bss     dec     hex filename
>>>>    12999    1012    4336   18347    47ab arch/x86/net/built-in.o.new
>>>>    13177    1076    4592   18845    499d arch/x86/net/built-in.o.old
>>> interesting. Compiler obviously thinks that 178 byte increase
>>> with -O2 is the right trade off. Which I agree with :)
>>>
>>> If I think dropping 'inline' and using -Os will give bigger savings...
>> This was allyesconfig which already uses -Os
>>
>> Using -O2, there is no difference using inline
>> or not, but the size delta with the bitmask is
>> much larger
>>
>> $ size arch/x86/net/built-in.o* (allyesconfig, but not -Os)
>>     text	   data	    bss	    dec	    hex	filename
>>    13410	    820	   3624	  17854	   45be	arch/x86/net/built-in.o.new
>>    16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.old
>>    16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.static
> That is quite a big % change in the code size.
> Why the change in data?
>
> 	David
>
>
>
Do you want me to propose a second version, or should I just
drop it all together ?

I am a new contributor so I have no experience in that sort of thing.

Quentin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-27 12:25   ` David Laight
  2014-11-27 14:36     ` Quentin Lambert
@ 2014-11-27 18:35     ` Joe Perches
  2014-11-27 18:49     ` Joe Perches
  2 siblings, 0 replies; 17+ messages in thread
From: Joe Perches @ 2014-11-27 18:35 UTC (permalink / raw)
  To: David Laight
  Cc: Alexei Starovoitov, Quentin Lambert, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, netdev, linux-kernel

On Thu, 2014-11-27 at 12:25 +0000, David Laight wrote:
> From: Joe Perches
> > On Wed, 2014-11-26 at 10:34 -0800, Alexei Starovoitov wrote:
> > > On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches <joe@perches.com> wrote:
> > > > On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
> > > >> On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:
> > > >
> > > >> > Is there any value in reordering these tests for frequency
> > > >> > or maybe using | instead of || to avoid multiple jumps?
> > > >>
> > > >> probably not. It's not a critical path.
> > > >> compiler may fuse conditions depending on values anyway.
> > > >> If it was a critical path, we could have used
> > > >> (1 << reg) & mask trick.
> > > >> I picked explicit 'return true' else 'return false' here,
> > > >> because it felt easier to read. Just a matter of taste.
> > > >
> > > > There is a size difference though: (allyesconfig)
> > > >
> > > > $ size arch/x86/net/built-in.o*
> > > >    text    data     bss     dec     hex filename
> > > >   12999    1012    4336   18347    47ab arch/x86/net/built-in.o.new
> > > >   13177    1076    4592   18845    499d arch/x86/net/built-in.o.old
> > >
> > > interesting. Compiler obviously thinks that 178 byte increase
> > > with -O2 is the right trade off. Which I agree with :)
> > >
> > > If I think dropping 'inline' and using -Os will give bigger savings...
> > 
> > This was allyesconfig which already uses -Os
> > 
> > Using -O2, there is no difference using inline
> > or not, but the size delta with the bitmask is
> > much larger
> > 
> > $ size arch/x86/net/built-in.o* (allyesconfig, but not -Os)
> >    text	   data	    bss	    dec	    hex	filename
> >   13410	    820	   3624	  17854	   45be	arch/x86/net/built-in.o.new
> >   16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.old
> >   16130	    884	   4200	  21214	   52de	arch/x86/net/built-in.o.static
> 
> That is quite a big % change in the code size.
> Why the change in data?

$ objdump -t arch/x86/net/bpf_jit_comp.o.new  > new
$ objdump -t arch/x86/net/bpf_jit_comp.o.old  > old
$ diff -urN old new
--- old	2014-11-27 10:31:36.654373756 -0800
+++ new	2014-11-27 10:31:31.254373453 -0800
@@ -1,5 +1,5 @@
 
-arch/x86/net/bpf_jit_comp.o.old:     file format elf64-x86-64
+arch/x86/net/bpf_jit_comp.o.new:     file format elf64-x86-64
 
 SYMBOL TABLE:
 0000000000000000 l    df *ABS*	0000000000000000 bpf_jit_comp.c
@@ -8,28 +8,26 @@
 0000000000000000 l    d  .bss	0000000000000000 .bss
 0000000000000000 l    d  .text.unlikely	0000000000000000 .text.unlikely
 0000000000000000 l     F .text	000000000000001f jit_fill_hole
-0000000000000098 l     O .bss	0000000000000008 __gcov0.jit_fill_hole
+0000000000000060 l     O .bss	0000000000000008 __gcov0.jit_fill_hole
 0000000000000000 l    d  .rodata.str1.1	0000000000000000 .rodata.str1.1
 0000000000000000 l    d  .rodata.str1.8	0000000000000000 .rodata.str1.8
-0000000000000020 l     F .text	00000000000030a2 do_jit
-00000000000000c0 l     O .bss	0000000000000b68 __gcov0.do_jit
+0000000000000020 l     F .text	000000000000260b do_jit
+0000000000000080 l     O .bss	0000000000000970 __gcov0.do_jit
 00000000000006e0 l     O .rodata	0000000000000034 reg2hex
 0000000000000000 l    d  .rodata	0000000000000000 .rodata
-0000000000000060 l     O .bss	0000000000000038 __gcov0.add_2mod
-0000000000000c28 l     O .bss	0000000000000008 __gcov0.bpf_jit_compile
-0000000000000c40 l     O .bss	00000000000003f0 __gcov0.bpf_int_jit_compile
+00000000000009f0 l     O .bss	0000000000000008 __gcov0.bpf_jit_compile
+0000000000000a00 l     O .bss	00000000000003f0 __gcov0.bpf_int_jit_compile
 0000000000000040 l     O .bss	0000000000000020 __gcov0.bpf_jit_dump
-0000000000001040 l     O .bss	0000000000000028 __gcov0.bpf_jit_free
+0000000000000e00 l     O .bss	0000000000000028 __gcov0.bpf_jit_free
 0000000000000010 l     O .bss	0000000000000018 __gcov0.bpf_prog_unlock_free
 0000000000000000 l    d  .text.startup	0000000000000000 .text.startup
 0000000000000000 l     F .text.startup	0000000000000012 _GLOBAL__sub_I_65535_0_bpf_jit_compile
 0000000000000000 l    d  .init_array	0000000000000000 .init_array
-0000000000000340 l     O .data	0000000000000028 __gcov_.bpf_jit_free
-0000000000000260 l     O .data	0000000000000028 __gcov_.bpf_int_jit_compile
-0000000000000220 l     O .data	0000000000000028 __gcov_.bpf_jit_compile
-00000000000001e0 l     O .data	0000000000000028 __gcov_.do_jit
-00000000000001a0 l     O .data	0000000000000028 __gcov_.jit_fill_hole
-0000000000000160 l     O .data	0000000000000028 __gcov_.add_2mod
+0000000000000300 l     O .data	0000000000000028 __gcov_.bpf_jit_free
+0000000000000220 l     O .data	0000000000000028 __gcov_.bpf_int_jit_compile
+00000000000001e0 l     O .data	0000000000000028 __gcov_.bpf_jit_compile
+00000000000001a0 l     O .data	0000000000000028 __gcov_.do_jit
+0000000000000160 l     O .data	0000000000000028 __gcov_.jit_fill_hole
 0000000000000120 l     O .data	0000000000000028 __gcov_.bpf_jit_dump
 00000000000000e0 l     O .data	0000000000000028 __gcov_.bpf_prog_unlock_free
 00000000000000a0 l     O .data	0000000000000028 __gcov_.__get_order
@@ -43,17 +41,17 @@
 0000000000000000         *UND*	0000000000000000 memset
 0000000000000000         *UND*	0000000000000000 sk_load_half
 0000000000000000         *UND*	0000000000000000 printk
+0000000000000000         *UND*	0000000000000000 sk_load_byte
+0000000000000000         *UND*	0000000000000000 sk_load_word
 0000000000000000         *UND*	0000000000000000 sk_load_half_positive_offset
 0000000000000000         *UND*	0000000000000000 sk_load_half_negative_offset
-0000000000000000         *UND*	0000000000000000 sk_load_word_positive_offset
-0000000000000000         *UND*	0000000000000000 sk_load_byte
 0000000000000000         *UND*	0000000000000000 sk_load_byte_positive_offset
 0000000000000000         *UND*	0000000000000000 sk_load_byte_negative_offset
-0000000000000000         *UND*	0000000000000000 sk_load_word
+0000000000000000         *UND*	0000000000000000 sk_load_word_positive_offset
 0000000000000000         *UND*	0000000000000000 __bpf_call_base
 0000000000000000         *UND*	0000000000000000 sk_load_word_negative_offset
-00000000000030d0 g     F .text	0000000000000013 bpf_jit_compile
-00000000000030f0 g     F .text	0000000000000352 bpf_int_jit_compile
+0000000000002630 g     F .text	0000000000000013 bpf_jit_compile
+0000000000002650 g     F .text	0000000000000352 bpf_int_jit_compile
 0000000000000000 g     O .data..read_mostly	0000000000000004 bpf_jit_enable
 0000000000000000         *UND*	0000000000000000 __kmalloc
 0000000000000000         *UND*	0000000000000000 bpf_jit_binary_alloc
@@ -62,7 +60,7 @@
 0000000000000000         *UND*	0000000000000000 set_memory_ro
 0000000000000000         *UND*	0000000000000000 kfree
 0000000000000000         *UND*	0000000000000000 bpf_jit_binary_free
-0000000000003450 g     F .text	000000000000008c bpf_jit_free
+00000000000029b0 g     F .text	000000000000008c bpf_jit_free
 0000000000000000         *UND*	0000000000000000 set_memory_rw
 0000000000000000         *UND*	0000000000000000 __bpf_prog_free
 0000000000000000         *UND*	0000000000000000 __gcov_init



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-27 12:25   ` David Laight
  2014-11-27 14:36     ` Quentin Lambert
  2014-11-27 18:35     ` Joe Perches
@ 2014-11-27 18:49     ` Joe Perches
  2014-12-04  9:26       ` Joe Perches
  2 siblings, 1 reply; 17+ messages in thread
From: Joe Perches @ 2014-11-27 18:49 UTC (permalink / raw)
  To: David Laight
  Cc: Alexei Starovoitov, Quentin Lambert, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, netdev, linux-kernel

On Thu, 2014-11-27 at 12:25 +0000, David Laight wrote:
> Why the change in data?

btw: without gcov and using -O2

$ size arch/x86/net/bpf_jit_comp.o*
   text	   data	    bss	    dec	    hex	filename
   9671	      4	      0	   9675	   25cb	arch/x86/net/bpf_jit_comp.o.new
  10679	      4	      0	  10683	   29bb	arch/x86/net/bpf_jit_comp.o.old



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-27 18:49     ` Joe Perches
@ 2014-12-04  9:26       ` Joe Perches
  2014-12-04 15:56         ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Joe Perches @ 2014-12-04  9:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Laight, Quentin Lambert, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, netdev,
	linux-kernel

On Thu, 2014-11-27 at 10:49 -0800, Joe Perches wrote:
> On Thu, 2014-11-27 at 12:25 +0000, David Laight wrote:
> > Why the change in data?
> 
> btw: without gcov and using -O2
> 
> $ size arch/x86/net/bpf_jit_comp.o*
>    text	   data	    bss	    dec	    hex	filename
>    9671	      4	      0	   9675	   25cb	arch/x86/net/bpf_jit_comp.o.new
>   10679	      4	      0	  10683	   29bb	arch/x86/net/bpf_jit_comp.o.old

Alexei?

Is this 10% reduction in size a good reason to change the code?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-12-04  9:26       ` Joe Perches
@ 2014-12-04 15:56         ` Alexei Starovoitov
  2014-12-04 18:05           ` Joe Perches
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2014-12-04 15:56 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Laight, Quentin Lambert, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, netdev,
	linux-kernel

On Thu, Dec 4, 2014 at 1:26 AM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2014-11-27 at 10:49 -0800, Joe Perches wrote:
>> On Thu, 2014-11-27 at 12:25 +0000, David Laight wrote:
>> > Why the change in data?
>>
>> btw: without gcov and using -O2
>>
>> $ size arch/x86/net/bpf_jit_comp.o*
>>    text          data     bss     dec     hex filename
>>    9671             4       0    9675    25cb arch/x86/net/bpf_jit_comp.o.new
>>   10679             4       0   10683    29bb arch/x86/net/bpf_jit_comp.o.old
>
> Alexei?
>
> Is this 10% reduction in size a good reason to change the code?

yes.
I believe you're seeing it with gcc 4.9. I wanted to double
check what 4.6 and 4.7 are doing. If they're not suddenly
increase code size then resubmit it for inclusion please.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-12-04 15:56         ` Alexei Starovoitov
@ 2014-12-04 18:05           ` Joe Perches
  0 siblings, 0 replies; 17+ messages in thread
From: Joe Perches @ 2014-12-04 18:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Laight, Quentin Lambert, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, netdev,
	linux-kernel

On Thu, 2014-12-04 at 07:56 -0800, Alexei Starovoitov wrote:
> On Thu, Dec 4, 2014 at 1:26 AM, Joe Perches <joe@perches.com> wrote:
> > On Thu, 2014-11-27 at 10:49 -0800, Joe Perches wrote:
> >> On Thu, 2014-11-27 at 12:25 +0000, David Laight wrote:
> >> > Why the change in data?
> >>
> >> btw: without gcov and using -O2
> >>
> >> $ size arch/x86/net/bpf_jit_comp.o*
> >>    text          data     bss     dec     hex filename
> >>    9671             4       0    9675    25cb arch/x86/net/bpf_jit_comp.o.new
> >>   10679             4       0   10683    29bb arch/x86/net/bpf_jit_comp.o.old
> >
> > Alexei?
> >
> > Is this 10% reduction in size a good reason to change the code?
> 
> yes.
> I believe you're seeing it with gcc 4.9. I wanted to double
> check what 4.6 and 4.7 are doing. If they're not suddenly
> increase code size then resubmit it for inclusion please.

I get these sizes for these compilers
(x86-64, -O2, without profiling)

$ size arch/x86/net/bpf_jit_comp.o*
   text	   data	    bss	    dec	    hex	filename
   9266	      4	      0	   9270	   2436	arch/x86/net/bpf_jit_comp.o.4.4.new
  10042	      4	      0	  10046	   273e	arch/x86/net/bpf_jit_comp.o.4.4.old
   9109	      4	      0	   9113	   2399	arch/x86/net/bpf_jit_comp.o.4.6.new
   9717	      4	      0	   9721	   25f9	arch/x86/net/bpf_jit_comp.o.4.6.old
   8789	      4	      0	   8793	   2259	arch/x86/net/bpf_jit_comp.o.4.7.new
  10245	      4	      0	  10249	   2809	arch/x86/net/bpf_jit_comp.o.4.7.old
   9671	      4	      0	   9675	   25cb	arch/x86/net/bpf_jit_comp.o.4.9.new
  10679	      4	      0	  10683	   29bb	arch/x86/net/bpf_jit_comp.o.4.9.old

I am a bit surprised by the size variations



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
@ 2014-12-04 22:44 Alexei Starovoitov
  0 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2014-12-04 22:44 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Laight, Quentin Lambert, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, netdev,
	linux-kernel

On Thu, Dec 4, 2014 at 10:05 AM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2014-12-04 at 07:56 -0800, Alexei Starovoitov wrote:
>> On Thu, Dec 4, 2014 at 1:26 AM, Joe Perches <joe@perches.com> wrote:
>> > On Thu, 2014-11-27 at 10:49 -0800, Joe Perches wrote:
>> >> On Thu, 2014-11-27 at 12:25 +0000, David Laight wrote:
>> >> > Why the change in data?
>> >>
>> >> btw: without gcov and using -O2
>> >>
>> >> $ size arch/x86/net/bpf_jit_comp.o*
>> >>    text          data     bss     dec     hex filename
>> >>    9671             4       0    9675    25cb arch/x86/net/bpf_jit_comp.o.new
>> >>   10679             4       0   10683    29bb arch/x86/net/bpf_jit_comp.o.old
>> >
>> > Alexei?
>> >
>> > Is this 10% reduction in size a good reason to change the code?
>>
>> yes.
>> I believe you're seeing it with gcc 4.9. I wanted to double
>> check what 4.6 and 4.7 are doing. If they're not suddenly
>> increase code size then resubmit it for inclusion please.
>
> I get these sizes for these compilers
> (x86-64, -O2, without profiling)
>
> $ size arch/x86/net/bpf_jit_comp.o*
>    text    data     bss     dec     hex filename
>    9266       4       0    9270    2436 arch/x86/net/bpf_jit_comp.o.4.4.new
>   10042       4       0   10046    273e arch/x86/net/bpf_jit_comp.o.4.4.old
>    9109       4       0    9113    2399 arch/x86/net/bpf_jit_comp.o.4.6.new
>    9717       4       0    9721    25f9 arch/x86/net/bpf_jit_comp.o.4.6.old
>    8789       4       0    8793    2259 arch/x86/net/bpf_jit_comp.o.4.7.new
>   10245       4       0   10249    2809 arch/x86/net/bpf_jit_comp.o.4.7.old
>    9671       4       0    9675    25cb arch/x86/net/bpf_jit_comp.o.4.9.new
>   10679       4       0   10683    29bb arch/x86/net/bpf_jit_comp.o.4.9.old
>
> I am a bit surprised by the size variations

yeah. the difference is surprising.
Just tried with 4.7 and my regular config and I see the same difference.
Looks like gcc wasn't able to fold conditions into cmov
and used a bunch of cmp/jmp
Since is_ereg() was inlined ~70 times on its own and as
part of other functions, the difference of 3-4 instructions
may a large difference in total size.
test_bpf also passes, so please resubmit properly.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-26 17:23 Alexei Starovoitov
@ 2014-11-26 18:02 ` Joe Perches
  0 siblings, 0 replies; 17+ messages in thread
From: Joe Perches @ 2014-11-26 18:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Quentin Lambert, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
> On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:

> > Is there any value in reordering these tests for frequency
> > or maybe using | instead of || to avoid multiple jumps?
> 
> probably not. It's not a critical path.
> compiler may fuse conditions depending on values anyway.
> If it was a critical path, we could have used
> (1 << reg) & mask trick.
> I picked explicit 'return true' else 'return false' here,
> because it felt easier to read. Just a matter of taste.

There is a size difference though: (allyesconfig)

$ size arch/x86/net/built-in.o*
   text	   data	    bss	    dec	    hex	filename
  12999	   1012	   4336	  18347	   47ab	arch/x86/net/built-in.o.new
  13177	   1076	   4592	  18845	   499d	arch/x86/net/built-in.o.old
---
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 3f62734..09e2cea 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -135,11 +135,11 @@ static const int reg2hex[] = {
  */
 static inline bool is_ereg(u32 reg)
 {
-	if (reg == BPF_REG_5 || reg == AUX_REG ||
-	    (reg >= BPF_REG_7 && reg <= BPF_REG_9))
-		return true;
-	else
-		return false;
+	return (1 << reg) & (BIT(BPF_REG_5) |
+			     BIT(AUX_REG) |
+			     BIT(BPF_REG_7) |
+			     BIT(BPF_REG_8) |
+			     BIT(BPF_REG_9));
 }
 
 /* add modifiers if 'reg' maps to x64 registers r8..r15 */



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-26 16:58 ` Joe Perches
@ 2014-11-26 17:55   ` Daniel Borkmann
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Borkmann @ 2014-11-26 17:55 UTC (permalink / raw)
  To: Joe Perches
  Cc: Alexei Starovoitov, Quentin Lambert, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, netdev, linux-kernel

On 11/26/2014 05:58 PM, Joe Perches wrote:
...
>> imo existing code is fine and I don't think the time spent
>> reviewing such changes is worth it when there is no
>> improvement in readability.

+1

> Is there any value in reordering these tests for frequency
> or maybe using | instead of || to avoid multiple jumps?

No, it's not a fast-path.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
@ 2014-11-26 17:23 Alexei Starovoitov
  2014-11-26 18:02 ` Joe Perches
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2014-11-26 17:23 UTC (permalink / raw)
  To: Joe Perches
  Cc: Quentin Lambert, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches <joe@perches.com> wrote:
> On Wed, 2014-11-26 at 08:42 -0800, Alexei Starovoitov wrote:
>> On Wed, Nov 26, 2014 at 1:18 AM, Quentin Lambert
>> <lambert.quentin@gmail.com> wrote:
>> > Remove if then else statements preceding
>> > boolean return.
> []
>> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> []
>> > @@ -135,11 +135,9 @@ static const int reg2hex[] = {
>> >   */
>> >  static inline bool is_ereg(u32 reg)
>> >  {
>> > -       if (reg == BPF_REG_5 || reg == AUX_REG ||
>> > -           (reg >= BPF_REG_7 && reg <= BPF_REG_9))
>> > -               return true;
>> > -       else
>> > -               return false;
>> > +       return (reg == BPF_REG_5 ||
>> > +               reg == AUX_REG ||
>> > +               (reg >= BPF_REG_7 && reg <= BPF_REG_9));
>>
>> please remove extra () around the whole expression, and
>> align in properly, and
>> don't move reg==AUX_REG check to a different line.
>> Subject is not warranted. I don't think it's a simplification.
>
> It's not really a simplification,
> gcc should emit the same object code.

exactly.

>> imo existing code is fine and I don't think the time spent
>> reviewing such changes is worth it when there is no
>> improvement in readability.
>
> Is there any value in reordering these tests for frequency
> or maybe using | instead of || to avoid multiple jumps?

probably not. It's not a critical path.
compiler may fuse conditions depending on values anyway.
If it was a critical path, we could have used
(1 << reg) & mask trick.
I picked explicit 'return true' else 'return false' here,
because it felt easier to read. Just a matter of taste.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
  2014-11-26 16:42 Alexei Starovoitov
@ 2014-11-26 16:58 ` Joe Perches
  2014-11-26 17:55   ` Daniel Borkmann
  0 siblings, 1 reply; 17+ messages in thread
From: Joe Perches @ 2014-11-26 16:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Quentin Lambert, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On Wed, 2014-11-26 at 08:42 -0800, Alexei Starovoitov wrote:
> On Wed, Nov 26, 2014 at 1:18 AM, Quentin Lambert
> <lambert.quentin@gmail.com> wrote:
> > Remove if then else statements preceding
> > boolean return.
[]
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
[]
> > @@ -135,11 +135,9 @@ static const int reg2hex[] = {
> >   */
> >  static inline bool is_ereg(u32 reg)
> >  {
> > -       if (reg == BPF_REG_5 || reg == AUX_REG ||
> > -           (reg >= BPF_REG_7 && reg <= BPF_REG_9))
> > -               return true;
> > -       else
> > -               return false;
> > +       return (reg == BPF_REG_5 ||
> > +               reg == AUX_REG ||
> > +               (reg >= BPF_REG_7 && reg <= BPF_REG_9));
> 
> please remove extra () around the whole expression, and
> align in properly, and
> don't move reg==AUX_REG check to a different line.
> Subject is not warranted. I don't think it's a simplification.

It's not really a simplification,
gcc should emit the same object code.

> imo existing code is fine and I don't think the time spent
> reviewing such changes is worth it when there is no
> improvement in readability.

Is there any value in reordering these tests for frequency
or maybe using | instead of || to avoid multiple jumps?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
@ 2014-11-26 16:42 Alexei Starovoitov
  2014-11-26 16:58 ` Joe Perches
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2014-11-26 16:42 UTC (permalink / raw)
  To: Quentin Lambert
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

On Wed, Nov 26, 2014 at 1:18 AM, Quentin Lambert
<lambert.quentin@gmail.com> wrote:
> Remove if then else statements preceding
> boolean return. Occurences were found using
> Coccinelle.
>
> The semantic patch used was:
>
> @@
> expression expr;
> @@
>
>
> - if ( expr )
> -       return true;
> - else
> -       return false;
> + return expr;
>
> Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com>
>
> ---
>  arch/x86/net/bpf_jit_comp.c |    8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 3f62734..1542f39 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -135,11 +135,9 @@ static const int reg2hex[] = {
>   */
>  static inline bool is_ereg(u32 reg)
>  {
> -       if (reg == BPF_REG_5 || reg == AUX_REG ||
> -           (reg >= BPF_REG_7 && reg <= BPF_REG_9))
> -               return true;
> -       else
> -               return false;
> +       return (reg == BPF_REG_5 ||
> +               reg == AUX_REG ||
> +               (reg >= BPF_REG_7 && reg <= BPF_REG_9));

please remove extra () around the whole expression, and
align in properly, and
don't move reg==AUX_REG check to a different line.
Subject is not warranted. I don't think it's a simplification.
imo existing code is fine and I don't think the time spent
reviewing such changes is worth it when there is no
improvement in readability.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
@ 2014-11-26  9:18 Quentin Lambert
  0 siblings, 0 replies; 17+ messages in thread
From: Quentin Lambert @ 2014-11-26  9:18 UTC (permalink / raw)
  To: David S. Miller
  Cc: Quentin Lambert, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, netdev, linux-kernel

Remove if then else statements preceding
boolean return. Occurences were found using
Coccinelle.

The semantic patch used was:

@@
expression expr;
@@


- if ( expr )
-	return true;
- else
-	return false;
+ return expr;

Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com>

---
 arch/x86/net/bpf_jit_comp.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 3f62734..1542f39 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -135,11 +135,9 @@ static const int reg2hex[] = {
  */
 static inline bool is_ereg(u32 reg)
 {
-	if (reg == BPF_REG_5 || reg == AUX_REG ||
-	    (reg >= BPF_REG_7 && reg <= BPF_REG_9))
-		return true;
-	else
-		return false;
+	return (reg == BPF_REG_5 ||
+		reg == AUX_REG ||
+		(reg >= BPF_REG_7 && reg <= BPF_REG_9));
 }
 
 /* add modifiers if 'reg' maps to x64 registers r8..r15 */


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-12-04 22:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-26 18:34 [PATCH] x86: bpf_jit_comp: simplify trivial boolean return Alexei Starovoitov
2014-11-26 18:41 ` Joe Perches
2014-11-26 20:00 ` Joe Perches
2014-11-27 12:25   ` David Laight
2014-11-27 14:36     ` Quentin Lambert
2014-11-27 18:35     ` Joe Perches
2014-11-27 18:49     ` Joe Perches
2014-12-04  9:26       ` Joe Perches
2014-12-04 15:56         ` Alexei Starovoitov
2014-12-04 18:05           ` Joe Perches
  -- strict thread matches above, loose matches on Subject: below --
2014-12-04 22:44 Alexei Starovoitov
2014-11-26 17:23 Alexei Starovoitov
2014-11-26 18:02 ` Joe Perches
2014-11-26 16:42 Alexei Starovoitov
2014-11-26 16:58 ` Joe Perches
2014-11-26 17:55   ` Daniel Borkmann
2014-11-26  9:18 Quentin Lambert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).