bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: BPF vs objtool again
       [not found]   ` <20200429215159.eah6ksnxq6g5adpx@treble>
@ 2020-04-29 23:41     ` Alexei Starovoitov
  2020-04-30  0:13       ` Josh Poimboeuf
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2020-04-29 23:41 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, tglx, linux-kernel, mingo, hpa, ast, peterz, rdunlap,
	Arnd Bergmann, bpf, daniel

On Wed, Apr 29, 2020 at 04:51:59PM -0500, Josh Poimboeuf wrote:
> On Thu, Jul 18, 2019 at 12:14:08PM -0700, tip-bot for Josh Poimboeuf wrote:
> > Commit-ID:  3193c0836f203a91bef96d88c64cccf0be090d9c
> > Gitweb:     https://git.kernel.org/tip/3193c0836f203a91bef96d88c64cccf0be090d9c
> > Author:     Josh Poimboeuf <jpoimboe@redhat.com>
> > AuthorDate: Wed, 17 Jul 2019 20:36:45 -0500
> > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > CommitDate: Thu, 18 Jul 2019 21:01:06 +0200
> > 
> > bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
> 
> For some reason, this
> 
>   __attribute__((optimize("-fno-gcse")))
> 
> is disabling frame pointers in ___bpf_prog_run().  If you compile with
> CONFIG_FRAME_POINTER it'll show something like:
> 
>   kernel/bpf/core.o: warning: objtool: ___bpf_prog_run.cold()+0x7: call without frame pointer save/setup

you mean it started to disable frame pointers from some version of gcc?
It wasn't doing this before, since objtool wasn't complaining, right?
Sounds like gcc bug?

> Also, since GCC 9.1, the GCC docs say "The optimize attribute should be
> used for debugging purposes only. It is not suitable in production
> code."  That doesn't sound too promising.
> 
> So it seems like this commit should be reverted. But then we're back to
> objtool being broken again in the RETPOLINE=n case, which means no ORC
> coverage in this function.  (See above commit for the details)
> 
> Some ideas:
> 
> - Skip objtool checking of that func/file (at least for RETPOLINE=n) --
>   but then it won't have ORC coverage.
> 
> - Get rid of the "double goto" in ___bpf_prog_run(), which simplifies it
>   enough for objtool to understand -- but then the text explodes for
>   RETPOLINE=y.

How that will look like?
That could be the best option.

> - Add -fno-gfcse to the Makefile for kernel/bpf/core.c -- but then that
>   affects the optimization of other functions in the file.  However I
>   don't think the impact is significant.
> 
> - Move ___bpf_prog_run() to its own file with the -fno-gfcse flag.  I'm
>   thinking this could be the least bad option.  Alexei?

I think it would be easier to move some of the hot path
functions out of core.c instead.
Like *ksym*, BPF_CALL*, bpf_jit*, bpf_prog*.
I think resulting churn will be less.
imo it's more important to keep git blame history for interpreter
than for the other funcs.
Sounds like it's a fix that needs to be sent for the next RC ?
Please send a patch for bpf tree then.

Daniel, thoughts?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BPF vs objtool again
  2020-04-29 23:41     ` BPF vs objtool again Alexei Starovoitov
@ 2020-04-30  0:13       ` Josh Poimboeuf
  2020-04-30  2:10         ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Poimboeuf @ 2020-04-30  0:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: x86, tglx, linux-kernel, mingo, hpa, ast, peterz, rdunlap,
	Arnd Bergmann, bpf, daniel

On Wed, Apr 29, 2020 at 04:41:59PM -0700, Alexei Starovoitov wrote:
> On Wed, Apr 29, 2020 at 04:51:59PM -0500, Josh Poimboeuf wrote:
> > On Thu, Jul 18, 2019 at 12:14:08PM -0700, tip-bot for Josh Poimboeuf wrote:
> > > Commit-ID:  3193c0836f203a91bef96d88c64cccf0be090d9c
> > > Gitweb:     https://git.kernel.org/tip/3193c0836f203a91bef96d88c64cccf0be090d9c
> > > Author:     Josh Poimboeuf <jpoimboe@redhat.com>
> > > AuthorDate: Wed, 17 Jul 2019 20:36:45 -0500
> > > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > > CommitDate: Thu, 18 Jul 2019 21:01:06 +0200
> > > 
> > > bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
> > 
> > For some reason, this
> > 
> >   __attribute__((optimize("-fno-gcse")))
> > 
> > is disabling frame pointers in ___bpf_prog_run().  If you compile with
> > CONFIG_FRAME_POINTER it'll show something like:
> > 
> >   kernel/bpf/core.o: warning: objtool: ___bpf_prog_run.cold()+0x7: call without frame pointer save/setup
> 
> you mean it started to disable frame pointers from some version of gcc?
> It wasn't doing this before, since objtool wasn't complaining, right?
> Sounds like gcc bug?

I actually think this warning has been around for a while.  I just only
recently looked at it.  I don't think anything changed in GCC, it's just
that almost nobody uses CONFIG_FRAME_POINTER, so it wasn't really
noticed.

> > Also, since GCC 9.1, the GCC docs say "The optimize attribute should be
> > used for debugging purposes only. It is not suitable in production
> > code."  That doesn't sound too promising.
> > 
> > So it seems like this commit should be reverted. But then we're back to
> > objtool being broken again in the RETPOLINE=n case, which means no ORC
> > coverage in this function.  (See above commit for the details)
> > 
> > Some ideas:
> > 
> > - Skip objtool checking of that func/file (at least for RETPOLINE=n) --
> >   but then it won't have ORC coverage.
> > 
> > - Get rid of the "double goto" in ___bpf_prog_run(), which simplifies it
> >   enough for objtool to understand -- but then the text explodes for
> >   RETPOLINE=y.
> 
> How that will look like?
> That could be the best option.

For example:

#define GOTO    ({ goto *jumptable[insn->code]; })

and then replace all 'goto select_insn' with 'GOTO;'

The problem is that with RETPOLINE=y, the function text size grows from
5k to 7k, because for each of the 160+ retpoline JMPs, GCC (stupidly)
reloads the jump table register into a scratch register.

> > - Add -fno-gfcse to the Makefile for kernel/bpf/core.c -- but then that
> >   affects the optimization of other functions in the file.  However I
> >   don't think the impact is significant.
> > 
> > - Move ___bpf_prog_run() to its own file with the -fno-gfcse flag.  I'm
> >   thinking this could be the least bad option.  Alexei?
> 
> I think it would be easier to move some of the hot path
> functions out of core.c instead.
> Like *ksym*, BPF_CALL*, bpf_jit*, bpf_prog*.
> I think resulting churn will be less.
> imo it's more important to keep git blame history for interpreter
> than for the other funcs.
> Sounds like it's a fix that needs to be sent for the next RC ?
> Please send a patch for bpf tree then.

I can make a patch, what file would you recommend moving those hot path
functions to?

-- 
Josh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BPF vs objtool again
  2020-04-30  0:13       ` Josh Poimboeuf
@ 2020-04-30  2:10         ` Alexei Starovoitov
  2020-04-30  3:53           ` Josh Poimboeuf
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2020-04-30  2:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, tglx, linux-kernel, mingo, hpa, ast, peterz, rdunlap,
	Arnd Bergmann, bpf, daniel

On Wed, Apr 29, 2020 at 07:13:00PM -0500, Josh Poimboeuf wrote:
> On Wed, Apr 29, 2020 at 04:41:59PM -0700, Alexei Starovoitov wrote:
> > On Wed, Apr 29, 2020 at 04:51:59PM -0500, Josh Poimboeuf wrote:
> > > On Thu, Jul 18, 2019 at 12:14:08PM -0700, tip-bot for Josh Poimboeuf wrote:
> > > > Commit-ID:  3193c0836f203a91bef96d88c64cccf0be090d9c
> > > > Gitweb:     https://git.kernel.org/tip/3193c0836f203a91bef96d88c64cccf0be090d9c
> > > > Author:     Josh Poimboeuf <jpoimboe@redhat.com>
> > > > AuthorDate: Wed, 17 Jul 2019 20:36:45 -0500
> > > > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > > > CommitDate: Thu, 18 Jul 2019 21:01:06 +0200
> > > > 
> > > > bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
> > > 
> > > For some reason, this
> > > 
> > >   __attribute__((optimize("-fno-gcse")))
> > > 
> > > is disabling frame pointers in ___bpf_prog_run().  If you compile with
> > > CONFIG_FRAME_POINTER it'll show something like:
> > > 
> > >   kernel/bpf/core.o: warning: objtool: ___bpf_prog_run.cold()+0x7: call without frame pointer save/setup
> > 
> > you mean it started to disable frame pointers from some version of gcc?
> > It wasn't doing this before, since objtool wasn't complaining, right?
> > Sounds like gcc bug?
> 
> I actually think this warning has been around for a while.  I just only
> recently looked at it.  I don't think anything changed in GCC, it's just
> that almost nobody uses CONFIG_FRAME_POINTER, so it wasn't really
> noticed.
> 
> > > Also, since GCC 9.1, the GCC docs say "The optimize attribute should be
> > > used for debugging purposes only. It is not suitable in production
> > > code."  That doesn't sound too promising.
> > > 
> > > So it seems like this commit should be reverted. But then we're back to
> > > objtool being broken again in the RETPOLINE=n case, which means no ORC
> > > coverage in this function.  (See above commit for the details)
> > > 
> > > Some ideas:
> > > 
> > > - Skip objtool checking of that func/file (at least for RETPOLINE=n) --
> > >   but then it won't have ORC coverage.
> > > 
> > > - Get rid of the "double goto" in ___bpf_prog_run(), which simplifies it
> > >   enough for objtool to understand -- but then the text explodes for
> > >   RETPOLINE=y.
> > 
> > How that will look like?
> > That could be the best option.
> 
> For example:
> 
> #define GOTO    ({ goto *jumptable[insn->code]; })
> 
> and then replace all 'goto select_insn' with 'GOTO;'
> 
> The problem is that with RETPOLINE=y, the function text size grows from
> 5k to 7k, because for each of the 160+ retpoline JMPs, GCC (stupidly)
> reloads the jump table register into a scratch register.

that would be a tiny change, right?
I'd rather go with that and gate it with 'ifdef CONFIG_FRAME_POINTER'
Like:
#ifndef CONFIG_FRAME_POINTER
#define CONT     ({ insn++; goto select_insn; })
#define CONT_JMP ({ insn++; goto select_insn; })
#else
#define CONT     ({ insn++; goto *jumptable[insn->code]; })
#define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
#endif

The reason this CONT and CONT_JMP macros are there because a combination
of different gcc versions together with different cpus make branch predictor
behave 'unpredictably'.

I've played with CONT and CONT_JMP either both doing direct goto or
indirect goto and observed quite different performance characteristics
from the interpreter.
What you see right now was the best tune I could get from a set of cpus
I had to play with and compilers. If I did the same tuning today the outcome
could have been different.
So I think it's totally fine to use above code. I think some cpus may actually
see performance gains with certain versions of gcc.
The retpoline text increase is unfortunate but when retpoline is on
other security knobs should be on too. In particular CONFIG_BPF_JIT_ALWAYS_ON
should be on as well. Which will remove interpreter from .text completely.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BPF vs objtool again
  2020-04-30  2:10         ` Alexei Starovoitov
@ 2020-04-30  3:53           ` Josh Poimboeuf
  2020-04-30  4:24             ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Poimboeuf @ 2020-04-30  3:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: x86, tglx, linux-kernel, mingo, hpa, ast, peterz, rdunlap,
	Arnd Bergmann, bpf, daniel

On Wed, Apr 29, 2020 at 07:10:52PM -0700, Alexei Starovoitov wrote:
> > For example:
> > 
> > #define GOTO    ({ goto *jumptable[insn->code]; })
> > 
> > and then replace all 'goto select_insn' with 'GOTO;'
> > 
> > The problem is that with RETPOLINE=y, the function text size grows from
> > 5k to 7k, because for each of the 160+ retpoline JMPs, GCC (stupidly)
> > reloads the jump table register into a scratch register.
> 
> that would be a tiny change, right?
> I'd rather go with that and gate it with 'ifdef CONFIG_FRAME_POINTER'
> Like:
> #ifndef CONFIG_FRAME_POINTER
> #define CONT     ({ insn++; goto select_insn; })
> #define CONT_JMP ({ insn++; goto select_insn; })
> #else
> #define CONT     ({ insn++; goto *jumptable[insn->code]; })
> #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
> #endif
> 
> The reason this CONT and CONT_JMP macros are there because a combination
> of different gcc versions together with different cpus make branch predictor
> behave 'unpredictably'.
> 
> I've played with CONT and CONT_JMP either both doing direct goto or
> indirect goto and observed quite different performance characteristics
> from the interpreter.
> What you see right now was the best tune I could get from a set of cpus
> I had to play with and compilers. If I did the same tuning today the outcome
> could have been different.
> So I think it's totally fine to use above code. I think some cpus may actually
> see performance gains with certain versions of gcc.
> The retpoline text increase is unfortunate but when retpoline is on
> other security knobs should be on too. In particular CONFIG_BPF_JIT_ALWAYS_ON
> should be on as well. Which will remove interpreter from .text completely.

This would actually be contingent on RETPOLINE, not FRAME_POINTER.

(FRAME_POINTER was the other issue with the "optimize" attribute, which
we're reverting so it'll no longer be a problem.)

So if you're not concerned about the retpoline text growth, it could be
as simple as:

#define CONT     ({ insn++; goto *jumptable[insn->code]; })
#define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })


Or, if you wanted to avoid the text growth, it could be:

#ifdef CONFIG_RETPOLINE
/*
 * Avoid a 40% increase in function text size by getting GCC to generate a
 * single retpoline jump instead of 160+.
 */
#define CONT	 ({ insn++; goto select_insn; })
#define CONT_JMP ({ insn++; goto select_insn; })

select_insn:
	goto *jumptable[insn->code];

#else /* !CONFIG_RETPOLINE */
#define CONT	 ({ insn++; goto *jumptable[insn->code]; })
#define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
#endif /* CONFIG_RETPOLINE */


But since this is legacy path, I think the first one is much nicer.


Also, JMP_TAIL_CALL has a "goto select_insn", is it ok to convert that
to CONT?

-- 
Josh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BPF vs objtool again
  2020-04-30  3:53           ` Josh Poimboeuf
@ 2020-04-30  4:24             ` Alexei Starovoitov
  2020-04-30  4:43               ` Josh Poimboeuf
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2020-04-30  4:24 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, tglx, linux-kernel, mingo, hpa, ast, peterz, rdunlap,
	Arnd Bergmann, bpf, daniel

On Wed, Apr 29, 2020 at 10:53:15PM -0500, Josh Poimboeuf wrote:
> On Wed, Apr 29, 2020 at 07:10:52PM -0700, Alexei Starovoitov wrote:
> > > For example:
> > > 
> > > #define GOTO    ({ goto *jumptable[insn->code]; })
> > > 
> > > and then replace all 'goto select_insn' with 'GOTO;'
> > > 
> > > The problem is that with RETPOLINE=y, the function text size grows from
> > > 5k to 7k, because for each of the 160+ retpoline JMPs, GCC (stupidly)
> > > reloads the jump table register into a scratch register.
> > 
> > that would be a tiny change, right?
> > I'd rather go with that and gate it with 'ifdef CONFIG_FRAME_POINTER'
> > Like:
> > #ifndef CONFIG_FRAME_POINTER
> > #define CONT     ({ insn++; goto select_insn; })
> > #define CONT_JMP ({ insn++; goto select_insn; })
> > #else
> > #define CONT     ({ insn++; goto *jumptable[insn->code]; })
> > #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
> > #endif
> > 
> > The reason this CONT and CONT_JMP macros are there because a combination
> > of different gcc versions together with different cpus make branch predictor
> > behave 'unpredictably'.
> > 
> > I've played with CONT and CONT_JMP either both doing direct goto or
> > indirect goto and observed quite different performance characteristics
> > from the interpreter.
> > What you see right now was the best tune I could get from a set of cpus
> > I had to play with and compilers. If I did the same tuning today the outcome
> > could have been different.
> > So I think it's totally fine to use above code. I think some cpus may actually
> > see performance gains with certain versions of gcc.
> > The retpoline text increase is unfortunate but when retpoline is on
> > other security knobs should be on too. In particular CONFIG_BPF_JIT_ALWAYS_ON
> > should be on as well. Which will remove interpreter from .text completely.
> 
> This would actually be contingent on RETPOLINE, not FRAME_POINTER.
> 
> (FRAME_POINTER was the other issue with the "optimize" attribute, which
> we're reverting so it'll no longer be a problem.)
> 
> So if you're not concerned about the retpoline text growth, it could be
> as simple as:
> 
> #define CONT     ({ insn++; goto *jumptable[insn->code]; })
> #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
> 
> 
> Or, if you wanted to avoid the text growth, it could be:
> 
> #ifdef CONFIG_RETPOLINE

I'm a bit lost. So objtool is fine with the asm when retpoline is on?
Then pls do:
#if defined(CONFIG_RETPOLINE) || !defined(CONFIG_X86)

since there is no need to mess with other archs.

> /*
>  * Avoid a 40% increase in function text size by getting GCC to generate a
>  * single retpoline jump instead of 160+.
>  */
> #define CONT	 ({ insn++; goto select_insn; })
> #define CONT_JMP ({ insn++; goto select_insn; })
> 
> select_insn:
> 	goto *jumptable[insn->code];
> 
> #else /* !CONFIG_RETPOLINE */
> #define CONT	 ({ insn++; goto *jumptable[insn->code]; })
> #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
> #endif /* CONFIG_RETPOLINE */
> 
> 
> But since this is legacy path, I think the first one is much nicer.
> 
> 
> Also, JMP_TAIL_CALL has a "goto select_insn", is it ok to convert that
> to CONT?

yep

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BPF vs objtool again
  2020-04-30  4:24             ` Alexei Starovoitov
@ 2020-04-30  4:43               ` Josh Poimboeuf
  0 siblings, 0 replies; 6+ messages in thread
From: Josh Poimboeuf @ 2020-04-30  4:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: x86, tglx, linux-kernel, mingo, hpa, ast, peterz, rdunlap,
	Arnd Bergmann, bpf, daniel

On Wed, Apr 29, 2020 at 09:24:00PM -0700, Alexei Starovoitov wrote:
> > This would actually be contingent on RETPOLINE, not FRAME_POINTER.
> > 
> > (FRAME_POINTER was the other issue with the "optimize" attribute, which
> > we're reverting so it'll no longer be a problem.)
> > 
> > So if you're not concerned about the retpoline text growth, it could be
> > as simple as:
> > 
> > #define CONT     ({ insn++; goto *jumptable[insn->code]; })
> > #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
> > 
> > 
> > Or, if you wanted to avoid the text growth, it could be:
> > 
> > #ifdef CONFIG_RETPOLINE
> 
> I'm a bit lost. So objtool is fine with the asm when retpoline is on?

Yeah, it's confusing... this has been quite an adventure with GCC.

Objtool is fine with the RETPOLINE double goto.  It's only the
!RETPOLINE double goto which is the problem, because that triggers
more GCC weirdness (see 3193c0836f20).

> Then pls do:
> #if defined(CONFIG_RETPOLINE) || !defined(CONFIG_X86)
> 
> since there is no need to mess with other archs.

Getting rid of select_insn altogether would make the code a lot simpler,
but it's your call.  I'll make a patch soon.

-- 
Josh


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-30  4:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <30c3ca29ba037afcbd860a8672eef0021addf9fe.1563413318.git.jpoimboe@redhat.com>
     [not found] ` <tip-3193c0836f203a91bef96d88c64cccf0be090d9c@git.kernel.org>
     [not found]   ` <20200429215159.eah6ksnxq6g5adpx@treble>
2020-04-29 23:41     ` BPF vs objtool again Alexei Starovoitov
2020-04-30  0:13       ` Josh Poimboeuf
2020-04-30  2:10         ` Alexei Starovoitov
2020-04-30  3:53           ` Josh Poimboeuf
2020-04-30  4:24             ` Alexei Starovoitov
2020-04-30  4:43               ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).