All of lore.kernel.org
 help / color / mirror / Atom feed
* objtool - what if I want to clobber rbp?
@ 2017-11-21 21:55 Jason A. Donenfeld
  2017-11-22  3:31 ` Josh Poimboeuf
  0 siblings, 1 reply; 3+ messages in thread
From: Jason A. Donenfeld @ 2017-11-21 21:55 UTC (permalink / raw)
  To: Josh Poimboeuf, LKML; +Cc: Samuel Neves

Hi Josh,

We're working on some highly optimized assembly crypto primitive
implementations for WireGuard. The last 24 hours have been spent
trying to make objtool happy with a variety of tricks, some more
unfortunate than others. There's still one issue remaining, however,
and I just can't figure out how to make it go away:

poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx uses BP as a
scratch register
poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2 uses BP as a
scratch register
poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx512 uses BP as
a scratch register

The messages are right. We're using %rbp as a general purpose
register, writing into it all sorts of crypto gibberish certainly not
suitable for walking stack frames. It's hard to find a way of not
using it, without incurring a speed penalty. We really do just need
all the registers.

Of course the "problem" goes away with the new slick ORC unwinding,
which is great. But for frame pointer unwinding, this problem remains.

I'm wondering if you can think of any clever way of marking a function
or the like that will make this issue go away, somehow. Is there any
path forward without sacrificing %rbp and hence performance to a
useless frame pointer?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: objtool - what if I want to clobber rbp?
  2017-11-21 21:55 objtool - what if I want to clobber rbp? Jason A. Donenfeld
@ 2017-11-22  3:31 ` Josh Poimboeuf
  2017-11-22 10:23   ` Jason A. Donenfeld
  0 siblings, 1 reply; 3+ messages in thread
From: Josh Poimboeuf @ 2017-11-22  3:31 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: LKML, Samuel Neves

On Tue, Nov 21, 2017 at 10:55:23PM +0100, Jason A. Donenfeld wrote:
> Hi Josh,
> 
> We're working on some highly optimized assembly crypto primitive
> implementations for WireGuard. The last 24 hours have been spent
> trying to make objtool happy with a variety of tricks, some more
> unfortunate than others. There's still one issue remaining, however,
> and I just can't figure out how to make it go away:
> 
> poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx uses BP as a
> scratch register
> poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2 uses BP as a
> scratch register
> poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx512 uses BP as
> a scratch register
> 
> The messages are right. We're using %rbp as a general purpose
> register, writing into it all sorts of crypto gibberish certainly not
> suitable for walking stack frames. It's hard to find a way of not
> using it, without incurring a speed penalty. We really do just need
> all the registers.
> 
> Of course the "problem" goes away with the new slick ORC unwinding,
> which is great. But for frame pointer unwinding, this problem remains.
> 
> I'm wondering if you can think of any clever way of marking a function
> or the like that will make this issue go away, somehow. Is there any
> path forward without sacrificing %rbp and hence performance to a
> useless frame pointer?

Hi Jason,

Unfortunately I don't have an easy answer.

The problem is that using %rbp as a scratch register isn't compatible
with CONFIG_FRAME_POINTER.  GCC doesn't do it, even on leaf functions,
and we also enforce that rule on asm code.  If your function gets
interrupted, and the interrupt handler needs to dump the stack, the
unwinder will get confused by the %rbp value and the rest of the unwind
will fail.

So, a few ideas:

- Make your feature conflict with CONFIG_FRAME_POINTER on x86_64.  The
  ORC unwinder is now the default anyway for 4.15, and we renamed the
  configs, so most people will be actively switching to ORC.

- Add some ifdefs so your code only uses %rbp as a scratch register when
  CONFIG_FRAME_POINTER is disabled.

- If one of the registers is used less often than the others, you could
  spill it to the stack.  I know you said you need all the registers,
  but I'd be willing to review the code for ideas, if that would help.
  Sometimes it helps to get fresh eyes on the problem.  We were able to
  fix this problem with all the existing crypto code without affecting
  performance measurably.  We had to get creative with a few of those.

BTW, since CONFIG_FRAME_POINTER is no longer the default and is becoming
deprecated, there has been some talk of disabling objtool with
CONFIG_FRAME_POINTER.  That would make your life easier.  However I
think we're not quite ready for that, so it might be a few more release
cycles before that happens.

-- 
Josh

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: objtool - what if I want to clobber rbp?
  2017-11-22  3:31 ` Josh Poimboeuf
@ 2017-11-22 10:23   ` Jason A. Donenfeld
  0 siblings, 0 replies; 3+ messages in thread
From: Jason A. Donenfeld @ 2017-11-22 10:23 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: LKML, Samuel Neves

Hi Josh,

On Wed, Nov 22, 2017 at 4:31 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> - Make your feature conflict with CONFIG_FRAME_POINTER on x86_64.  The
>   ORC unwinder is now the default anyway for 4.15, and we renamed the
>   configs, so most people will be actively switching to ORC.
> BTW, since CONFIG_FRAME_POINTER is no longer the default and is becoming
> deprecated, there has been some talk of disabling objtool with
> CONFIG_FRAME_POINTER.  That would make your life easier.  However I
> think we're not quite ready for that, so it might be a few more release
> cycles before that happens.

Fortunately my code won't be ready to be merged for a few release
cycles, so thanks for letting me know about future plans. I think my
patch set and these changes will synchronize nicely.

>
> - Add some ifdefs so your code only uses %rbp as a scratch register when
>   CONFIG_FRAME_POINTER is disabled.
> - If one of the registers is used less often than the others, you could
>   spill it to the stack.  I know you said you need all the registers,
>   but I'd be willing to review the code for ideas, if that would help.
>   Sometimes it helps to get fresh eyes on the problem.  We were able to
>   fix this problem with all the existing crypto code without affecting
>   performance measurably.  We had to get creative with a few of those.

That actually could be interesting, if you're curious about looking at it.

Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-11-22 10:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-21 21:55 objtool - what if I want to clobber rbp? Jason A. Donenfeld
2017-11-22  3:31 ` Josh Poimboeuf
2017-11-22 10:23   ` Jason A. Donenfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.