linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] Handle clock_gettime(CLOCK_TAI) in VDSO
       [not found] <20180814011732.075EA6E0251@virtux32.softrans.com.au>
@ 2018-08-14 14:20 ` Andy Lutomirski
  2018-08-14 16:30   ` David Woodhouse
  0 siblings, 1 reply; 3+ messages in thread
From: Andy Lutomirski @ 2018-08-14 14:20 UTC (permalink / raw)
  To: Matt Rickard, LKML, Linus Torvalds, Dave Hansen, David Woodhouse,
	X86 ML, Kees Cook

[Added a whole bunch of ccs]

On Mon, Aug 13, 2018 at 6:17 PM, Matt Rickard <matt@softrans.com.au> wrote:
> Process clock_gettime(CLOCK_TAI) in VDSO. This makes the call about as fast as
> CLOCK_REALTIME instead of taking about four times as long.
>
> Signed-off-by: Matt Rickard <matt@softrans.com.au>
> ---
>  arch/x86/entry/vdso/vclock_gettime.c    | 30 ++++++++++++++++++++++++++++++
>  arch/x86/entry/vsyscall/vsyscall_gtod.c |  2 ++
>  arch/x86/include/asm/vgtod.h            |  1 +
>  3 files changed, 33 insertions(+)
>
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index f19856d95c60..bc8d8f086721 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c

...

>  notrace static void do_realtime_coarse(struct timespec *ts)
>  {
>         unsigned long seq;
> @@ -284,8 +305,17 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
>                 do_monotonic_coarse(ts);
>                 break;
>         default:
> +       /* Doubled switch statement to work around kernel Makefile error */
> +       /* See: https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg567499.html */

NAK.

The issue here (after reading that thread) is that, with our current
compile options, gcc generates a jump table once the switch statement
hits five entries.  And it uses retpolines for it, and somehow it
generates the relocations in such a way that the vDSO build fails.  We
need to address this so that the vDSO build is reliable, but there's
an important question here:

Should the vDSO be built with retpolines, or should it be built with
indirect branches?  Or should we go out of our way to make sure that
the vDSO contains neither retpolines nor indirect branches?

We could accomplish the latter (sort of) by manually converting the
switch into the appropriate if statements, but that's rather ugly.

(Hmm.  We should add exports to directly read each clock source.
They'll be noticeably faster, especially when
cache-and-predictor-code.)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Handle clock_gettime(CLOCK_TAI) in VDSO
  2018-08-14 14:20 ` [PATCH] Handle clock_gettime(CLOCK_TAI) in VDSO Andy Lutomirski
@ 2018-08-14 16:30   ` David Woodhouse
  2018-08-14 21:20     ` Andy Lutomirski
  0 siblings, 1 reply; 3+ messages in thread
From: David Woodhouse @ 2018-08-14 16:30 UTC (permalink / raw)
  To: Andy Lutomirski, Matt Rickard, LKML, Linus Torvalds, Dave Hansen,
	X86 ML, Kees Cook

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

On Tue, 2018-08-14 at 07:20 -0700, Andy Lutomirski wrote:
> > +       /* Doubled switch statement to work around kernel Makefile error */
> > +       /* See: https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg567499.html */
> 
> NAK.
> 
> The issue here (after reading that thread) is that, with our current
> compile options, gcc generates a jump table once the switch statement
> hits five entries.  And it uses retpolines for it, and somehow it
> generates the relocations in such a way that the vDSO build fails. 
> We
> need to address this so that the vDSO build is reliable, but there's
> an important question here:
> 
> Should the vDSO be built with retpolines, or should it be built with
> indirect branches?  Or should we go out of our way to make sure that
> the vDSO contains neither retpolines nor indirect branches?
> 
> We could accomplish the latter (sort of) by manually converting the
> switch into the appropriate if statements, but that's rather ugly.
> 
> (Hmm.  We should add exports to directly read each clock source.
> They'll be noticeably faster, especially when
> cache-and-predictor-code.)

Surely it's kind of expected that the vDSO can't find an externally
provided __x86_indirect_thunk_rax symbol, since we only provide one as
part of the kernel image.

Building the vDSO with -mindirect-branch=thunk(|-inline) should fix
that, if we want retpolines in the vDSO.

There's also -fno-jump-tables.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Handle clock_gettime(CLOCK_TAI) in VDSO
  2018-08-14 16:30   ` David Woodhouse
@ 2018-08-14 21:20     ` Andy Lutomirski
  0 siblings, 0 replies; 3+ messages in thread
From: Andy Lutomirski @ 2018-08-14 21:20 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andy Lutomirski, Matt Rickard, LKML, Linus Torvalds, Dave Hansen,
	X86 ML, Kees Cook

On Tue, Aug 14, 2018 at 9:30 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Tue, 2018-08-14 at 07:20 -0700, Andy Lutomirski wrote:
>> > +       /* Doubled switch statement to work around kernel Makefile error */
>> > +       /* See: https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg567499.html */
>>
>> NAK.
>>
>> The issue here (after reading that thread) is that, with our current
>> compile options, gcc generates a jump table once the switch statement
>> hits five entries.  And it uses retpolines for it, and somehow it
>> generates the relocations in such a way that the vDSO build fails.
>> We
>> need to address this so that the vDSO build is reliable, but there's
>> an important question here:
>>
>> Should the vDSO be built with retpolines, or should it be built with
>> indirect branches?  Or should we go out of our way to make sure that
>> the vDSO contains neither retpolines nor indirect branches?
>>
>> We could accomplish the latter (sort of) by manually converting the
>> switch into the appropriate if statements, but that's rather ugly.
>>
>> (Hmm.  We should add exports to directly read each clock source.
>> They'll be noticeably faster, especially when
>> cache-and-predictor-code.)
>
> Surely it's kind of expected that the vDSO can't find an externally
> provided __x86_indirect_thunk_rax symbol, since we only provide one as
> part of the kernel image.
>
> Building the vDSO with -mindirect-branch=thunk(|-inline) should fix
> that, if we want retpolines in the vDSO.

I think that, if we want retpolines in the kernel, we probably want
them in the vDSO as well.  Although there's an argument to be made
that IBPB gives enough protection, at least against most targets.

>
> There's also -fno-jump-tables.

I'll probably do this, conditioned on CONFIG_RETPOLINE.  Or we should
do it kernel-wide.

hjl filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952 based on
my comment in the other bug report.  gcc seems to be generating jump
tables when it shouldn't be doing so.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-08-14 21:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20180814011732.075EA6E0251@virtux32.softrans.com.au>
2018-08-14 14:20 ` [PATCH] Handle clock_gettime(CLOCK_TAI) in VDSO Andy Lutomirski
2018-08-14 16:30   ` David Woodhouse
2018-08-14 21:20     ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).