All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Chancellor <nathan@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	clang-built-linux <clang-built-linux@googlegroups.com>
Subject: Re: Very slow clang kernel config ..
Date: Thu, 29 Apr 2021 17:52:45 -0700	[thread overview]
Message-ID: <YItU3YrFi8REwkRA@archlinux-ax161> (raw)
In-Reply-To: <CAHk-=wjmNOoX8iPtYsM8PVa+7DE1=5bv-XVe_egP0ZOiuT=7CQ@mail.gmail.com>

On Thu, Apr 29, 2021 at 02:53:08PM -0700, Linus Torvalds wrote:
> I haven't looked into why this is so slow with clang, but it really is
> painfully slow:
> 
>    time make CC=clang allmodconfig
>    real 0m2.667s
> 
> vs the gcc case:
> 
>     time make CC=gcc allmodconfig
>     real 0m0.903s
> 
> Yeah, yeah, three seconds may sound like "not a lot of time, but
> considering that the subsequent full build (which for me is often
> empty) doesn't take all that much longer, that config time clang waste
> is actually quite noticeable.
> 
> I actually don't do allmodconfig builds with clang, but I do my
> default kernel builds with it:
> 
>     time make oldconfig
>     real 0m2.748s
> 
>     time sh -c "make -j128 > ../makes"
>     real 0m3.546s
> 
> so that "make oldconfig" really is almost as slow as the whole
> "confirm build is done" thing. Its' quite noticeable in my workflow.
> 
> The gcc config isn't super-fast either, but there's a big 3x
> difference, so the clang case really is doing something extra wrong.
> 
> I've not actually looked into _why_. Except I do see that "clang" gets
> invoked with small (empty?) test files several times, probably to
> check for command line flags being valid.
> 
> Sending this to relevant parties in the hope that somebody goes "Yeah,
> that's silly" and fixes it.
> 
> This is on my F34 machine:
> 
>      clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
> 
> in case it matters (but I don't see why it should).
> 
> Many many moons ago the promise for clang was faster build speeds.
> That didn't turn out to be true, but can we please at least try to
> make them not painfully much slower?

Hi Linus,

I benchmarked this with your latest tree
(8ca5297e7e38f2dc8c753d33a5092e7be181fff0) with my distribution versions
of clang 11.1.0 and gcc 10.2.0 and I saw the same results, benchmarking
with hyperfine.

$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
Benchmark #1: make allmodconfig
  Time (mean ± σ):      1.490 s ±  0.012 s    [User: 1.153 s, System: 0.374 s]
  Range (min … max):    1.462 s …  1.522 s    100 runs

Benchmark #2: make CC=clang allmodconfig
  Time (mean ± σ):      4.001 s ±  0.020 s    [User: 2.761 s, System: 1.274 s]
  Range (min … max):    3.939 s …  4.038 s    100 runs

Summary
  'make allmodconfig' ran
    2.69 ± 0.03 times faster than 'make CC=clang allmodconfig'

It was also reproducible in a Fedora Docker image, which has newer
versions of those tools than my distro does (GCC 11.1.0 and clang
12.0.0):

$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
Benchmark #1: make allmodconfig
  Time (mean ± σ):     989.9 ms ±   3.5 ms    [User: 747.0 ms, System: 271.1 ms]
  Range (min … max):   983.0 ms … 998.2 ms    100 runs

Benchmark #2: make CC=clang allmodconfig
  Time (mean ± σ):      3.328 s ±  0.005 s    [User: 2.408 s, System: 0.948 s]
  Range (min … max):    3.316 s …  3.343 s    100 runs

Summary
  'make allmodconfig' ran
    3.36 ± 0.01 times faster than 'make CC=clang allmodconfig'

Unfortunately, I doubt there is much that can be done on the kernel side
because this is reproducible just invoking the compilers without any
source input.

Clang 11.1.0 and GCC 10.2.0:

$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
  Time (mean ± σ):       9.6 ms ±   1.0 ms    [User: 6.5 ms, System: 3.4 ms]
  Range (min … max):     5.8 ms …  12.7 ms    5000 runs

Benchmark #2: echo | clang -x c -c -o /dev/null -
  Time (mean ± σ):      33.0 ms ±   0.8 ms    [User: 22.4 ms, System: 10.9 ms]
  Range (min … max):    30.3 ms …  36.0 ms    5000 runs

Summary
  'echo | gcc -x c -c -o /dev/null -' ran
    3.45 ± 0.39 times faster than 'echo | clang -x c -c -o /dev/null -'

$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
  Time (mean ± σ):      11.9 ms ±   1.1 ms    [User: 10.5 ms, System: 1.8 ms]
  Range (min … max):     8.2 ms …  15.1 ms    5000 runs

  Warning: Ignoring non-zero exit code.

Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
  Time (mean ± σ):      31.0 ms ±   0.8 ms    [User: 20.3 ms, System: 10.9 ms]
  Range (min … max):    27.9 ms …  33.8 ms    5000 runs

  Warning: Ignoring non-zero exit code.

Summary
  'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
    2.62 ± 0.26 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'

Clang 12.0.0 and GCC 11.1.0:

$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
  Time (mean ± σ):       8.5 ms ±   0.3 ms    [User: 5.6 ms, System: 3.3 ms]
  Range (min … max):     7.6 ms …   9.8 ms    5000 runs

Benchmark #2: echo | clang -x c -c -o /dev/null -
  Time (mean ± σ):      27.4 ms ±   0.4 ms    [User: 19.6 ms, System: 8.1 ms]
  Range (min … max):    26.4 ms …  29.1 ms    5000 runs

Summary
  'echo | gcc -x c -c -o /dev/null -' ran
    3.22 ± 0.13 times faster than 'echo | clang -x c -c -o /dev/null -'

$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
  Time (mean ± σ):      12.2 ms ±   0.3 ms    [User: 11.5 ms, System: 1.0 ms]
  Range (min … max):    11.7 ms …  13.9 ms    5000 runs

  Warning: Ignoring non-zero exit code.

Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
  Time (mean ± σ):      26.3 ms ±   0.5 ms    [User: 19.1 ms, System: 7.5 ms]
  Range (min … max):    25.2 ms …  28.1 ms    5000 runs

  Warning: Ignoring non-zero exit code.

Summary
  'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
    2.16 ± 0.06 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'

Seems that GCC is faster to complete when it does not have to parse
warning flags while clang shows no major variance. Thinking more about,
cc-option gives clang an empty file so it should not have to actually
parse anything so I do not think '-fsyntax-only' will gain us a whole
ton because we should not be dipping into the backend at all.

Tangentially, my version of clang built with Profile Guided Optimization
gets me closed to GCC. I am surprised to see this level of gain.

$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
  Time (mean ± σ):       9.6 ms ±   1.0 ms    [User: 6.4 ms, System: 3.5 ms]
  Range (min … max):     5.6 ms …  12.9 ms    5000 runs

Benchmark #2: echo | clang -x c -c -o /dev/null -
  Time (mean ± σ):       8.7 ms ±   1.3 ms    [User: 4.3 ms, System: 4.9 ms]
  Range (min … max):     4.9 ms …  12.1 ms    5000 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

Summary
  'echo | clang -x c -c -o /dev/null -' ran
    1.10 ± 0.20 times faster than 'echo | gcc -x c -c -o /dev/null -'

$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
Benchmark #1: make allmodconfig
  Time (mean ± σ):      1.531 s ±  0.011 s    [User: 1.180 s, System: 0.388 s]
  Range (min … max):    1.501 s …  1.561 s    100 runs

Benchmark #2: make CC=clang allmodconfig
  Time (mean ± σ):      1.828 s ±  0.015 s    [User: 1.209 s, System: 0.760 s]
  Range (min … max):    1.802 s …  1.872 s    100 runs

Summary
  'make allmodconfig' ran
    1.19 ± 0.01 times faster than 'make CC=clang allmodconfig'

I think that we should definitely see what we can do to speed up the front end.

Cheers,
Nathan

  parent reply	other threads:[~2021-04-30  0:52 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29 21:53 Very slow clang kernel config Linus Torvalds
2021-04-30  0:19 ` Nick Desaulniers
2021-04-30  2:22   ` Nick Desaulniers
2021-05-01  0:19     ` Nick Desaulniers
2021-05-01  0:23       ` Nick Desaulniers
2021-05-01  0:25         ` Nick Desaulniers
2021-05-01  0:40           ` Nick Desaulniers
2021-05-01  1:22           ` Linus Torvalds
2021-05-01  1:48             ` Nick Desaulniers
2021-05-01  2:16               ` Fangrui Song
2021-05-01  3:32               ` Tom Stellard
2021-05-01 16:32                 ` Linus Torvalds
2021-05-01 19:57                   ` Serge Guelton
2021-05-01 22:39                     ` Linus Torvalds
2021-05-01 23:55                       ` Fangrui Song
2021-05-01 21:58                   ` David Laight
2021-05-02  9:31                   ` Adrian Bunk
2021-05-02 11:35                     ` David Laight
2021-05-02 16:12                     ` Linus Torvalds
2021-05-02 16:45                       ` Adrian Bunk
2021-05-02 16:49                         ` Linus Torvalds
2021-05-02 17:55                           ` Adrian Bunk
2021-05-02 17:59                             ` Linus Torvalds
2021-05-02 21:48                               ` Adrian Bunk
2021-05-04 22:02                                 ` Miguel Ojeda
2021-05-05  0:58                                   ` Theodore Ts'o
2021-05-05 17:21                                     ` Miguel Ojeda
2021-05-04 21:32                     ` Miguel Ojeda
2021-05-05 11:05                       ` David Laight
2021-05-05 13:53                         ` Miguel Ojeda
2021-05-05 14:13                           ` David Laight
2021-05-05 16:06                             ` Miguel Ojeda
2021-05-05 16:25                               ` David Laight
2021-05-05 17:55                                 ` Miguel Ojeda
2021-05-03  1:03                   ` Maciej W. Rozycki
2021-05-03 14:38                     ` Theodore Ts'o
2021-05-03 14:54                       ` Theodore Ts'o
2021-05-03 17:14                         ` Maciej W. Rozycki
2021-05-03 16:09                       ` David Laight
2021-05-04 23:04                       ` Greg Stark
2021-05-05  0:55                         ` Theodore Ts'o
2021-05-01 23:37               ` Mike Hommey
2021-05-02  5:19               ` Dan Aloni
2021-05-03 16:48                 ` Tom Stellard
2021-05-03 19:00                   ` Fangrui Song
2021-04-30  0:52 ` Nathan Chancellor [this message]
2021-04-30  2:21   ` Nick Desaulniers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YItU3YrFi8REwkRA@archlinux-ax161 \
    --to=nathan@kernel.org \
    --cc=clang-built-linux@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.