LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Nick Desaulniers <ndesaulniers@google.com>
To: Nathan Chancellor <nathan@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	clang-built-linux <clang-built-linux@googlegroups.com>
Subject: Re: Very slow clang kernel config ..
Date: Thu, 29 Apr 2021 19:21:21 -0700
Message-ID: <CAKwvOdk6cWE515-y_4Uek2caFQvThKs23kM5CVrS9eMdRuB-eQ@mail.gmail.com> (raw)
In-Reply-To: <YItU3YrFi8REwkRA@archlinux-ax161>

On Thu, Apr 29, 2021 at 5:52 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Thu, Apr 29, 2021 at 02:53:08PM -0700, Linus Torvalds wrote:
> > I haven't looked into why this is so slow with clang, but it really is
> > painfully slow:
> >
> >    time make CC=clang allmodconfig
> >    real 0m2.667s
> >
> > vs the gcc case:
> >
> >     time make CC=gcc allmodconfig
> >     real 0m0.903s
> >
> > Yeah, yeah, three seconds may sound like "not a lot of time, but
> > considering that the subsequent full build (which for me is often
> > empty) doesn't take all that much longer, that config time clang waste
> > is actually quite noticeable.
> >
> > I actually don't do allmodconfig builds with clang, but I do my
> > default kernel builds with it:
> >
> >     time make oldconfig
> >     real 0m2.748s
> >
> >     time sh -c "make -j128 > ../makes"
> >     real 0m3.546s
> >
> > so that "make oldconfig" really is almost as slow as the whole
> > "confirm build is done" thing. Its' quite noticeable in my workflow.
> >
> > The gcc config isn't super-fast either, but there's a big 3x
> > difference, so the clang case really is doing something extra wrong.
> >
> > I've not actually looked into _why_. Except I do see that "clang" gets
> > invoked with small (empty?) test files several times, probably to
> > check for command line flags being valid.
> >
> > Sending this to relevant parties in the hope that somebody goes "Yeah,
> > that's silly" and fixes it.
> >
> > This is on my F34 machine:
> >
> >      clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
> >
> > in case it matters (but I don't see why it should).
> >
> > Many many moons ago the promise for clang was faster build speeds.
> > That didn't turn out to be true, but can we please at least try to
> > make them not painfully much slower?
>
> Hi Linus,
>
> I benchmarked this with your latest tree
> (8ca5297e7e38f2dc8c753d33a5092e7be181fff0) with my distribution versions
> of clang 11.1.0 and gcc 10.2.0 and I saw the same results, benchmarking
> with hyperfine.
>
> $ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
> Benchmark #1: make allmodconfig
>   Time (mean ± σ):      1.490 s ±  0.012 s    [User: 1.153 s, System: 0.374 s]
>   Range (min … max):    1.462 s …  1.522 s    100 runs
>
> Benchmark #2: make CC=clang allmodconfig
>   Time (mean ± σ):      4.001 s ±  0.020 s    [User: 2.761 s, System: 1.274 s]
>   Range (min … max):    3.939 s …  4.038 s    100 runs
>
> Summary
>   'make allmodconfig' ran
>     2.69 ± 0.03 times faster than 'make CC=clang allmodconfig'

$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make
{comp_var}allmodconfig'
Benchmark #1: make allmodconfig
  Time (mean ± σ):      2.095 s ±  0.025 s    [User: 1.285 s, System: 0.880 s]
  Range (min … max):    2.014 s …  2.168 s    100 runs

Benchmark #2: make CC=clang allmodconfig
  Time (mean ± σ):      2.930 s ±  0.034 s    [User: 1.522 s, System: 1.477 s]
  Range (min … max):    2.849 s …  3.005 s    100 runs

Summary
  'make allmodconfig' ran
    1.40 ± 0.02 times faster than 'make CC=clang allmodconfig'

Swapping the order, I get pretty similar results to my initial run:

hyperfine -L comp_var "CC=clang ","" -r 100 -S /bin/sh -w 5 'make
{comp_var}allmodconfig'
Benchmark #1: make CC=clang allmodconfig
  Time (mean ± σ):      2.915 s ±  0.031 s    [User: 1.501 s, System: 1.482 s]
  Range (min … max):    2.825 s …  3.004 s    100 runs

Benchmark #2: make allmodconfig
  Time (mean ± σ):      2.093 s ±  0.022 s    [User: 1.284 s, System: 0.879 s]
  Range (min … max):    2.037 s …  2.136 s    100 runs

Summary
  'make allmodconfig' ran
    1.39 ± 0.02 times faster than 'make CC=clang allmodconfig'

So, yes, slower, but not quite as drastic as others have observed.

>
> It was also reproducible in a Fedora Docker image, which has newer
> versions of those tools than my distro does (GCC 11.1.0 and clang
> 12.0.0):
>
> $ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
> Benchmark #1: make allmodconfig
>   Time (mean ± σ):     989.9 ms ±   3.5 ms    [User: 747.0 ms, System: 271.1 ms]
>   Range (min … max):   983.0 ms … 998.2 ms    100 runs
>
> Benchmark #2: make CC=clang allmodconfig
>   Time (mean ± σ):      3.328 s ±  0.005 s    [User: 2.408 s, System: 0.948 s]
>   Range (min … max):    3.316 s …  3.343 s    100 runs
>
> Summary
>   'make allmodconfig' ran
>     3.36 ± 0.01 times faster than 'make CC=clang allmodconfig'
>
> Unfortunately, I doubt there is much that can be done on the kernel side
> because this is reproducible just invoking the compilers without any
> source input.
>
> Clang 11.1.0 and GCC 10.2.0:
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -x c -c -o /dev/null -
>   Time (mean ± σ):       9.6 ms ±   1.0 ms    [User: 6.5 ms, System: 3.4 ms]
>   Range (min … max):     5.8 ms …  12.7 ms    5000 runs
>
> Benchmark #2: echo | clang -x c -c -o /dev/null -
>   Time (mean ± σ):      33.0 ms ±   0.8 ms    [User: 22.4 ms, System: 10.9 ms]
>   Range (min … max):    30.3 ms …  36.0 ms    5000 runs
>
> Summary
>   'echo | gcc -x c -c -o /dev/null -' ran
>     3.45 ± 0.39 times faster than 'echo | clang -x c -c -o /dev/null -'

hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo |
{compiler} -x c -c -o /dev/null -'
Benchmark #1:  echo | gcc -x c -c -o /dev/null -
  Time (mean ± σ):      21.4 ms ±   2.4 ms    [User: 11.6 ms, System: 10.8 ms]
  Range (min … max):    12.8 ms …  27.3 ms    5000 runs

Benchmark #2:  echo | clang -x c -c -o /dev/null -
  Time (mean ± σ):      16.4 ms ±   2.3 ms    [User: 8.6 ms, System: 8.8 ms]
  Range (min … max):    10.4 ms …  25.4 ms    5000 runs

Summary
  ' echo | clang -x c -c -o /dev/null -' ran
    1.31 ± 0.24 times faster than ' echo | gcc -x c -c -o /dev/null -'

>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
>   Time (mean ± σ):      11.9 ms ±   1.1 ms    [User: 10.5 ms, System: 1.8 ms]
>   Range (min … max):     8.2 ms …  15.1 ms    5000 runs
>
>   Warning: Ignoring non-zero exit code.
>
> Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
>   Time (mean ± σ):      31.0 ms ±   0.8 ms    [User: 20.3 ms, System: 10.9 ms]
>   Range (min … max):    27.9 ms …  33.8 ms    5000 runs
>
>   Warning: Ignoring non-zero exit code.
>
> Summary
>   'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
>     2.62 ± 0.26 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'

hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo |
{compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Benchmark #1:  echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o
/dev/null -
  Time (mean ± σ):      18.5 ms ±   2.4 ms    [User: 17.0 ms, System: 2.7 ms]
  Range (min … max):    12.2 ms …  24.6 ms    5000 runs

  Warning: Ignoring non-zero exit code.

Benchmark #2:  echo | clang -Werror -Wflag-that-does-not-exit -x c -c
-o /dev/null -
  Time (mean ± σ):      15.4 ms ±   2.3 ms    [User: 8.4 ms, System: 8.1 ms]
  Range (min … max):     9.5 ms …  22.6 ms    5000 runs

  Warning: Ignoring non-zero exit code.

Summary
  ' echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
    1.20 ± 0.23 times faster than ' echo | gcc -Werror
-Wflag-that-does-not-exit -x c -c -o /dev/null -'

>
> Clang 12.0.0 and GCC 11.1.0:
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -x c -c -o /dev/null -
>   Time (mean ± σ):       8.5 ms ±   0.3 ms    [User: 5.6 ms, System: 3.3 ms]
>   Range (min … max):     7.6 ms …   9.8 ms    5000 runs
>
> Benchmark #2: echo | clang -x c -c -o /dev/null -
>   Time (mean ± σ):      27.4 ms ±   0.4 ms    [User: 19.6 ms, System: 8.1 ms]
>   Range (min … max):    26.4 ms …  29.1 ms    5000 runs
>
> Summary
>   'echo | gcc -x c -c -o /dev/null -' ran
>     3.22 ± 0.13 times faster than 'echo | clang -x c -c -o /dev/null -'
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
>   Time (mean ± σ):      12.2 ms ±   0.3 ms    [User: 11.5 ms, System: 1.0 ms]
>   Range (min … max):    11.7 ms …  13.9 ms    5000 runs
>
>   Warning: Ignoring non-zero exit code.
>
> Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
>   Time (mean ± σ):      26.3 ms ±   0.5 ms    [User: 19.1 ms, System: 7.5 ms]
>   Range (min … max):    25.2 ms …  28.1 ms    5000 runs
>
>   Warning: Ignoring non-zero exit code.
>
> Summary
>   'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
>     2.16 ± 0.06 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
>
> Seems that GCC is faster to complete when it does not have to parse
> warning flags while clang shows no major variance. Thinking more about,
> cc-option gives clang an empty file so it should not have to actually
> parse anything so I do not think '-fsyntax-only' will gain us a whole
> ton because we should not be dipping into the backend at all.
>
> Tangentially, my version of clang built with Profile Guided Optimization
> gets me closed to GCC. I am surprised to see this level of gain.
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo | {compiler} -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -x c -c -o /dev/null -
>   Time (mean ± σ):       9.6 ms ±   1.0 ms    [User: 6.4 ms, System: 3.5 ms]
>   Range (min … max):     5.6 ms …  12.9 ms    5000 runs

hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5  'echo |
{compiler} -x c -c -o /dev/null -'
Benchmark #1:  echo | gcc -x c -c -o /dev/null -
  Time (mean ± σ):      21.3 ms ±   2.4 ms    [User: 11.7 ms, System: 10.6 ms]
  Range (min … max):    12.2 ms …  27.4 ms    5000 runs

Benchmark #2:  echo | clang -x c -c -o /dev/null -
  Time (mean ± σ):      16.3 ms ±   2.3 ms    [User: 8.5 ms, System: 8.8 ms]
  Range (min … max):    10.1 ms …  25.2 ms    5000 runs

Summary
  ' echo | clang -x c -c -o /dev/null -' ran
    1.31 ± 0.24 times faster than ' echo | gcc -x c -c -o /dev/null -'

So now clang is faster?  Am I holding it wrong?

>
> Benchmark #2: echo | clang -x c -c -o /dev/null -
>   Time (mean ± σ):       8.7 ms ±   1.3 ms    [User: 4.3 ms, System: 4.9 ms]
>   Range (min … max):     4.9 ms …  12.1 ms    5000 runs
>
>   Warning: Command took less than 5 ms to complete. Results might be inaccurate.
>
> Summary
>   'echo | clang -x c -c -o /dev/null -' ran
>     1.10 ± 0.20 times faster than 'echo | gcc -x c -c -o /dev/null -'
>
> $ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
> Benchmark #1: make allmodconfig
>   Time (mean ± σ):      1.531 s ±  0.011 s    [User: 1.180 s, System: 0.388 s]
>   Range (min … max):    1.501 s …  1.561 s    100 runs
>
> Benchmark #2: make CC=clang allmodconfig
>   Time (mean ± σ):      1.828 s ±  0.015 s    [User: 1.209 s, System: 0.760 s]
>   Range (min … max):    1.802 s …  1.872 s    100 runs
>
> Summary
>   'make allmodconfig' ran
>     1.19 ± 0.01 times faster than 'make CC=clang allmodconfig'
>
> I think that we should definitely see what we can do to speed up the front end.

Numbers between machines probably aren't directly comparable, but I
would be curious if which toolchain was used to build clang makes a
difference; whether debug builds are significantly slower, and whether
distro toolchains vs local builds from the same branch (but all from
the same machine) are noticeably different.


-- 
Thanks,
~Nick Desaulniers

      reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29 21:53 Linus Torvalds
2021-04-30  0:19 ` Nick Desaulniers
2021-04-30  2:22   ` Nick Desaulniers
2021-05-01  0:19     ` Nick Desaulniers
2021-05-01  0:23       ` Nick Desaulniers
2021-05-01  0:25         ` Nick Desaulniers
2021-05-01  0:40           ` Nick Desaulniers
2021-05-01  1:22           ` Linus Torvalds
2021-05-01  1:48             ` Nick Desaulniers
2021-05-01  2:16               ` Fangrui Song
2021-05-01  3:32               ` Tom Stellard
2021-05-01 16:32                 ` Linus Torvalds
2021-05-01 19:57                   ` Serge Guelton
2021-05-01 22:39                     ` Linus Torvalds
2021-05-01 23:55                       ` Fangrui Song
2021-05-01 21:58                   ` David Laight
2021-05-02  9:31                   ` Adrian Bunk
2021-05-02 11:35                     ` David Laight
2021-05-02 16:12                     ` Linus Torvalds
2021-05-02 16:45                       ` Adrian Bunk
2021-05-02 16:49                         ` Linus Torvalds
2021-05-02 17:55                           ` Adrian Bunk
2021-05-02 17:59                             ` Linus Torvalds
2021-05-02 21:48                               ` Adrian Bunk
2021-05-04 22:02                                 ` Miguel Ojeda
2021-05-05  0:58                                   ` Theodore Ts'o
2021-05-05 17:21                                     ` Miguel Ojeda
2021-05-04 21:32                     ` Miguel Ojeda
2021-05-05 11:05                       ` David Laight
2021-05-05 13:53                         ` Miguel Ojeda
2021-05-05 14:13                           ` David Laight
2021-05-05 16:06                             ` Miguel Ojeda
2021-05-05 16:25                               ` David Laight
2021-05-05 17:55                                 ` Miguel Ojeda
2021-05-03  1:03                   ` Maciej W. Rozycki
2021-05-03 14:38                     ` Theodore Ts'o
2021-05-03 14:54                       ` Theodore Ts'o
2021-05-03 17:14                         ` Maciej W. Rozycki
2021-05-03 16:09                       ` David Laight
2021-05-04 23:04                       ` Greg Stark
2021-05-05  0:55                         ` Theodore Ts'o
2021-05-01 23:37               ` Mike Hommey
2021-05-02  5:19               ` Dan Aloni
2021-05-03 16:48                 ` Tom Stellard
2021-05-03 19:00                   ` Fangrui Song
2021-04-30  0:52 ` Nathan Chancellor
2021-04-30  2:21   ` Nick Desaulniers [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKwvOdk6cWE515-y_4Uek2caFQvThKs23kM5CVrS9eMdRuB-eQ@mail.gmail.com \
    --to=ndesaulniers@google.com \
    --cc=clang-built-linux@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=nathan@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git