linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Laight <David.Laight@aculab.com>
Cc: Tom Tromey <tom@tromey.com>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Luc Van Oostenryck <luc.vanoostenryck@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sparse Mailing-list <linux-sparse@vger.kernel.org>
Subject: Re: [PATCH 00/11] pragma once: treewide conversion
Date: Thu, 4 Mar 2021 12:16:14 -0800	[thread overview]
Message-ID: <CAHk-=wh8PptzH-=ak1D7C5Zp6EJ8eurYqVqGdQauupAFaNuG4g@mail.gmail.com> (raw)
In-Reply-To: <4835ec1d2ecc40b285596288a0df4f47@AcuMS.aculab.com>

On Thu, Mar 4, 2021 at 5:55 AM David Laight <David.Laight@aculab.com> wrote:
>
> >  (a) the traditional include guard optimization HAS NO HIDDEN SEMANTIC
> > MEANING. It's a pure optimization that doesn't actually change
> > anything else. If you don't do the optimization, absolutely nothing
> > changes.
>
> And if the parser is well written the optimisation is probably
> irrelevant compared to the compile time.

That's actually surprisingly not even remotely true.

People always think that the optimization phases of a compiler are the
expensive ones. And yes, there are certain optimizations that can be
*really* expensive, and people just don't even do them because they
are _so_ expensive and are exponential in time.

And yes, there can be some patterns that expose bad scaling in some
compiler algorithms, and histyorically that has often been seen when
people use generators to automatically generate huge functions (or
huge initializers), and then the compiler has some O(n**3) thing in it
that is entirely unnoticeable for normal code written by a human, but
means that a million-entry initializer takes five hours to compile.

But in reality, on very real normal code, the *front end* of the
compiler is often the most costly thing by far. Very much just the
lexer and the simple parser. Things that are "simple" and people think
are fast.

But they are are often the slowest part in C style languages, because
you have to lex and parse _everything_. You include maybe a dozen
header files, maybe more. Those in turn often include another dozen
support header files. You easily end up tokenizing and parsing
hundreds of header files for even the simplest programs - and 99,.9%
of that isn't ever *used*, and never actually generates any code that
needs to be optimized. Think of all the complex macros and functions
and type definitions you get when you include something basic like
<stdio.h>. Trust me, it's a _lot_ of small details that get tokenixed,
parsed and memoized.

And then all you do is a 'printf("Hello world");' and the amount of
actual code generation and optimization by the back-end is basically
zero.

And C++ is about an order of magnitude worse, because you really take
this whole approach and turn it up to 11 with templates, namespaces,
STL, and a lot of all that infrastructure that is there for all the
possible cases, but that most files that include it only use a small
tiny portion of.

So in C-style (and particularly C++) languages, reading header files,
tokenizing them and parsing them can _easily_ be the bulk of your
compile time. Absolutely every single serious C/C++ compiler
integrates tbe preprocessor _into_ the compiler, because the
traditional model of having a separate "cpp" phase actually made the
tokenization problem happen _twice_: once by the pre-processor, and
then a second time by the compiler. Integrating the two phases, so
that you can use just one tokenizer, and one single set of
identifiers, actually speeds up compile speed _enormously_.

Yes, yes, tokenization and parsing really is the simple part of a
compiler. Good register allocation is _hard_. SSA makes a lot of the
basic optimizations actually fairly straightforward and simple, but
more complex transformations (loop invariant stuff, vectorization,
various things like that) are really much much much more complex than
the simple front-end.

But that simple front end is the thing that touches absolutely
_everything_, whether it actually generates code or not.

And yes, this is mainly a C-style language problem. If you have
hardcoded modules approach and not the "include files that describe
all the different interfaces", you don't end up in that situation
where you spend a lot of time parsign the possible interfaces over and
over again.

And C++ really is hugely worse, and takes this issue to another level.
You can have simple single C++ files that end up basically having to
parse hundreds of thousands of lines of fairly complex code, because
they use several complex libraries, and that's how the library header
files are set up,m with multiple levels (ie the GUI header files
depend on, and include the lower-level object header files, which in
turn depend on "simple" core libraries like STL and Boost.

This is why pretty much every compiler out there does that include
file header guard optimization. Because avoiding having to read and
parse a file multiple times really does show up as a big big win. Even
when it's all hidden behind the #ifndef/#define/#endif guarding logic,
the cost of reading the file and lexing and parsing it enough to _see_
those guards is not cheap. Doing it once is required and important,
but then you memoize the fact that "oh, all the stuff I needed to
parse was behind this macro test, so next time I see this file, if
that macro is set, I can just ignore it".

So it is actually a very important optimization. It's just that the
optimization is better done with that explicit guard memoization than
it is with '#pragma once'

                       Linus

  reply	other threads:[~2021-03-04 20:18 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-28 16:57 [PATCH 00/11] pragma once: treewide conversion Alexey Dobriyan
2021-02-28 16:58 ` [PATCH 01/11] pragma once: delete include/linux/atm_suni.h Alexey Dobriyan
2021-02-28 19:05   ` Jakub Kicinski
2021-02-28 16:59 ` [PATCH 02/11] pragma once: convert arch/arm/tools/gen-mach-types Alexey Dobriyan
2021-03-01 10:19   ` Russell King - ARM Linux admin
2021-03-02 15:15     ` Alexey Dobriyan
2021-02-28 16:59 ` [PATCH 03/11] pragma once: convert arch/s390/tools/gen_facilities.c Alexey Dobriyan
2021-02-28 17:00 ` [PATCH 04/11] pragma once: convert drivers/gpu/drm/pl111/pl111_nomadik.h Alexey Dobriyan
2021-03-01 14:41   ` Linus Walleij
2021-02-28 17:01 ` [PATCH 05/11] pragma once: convert drivers/scsi/qla2xxx/qla_target.h Alexey Dobriyan
2021-02-28 22:07   ` Bart Van Assche
2021-02-28 17:02 ` [PATCH 06/11] pragma once: convert include/linux/cb710.h Alexey Dobriyan
2021-03-03 23:13   ` Michał Mirosław
2021-02-28 17:02 ` [PATCH 07/11] pragma once: convert kernel/time/timeconst.bc Alexey Dobriyan
2021-02-28 17:03 ` [PATCH 08/11] pragma once: convert scripts/atomic/ Alexey Dobriyan
2021-03-01  7:55   ` Peter Zijlstra
2021-02-28 17:04 ` [PATCH 09/11] pragma once: convert scripts/selinux/genheaders/genheaders.c Alexey Dobriyan
2021-02-28 18:37   ` Paul Moore
2021-02-28 18:57     ` Alexey Dobriyan
2021-02-28 17:05 ` [PATCH 10/11] pragma once: delete few backslashes Alexey Dobriyan
2021-03-01  8:54   ` Ido Schimmel
2021-03-02 19:00   ` Vineet Gupta
2021-03-04 14:22   ` Edward Cree
2021-03-23 10:09     ` Pavel Machek
2021-02-28 17:05 ` [PATCH 11/11] pragma once: conversion script (in Python 2) Alexey Dobriyan
2021-02-28 17:11 ` [PATCH 12/11] pragma once: scripted treewide conversion Alexey Dobriyan
2021-03-01 17:35   ` Darrick J. Wong
2021-02-28 17:46 ` [PATCH 00/11] pragma once: " Linus Torvalds
2021-02-28 19:34   ` Alexey Dobriyan
2021-02-28 20:00     ` Linus Torvalds
     [not found]       ` <877dmo10m3.fsf@tromey.com>
2021-03-03 20:17         ` Linus Torvalds
2021-03-04 13:55           ` David Laight
2021-03-04 20:16             ` Linus Torvalds [this message]
2021-03-05  9:19               ` David Laight
2021-03-05 21:23                 ` Linus Torvalds
2021-03-06 13:07                   ` Miguel Ojeda
2021-03-06 21:33                     ` Linus Torvalds
2021-03-23 10:03               ` Pavel Machek
2021-03-01  0:29     ` Luc Van Oostenryck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wh8PptzH-=ak1D7C5Zp6EJ8eurYqVqGdQauupAFaNuG4g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=David.Laight@aculab.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sparse@vger.kernel.org \
    --cc=luc.vanoostenryck@gmail.com \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).