linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Laight <David.Laight@aculab.com>
Cc: Tom Tromey <tom@tromey.com>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Luc Van Oostenryck <luc.vanoostenryck@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sparse Mailing-list <linux-sparse@vger.kernel.org>
Subject: Re: [PATCH 00/11] pragma once: treewide conversion
Date: Fri, 5 Mar 2021 13:23:38 -0800	[thread overview]
Message-ID: <CAHk-=whh3fiL7FcLD_r1rfx-gP9W4HWS7vTPM9LKUH+0xzF2=A@mail.gmail.com> (raw)
In-Reply-To: <44a0cc9cb5344add8ee4d91bffbf958f@AcuMS.aculab.com>

On Fri, Mar 5, 2021 at 1:19 AM David Laight <David.Laight@aculab.com> wrote:
>
> The point is that you can skip the unwanted parts of
> #if without having to parse the file at all.
> You just need to detect the line breaks.

That's not actually true AT ALL.

You still need to at the very least parse the preprocessor tokens,
looking for things like #if, #else, and #endif. Because those can -
and do - nest inside the whole thing, so you're not even looking for
the final #endif, you have to be aware that there might be new #if
statements that means that now you now have to increment the nesting
count for the endif.

And to do even just *THAT*, you need to do all the comment parsing,
and all the string parsing, because a #endif means something entirely
different if there was a "/*"  or a string on a previous line that
hasn't been terminated yet (yes, unterminated strings are bad
practice, but ..).

And regardless of even _those_ issues, you still should do all the
other syntactic tokenization stuff (ie all the quoting, the the
character handling: 'a' is a valid C token, but if you see the string
"it's" outside of a comment, that's a syntax error even if it's inside
a disabled region. IOW, this is an incorrect file:

   #if 0
   it's a bug to do this, and the compiler should scream
   #endif

because it's simply not a valid token sequence. The fact that it's
inside a "#if 0" region doesn't change that fact at all.  So you did
need to do all the tokenization logic.

The same goes for all the wide string stuff, the tri-graph horrors, etc etc.

End result: you need to still do basically all of the basic lexing,
and while you can then usually quickly throw the result mostly away
(and you could possibly use a simplified lexer _because_ you throw it
away), you actually didn't really win much. Doing a specialized lexer
just for the disabled regions is probably simply a bad idea: the fact
that you need to still do all the #if nesting etc checking means that
you still do need to do a modicum of tokenization etc.

Really: the whole "trivial" front-end parsing phase of C - and
particularly C++ - is a huge huge deal. It's going to show in the
profiles of the compiler quite decisively, unless you have a compiler
that then spends absolutely insane time on optimization and tries to
do things that basically no sane compiler does (because developers
wouldn't put up with the time sink).

So yes, I've even used things like super-optimizers that chew on small
pieces of code for _days_ because they have insane exponential costs
etc. I've never done it seriously, because it really isn't realistic,
but it can be a fun exercise to try.

Outside of those kinds of super-optimizers, lexing and parsing is a
big big deal.

(And again - this is very much language-specific.  The C/C++ model of
header files is very very flexible, and has a lot of conveniences, but
it's also a big part of why the front end is such a big deal. Other
language models have other trade-offs).

             Linus

  reply	other threads:[~2021-03-05 21:24 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-28 16:57 [PATCH 00/11] pragma once: treewide conversion Alexey Dobriyan
2021-02-28 16:58 ` [PATCH 01/11] pragma once: delete include/linux/atm_suni.h Alexey Dobriyan
2021-02-28 19:05   ` Jakub Kicinski
2021-02-28 16:59 ` [PATCH 02/11] pragma once: convert arch/arm/tools/gen-mach-types Alexey Dobriyan
2021-03-01 10:19   ` Russell King - ARM Linux admin
2021-03-02 15:15     ` Alexey Dobriyan
2021-02-28 16:59 ` [PATCH 03/11] pragma once: convert arch/s390/tools/gen_facilities.c Alexey Dobriyan
2021-02-28 17:00 ` [PATCH 04/11] pragma once: convert drivers/gpu/drm/pl111/pl111_nomadik.h Alexey Dobriyan
2021-03-01 14:41   ` Linus Walleij
2021-02-28 17:01 ` [PATCH 05/11] pragma once: convert drivers/scsi/qla2xxx/qla_target.h Alexey Dobriyan
2021-02-28 22:07   ` Bart Van Assche
2021-02-28 17:02 ` [PATCH 06/11] pragma once: convert include/linux/cb710.h Alexey Dobriyan
2021-03-03 23:13   ` Michał Mirosław
2021-02-28 17:02 ` [PATCH 07/11] pragma once: convert kernel/time/timeconst.bc Alexey Dobriyan
2021-02-28 17:03 ` [PATCH 08/11] pragma once: convert scripts/atomic/ Alexey Dobriyan
2021-03-01  7:55   ` Peter Zijlstra
2021-02-28 17:04 ` [PATCH 09/11] pragma once: convert scripts/selinux/genheaders/genheaders.c Alexey Dobriyan
2021-02-28 18:37   ` Paul Moore
2021-02-28 18:57     ` Alexey Dobriyan
2021-02-28 17:05 ` [PATCH 10/11] pragma once: delete few backslashes Alexey Dobriyan
2021-03-01  8:54   ` Ido Schimmel
2021-03-02 19:00   ` Vineet Gupta
2021-03-04 14:22   ` Edward Cree
2021-03-23 10:09     ` Pavel Machek
2021-02-28 17:05 ` [PATCH 11/11] pragma once: conversion script (in Python 2) Alexey Dobriyan
2021-02-28 17:11 ` [PATCH 12/11] pragma once: scripted treewide conversion Alexey Dobriyan
2021-03-01 17:35   ` Darrick J. Wong
2021-02-28 17:46 ` [PATCH 00/11] pragma once: " Linus Torvalds
2021-02-28 19:34   ` Alexey Dobriyan
2021-02-28 20:00     ` Linus Torvalds
     [not found]       ` <877dmo10m3.fsf@tromey.com>
2021-03-03 20:17         ` Linus Torvalds
2021-03-04 13:55           ` David Laight
2021-03-04 20:16             ` Linus Torvalds
2021-03-05  9:19               ` David Laight
2021-03-05 21:23                 ` Linus Torvalds [this message]
2021-03-06 13:07                   ` Miguel Ojeda
2021-03-06 21:33                     ` Linus Torvalds
2021-03-23 10:03               ` Pavel Machek
2021-03-01  0:29     ` Luc Van Oostenryck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whh3fiL7FcLD_r1rfx-gP9W4HWS7vTPM9LKUH+0xzF2=A@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=David.Laight@aculab.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sparse@vger.kernel.org \
    --cc=luc.vanoostenryck@gmail.com \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).