All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: Andrew Morton <akpm@linux-foundation.org>,
	Alexey Dobriyan <adobriyan@gmail.com>
Cc: dsterba@suse.cz, Christoph Hellwig <hch@infradead.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] proc: use #pragma once
Date: Fri, 4 May 2018 00:14:57 +0200	[thread overview]
Message-ID: <85dc7f54-c17f-b49f-df4d-04a339b260d7@rasmusvillemoes.dk> (raw)
In-Reply-To: <20180501151319.33dcb1d48a8526ed521fae9c@linux-foundation.org>

On 2018-05-02 00:13, Andrew Morton wrote:
> On Thu, 26 Apr 2018 22:24:44 +0300 Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
>>> The LOC argument also does not sound very convincing.
>>
>> When was the last time you did -80 kLOC patch for free?
> 
> That would be the way to do it - sell the idea to Linus, send him a
> script to do it then stand back.  The piecemeal approach is ongoing
> pain.
> 

FWIW, it's not just removing some identifiers from cpp's hash tables, it
also reduces I/O: Due to our header mess, we have some cyclic includes,
e.g mm.h -> memremap.h -> mm.h. While parsing mm.h, cpp sees the #define
_LINUX_MM_H, then goes parsing memremap.h, but since it hasn't reached
the end of mm.h yet (seeing that there's nothing but comments outside
the #ifndef/#endif pair), it hasn't had a chance to set the internal
flag for mm.h, so it goes slurping in mm.h again. Obviously, the
definedness of _LINUX_MM_H at that point means it "only" has to parse
those 87K for comments and matching up #ifs, #ifdefs,#endifs etc. With
#pragma once, the flag gets set for mm.h immediately, so the #include
from memremap.h is entirely ignored. This can easily be verified with
strace. And mm.h is not the only header getting read twice.

I had some "extract the include guard" line noise lying around, so I
hacked up the below if someone wants to play some more with this. A few
not-very-careful kbuild timings didn't show anything significant, but
both the before and after times were way too noisy, and I only patched
include/linux/*.h.

Anyway, the first order of business is to figure out which ones to leave
alone. We have a bunch of #ifndef THAT_ONE #error "don't include
$this_one directly". The brute-force way is to simply record all macros
which are checked for definedness at least twice.

git grep -h -E '^\s*#\s*if(.*defined\s*\(|n?def)\s*[A-Za-z0-9_]+' | grep
-o -E '[A-Za-z_][A-Za-z_0-9]*' | sort | uniq --repeated > multest.txt

But there's also stuff like arch/x86/boot/compressed/kaslr.c that plays
games with pre-defining _EXPORT_H to avoid parsing export.h when it
inevitably gets included. Oh well, just add the list of macros that have
at least two definitions.

git grep -h -E '^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -o -E
'^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -oE '[A-Za-z0-9_]+' | sort |
uniq --repeated > muldef.txt

With those, one can just do

cat muldef.txt multest.txt | scripts/replace_ig.pl ...

This ends up detecting a lot of copy-pasting (e.g.
__LINUX_MFD_MAX8998_H), as well as lots of headers that for no obvious
reason do not have an include guard. Oh, and once.h has a redundant \.

Rasmus

wear sunglasses...

=== scripts/replace_ig.pl ===

#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp;

my %preserve;

sub strip_comments {
    my $txt = shift;

    # Line continuations are handled before comment stripping, so
    # <slash> <backslash> <newline> <star> actually starts a comment,
    # and a // comment can swallow the following line. Let's just
    # assume nobody has modified the #if control flow using such dirty
    # tricks when we do a more naive line-by-line parsing below to
    # actually remove the include guard deffery.
    $txt =~ s/\\\n//g;

    # http://stackoverflow.com/a/911583/722859
    $txt =~ s{
		 /\*         ##  Start of /* ... */ comment
		 [^*]*\*+    ##  Non-* followed by 1-or-more *'s
		 (?:
		     [^/*][^*]*\*+
		 )*          ##  0-or-more things which don't start with /
		 ##    but do end with '*'
		 /           ##  End of /* ... */ comment

	     |
		 //     ## Start of // comment
		 [^\n]* ## Anything which is not a newline
		 (?=\n) ## End of // comment; use look-ahead to avoid consuming the
newline

	     |         ##     OR  various things which aren't comments:

		 (
		     "           ##  Start of " ... " string
		     (?:
			 \\.           ##  Escaped char
		     |               ##    OR
			 [^"\\]        ##  Non "\
		     )*
		     "           ##  End of " ... " string

		 |         ##     OR

		     '           ##  Start of ' ... ' string
		     (
			 \\.           ##  Escaped char
		     |               ##    OR
			 [^'\\]        ##  Non '\
		     )*
		     '           ##  End of ' ... ' string

		 |         ##     OR

		     .           ##  Anything other char
		     [^/"'\\]*   ##  Chars which doesn't start a comment, string or escape
		 )
	 }{defined $1 ? $1 : " "}gxse;

    return $txt;
}

sub include_guard {
    my $txt = shift;
    my @lines = (split /^/, $txt);
    my $i = 0;
    my $level = 1;
    my $name;

   # The first non-empty line must be an #ifndef or an #if !defined().
    ++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i == @lines);
    goto not_found
	if (!($lines[$i] =~
m/^\s*#\s*ifndef\s+(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*$/) &&
	    !($lines[$i] =~
m/^\s*#\s*if\s+!\s*defined\s*\(\s*(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*\)\s*$/));
    $name = $+{name};

    # The next non-empty line must be a #define of that macro.
    1 while (++$i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i == @lines);
    goto not_found if !($lines[$i] =~ m/^\s*#\s*define\s+\b$name\b/);

    # Now track #ifs and #endifs. #elifs and #elses don't change the level.
    while (++$i < @lines && $level > 0) {
	if ($lines[$i] =~ m/^\s*#\s*(?:if|ifdef|ifndef)\b/) {
	    $level++;
	} elsif ($lines[$i] =~ m/^\s*#\s*endif\b/) {
	    $level--;
	}
    }
    goto not_found if ($level > 0); # issue a warning?
    # Check that the rest of the file consists of empty lines.
    ++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i < @lines);
    return $name;

 not_found:
    return undef;
}

sub do_file {
    my $fn = shift;
    my $src = read_file($fn);
    my $ig = include_guard(strip_comments($src));
    if (not defined $ig) {
	printf STDERR "%s: no include guard\n", $fn;
	return;
    }
    if (exists $preserve{$ig}) {
	printf STDERR "%s: include guard %s exempted\n", $fn, $ig;
	return;
    }

    # OK, the entire text should match this horrible regexp.
    if ($src =~ m{
  (.*?) # arbitrary stuff before #ifndef
  (^\s*\#\s*if(?:\s*!\s*defined\s*\(\s*$ig\s*\)|ndef\s*$ig) .*? \n #
  (?:^\s*\n)*
   ^\s*\#\s*define\s*$ig .*? \n) # 2/3 of include guard
  (.*(?=^\s*\#\s*endif)) # body of file
  (^\s*\#\s*endif .*? \n) # last 1/3
  (.*) # rest of file (trailing comments)
	}smx) {
	my $pre = $1;
	my $define = $2;
	my $body = $3;
	my $endif = $4;
	my $post = $5;
	$body =~ s/\n[ \t]*\n$/\n/g;
	$src = $pre . "#pragma once\n";
	$src .= $body . $post;
    } else {
	printf STDERR "%s: has include guard %s, but I failed to replace it
with #pragma once\n",
	$fn, $ig;
	return;
    }
    write_file($fn, $src);
}

while (<STDIN>) {
    chomp;
    $preserve{$_} = 1;
}

for (@ARGV) {
    do_file($_);
}

  reply	other threads:[~2018-05-03 22:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-23 21:35 [PATCH] proc: use #pragma once Alexey Dobriyan
2018-04-24 13:54 ` Christoph Hellwig
2018-04-25 20:55   ` Alexey Dobriyan
2018-04-26 10:26     ` David Sterba
2018-04-26 10:33       ` Christoph Hellwig
2018-04-26 19:25         ` Alexey Dobriyan
2018-04-26 19:24       ` Alexey Dobriyan
2018-05-01 22:13         ` Andrew Morton
2018-05-03 22:14           ` Rasmus Villemoes [this message]
2018-05-03 22:42             ` Al Viro
2018-05-03 23:23               ` Rasmus Villemoes
2018-05-04  2:58                 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85dc7f54-c17f-b49f-df4d-04a339b260d7@rasmusvillemoes.dk \
    --to=linux@rasmusvillemoes.dk \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dsterba@suse.cz \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.