linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [SCRIPT] chomp: trim trailing whitespace
@ 2006-05-27  2:27 Jeff Garzik
  2006-05-27  4:17 ` H. Peter Anvin
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jeff Garzik @ 2006-05-27  2:27 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 323 bytes --]


Attached to this email is chomp.pl, a Perl script which removes trailing 
whitespace from several files.  I've had this for years, as trailing 
whitespace is one of my pet peeves.

Now that git-applymbox complains loudly whenever a patch adds trailing 
whitespace, I figured this script may be useful to others.

	Jeff




[-- Attachment #2: chomp.pl --]
[-- Type: application/x-perl, Size: 1043 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27  2:27 [SCRIPT] chomp: trim trailing whitespace Jeff Garzik
@ 2006-05-27  4:17 ` H. Peter Anvin
  2006-05-27 11:42   ` Jeff Garzik
  2006-05-27 10:15 ` Jan Engelhardt
  2006-05-27 15:28 ` Martin Langhoff
  2 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2006-05-27  4:17 UTC (permalink / raw)
  To: Jeff Garzik, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

Jeff Garzik wrote:
> 
> Attached to this email is chomp.pl, a Perl script which removes trailing 
> whitespace from several files.  I've had this for years, as trailing 
> whitespace is one of my pet peeves.
> 
> Now that git-applymbox complains loudly whenever a patch adds trailing 
> whitespace, I figured this script may be useful to others.
> 

This is the script I use for the same purpose.  It's a bit more 
sophisticated, in that it detects and avoids binary files, and doesn't 
throw an error if it encounters a directory (which can happen if you 
give it a wildcard.)

	-hpa

[-- Attachment #2: cleanfile --]
[-- Type: text/plain, Size: 1126 bytes --]

#!/usr/bin/perl
#
# Clean a text file of stealth whitespace
#

use bytes;

$name = 'cleanfile';

foreach $f ( @ARGV ) {
    print STDERR "$name: $f\n";

    if (! -f $f) {
	print STDERR "$f: not a file\n";
	next;
    }
    
    if (!open(FILE, '+<', $f)) {
	print STDERR "$name: Cannot open file: $f: $!\n";
	next;
    }

    binmode FILE;

    # First, verify that it is not a binary file
    $is_binary = 0;

    while (read(FILE, $data, 65536) > 0) {
	if ($data =~ /\0/) {
	    $is_binary = 1;
	    last;
	}
    }

    if ($is_binary) {
	print STDERR "$name: $f: binary file\n";
	next;
    }

    seek(FILE, 0, 0);

    @blanks = ();
    @lines  = ();

    while ( defined($line = <FILE>) ) {
	$line =~ s/[ \t\r\n]*$/\n/;

	if ( $line eq "\n" ) {
	    push(@blanks, $line);
	} else {
	    push(@lines, @blanks);
	    push(@lines, $line);
	    @blanks = ();
	}
    }

    # Any blanks at the end of the file are discarded

    seek(FILE, 0, 0);
    print FILE @lines;

    if ( !defined($where = tell(FILE)) ||
	 !truncate(FILE, $where) ) {
	die "$name: Failed to truncate modified file: $f: $!\n";
    }
    close(FILE);
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27  2:27 [SCRIPT] chomp: trim trailing whitespace Jeff Garzik
  2006-05-27  4:17 ` H. Peter Anvin
@ 2006-05-27 10:15 ` Jan Engelhardt
  2006-05-27 10:24   ` Thomas Glanzmann
  2006-05-27 11:32   ` Jeff Garzik
  2006-05-27 15:28 ` Martin Langhoff
  2 siblings, 2 replies; 14+ messages in thread
From: Jan Engelhardt @ 2006-05-27 10:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List, Linux Kernel

> Attached to this email is chomp.pl, a Perl script which removes trailing
> whitespace from several files.  I've had this for years, as trailing whitespace
> is one of my pet peeves.
>
> Now that git-applymbox complains loudly whenever a patch adds trailing
> whitespace, I figured this script may be useful to others.
>

Pretty long script. How about this two-liner? It does not show 'bytes 
chomped' but it also trims trailing whitespace.

#!/usr/bin/perl -i -p
s/[ \t\r\n]+$//



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 10:15 ` Jan Engelhardt
@ 2006-05-27 10:24   ` Thomas Glanzmann
  2006-05-27 10:36     ` Neil Brown
  2006-05-27 11:32   ` Jeff Garzik
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Glanzmann @ 2006-05-27 10:24 UTC (permalink / raw)
  To: Linux Kernel; +Cc: GIT, Jan Engelhardt

Hello,

> #!/usr/bin/perl -i -p
> s/[ \t\r\n]+$//

perl -p -i -e 's/\s+$//' file1 file2 file3 ...

        Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 10:24   ` Thomas Glanzmann
@ 2006-05-27 10:36     ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-05-27 10:36 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: Linux Kernel, GIT, Jan Engelhardt

On Saturday May 27, sithglan@stud.uni-erlangen.de wrote:
> Hello,
> 
> > #!/usr/bin/perl -i -p
> > s/[ \t\r\n]+$//
> 
> perl -p -i -e 's/\s+$//' file1 file2 file3 ...
> 

Uhm... have either of you actually tried those?  When I tried, I lose
all the '\n' characters :-(

  perl -pi -e 's/[ \t\r]+$//'  *.[ch]

seems to actually work.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 10:15 ` Jan Engelhardt
  2006-05-27 10:24   ` Thomas Glanzmann
@ 2006-05-27 11:32   ` Jeff Garzik
  2006-05-27 11:48     ` Dmitry Fedorov
  2006-05-27 12:42     ` Jan Engelhardt
  1 sibling, 2 replies; 14+ messages in thread
From: Jeff Garzik @ 2006-05-27 11:32 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Git Mailing List, Linux Kernel

Jan Engelhardt wrote:
>> Attached to this email is chomp.pl, a Perl script which removes trailing
>> whitespace from several files.  I've had this for years, as trailing whitespace
>> is one of my pet peeves.
>>
>> Now that git-applymbox complains loudly whenever a patch adds trailing
>> whitespace, I figured this script may be useful to others.
>>
> 
> Pretty long script. How about this two-liner? It does not show 'bytes 
> chomped' but it also trims trailing whitespace.
> 
> #!/usr/bin/perl -i -p
> s/[ \t\r\n]+$//

Yes, it does, but a bit too aggressive for what we need :)

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27  4:17 ` H. Peter Anvin
@ 2006-05-27 11:42   ` Jeff Garzik
  2006-05-28  9:24     ` H. Peter Anvin
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2006-05-27 11:42 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

H. Peter Anvin wrote:
> Jeff Garzik wrote:
>>
>> Attached to this email is chomp.pl, a Perl script which removes 
>> trailing whitespace from several files.  I've had this for years, as 
>> trailing whitespace is one of my pet peeves.
>>
>> Now that git-applymbox complains loudly whenever a patch adds trailing 
>> whitespace, I figured this script may be useful to others.
>>
> 
> This is the script I use for the same purpose.  It's a bit more 
> sophisticated, in that it detects and avoids binary files, and doesn't 
> throw an error if it encounters a directory (which can happen if you 
> give it a wildcard.)

Chewing the EOF blanks is nice.  The only nit I have is that your script 
rewrites the file even if nothing was changed.

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 11:32   ` Jeff Garzik
@ 2006-05-27 11:48     ` Dmitry Fedorov
  2006-05-27 12:42     ` Jan Engelhardt
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Fedorov @ 2006-05-27 11:48 UTC (permalink / raw)
  To: Git Mailing List, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 226 bytes --]

Jan Engelhardt wrote:
>> Attached to this email is chomp.pl, a Perl script which removes trailing
>> whitespace from several files.  I've had this for years, as
trailing whitespace
>> is one of my pet peeves.

And my scripts.

[-- Attachment #2: find-text-files --]
[-- Type: application/octet-stream, Size: 7833 bytes --]

#!/usr/bin/perl -w

=head1 NAME

find-text-files - traverse a file tree and guess plain text files

=head1 SYNOPSIS

find-text-files [options] dir ...

=head1 DESCRIPTION

This program traverse a file tree, guess plain text files
and outputs their names to STDOUT.

=cut

require 5.004;
use strict;
use integer;
use File::Find;
use Getopt::Long;
use IPC::Open2;


sub usage {
	warn "\n".join(" ", @_)."\n" if @_;
	warn <<EOF;

Usage:
 find-text-files [-exclude='perlre' ...] [-include='perlre' ...] \
                 [-total] [-excluded] [-included] [-selectors]   \
                      dir ...

EOF
	exit(1);
}


=head1 PARAMETERS

=over 4

=item dir ...

Directories list.

=back

=cut

=head1 OPTIONS

=over 4

=item -exclude='perlre' ...

Perl regular expression, case insensitive.
Matched file names excluded from output list.

=item -include='perlre' ...

Perl regular expression, case insensitive.
Matched file names included to output list.

=head2 Note

Directory part of the file name stripped before match,
'^filename\.ext$' will be matched exactly to filename.ext
with any directory prepended.

=item -total

print statistic counters to STDERR.

=item -excluded

print to STDERR what files are excluded and why.

=item -included

print to STDERR what files are included and why.

=item -selectors

Prints exclude/include regular expressions and file suffices and exits.

=back


=head1 HOW IT WORKS

Each of file names checked in that order:

* check against exclude RE; matched file excluded (see -exclude option);

  if not matched, then:

* check against include RE; matched file included (see -include option);

  if not matched, then:

* check against binary suffices table; matched file excluded;

  if not matched, then:

* check against text suffices table; matched file included;

  if not matched, then:

* checked by file(1)


All of this allows to avoid file(1)'s misdetection on some texts
and reduce time spent for file(1) calls.


=head1 NOTES

Does not follows symlinks.

Zero size files are skipped.

=cut


my $     help_option = 0;
my @  include_options;
my @  exclude_options;
my $    total_option = 0;
my $ excluded_option = 0;
my $ included_option = 0;
my $selectors_option = 0;

GetOptions(
	'help'		=> \$     help_option,
	'exclude=s'	=> \@  exclude_options,
	'include=s'	=> \@  include_options,
	'total'		=> \$    total_option,
	'excluded'	=> \$ excluded_option,
	'included'	=> \$ included_option,
	'selectors'	=> \$selectors_option,
) or usage;

usage if $help_option;


my %bin_suffices;
my %txt_suffices;

BEGIN
{
    map { $bin_suffices{$_} = undef }
    (
     'gif', 'tif', 'tiff', 'png', 'jpg', 'jpeg',
     'avi', 'mpg', 'mpeg',
     'o', 'obj', 'exe',
     'cab', 'a', 'rar', 'arj', 'zip', 'tar', 'cpio',
     'z', 'gz', 'bz', 'bz2', 'tgz', 'tbz', 'tbz2',
     'iso', 'bin', 'img', 'imag', 'image',
     'diff', 'patch' # diff/patch files could have EOL spaces!
    );

    map { $txt_suffices{$_} = undef }
    (
     'txt', 'text', 'html', 'htm', 'xml', 'php',
     'c', 'cpp', 'c++', 'cc', 'cxx',
     'h', 'hpp', 'h++', 'hh', 'hxx',
     'asm', 'inc', 'mod',
     'for', 'f77', 'g77',
     'java', 'jav',
     'bas', 'vb',
     'pl', 'pm', 'pod',
     'make', 'mak', 'mk',
     'awk', 'sh', 'bat', 'cmd', 'rexx', 'rex',
     'sql', 'def', 'man',
     'cvsignore'
    );
}


my $exclude_re = '(,v$)';
map { $exclude_re .= '|('.lc $_.')'; } @exclude_options;

my $include_re = '(^makefile$)';
map { $include_re .= '|('.lc $_.')'; } @include_options;


if ($selectors_option)
{
    my $bin_suffices = join(" ", sort keys %bin_suffices);
    my $txt_suffices = join(" ", sort keys %txt_suffices);
    print STDERR "\n";
    print STDERR "Exclude RE: ".$exclude_re."\n";
    print STDERR "\n";
    print STDERR "Include RE: ".$include_re."\n";
    print STDERR "\n";
    print STDERR "Exclude suffices: ".$bin_suffices."\n";
    print STDERR "\n";
    print STDERR "Include suffices: ".$txt_suffices."\n";
    print STDERR "\n";
    exit 0;
}


scalar(@ARGV) >= 1 or usage("no directory specified");


my (
    $total_files_checked,
    $total_files_empty,
    $total_files_excluded_by_re,
    $total_files_included_by_re,
    $total_files_excluded_by_suffix,
    $total_files_included_by_suffix,
    $total_files_excluded_by_file,
    $total_files_included_by_file
   ) = (0,0,0,0,0,0,0,0);


sub _by($$$$)
{
    my ($inex_option, $inex_str, $by, $name) = @_;
    printf(STDERR "%scluded by %13s: %s\n", $inex_str, $by, $name)
	if $inex_option;
}

sub inby($$) { _by($included_option, 'in', $_[0], $_[1]); }
sub exby($$) { _by($excluded_option, 'ex', $_[0], $_[1]); }


local *FILE_RH;
local *FILE_WH;
my $file_pid;

$SIG{PIPE} = sub
{
    close    FILE_WH;
    waitpid $file_pid, 0;
    die "file(1) pipe broken"
};

$file_pid = open2(\*FILE_RH, \*FILE_WH, "file -n -f -" )
    or die "can't fork: $!";


#+ main work
$| = 1; # STDOUT autoflush
find(\&onfile, @ARGV);
#- main work


close    FILE_WH;
waitpid $file_pid, 0;


format STDERR =

Total files: checked   empty
             -------  -------
             @>>>>>>  @>>>>>>
$total_files_checked, $total_files_empty

              suffix     re    file(1)
             -------  -------  -------
excluded by: @>>>>>>  @>>>>>>  @>>>>>>
$total_files_excluded_by_suffix, $total_files_excluded_by_re, $total_files_excluded_by_file
included by: @>>>>>>  @>>>>>>  @>>>>>>
$total_files_included_by_suffix, $total_files_included_by_re, $total_files_included_by_file

.

write STDERR if $total_option;

exit 0;


sub onfile()
{
    my $shortname = $_;
    my $ fullname = "$File::Find::name";

    return unless -f $shortname;
    $total_files_checked++;

    if ( ! -s $shortname )
    {
	$total_files_empty++;
	return;
    }

    my $lcshortname = lc $shortname;


    if ( $lcshortname =~ m/$exclude_re/o )
    {
	exby('RE', $fullname);
	$total_files_excluded_by_re++;
	return;
    }

    if ( $lcshortname =~ m/$include_re/o )
    {
	inby('RE', $fullname);
	$total_files_included_by_re++;
    }
    else # check by suffix
    {
	my $suffix = $1 if $lcshortname =~ m/\.([^\.]+)$/;

	if ( defined $suffix and length $suffix and
	     exists $bin_suffices{$suffix} )
	{
	    exby('binary suffix', $fullname);
	    $total_files_excluded_by_suffix++;
	    return;
	}

	if ( defined $suffix and length $suffix and
	     exists $txt_suffices{$suffix} )
	{
	    inby('text suffix', $fullname);
	    $total_files_included_by_suffix++;
	}
	else	# check by file(1)
	{
	    print FILE_WH $fullname."\n"
		or die "bad write to file(1) pipe: $! $?";

	    my $fread = <FILE_RH>;
	    defined $fread or die "bad read from file(1) pipe: $! $?";

	    chomp $fread;

	    unless ( $fread =~ m|^(.+):\s+(.+)$| )
	    {
		die "file(1) output does not match pattern:\n$fread\n";
	    }

	    my ($fname,$fdesc) = ($1,$2);
	    die "can't parse file(1) output:\n$fread\n"
		if (! defined $fname) or (! defined $fdesc);

	    die "file name after file(1) does not match the original one:\n".
		"\tbefore: $fullname\n".
		"\tafter : $fname\n"
		if $fname ne $fullname;

	    if ( $fdesc =~ m/^.* (text)|(source).*$/ )
	    {
		inby('file(1)', $fullname);
		$total_files_included_by_file++;
	    }
	    else
	    {
		exby('file(1)', $fullname);
		$total_files_excluded_by_file++;
		return;
	    }
	}
    }

    print $fullname . "\n";
}


=head1 AUTHOR

Dmitry Fedorov <dm.fedorov@gmail.com>

=head1 COPYRIGHT

Copyright (C) 2003 Dmitry Fedorov <dm.fedorov@gmail.com>

=head1 LICENSE

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

=head1 DISCLAIMER

The author disclaims any responsibility for any mangling of your system
etc, that this script may cause.

=cut


[-- Attachment #3: truncate-eol-whitespace --]
[-- Type: application/octet-stream, Size: 4916 bytes --]

#!/usr/bin/perl -w

=head1 NAME

truncate-eol-whitespace - truncate white spaces at end of line.

=head1 SYNOPSIS

  truncate-eol-whitespace [-total] [-truncated] [-nontruncated] [-dry-run] \
	[file ...] [-f files-from]

=head1 DESCRIPTION

This program truncates extra white spaces just before end of line
in specified files. File names can be specified as parameters
and/or readed from specified file, '-' for STDIN.

=head1 EXAMPLE

Truncate all text files under DIR:

 find-text-files DIR -total | truncate-eol-whitespace -total -f -

=cut

require 5.004;
use strict;
use integer;
use Getopt::Long;


sub usage {
	warn "\n".join(" ", @_)."\n" if @_;
	warn <<EOF;

Usage:
  truncate-eol-whitespace [-total] [-truncated] [-nontruncated] [-dry-run] \
	[file ...] [-f files-from]

Warning: this script truncates files! Use -dry-run for test first.

EOF
	exit(1);
}


=head1 OPTIONS

=over 4

=item -total

print statistic counters to STDERR.

=item -truncated

print to STDERR what files was truncated;

=item -nontruncated

print to STDERR what files was not truncated;

=item -dry-run

Do not write files, report only

=item file ...

Files to truncate (optional)

=item -f files-from

File name with file names to truncate, one name per line.
Use '-' for STDIN.

=back

=cut


my $        help_option = 0;
my $     dry_run_option = 0;
my $  files_from_option;
my $       total_option = 0;
my $   truncated_option = 0;
my $nontruncated_option = 0;

GetOptions(
	'help'		=> \$        help_option,
	'total'		=> \$       total_option,
	'truncated'	=> \$   truncated_option,
	'nontruncated'	=> \$nontruncated_option,
	'dry-run'	=> \$     dry_run_option,
	'f=s'		=> \$  files_from_option,
) or usage;

usage if $help_option;


usage("no files specified")
	if (! defined $files_from_option) and scalar(@ARGV) < 1;


my (
    $total_files_checked,
    $total_files_empty,
    $total_files_truncated,
    $total_files_no_chars_truncated
   ) = (0,0,0,0,0,0,0,0,0,0);

my ( $total_chars_readed, $total_chars_truncated ) = (0,0);


sub truncate_file($)
{
    my $fname = shift;

    $total_files_checked++;

    if ( ! -f $fname )
    {
	print STDERR "is not a plain file: ".$fname."\n";
	return;
    }

    if ( ! -s $fname )
    {
	print STDERR "zero size file: ".$fname."\n";
	$total_files_empty++;
	return;
    }


    local $/ = undef;	# no records, slurp mode

    local *IN;
    open   IN, "< $fname"
	or die "Can't open $fname: $!";

    my $file = <IN>;
    defined $file or die "Can't read $fname: $!";

    close IN;


    my $length_before = length $file;
    $total_chars_readed += $length_before;

    $file =~ s/[\000-\011\013-\040]+\n/\n/mg;

    my $length_after  = length $file;


    my $chars_truncated = $length_before - $length_after;

    die "size become greater after truncating: ".$fname
	if $chars_truncated < 0;


    if ( $chars_truncated > 0 )
    {
	$total_files_truncated++;
	$total_chars_truncated += $chars_truncated;
    }
    else
    {
	$total_files_no_chars_truncated++;
    }

    if    ( $chars_truncated >0 and $truncated_option )
    {
	printf(STDOUT  "%6u of %6u chars truncated from $fname\n",
		       $chars_truncated, $length_before);
    }
    elsif ( $chars_truncated==0 and $nontruncated_option )
    {
	printf(STDOUT       "%-16s chars truncated from $fname\n", 'no');
    }

    if ( ! $dry_run_option and $chars_truncated > 0 )
    {
	local *OUT;
	open   OUT, "> $fname" or die "Can't open $fname: $!";
	print  OUT $file or die "Can't write $fname: $!";
	close  OUT  or die "Error on closing $fname: $!";
    }
}



#+ main work

# do process file names from the @ARGV first
truncate_file($_) while defined ($_ = shift);

if (defined $files_from_option) # do process file names from file|STDIN
{
    local *IN;
    open  (IN, $files_from_option) or die "Can't open $files_from_option: $!";

    while ( my $fname = <IN> )
    {
	chomp $fname;
	next if length($fname) < 1; # skip empty lines

	truncate_file($fname);
    }
}

#- main work


format STDERR =

Total files: checked   empty   truncated  non-truncated
             -------  -------  -------   -------
             @>>>>>>  @>>>>>>  @>>>>>>   @>>>>>>
$total_files_checked, $total_files_empty, $total_files_truncated, $total_files_no_chars_truncated

Total chars truncated: @>>>>>> of @<<<<<<<<<<<<<<<<<
$total_chars_truncated, $total_chars_readed

.

write STDERR if $total_option;

exit 0;


=head1 AUTHOR

Dmitry Fedorov <dm.fedorov@gmail.com>

=head1 COPYRIGHT

Copyright (C) 2003 Dmitry Fedorov <dm.fedorov@gmail.com>

=head1 LICENSE

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

=head1 DISCLAIMER

The author disclaims any responsibility for any mangling of your system
etc, that this script may cause.

=cut


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 11:32   ` Jeff Garzik
  2006-05-27 11:48     ` Dmitry Fedorov
@ 2006-05-27 12:42     ` Jan Engelhardt
  2006-05-28  8:33       ` Keith Owens
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2006-05-27 12:42 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List, Linux Kernel

>> Pretty long script. How about this two-liner? It does not show 'bytes
>> chomped' but it also trims trailing whitespace.
>> 
>> #!/usr/bin/perl -i -p
>> s/[ \t\r\n]+$//
>
> Yes, it does, but a bit too aggressive for what we need :)
>
Whoops, should have been s/[ \t\r]+$//
And the CL form is
  perl -i -pe '...'

Somehow, you can't group it to -ipe, but who cares.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27  2:27 [SCRIPT] chomp: trim trailing whitespace Jeff Garzik
  2006-05-27  4:17 ` H. Peter Anvin
  2006-05-27 10:15 ` Jan Engelhardt
@ 2006-05-27 15:28 ` Martin Langhoff
  2006-05-27 16:13   ` Linus Torvalds
  2 siblings, 1 reply; 14+ messages in thread
From: Martin Langhoff @ 2006-05-27 15:28 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List, Linux Kernel

I love perl golf for this kind of stuff... but git-stripspace is part
of git already. Even then, I tend to do it with perl -pi -e ''
constructs ;-)

cheers,


m

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 15:28 ` Martin Langhoff
@ 2006-05-27 16:13   ` Linus Torvalds
  2006-05-28 10:00     ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2006-05-27 16:13 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Jeff Garzik, Git Mailing List, Linux Kernel



On Sun, 28 May 2006, Martin Langhoff wrote:
>
> I love perl golf for this kind of stuff... but git-stripspace is part
> of git already. Even then, I tend to do it with perl -pi -e ''
> constructs ;-)

Well, git-stripspace actually does something slightly differently, in that 
it also removes extraneous all-whitespace lines from the beginning, the 
end, and the middle (in the middle, the rule is: two or more empty lines 
are collapsed into one).

Ie it's a total hack for parsing just commit messages (and it is in C, 
because I can personally write 25 lines of C in about a millionth of the 
time I can write 3 lines of perl).

		Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 12:42     ` Jan Engelhardt
@ 2006-05-28  8:33       ` Keith Owens
  0 siblings, 0 replies; 14+ messages in thread
From: Keith Owens @ 2006-05-28  8:33 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Jeff Garzik, Git Mailing List, Linux Kernel

Jan Engelhardt (on Sat, 27 May 2006 14:42:02 +0200 (MEST)) wrote:
>And the CL form is
>  perl -i -pe '...'
>Somehow, you can't group it to -ipe, but who cares.

-i takes an optional extension which is used to optionally create
backup files.  As such, -i must be followed by space if you want no
extension (and no backup).


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 11:42   ` Jeff Garzik
@ 2006-05-28  9:24     ` H. Peter Anvin
  0 siblings, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2006-05-28  9:24 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]

Jeff Garzik wrote:
> H. Peter Anvin wrote:
>> Jeff Garzik wrote:
>>>
>>> Attached to this email is chomp.pl, a Perl script which removes 
>>> trailing whitespace from several files.  I've had this for years, as 
>>> trailing whitespace is one of my pet peeves.
>>>
>>> Now that git-applymbox complains loudly whenever a patch adds 
>>> trailing whitespace, I figured this script may be useful to others.
>>>
>>
>> This is the script I use for the same purpose.  It's a bit more 
>> sophisticated, in that it detects and avoids binary files, and doesn't 
>> throw an error if it encounters a directory (which can happen if you 
>> give it a wildcard.)
> 
> Chewing the EOF blanks is nice.  The only nit I have is that your script 
> rewrites the file even if nothing was changed.
> 

Ah, good point.  Attached version fixes that.  It still doesn't break 
hard links, which may be a desirable feature.

	-hpa

[-- Attachment #2: cleanfile --]
[-- Type: text/plain, Size: 1418 bytes --]

#!/usr/bin/perl
#
# Clean a text file of stealth whitespace
#

use bytes;

$name = 'cleanfile';

foreach $f ( @ARGV ) {
    print STDERR "$name: $f\n";

    if (! -f $f) {
	print STDERR "$f: not a file\n";
	next;
    }
    
    if (!open(FILE, '+<', $f)) {
	print STDERR "$name: Cannot open file: $f: $!\n";
	next;
    }

    binmode FILE;

    # First, verify that it is not a binary file
    $is_binary = 0;

    while (read(FILE, $data, 65536) > 0) {
	if ($data =~ /\0/) {
	    $is_binary = 1;
	    last;
	}
    }

    if ($is_binary) {
	print STDERR "$name: $f: binary file\n";
	next;
    }

    seek(FILE, 0, 0);

    $in_bytes = 0;
    $out_bytes = 0;
    $blank_bytes = 0;

    @blanks = ();
    @lines  = ();

    while ( defined($line = <FILE>) ) {
	$in_bytes += length($line);
	$line =~ s/[ \t\r\n]*$/\n/;

	if ( $line eq "\n" ) {
	    push(@blanks, $line);
	    $blank_bytes += length($line);
	} else {
	    push(@lines, @blanks);
	    $out_bytes += $blank_bytes;
	    push(@lines, $line);
	    $out_bytes += length($line);
	    @blanks = ();
	    $blank_bytes = 0;
	}
    }

    # Any blanks at the end of the file are discarded

    if ($in_bytes != $out_bytes) {
	# Only write to the file if changed
	seek(FILE, 0, 0);
	print FILE @lines;

	if ( !defined($where = tell(FILE)) ||
	     !truncate(FILE, $where) ) {
	    die "$name: Failed to truncate modified file: $f: $!\n";
	}
    }

    close(FILE);
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SCRIPT] chomp: trim trailing whitespace
  2006-05-27 16:13   ` Linus Torvalds
@ 2006-05-28 10:00     ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2006-05-28 10:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Jeff Garzik, Git Mailing List, Linux Kernel

Hi,

On Sat, 27 May 2006, Linus Torvalds wrote:

> Well, git-stripspace actually does something slightly differently, in that 
> it also removes extraneous all-whitespace lines from the beginning, the 
> end, and the middle (in the middle, the rule is: two or more empty lines 
> are collapsed into one).
> 
> Ie it's a total hack for parsing just commit messages (and it is in C, 
> because I can personally write 25 lines of C in about a millionth of the 
> time I can write 3 lines of perl).

But there is no good reason not to add some code and a command line 
switch, so that this tool with a very generic name actually performs what 
a normal person would expect from that name.

Ciao,
Dscho


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-05-28 10:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-27  2:27 [SCRIPT] chomp: trim trailing whitespace Jeff Garzik
2006-05-27  4:17 ` H. Peter Anvin
2006-05-27 11:42   ` Jeff Garzik
2006-05-28  9:24     ` H. Peter Anvin
2006-05-27 10:15 ` Jan Engelhardt
2006-05-27 10:24   ` Thomas Glanzmann
2006-05-27 10:36     ` Neil Brown
2006-05-27 11:32   ` Jeff Garzik
2006-05-27 11:48     ` Dmitry Fedorov
2006-05-27 12:42     ` Jan Engelhardt
2006-05-28  8:33       ` Keith Owens
2006-05-27 15:28 ` Martin Langhoff
2006-05-27 16:13   ` Linus Torvalds
2006-05-28 10:00     ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).