linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cleanfile: a script to clean up stealth whitespace
@ 2007-03-12 19:16 H. Peter Anvin
  2007-03-13  6:14 ` Andrew Morton
  2007-03-13 14:16 ` Arnaud Giersch
  0 siblings, 2 replies; 5+ messages in thread
From: H. Peter Anvin @ 2007-03-12 19:16 UTC (permalink / raw)
  To: Andrew Morton, Linux Kernel Mailing List

This script cleans up various classes of stealth whitespace.  In
particular, it cleans up:

- Whitespace (spaces or tabs)before newline;
- DOS line endings (CR before LF);
- Space before tab (spaces are deleted or converted to tabs);
- Empty lines at end of file.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 scripts/cleanfile |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 126 insertions(+), 0 deletions(-)
 create mode 100755 scripts/cleanfile

diff --git a/scripts/cleanfile b/scripts/cleanfile
new file mode 100755
index 0000000..f1ba8aa
--- /dev/null
+++ b/scripts/cleanfile
@@ -0,0 +1,126 @@
+#!/usr/bin/perl -w
+#
+# Clean a text file -- or directory of text files -- of stealth whitespace.
+# WARNING: this can be a highly destructive operation.  Use with caution.
+#
+
+use bytes;
+use File::Basename;
+
+#
+# Clean up space-tab sequences, either by removing spaces or
+# replacing them with tabs.
+sub clean_space_tabs($)
+{
+    no bytes;			# Tab alignment depends on characters
+
+    my($li) = @_;
+    my($lo) = '';
+    my $pos = 0;
+    my $nsp = 0;
+    my($i, $c);
+
+    for ($i = 0; $i < length($li); $i++) {
+	$c = substr($li, $i, 1);
+	if ($c eq "\t") {
+	    my $npos = ($pos+$nsp+8) & ~7;
+	    my $ntab = ($npos >> 3) - ($pos >> 3);
+	    $lo .= "\t" x $ntab;
+	    $pos = $npos;
+	    $nsp = 0;
+	} elsif ($c eq "\n" || $c eq "\r") {
+	    $lo .= " " x $nsp;
+	    $pos += $nsp;
+	    $nsp = 0;
+	    $lo .= $c;
+	    $pos = 0;
+	} elsif ($c eq " ") {
+	    $nsp++;
+	} else {
+	    $lo .= " " x $nsp;
+	    $pos += $nsp;
+	    $nsp = 0;
+	    $lo .= $c;
+	    $pos++;
+	}
+    }
+    $lo .= " " x $nsp;
+    return $lo;
+}
+
+$name = basename($0);
+
+foreach $f ( @ARGV ) {
+    print STDERR "$name: $f\n";
+
+    if (! -f $f) {
+	print STDERR "$f: not a file\n";
+	next;
+    }
+
+    if (!open(FILE, '+<', $f)) {
+	print STDERR "$name: Cannot open file: $f: $!\n";
+	next;
+    }
+
+    binmode FILE;
+
+    # First, verify that it is not a binary file; consider any file
+    # with a zero byte to be a binary file.  Is there any better, or
+    # additional, heuristic that should be applied?
+    $is_binary = 0;
+
+    while (read(FILE, $data, 65536) > 0) {
+	if ($data =~ /\0/) {
+	    $is_binary = 1;
+	    last;
+	}
+    }
+
+    if ($is_binary) {
+	print STDERR "$name: $f: binary file\n";
+	next;
+    }
+
+    seek(FILE, 0, 0);
+
+    $in_bytes = 0;
+    $out_bytes = 0;
+    $blank_bytes = 0;
+
+    @blanks = ();
+    @lines  = ();
+
+    while ( defined($line = <FILE>) ) {
+	$in_bytes += length($line);
+	$line =~ s/[ \t\r]*$//;		# Remove trailing spaces
+	$line = clean_space_tabs($line);
+
+	if ( $line eq "\n" ) {
+	    push(@blanks, $line);
+	    $blank_bytes += length($line);
+	} else {
+	    push(@lines, @blanks);
+	    $out_bytes += $blank_bytes;
+	    push(@lines, $line);
+	    $out_bytes += length($line);
+	    @blanks = ();
+	    $blank_bytes = 0;
+	}
+    }
+
+    # Any blanks at the end of the file are discarded
+
+    if ($in_bytes != $out_bytes) {
+	# Only write to the file if changed
+	seek(FILE, 0, 0);
+	print FILE @lines;
+
+	if ( !defined($where = tell(FILE)) ||
+	     !truncate(FILE, $where) ) {
+	    die "$name: Failed to truncate modified file: $f: $!\n";
+	}
+    }
+
+    close(FILE);
+}
-- 
1.5.0.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanfile: a script to clean up stealth whitespace
  2007-03-13  6:14 ` Andrew Morton
@ 2007-03-13  5:33   ` H. Peter Anvin
  2007-03-13  5:37     ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2007-03-13  5:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
>> On Mon, 12 Mar 2007 12:16:30 -0700 "H. Peter Anvin" <hpa@zytor.com> wrote:
>> This script cleans up various classes of stealth whitespace.  In
>> particular, it cleans up:
>>
>> - Whitespace (spaces or tabs)before newline;
>> - DOS line endings (CR before LF);
>> - Space before tab (spaces are deleted or converted to tabs);
>> - Empty lines at end of file.
> 
> Fair enough.
> 
> It'd be nice to have a clean-up-a-patch version of this.  So it does
> all these things, except it only changes lines which start with ^+.

It can do everything except kill empty lines at the end of the file; a 
patch simply doesn't contain enough information to know if blank lines 
are inserted at the end of a file as opposed in the middle of the file.

It can, of course, be done if the unpatched material is available, 
probably by applying the patch and seeing what happens.

Let me know if you still want it; I'll whip it up.

	-hpa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanfile: a script to clean up stealth whitespace
  2007-03-13  5:33   ` H. Peter Anvin
@ 2007-03-13  5:37     ` H. Peter Anvin
  0 siblings, 0 replies; 5+ messages in thread
From: H. Peter Anvin @ 2007-03-13  5:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Morton, linux-kernel

H. Peter Anvin wrote:
>>
>> Fair enough.
>>
>> It'd be nice to have a clean-up-a-patch version of this.  So it does
>> all these things, except it only changes lines which start with ^+.
> 
> It can do everything except kill empty lines at the end of the file; a 
> patch simply doesn't contain enough information to know if blank lines 
> are inserted at the end of a file as opposed in the middle of the file.
> 
> It can, of course, be done if the unpatched material is available, 
> probably by applying the patch and seeing what happens.
> 

Correction: for a context/unified diff it can be done by observing that 
there is no context left at the end of the file.  It won't work if the 
file already have empty space at the end of it, but that's probably good 
enough.  I'll cook something up.

	-hpa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanfile: a script to clean up stealth whitespace
  2007-03-12 19:16 [PATCH] cleanfile: a script to clean up stealth whitespace H. Peter Anvin
@ 2007-03-13  6:14 ` Andrew Morton
  2007-03-13  5:33   ` H. Peter Anvin
  2007-03-13 14:16 ` Arnaud Giersch
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2007-03-13  6:14 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

> On Mon, 12 Mar 2007 12:16:30 -0700 "H. Peter Anvin" <hpa@zytor.com> wrote:
> This script cleans up various classes of stealth whitespace.  In
> particular, it cleans up:
> 
> - Whitespace (spaces or tabs)before newline;
> - DOS line endings (CR before LF);
> - Space before tab (spaces are deleted or converted to tabs);
> - Empty lines at end of file.

Fair enough.

It'd be nice to have a clean-up-a-patch version of this.  So it does
all these things, except it only changes lines which start with ^+.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanfile: a script to clean up stealth whitespace
  2007-03-12 19:16 [PATCH] cleanfile: a script to clean up stealth whitespace H. Peter Anvin
  2007-03-13  6:14 ` Andrew Morton
@ 2007-03-13 14:16 ` Arnaud Giersch
  1 sibling, 0 replies; 5+ messages in thread
From: Arnaud Giersch @ 2007-03-13 14:16 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Morton, Linux Kernel Mailing List

Lundi 12 mars 2007, vers 20:16:30 (+0100), H. Peter Anvin a écrit:

> This script cleans up various classes of stealth whitespace.  In
> particular, it cleans up:
>
> - Whitespace (spaces or tabs)before newline;
> - DOS line endings (CR before LF);
> - Space before tab (spaces are deleted or converted to tabs);
> - Empty lines at end of file.

What about checking for a newline at end of file?  Something like:

[...]

> +    if ($is_binary) {
> +	print STDERR "$name: $f: binary file\n";
> +	next;
> +    }

    # Add a newline at end of file, if needed.
    seek(FILE, -1, 2);
    if (read(FILE, $last_char, 1) == 1 && $last_char ne "\n") {
	seek(FILE, 0, 2);
	print FILE "\n";
    }

> +    seek(FILE, 0, 0);
> +
> +    $in_bytes = 0;
> +    $out_bytes = 0;
> +    $blank_bytes = 0;

[...]

Regards,
        Arnaud Giersch

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-03-13 14:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-12 19:16 [PATCH] cleanfile: a script to clean up stealth whitespace H. Peter Anvin
2007-03-13  6:14 ` Andrew Morton
2007-03-13  5:33   ` H. Peter Anvin
2007-03-13  5:37     ` H. Peter Anvin
2007-03-13 14:16 ` Arnaud Giersch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).