All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] Add script for importing bits-and-pieces to Git.
@ 2009-08-24 17:09 Peter Krefting
  2009-08-24 19:10 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Krefting @ 2009-08-24 17:09 UTC (permalink / raw)
  To: git

Allows the user to import version history that is stored in bits and
pieces in the file system, for instance snapshots of old development
trees, or day-by-day backups. A configuration file is used to
describe the relationship between the different files and allow
describing branches and merges, as well as authorship and commit
messages.

Output is created in a format compatible with git-fast-import.

Full documentation is provided inline in perldoc format.

Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
---
Changed the use of stat() as suggested by Thomas Adam.

 contrib/fast-import/import-directories.perl |  332 +++++++++++++++++++++++++++
 1 files changed, 332 insertions(+), 0 deletions(-)
 create mode 100755 contrib/fast-import/import-directories.perl

diff --git a/contrib/fast-import/import-directories.perl b/contrib/fast-import/import-directories.perl
new file mode 100755
index 0000000..98079ad
--- /dev/null
+++ b/contrib/fast-import/import-directories.perl
@@ -0,0 +1,332 @@
+#!/usr/bin/perl -w
+#
+# Copyright 2008-2009 Peter Krefting <peter@softwolves.pp.se>
+#
+# ------------------------------------------------------------------------
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+#
+# ------------------------------------------------------------------------
+
+=pod
+
+=head1 NAME
+
+import-directories - Import bits and pieces to Git.
+
+=head1 SYNOPSIS
+
+B<import-directories.perl> F<configfile>
+
+=head1 DESCRIPTION
+
+Script to import arbitrary projects version controlled by the "copy the
+source directory to a new location and edit it there"-version controlled
+projects into version control. Handles projects with arbitrary branching
+and version trees, taking a file describing the inputs and generating a
+file compatible with the L<git-fast-import(1)> format.
+
+=head1 CONFIGURATION FILE
+
+=head2 Format
+
+The configuration file is using a standard I<.ini> format.
+
+ ; Comments start with semi-colons
+ [section]
+ key=value
+
+=head2 Global configuration
+
+Global configuration is done in the B<[config]> section, which should be
+the first section in the file. Configuration can be changed by
+repeating configuration sections later on.
+
+ [config]
+ ; configure conversion of CRLFs. "convert" means that all CRLFs
+ ; should be converted into LFs (suitable for the core.autocrlf
+ ; setting set to true in Git). "none" means that all data is
+ ; treated as binary.
+ crlf=convert
+
+=head2 Revision configuration
+
+Each revision that is to be imported is described in three
+sections. Sections should be defined chronologically, so that a
+revision's parent has always been defined when a new revision
+is introduced. All sections for one revision should be defined
+before defining the next revision.
+
+Revisions are specified numerically, but they numbers need not be
+consecutive, only unique.
+
+=pod
+
+=head3 Revision description section
+
+A section whose section name is just an integer gives meta-data
+about the revision.
+
+ [3]
+ ; author sets the author of the revisions
+ author=Peter Krefting <peter@softwolves.pp.se>
+ ; branch sets the branch that the revision should be committed to
+ branch=master
+ ; parent describes the revision that is the parent of this commit
+ ; (optional)
+ parent=1
+ ; merges describes a revision that is merged into this commit
+ ; (optional; can be repeated)
+ merges=2
+ ; selects one file to take the timestamp from
+ ; (optional; if unspecified, the most recent file from the .files
+ ;  section is used)
+ timestamp=3/source.c
+
+=head3 Revision contents section
+
+A section whose section name is an integer followed by B<.files>
+describes the files included in this revision.
+
+ [3.files]
+ ; the key is the path inside the repository, the value is the path
+ ; as seen from the importer script.
+ source.c=3/source.c
+ source.h=3/source.h
+
+=head3 Revision commit message section
+
+A section whose section name is an integer followed by B<.message>
+gives the commit message. This section is read verbatim.
+
+ [3.message]
+ Implement foobar.
+ ; trailing blank lines are ignored.
+
+=cut
+
+# Globals
+use strict;
+use integer;
+my $crlfmode = 0;
+my @revs;
+my (%revmap, %message, %files, %author, %branch, %parent, %merges, %time, %timesource);
+my $sectiontype = 0;
+my $rev = 0;
+my $mark = 1;
+
+# Check command line
+if ($#ARGV == -1 || $ARGV[0] =~ /^--?h/)
+{
+    exec('perldoc', $0);
+    exit 1;
+}
+
+# Open configuration
+my $config = $ARGV[0];
+open CFG, '<', $config or die "Cannot open configuration file \"$config\": ";
+
+# Open output
+my $output = $ARGV[1];
+open OUT, '>', $output or die "Cannot create output file \"$output\": ";
+binmode OUT;
+
+LINE: while (my $line = <CFG>)
+{
+	$line =~ s/\r?\n$//;
+	next LINE if $sectiontype != 4 && $line eq '';
+	next LINE if $line =~ /^;/;
+	my $oldsectiontype = $sectiontype;
+	my $oldrev = $rev;
+
+	# Sections
+	if ($line =~ m"^\[(config|(\d+)(|\.files|\.message))\]$")
+	{
+		if ($1 eq 'config')
+		{
+			$sectiontype = 1;
+		}
+		elsif ($3 eq '')
+		{
+			$sectiontype = 2;
+			$rev = $2;
+			# Create a new revision
+			die "Duplicate rev: $line\n " if defined $revmap{$rev};
+			print "Reading revision $rev\n";
+			push @revs, $rev;
+			$revmap{$rev} = $mark ++;
+			$time{$revmap{$rev}} = 0;
+		}
+		elsif ($3 eq '.files')
+		{
+			$sectiontype = 3;
+			$rev = $2;
+			die "Revision mismatch: $line\n " unless $rev == $oldrev;
+		}
+		elsif ($3 eq '.message')
+		{
+			$sectiontype = 4;
+			$rev = $2;
+			die "Revision mismatch: $line\n " unless $rev == $oldrev;
+		}
+		else
+		{
+			die "Internal parse error: $line\n ";
+		}
+		next LINE;
+	}
+
+	# Parse data
+	if ($sectiontype != 4)
+	{
+		# Key and value
+		if ($line =~ m"^(.*)=(.*)$")
+		{
+			my ($key, $value) = ($1, $2);
+			# Global configuration
+			if (1 == $sectiontype)
+			{
+				if ($key eq 'crlf')
+				{
+					$crlfmode = 1, next LINE if $value eq 'convert';
+					$crlfmode = 0, next LINE if $value eq 'none';
+				}
+				die "Unknown configuration option: $line\n ";
+			}
+			# Revision specification
+			if (2 == $sectiontype)
+			{
+				my $current = $revmap{$rev};
+				$author{$current} = $value, next LINE if $key eq 'author';
+				$branch{$current} = $value, next LINE if $key eq 'branch';
+				$parent{$current} = $value, next LINE if $key eq 'parent';
+				$timesource{$current} = $value, next LINE if $key eq 'timestamp';
+				push(@{$merges{$current}}, $value), next LINE if $key eq 'merges';
+				die "Unknown revision option: $line\n ";
+			}
+			# Filespecs
+			if (3 == $sectiontype)
+			{
+				# Add the file and create a marker
+				die "File not found: $line\n " unless -f $value;
+				my $current = $revmap{$rev};
+				${$files{$current}}{$key} = $mark;
+				my $time = &fileblob($value, $crlfmode, $mark ++);
+
+				# Update revision timestamp if more recent than other
+				# files seen, or if this is the file we have selected
+				# to take the time stamp from using the "timestamp"
+				# directive.
+				if ((defined $timesource{$current} && $timesource{$current} eq $value)
+				    || $time > $time{$current})
+				{
+					$time{$current} = $time;
+				}
+			}
+		}
+		else
+		{
+			die "Parse error: $line\n ";
+		}
+	}
+	else
+	{
+		# Commit message
+		my $current = $revmap{$rev};
+		if (defined $message{$current})
+		{
+			$message{$current} .= "\n";
+		}
+		$message{$current} .= $line;
+	}
+}
+close CFG;
+
+# Start spewing out data for git-fast-import
+foreach my $commit (@revs)
+{
+	# Progress
+	print OUT "progress Creating revision $commit\n";
+
+	# Create commit header
+	my $mark = $revmap{$commit};
+
+	# Branch and commit id
+	print OUT "commit refs/heads/", $branch{$mark}, "\nmark :", $mark, "\n";
+
+	# Author and timestamp
+	die "No timestamp defined for $commit (no files?)\n" unless defined $time{$mark};
+	print OUT "committer ", $author{$mark}, " ", $time{$mark}, " +0100\n";
+
+	# Commit message
+	die "No message defined for $commit\n" unless defined $message{$mark};
+	my $message = $message{$mark};
+	$message =~ s/\n$//; # Kill trailing empty line
+	print OUT "data ", length($message), "\n", $message, "\n";
+
+	# Parent and any merges
+	print OUT "from :", $revmap{$parent{$mark}}, "\n" if defined $parent{$mark};
+	if (defined $merges{$mark})
+	{
+		foreach my $merge (@{$merges{$mark}})
+		{
+			print OUT "merge :", $revmap{$merge}, "\n";
+		}
+	}
+
+	# Output file marks
+	print OUT "deleteall\n"; # start from scratch
+	foreach my $file (sort keys %{$files{$mark}})
+	{
+		print OUT "M 644 :", ${$files{$mark}}{$file}, " $file\n";
+	}
+	print OUT "\n";
+}
+
+# Create one file blob
+sub fileblob
+{
+	my ($filename, $crlfmode, $mark) = @_;
+
+	# Import the file
+	print OUT "progress Importing $filename\nblob\nmark :$mark\n";
+	open FILE, '<', $filename or die "Cannot read $filename\n ";
+	binmode FILE;
+	my ($size, $mtime) = (stat(FILE))[7,9];
+	my $file;
+	read FILE, $file, $size;
+	close FILE;
+	$file =~ s/\r\n/\n/g if $crlfmode;
+	print OUT "data ", length($file), "\n", $file, "\n";
+
+	return $time;
+}
+
+__END__
+
+=pod
+
+=head1 EXAMPLES
+
+B<import-directories.perl> F<project.import>
+
+=head1 AUTHOR
+
+Copyright 2008-2009 Peter Krefting E<lt>peter@softwolves.pp.se>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation.
+
+=cut
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Add script for importing bits-and-pieces to Git.
  2009-08-24 17:09 [PATCH v2] Add script for importing bits-and-pieces to Git Peter Krefting
@ 2009-08-24 19:10 ` Junio C Hamano
  2009-08-25 18:59   ` Peter Krefting
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2009-08-24 19:10 UTC (permalink / raw)
  To: Peter Krefting; +Cc: git

Peter Krefting <peter@softwolves.pp.se> writes:

> +=head1 NAME
> +
> +import-directories - Import bits and pieces to Git.
> +
> +=head1 SYNOPSIS
> +
> +B<import-directories.perl> F<configfile>
> +
> +=head1 DESCRIPTION
> +
> +Script to import arbitrary projects version controlled by the "copy the
> +source directory to a new location and edit it there"-version controlled
> +projects into version control. Handles projects with arbitrary branching
> +and version trees, taking a file describing the inputs and generating a
> +file compatible with the L<git-fast-import(1)> format.

Nice write-up.

> +=head1 CONFIGURATION FILE
> +
> +=head2 Format
> +
> +The configuration file is using a standard I<.ini> format.

You might want to mention that this format is different from what git uses
for its .git/config and .gitmodules files, and none of the rules apply to
them (namely, two/three-level names, case sensitivity, allowed letters in
variable names, stripping of whitespaces around values, and value quoting)
described in 'git help config' apply to this file.

It was the first "huh" I had when reading your description below, when you
used "[3]" as a section name and "source.c" as a variable.

> +=head2 Revision configuration
> +
> +Each revision that is to be imported is described in three
> +sections. Sections should be defined chronologically, so that a
> +revision's parent has always been defined when a new revision
> +is introduced. All sections for one revision should be defined
> +before defining the next revision.
> +
> +Revisions are specified numerically, but they numbers need not be
> +consecutive, only unique.

You might want to clarify that they do not need to be monotonically
increasing either---you can have #3 as the root and then #1 with its
parent set to #3, right?

As you seem to be supporting merges, you might want to say topologically
instead of chronologically---this is minor, as you give more precise
definition "all parents must come before a child" in that sentence later.

> + timestamp=3/source.c
> + ...
> +=head3 Revision contents section
> +
> +A section whose section name is an integer followed by B<.files>
> +describes the files included in this revision.

To somebody who knows git it may be obvious but perhaps "describes all the
files" (or "lists all the files") would be clearer?  Otherwise, a naive
reader might be frustrated by getting unexpected results after listing
only modified or added files in this section.

> + [3.files]
> + ; the key is the path inside the repository, the value is the path
> + ; as seen from the importer script.
> + source.c=3/source.c
> + source.h=3/source.h

How are problematic characters in pathnames (say, SP, '=' or worse LF)
handled?  Do they need to be quoted, and if so how?

As an example in the documentation, 3/source.c is a bit unfortunate.  It
may be risking to get misunderstood that somehow the directory name must
match the revision label (numeric section name), which I think is not what
you meant to say here.  Perhaps use something like:

	source.c = project-v0.0.3/soruce.c

to clarify?

> +=head3 Revision commit message section
> +
> +A section whose section name is an integer followed by B<.message>
> +gives the commit message. This section is read verbatim.

Meaning "everything up to the beginning of the next section"?  Can a
commit message have a line that begins with a '[', perhaps as long as it
does not contain a matching ']', so that such a line does not
misinterpreted as starting a new, possibly invalid, section?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Add script for importing bits-and-pieces to Git.
  2009-08-24 19:10 ` Junio C Hamano
@ 2009-08-25 18:59   ` Peter Krefting
  2009-08-25 20:42     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Krefting @ 2009-08-25 18:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano:

> Nice write-up.

Thank you.

As to your questions, they are very good, and I'll try to update the 
documentation a bit to clarify the points you made. This is the first time 
someone who is not myself have read it, so I expected some rough edges...

Follow-ups on some of the points (I will address the rest as well):

> You might want to mention that this format is different from what git uses 
> for its .git/config and .gitmodules files, and none of the rules apply to 
> them (namely, two/three-level names, case sensitivity, allowed letters in 
> variable names, stripping of whitespaces around values, and value quoting) 
> described in 'git help config' apply to this file.

A quick question on that: Is it possible to use the git-config parser 
stand-alone from a script like this? Then that note wouldn't need to apply.

> As you seem to be supporting merges, you might want to say topologically 
> instead of chronologically---this is minor, as you give more precise 
> definition "all parents must come before a child" in that sentence later.

I'm not sure I get the distinction here. Could you be a bit more specific 
(or point me to what I have missed in the Git manual)?

> How are problematic characters in pathnames (say, SP, '=' or worse LF)
> handled?  Do they need to be quoted, and if so how?

In the current version: Not at all. :-) I didn't need to, at the time.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Add script for importing bits-and-pieces to Git.
  2009-08-25 18:59   ` Peter Krefting
@ 2009-08-25 20:42     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-08-25 20:42 UTC (permalink / raw)
  To: Peter Krefting; +Cc: git

Peter Krefting <peter@softwolves.pp.se> writes:

>> You might want to mention that this format is different from what
>> git uses for its .git/config and .gitmodules files, and none of the
>> rules apply to them (namely, two/three-level names, case
>> sensitivity, allowed letters in variable names, stripping of
>> whitespaces around values, and value quoting) described in 'git help
>> config' apply to this file.
>
> A quick question on that: Is it possible to use the git-config parser
> stand-alone from a script like this? Then that note wouldn't need to
> apply.

Yes, but then you have to update your data language, because some of the
section names and variables names you would want to use in your script are
illegal in "git config" configuration language.  Values and the second
level name in two level section names are more-or-less free form (they
need to be quoted as appropriately), but the first-level section names and
the variable names are case insensitive, do not allow SPs and funnies, and
there is no escaping.  You cannot have "source.c" as the variable name,
for example.

I'd recommend against re-using the git config format for that reason.

Another possibility would be to use something that does not even resemble
the git config format, say, YAML as your data language.  There is no risk
of confusion from the end users if you did so, and we wouldn't need the
note either.

>> As you seem to be supporting merges, you might want to say
>> topologically instead of chronologically---this is minor, as you
>> give more precise definition "all parents must come before a child"
>> in that sentence later.
>
> I'm not sure I get the distinction here. Could you be a bit more
> specific (or point me to what I have missed in the Git manual)?

Your history could be in this shape (numbers are timestamps recorded in
commit): 

             1--4
            /    \
	0--3--6---9--12

when somebody with a skewed clock forked the project at commit 3, worked
on a side branch to create two commits 1 and 4, which are pulled back to
the mainline at commit 9.

Chronological listing would mean 0 1 3 4 6 9 12.  Topological listing
would be either 0 3 1 4 6 9 12 or 0 3 6 1 4 9 12 or 0 3 1 6 4 9 12.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-08-25 20:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-24 17:09 [PATCH v2] Add script for importing bits-and-pieces to Git Peter Krefting
2009-08-24 19:10 ` Junio C Hamano
2009-08-25 18:59   ` Peter Krefting
2009-08-25 20:42     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.