All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 0/4] Add option to enable filters in git-svn
@ 2016-05-31 15:07 Matteo Bertini
  2016-05-31 15:07 ` [PATCH 1/4] hash-object.c: Allow distinct file/path in stdin mode too Matteo Bertini
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Matteo Bertini @ 2016-05-31 15:07 UTC (permalink / raw)
  To: git; +Cc: Matteo Bertini, normalperson, gitster

===========
Description
===========

This is a RFC for a patch that allows to enable filters in the repositories
created/managed by git-svn.


==================
Example scenario
==================

The usage scenario I’m facing is the migration of a large SVN repository to a
git repository, with big files being transparently handled through git-lfs

To recap, these migrations are usually handled through a transitioning repo
created by git-svn, which is then pushed to a remote git server; this server
acts as a read-only mirror for a while, until the main SVN server is shut down.
For large repos, the process can take several weeks to adjust the migration
scripts, branch mapping, authorship, etc., and it’s also common to update the
mirror incrementally.


Without this patch
==================

Without this patch, the migration needs to be setup with a pipeline with:

* incremental git svn fetch,
* bfg-repo-cleaner with —convert-to-git-lfs option (which puts invalid
  .gitattributes files, see https://github.com/rtyley/bfg-repo-cleaner/issues/143)
* git-filter-branch (to put in place the correct .gitattributes files)

A minimal change in the pipeline results in unrelated branches after
the rebases.

Each deviation from the svn sources must be committed back or saved
as patch and reapplied after the rebase.


With this patch
===============

Everything is much simpler: filters can be triggered during the fetch, so that
git-lfs already runs while the repository is fetched.
At the end of the process, the local git repository is already in its final
form, and no further git-filter-branch/rebases is required.

The transitioning git repo can diverge partially from the svn source
thanks to git svn rebase.


Details
=======

In the current implementation git-svn uses the hash-object command
with the --no-filters option. There is no gain in removing that option
because the git-svn uses temporary file names.

In this patch hash-object accepts a new format for the streaming input,
files and paths, tab separated. git-svn can provide the real path
to hash-object and thanks to that we can enable the filters.


Status
======

The patch is in RFC, no new test where written (yet), just runned for
regressions (https://travis-ci.org/naufraghi/git/builds/134170350).

I'd like to gain some feedback before spending more time on the
feature, mainly:

 * feature comments,
 * coding comments (neither Perl or C are my main languages),
 * user interface (option names, config placements) comments.

Best,
Matteo


Matteo Bertini (4):
  hash-object.c: Allow distinct file/path in stdin mode too.
  Git.pm: Add $path and $enable_filters arguments to
    hash_and_insert_object.
  SVN/Fetcher.pm: Add svn-remote.<id>.enable-filters to enable the
    filters.
  git-svn.perl: Add git svn init --enable-filters option.

 builtin/hash-object.c   | 29 +++++++++++++++++++++++++++--
 git-svn.perl            |  4 ++++
 perl/Git.pm             | 19 +++++++++++++------
 perl/Git/SVN/Fetcher.pm | 16 ++++++++++++----
 4 files changed, 56 insertions(+), 12 deletions(-)

-- 
2.9.0.rc0.39.gb9f310b.dirty

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] hash-object.c: Allow distinct file/path in stdin mode too.
  2016-05-31 15:07 [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
@ 2016-05-31 15:07 ` Matteo Bertini
  2016-05-31 15:07 ` [PATCH 2/4] Git.pm: Add $path and $enable_filters arguments to hash_and_insert_object Matteo Bertini
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Matteo Bertini @ 2016-05-31 15:07 UTC (permalink / raw)
  To: git; +Cc: Matteo Bertini, normalperson, gitster, Matteo Bertini

From: Matteo Bertini <matteo@naufraghi.net>

The hash-object command has the --path option to use a name for the filters
that is different from the file containing the data.

This patch exposes the same functionality for the --stdin-paths, using \t
as separator.

Signed-off-by: Matteo Bertini <naufraghi@develer.com>
---
 builtin/hash-object.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index f7d3567..f74c6bd 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -58,20 +58,45 @@ static void hash_object(const char *path, const char *type, const char *vpath,
 static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
 			     int literally)
 {
+	int has_path = 0;
 	struct strbuf buf = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
 	struct strbuf unquoted = STRBUF_INIT;
+	struct strbuf **pair;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
+		pair = strbuf_split_max(&buf, '\t', 2);
+		if (pair[0]) {
+			if (pair[0]->len && pair[0]->buf[pair[0]->len - 1] == '\t') {
+				strbuf_setlen(pair[0], pair[0]->len - 1);
+			}
+			strbuf_swap(&buf, pair[0]);
+		}
+		if (pair[1]) {
+			strbuf_swap(&path, pair[1]);
+			has_path = 1;
+		}
+		strbuf_list_free(pair);
+
 		if (buf.buf[0] == '"') {
 			strbuf_reset(&unquoted);
 			if (unquote_c_style(&unquoted, buf.buf, NULL))
 				die("line is badly quoted");
 			strbuf_swap(&buf, &unquoted);
 		}
-		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags,
-			    literally);
+
+		if (has_path && path.buf[0] == '"') {
+			strbuf_reset(&unquoted);
+			if (unquote_c_style(&unquoted, path.buf, NULL))
+				die("line is badly quoted");
+			strbuf_swap(&path, &unquoted);
+		}
+
+		hash_object(buf.buf, type, no_filters ? NULL : (has_path ? path.buf : buf.buf),
+				  flags, literally);
 	}
 	strbuf_release(&buf);
+	strbuf_release(&path);
 	strbuf_release(&unquoted);
 }
 
-- 
2.9.0.rc0.39.gb9f310b.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] Git.pm: Add $path and $enable_filters arguments to hash_and_insert_object.
  2016-05-31 15:07 [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
  2016-05-31 15:07 ` [PATCH 1/4] hash-object.c: Allow distinct file/path in stdin mode too Matteo Bertini
@ 2016-05-31 15:07 ` Matteo Bertini
  2016-05-31 15:07 ` [PATCH 3/4] SVN/Fetcher.pm: Add svn-remote.<id>.enable-filters to enable the filters Matteo Bertini
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Matteo Bertini @ 2016-05-31 15:07 UTC (permalink / raw)
  To: git; +Cc: Matteo Bertini, normalperson, gitster, Matteo Bertini

From: Matteo Bertini <matteo@naufraghi.net>

The option $enable_filters skips the --no-filters option,
the $path argument provide a path to be used alike the --path argument
to hash-object in the non streaming invocation.

Signed-off-by: Matteo Bertini <naufraghi@develer.com>
---
 perl/Git.pm | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/perl/Git.pm b/perl/Git.pm
index ce7e4e8..6c2e3fc 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -963,11 +963,13 @@ The function returns the SHA1 hash.
 
 # TODO: Support for passing FILEHANDLE instead of FILENAME
 sub hash_and_insert_object {
-	my ($self, $filename) = @_;
-
-	carp "Bad filename \"$filename\"" if $filename =~ /[\r\n]/;
+	my ($self, $filename, $path, $enable_filters) = @_;
+	carp "Bad filename \"$filename\"" if $filename =~ /[\r\n\t]/;
+	if (defined($path)) {
+		$filename = join("\t", $filename, $path);
+	}
 
-	$self->_open_hash_and_insert_object_if_needed();
+	$self->_open_hash_and_insert_object_if_needed($enable_filters);
 	my ($in, $out) = ($self->{hash_object_in}, $self->{hash_object_out});
 
 	unless (print $out $filename, "\n") {
@@ -985,13 +987,18 @@ sub hash_and_insert_object {
 }
 
 sub _open_hash_and_insert_object_if_needed {
-	my ($self) = @_;
+	my ($self, $enable_filters) = @_;
 
 	return if defined($self->{hash_object_pid});
 
+	my @command = qw(hash-object -w --stdin-paths);
+	if (!$enable_filters) {
+		push(@command, "--no-filters");
+	}
+
 	($self->{hash_object_pid}, $self->{hash_object_in},
 	 $self->{hash_object_out}, $self->{hash_object_ctx}) =
-		$self->command_bidi_pipe(qw(hash-object -w --stdin-paths --no-filters));
+		$self->command_bidi_pipe(@command);
 }
 
 sub _close_hash_and_insert_object {
-- 
2.9.0.rc0.39.gb9f310b.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] SVN/Fetcher.pm: Add svn-remote.<id>.enable-filters to enable the filters.
  2016-05-31 15:07 [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
  2016-05-31 15:07 ` [PATCH 1/4] hash-object.c: Allow distinct file/path in stdin mode too Matteo Bertini
  2016-05-31 15:07 ` [PATCH 2/4] Git.pm: Add $path and $enable_filters arguments to hash_and_insert_object Matteo Bertini
@ 2016-05-31 15:07 ` Matteo Bertini
  2016-05-31 15:07 ` [PATCH 4/4] git-svn.perl: Add git svn init --enable-filters option Matteo Bertini
  2016-05-31 15:34 ` [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
  4 siblings, 0 replies; 9+ messages in thread
From: Matteo Bertini @ 2016-05-31 15:07 UTC (permalink / raw)
  To: git; +Cc: Matteo Bertini, normalperson, gitster, Matteo Bertini

From: Matteo Bertini <matteo@naufraghi.net>

Given the fact that git-svn uses temporary files to build the index,
provide the real $path to hash_and_insert_object if the filters are
enabled.

Signed-off-by: Matteo Bertini <naufraghi@develer.com>
---
 perl/Git/SVN/Fetcher.pm | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/perl/Git/SVN/Fetcher.pm b/perl/Git/SVN/Fetcher.pm
index d8c21ad..3557abe 100644
--- a/perl/Git/SVN/Fetcher.pm
+++ b/perl/Git/SVN/Fetcher.pm
@@ -1,7 +1,7 @@
 package Git::SVN::Fetcher;
 use vars qw/@ISA $_ignore_regex $_include_regex $_preserve_empty_dirs
             $_placeholder_filename @deleted_gpath %added_placeholder
-            $repo_id/;
+            $repo_id $_enable_filters/;
 use strict;
 use warnings;
 use SVN::Delta;
@@ -46,6 +46,10 @@ sub new {
 		$_placeholder_filename = $v;
 	}
 
+	$k = "svn-remote.$repo_id.enable-filters";
+	$v = eval { command_oneline('config', '--get', '--bool', $k) };
+	$_enable_filters = 1
+		if ($v && $v eq 'true');
 	# Load the list of placeholder files added during previous invocations.
 	$k = "svn-remote.$repo_id.added-placeholder";
 	$v = eval { command_oneline('config', '--get-all', $k) };
@@ -415,9 +419,13 @@ sub close_file {
 				Git::temp_release($tmp_fh, 1);
 			}
 		}
-
-		$hash = $::_repository->hash_and_insert_object(
-				Git::temp_path($fh));
+		if ($_enable_filters) {
+			$hash = $::_repository->hash_and_insert_object(
+					Git::temp_path($fh), $path, $_enable_filters);
+		} else {
+			$hash = $::_repository->hash_and_insert_object(
+					Git::temp_path($fh));
+		}
 		$hash =~ /^[a-f\d]{40}$/ or die "not a sha1: $hash\n";
 
 		Git::temp_release($fb->{base}, 1);
-- 
2.9.0.rc0.39.gb9f310b.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] git-svn.perl: Add git svn init --enable-filters option.
  2016-05-31 15:07 [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
                   ` (2 preceding siblings ...)
  2016-05-31 15:07 ` [PATCH 3/4] SVN/Fetcher.pm: Add svn-remote.<id>.enable-filters to enable the filters Matteo Bertini
@ 2016-05-31 15:07 ` Matteo Bertini
  2016-05-31 15:34 ` [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
  4 siblings, 0 replies; 9+ messages in thread
From: Matteo Bertini @ 2016-05-31 15:07 UTC (permalink / raw)
  To: git; +Cc: Matteo Bertini, normalperson, gitster, Matteo Bertini

From: Matteo Bertini <matteo@naufraghi.net>

The enabled option activates the propagation of the real paths towards
hash-object and the removal of the --no-filters option.

Signed-off-by: Matteo Bertini <naufraghi@develer.com>
---
 git-svn.perl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/git-svn.perl b/git-svn.perl
index 05eced0..4d4d6c4 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -153,6 +153,7 @@ my %init_opts = ( 'template=s' => \$_template, 'shared:s' => \$_shared,
 		  'use-svnsync-props' => sub { $icv{useSvnsyncProps} = 1 },
 		  'rewrite-root=s' => sub { $icv{rewriteRoot} = $_[1] },
 		  'rewrite-uuid=s' => sub { $icv{rewriteUUID} = $_[1] },
+		  'enable-filters' => \$Git::SVN::Fetcher::_enable_filters,
                   %remote_opts );
 my %cmt_opts = ( 'edit|e' => \$_edit,
 		'rmdir' => \$Git::SVN::Editor::_rmdir,
@@ -495,6 +496,9 @@ sub do_git_init_db {
 		command_noisy('config', "$pfx.preserve-empty-dirs", 'true');
 		command_noisy('config', "$pfx.placeholder-filename", $$fname);
 	}
+
+	command_noisy('config', "$pfx.enable-filters", 'true')
+		if defined $Git::SVN::Fetcher::_enable_filters
 }
 
 sub init_subdir {
-- 
2.9.0.rc0.39.gb9f310b.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH/RFC 0/4] Add option to enable filters in git-svn
  2016-05-31 15:07 [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
                   ` (3 preceding siblings ...)
  2016-05-31 15:07 ` [PATCH 4/4] git-svn.perl: Add git svn init --enable-filters option Matteo Bertini
@ 2016-05-31 15:34 ` Matteo Bertini
  2016-05-31 18:12   ` Eric Wong
  4 siblings, 1 reply; 9+ messages in thread
From: Matteo Bertini @ 2016-05-31 15:34 UTC (permalink / raw)
  To: git; +Cc: normalperson, gitster

Sorry to all, but I missed a Checksum mismatch error, I'll have a look 
and submit an update.

Cheers,
Matteo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/RFC 0/4] Add option to enable filters in git-svn
  2016-05-31 15:34 ` [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
@ 2016-05-31 18:12   ` Eric Wong
  2016-05-31 20:34     ` Matteo Bertini
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Wong @ 2016-05-31 18:12 UTC (permalink / raw)
  To: Matteo Bertini; +Cc: git, gitster

Matteo Bertini <naufraghi@develer.com> wrote:
> Sorry to all, but I missed a Checksum mismatch error, I'll have a
> look and submit an update.

Sure, when you reroll can you also ensure lines are wrapped at
80 cols or less? (at least for the git-svn code, but probably
for the rest of it, too).

As for the option name, the "enable" prefix doesn't seem
right.  We already have some "use" prefixes (use-svm-props),
so maybe "--use-filters" is better.

I haven't looked into filters at all, yet, but you can probably
use the existing rot13 filter in t0021 for writing tests.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/RFC 0/4] Add option to enable filters in git-svn
  2016-05-31 18:12   ` Eric Wong
@ 2016-05-31 20:34     ` Matteo Bertini
  2016-06-02 21:13       ` Eric Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Matteo Bertini @ 2016-05-31 20:34 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, gitster

Il 2016-05-31 20:12 Eric Wong ha scritto:
> Matteo Bertini <naufraghi@develer.com> wrote:
>> Sorry to all, but I missed a Checksum mismatch error, I'll have a
>> look and submit an update.

I've had a look, the problem is a missing smudge filter.
Unluckily the small svn revision range i was using was never updating
a filtered file.

The code actually uses `cat-file --batch` to get the blobs,
but the stored blob is not what SVN::TxDelta::apply need.
What I need is the smudged blob, but cat-file does not supports it.

I can modify cat-file with a new option to call 
`convert_to_working_tree`,
and have the filters applied, for example --use-filters, that expects an
extra field, like $sha\t$path, in the stdin stream.

I don't like a lot putting an high level feature in a low level command,
but --textconv seems very similar to this.

Opinions or other suggestions?

> Sure, when you reroll can you also ensure lines are wrapped at
> 80 cols or less? (at least for the git-svn code, but probably
> for the rest of it, too).

Sure

> As for the option name, the "enable" prefix doesn't seem
> right.  We already have some "use" prefixes (use-svm-props),
> so maybe "--use-filters" is better.

Ok

> I haven't looked into filters at all, yet, but you can probably
> use the existing rot13 filter in t0021 for writing tests.

Thanks for the pointer, I'll have a look.

-- 
Matteo Bertini - naufraghi@develer.com
Develer S.r.l. - http://www.develer.com/
.hardware .software .innovation
Tel.: +39 055 3986627 - ext.: 211

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/RFC 0/4] Add option to enable filters in git-svn
  2016-05-31 20:34     ` Matteo Bertini
@ 2016-06-02 21:13       ` Eric Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2016-06-02 21:13 UTC (permalink / raw)
  To: Matteo Bertini; +Cc: git, gitster

Matteo Bertini <naufraghi@develer.com> wrote:
> Il 2016-05-31 20:12 Eric Wong ha scritto:
> >Matteo Bertini <naufraghi@develer.com> wrote:
> >>Sorry to all, but I missed a Checksum mismatch error, I'll have a
> >>look and submit an update.
> 
> I've had a look, the problem is a missing smudge filter.
> Unluckily the small svn revision range i was using was never updating
> a filtered file.
> 
> The code actually uses `cat-file --batch` to get the blobs,
> but the stored blob is not what SVN::TxDelta::apply need.
> What I need is the smudged blob, but cat-file does not supports it.
> 
> I can modify cat-file with a new option to call
> `convert_to_working_tree`,
> and have the filters applied, for example --use-filters, that expects an
> extra field, like $sha\t$path, in the stdin stream.
> 
> I don't like a lot putting an high level feature in a low level command,
> but --textconv seems very similar to this.
> 
> Opinions or other suggestions?

In the absense of other opinions or suggestions, I suggest you
try it and see what others think :)

I'm not that experienced at the C code of this project,
but I can try to follow along as best as I can.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-02 21:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31 15:07 [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
2016-05-31 15:07 ` [PATCH 1/4] hash-object.c: Allow distinct file/path in stdin mode too Matteo Bertini
2016-05-31 15:07 ` [PATCH 2/4] Git.pm: Add $path and $enable_filters arguments to hash_and_insert_object Matteo Bertini
2016-05-31 15:07 ` [PATCH 3/4] SVN/Fetcher.pm: Add svn-remote.<id>.enable-filters to enable the filters Matteo Bertini
2016-05-31 15:07 ` [PATCH 4/4] git-svn.perl: Add git svn init --enable-filters option Matteo Bertini
2016-05-31 15:34 ` [PATCH/RFC 0/4] Add option to enable filters in git-svn Matteo Bertini
2016-05-31 18:12   ` Eric Wong
2016-05-31 20:34     ` Matteo Bertini
2016-06-02 21:13       ` Eric Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.