All of lore.kernel.org
 help / color / mirror / Atom feed
* GSOC remote-svn
@ 2012-07-22 21:03 Florian Achleitner
  2012-07-22 21:43 ` Jonathan Nieder
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Florian Achleitner @ 2012-07-22 21:03 UTC (permalink / raw)
  To: jrnieder, davidbarr; +Cc: git

Hi!

Refering to Jonathan's concerns in Saturday night's IRC log:

> [22:59:34] <jrnieder> barrbrain, flyingflo: I'm worried about the remote 
helper project
> [23:00:05] <jrnieder> someone needs to review remote-svn.c to catch things 
like that refspec issue which should be straightforward to an experienced eye

Let me explain the refspec issue:

In the the exisiting code in contrib/svn-fe commits are always imported to 
refs/heads/master, that was hardcoded. So I thought that couldn't be it.
I made the name of the  branch to import variable, depending on the name of 
the remote.
But my remote-helper didn't advertise the refspec capability, so transport-
helper assumed it imports to refs/heads/master, which is the default import 
refspec. The subsequent update of references in store_updated_refs lead to 
wrong values after the fetch, which I considered a bug and tried to fix.

In fact I didn't realize that the actual updating of references is not done by 
the remote-helper. I thought the remote-helper would have to evaluate the 
fetch refspec and tell fast-import the correct target branch.
Furthermore I confused 'private namespace' with refs/remotes/<remote's name>/, 
which I considered somehow private too.

After several mailing iterations, showing me that I was wrong, I found what 
the right point is, namely that the remote helper writes references to a 
really private dir in refs/<remote name>/, it doesn't touch anything else, and 
by advertising the 'refspec' capability, git-fetch knows where the private 
refs are and updates non-private references according to the fetch refspec in 
some post-processing in store_updated_refs. (Ok, you will say "of course!", 
but I didn't know that I was wrong and it's hidden in some 1000 lines of 
code).

For me that was not very easy to figure out, and it took a lot of time, but I 
think now remote-svn does it right.

> [23:00:38] <jrnieder> (also, remote-svn.c should be at the toplevel so it 
can be tested more easily with tests in t/
> [23:01:10] <jrnieder> and it should not be named remote-svn, since we 
haven't pinned down details about the svn:: conversion yet.  That's why 
Dmitry's was called git-remote-svn-alpha)

Ok. Why is that important? I think if it's not called remote-svn git doesn't 
find it as a helper for the 'svn' protocol. Actually in my local git tree, I 
have a symlink in the toplevel (to simpify PATH).

> [23:01:45] <jrnieder> I'm happy to review patches but I don't have a lot of 
time for it, which has been a problem:
> [23:02:11] <jrnieder>  * I think I wasn't cc-ed on earlier discussion so 
they seem to come out of the blue.  That's fine, but
> [23:03:05] <jrnieder>  * I really rely on patches that do one logical thing 
with a commit message describing the context and what the patch is trying to 
accomplish.  That makes review way, way easier when it is happening.

Probably I should stop sending proposals or incomplete stuff to the list/you.
The current state may probably be viewed easier in my github repo.

I think for creating patches that are acceptable I will need to squash and 
split a lot of my development  commits after the code is somehow finished and 
no longer experimental.

> [23:04:42] <jrnieder> Also it seems very chaotic: there are basic things 
about remote-svn.c that need fixing, and then patches for other things are 
appearing on top of that.
> [23:04:49] <jrnieder> Help?
> [23:05:26] <jrnieder> thanks, and hope that helps

About the current state:

Tester:
I wrote a small simulation script in python that mimics svnrdumps behaviour by 
replaying an existing svn dump file from a start rev up to an end rev to test 
incremental imports. I use it together with a little testrepo shell script.
Will need to bring that into t/ later, after figuring out how the test 
framework works. As it's not finished it's not published.

Incremental import:
By reading the latest svn revision number from a note attached to the private 
master ref, it starts future imports from the next svn revision. That 
basically works well.
It doesn't reuse mark files. What's the point of reusing them? Dmitry's svn-
alpha did that.
All I need to know is the revision to start from and the branch i want to add 
commits to, right? It now simply reads that from the note.

This got stuck on another problem:
Incremental update of the note tree doesn't work. fast-import refuses to 
update the notes tree: '<newsha1> doesn't contain <oldsha1>'.
I don't yet know what's the reason for this.
I'm digging into the internals of notes to find out why..
(no problem with the file tree).

This state hasn't hit the list of course, as it's in no way useful nor 
complete.

I often get caught in the traingle of those three processes (git transport-
helper, fast-import, remote-svn) needing to understand a lot about the 
existing two to understand why things don't work and why they need to work 
like they do.

--
Florian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: GSOC remote-svn
  2012-07-22 21:03 GSOC remote-svn Florian Achleitner
@ 2012-07-22 21:43 ` Jonathan Nieder
  2012-07-23  9:42   ` Florian Achleitner
  2012-07-23  7:59 ` GSOC remote-svn Matthieu Moy
  2012-07-23  9:42 ` Florian Achleitner
  2 siblings, 1 reply; 21+ messages in thread
From: Jonathan Nieder @ 2012-07-22 21:43 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: davidbarr, git

Hi,

Florian Achleitner wrote:

> After several mailing iterations, showing me that I was wrong, I found what 
> the right point is, namely that the remote helper writes references to a 
> really private dir in refs/<remote name>/, it doesn't touch anything else, and 
> by advertising the 'refspec' capability, git-fetch knows where the private 
> refs are and updates non-private references according to the fetch refspec in 
> some post-processing in store_updated_refs.

Right, that's fine.  And you did a fine job of navigating the existing
documentation (which could be improved, hint hint).

What I am more concerned about is that you had code you sent a while
ago for review, that this would have been obvious to David, Ram,
Dmitry, or me if we had seen it, and yet none of us gave you that
help.  We are failing at _our_ job of giving you prompt advice and
instead you have had to work on your own.

Isn't it likely that there are multiple other bugs like that which
still haven't been fixed?

That's why I think we need to get into a habit of giving and getting
feedback quickly and incrementally improving work.  Soon, before the
summer ends.

[...]
>> [23:01:10] <jrnieder> and it should not be named remote-svn, since we 
> haven't pinned down details about the svn:: conversion yet.  That's why 
> Dmitry's was called git-remote-svn-alpha)
>
> Ok. Why is that important? I think if it's not called remote-svn git doesn't 
> fid it as a helper for the 'svn' protocol.

It finds it as a helper for the 'svn-alpha' protocol instead.

The point is that when I perform the following steps:

	git clone svn://path/to/remote/repo

	... wait a day, update git

	cd repo
	git pull

nobody would expect the result to be a non-fast-forward update caused
by the details of svn-to-git conversion changing.  Using a name like
testsvn or svn-alpha would help in managing expectations --- the
remote helper is meant for experimentation for now and not meant to be
something people can rely on for collaboration.

[...]
>> [23:03:05] <jrnieder>  * I really rely on patches that do one logical thing 
> with a commit message describing the context and what the patch is trying to 
> accomplish.  That makes review way, way easier when it is happening.
>
> Probably I should stop sending proposals or incomplete stuff to the list/you.
> The current state may probably be viewed easier in my github repo.

No, incomplete stuff is nice.  Just please do explain the _purpose_ of
the code you are sending out.  The best possible outcome is if someone
realizes that something that would have taken hours doesn't need to be
done at all.

[...]
> I wrote a small simulation script in python that mimics svnrdumps behaviour by 
> replaying an existing svn dump file from a start rev up to an end rev to test 
> incremental imports. I use it together with a little testrepo shell script.
> Will need to bring that into t/ later, after figuring out how the test 
> framework works. As it's not finished it's not published.

Sounds neat --- how can one try it out?

> Incremental import:
> By reading the latest svn revision number from a note attached to the private 
> master ref, it starts future imports from the next svn revision. That 
> basically works well.
> It doesn't reuse mark files. What's the point of reusing them? Dmitry's svn-
> alpha did that.
> All I need to know is the revision to start from and the branch i want to add 
> commits to, right? It now simply reads that from the note.

The marks are used to handle copyfrom operations referring to older
revisions.  Are you sure you want to abandon them?  Can you explain a
little more about your plan?

[...]
> This state hasn't hit the list of course, as it's in no way useful nor 
> complete.

A good habit to get into is to make sure your partial progress toward
a goal is in a usable state periodically, even if it is not complete.
That makes it a lot easier to test and to get other people to look it
over.

A rule of thumb is that unless you are adding a new and complicated
feature, it should be possible to fit each change in a patch of around
250 lines including context (not including documentation and tests).

> I often get caught in the traingle of those three processes (git transport-
> helper, fast-import, remote-svn) needing to understand a lot about the 
> existing two to understand why things don't work and why they need to work 
> like they do.

Probably you are finding documentation bugs right and left (yes,
information not being easy to find is a bug) that don't get fixed
because no one has reported them.  Questions are welcome and very
useful.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: GSOC remote-svn
  2012-07-22 21:03 GSOC remote-svn Florian Achleitner
  2012-07-22 21:43 ` Jonathan Nieder
@ 2012-07-23  7:59 ` Matthieu Moy
  2012-07-23 11:59   ` Jonathan Nieder
  2012-07-23  9:42 ` Florian Achleitner
  2 siblings, 1 reply; 21+ messages in thread
From: Matthieu Moy @ 2012-07-23  7:59 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: jrnieder, davidbarr, git

Florian Achleitner <florian.achleitner.2.6.31@gmail.com> writes:

> But my remote-helper didn't advertise the refspec capability, so transport-
> helper assumed it imports to refs/heads/master, which is the default import 
> refspec.

The man page for git-remote-helpers says:

refspec <refspec>
    [...] It is recommended that all importers providing the import
    capability use this.

I'm not sure I fully understand the rationale, but one difference
between refs/<remote name>/* and refs/remotes/<remote name> is that
refs/remotes/ is automatically updated by Git on push, while the private
namespace isn't (it only receives updates when importing).

I played a bit with that it git-remote-mediawiki. There's a
configuration variable mediawiki.dumbPush that controls what "push"
does. It can either export the local history without touching local
metadata (and then, it is expected that the user does a "git pull
--rebase" right after to re-import the history), or update the local
metadata (private ref, and notes that keep the <local commit> <->
<remote revision number> map), so that the next "git pull" says
"Already up to date.".

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: GSOC remote-svn
  2012-07-22 21:43 ` Jonathan Nieder
@ 2012-07-23  9:42   ` Florian Achleitner
  2012-07-23 12:04     ` Jonathan Nieder
  2012-07-23 12:44     ` [PATCH] Add a svnrdump-simulator replaying a dump file for testing Florian Achleitner
  0 siblings, 2 replies; 21+ messages in thread
From: Florian Achleitner @ 2012-07-23  9:42 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Florian Achleitner, davidbarr, git

On Sunday 22 July 2012 16:43:33 Jonathan Nieder wrote:
> Hi,
> 
> Florian Achleitner wrote:
> > After several mailing iterations, showing me that I was wrong, I found
> > what
> > the right point is, namely that the remote helper writes references to a
> > really private dir in refs/<remote name>/, it doesn't touch anything else,
> > and by advertising the 'refspec' capability, git-fetch knows where the
> > private refs are and updates non-private references according to the
> > fetch refspec in some post-processing in store_updated_refs.
> 
> Right, that's fine.  And you did a fine job of navigating the existing
> documentation (which could be improved, hint hint).
> 
> What I am more concerned about is that you had code you sent a while
> ago for review, that this would have been obvious to David, Ram,
> Dmitry, or me if we had seen it, and yet none of us gave you that
> help.  We are failing at _our_ job of giving you prompt advice and
> instead you have had to work on your own.
> 
> Isn't it likely that there are multiple other bugs like that which
> still haven't been fixed?
> 
> That's why I think we need to get into a habit of giving and getting
> feedback quickly and incrementally improving work.  Soon, before the
> summer ends.
> 
> [...]
> 
> >> [23:01:10] <jrnieder> and it should not be named remote-svn, since we
> > 
> > haven't pinned down details about the svn:: conversion yet.  That's why
> > Dmitry's was called git-remote-svn-alpha)
> > 
> > Ok. Why is that important? I think if it's not called remote-svn git
> > doesn't fid it as a helper for the 'svn' protocol.
> 
> It finds it as a helper for the 'svn-alpha' protocol instead.
> 
> The point is that when I perform the following steps:
> 
> 	git clone svn://path/to/remote/repo
> 
> 	... wait a day, update git
> 
> 	cd repo
> 	git pull
> 
> nobody would expect the result to be a non-fast-forward update caused
> by the details of svn-to-git conversion changing.  Using a name like
> testsvn or svn-alpha would help in managing expectations --- the
> remote helper is meant for experimentation for now and not meant to be
> something people can rely on for collaboration.

Ok, that makes sense. Renaming is easily done!

> 
> > I wrote a small simulation script in python that mimics svnrdumps
> > behaviour by replaying an existing svn dump file from a start rev up to
> > an end rev to test incremental imports. I use it together with a little
> > testrepo shell script. Will need to bring that into t/ later, after
> > figuring out how the test framework works. As it's not finished it's not
> > published.
> 
> Sounds neat --- how can one try it out?

I'll send a patch ...

> 
> > Incremental import:
> > By reading the latest svn revision number from a note attached to the
> > private master ref, it starts future imports from the next svn revision.
> > That basically works well.
> > It doesn't reuse mark files. What's the point of reusing them? Dmitry's
> > svn- alpha did that.
> > All I need to know is the revision to start from and the branch i want to
> > add commits to, right? It now simply reads that from the note.
> 
> The marks are used to handle copyfrom operations referring to older
> revisions.  Are you sure you want to abandon them?  Can you explain a
> little more about your plan?

Ok, that makes sense. I didn't need the marks for incremental import. But to 
evaluate the copyfrom props we need some revision->sha1 mapping.
I just added the options to save and import marks to fast-import's command 
line. 
If the file is missing, it will need to be generated from the notes, or the 
whole history will be reimported.

But when I fetch from a git repo that imported from svn, the notes are not 
fetched automatically. In this case I currently loose marks and notes.
What can I do?

Florian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: GSOC remote-svn
  2012-07-22 21:03 GSOC remote-svn Florian Achleitner
  2012-07-22 21:43 ` Jonathan Nieder
  2012-07-23  7:59 ` GSOC remote-svn Matthieu Moy
@ 2012-07-23  9:42 ` Florian Achleitner
  2 siblings, 0 replies; 21+ messages in thread
From: Florian Achleitner @ 2012-07-23  9:42 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: jrnieder, davidbarr, git

On Sunday 22 July 2012 23:03:27 Florian Achleitner wrote:
> This got stuck on another problem:
> Incremental update of the note tree doesn't work. fast-import refuses to 
> update the notes tree: '<newsha1> doesn't contain <oldsha1>'.
> I don't yet know what's the reason for this.
> I'm digging into the internals of notes to find out why..
> (no problem with the file tree).

btw, this one was rather simple, just a syntax error in the fast-import 
stream..

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: GSOC remote-svn
  2012-07-23  7:59 ` GSOC remote-svn Matthieu Moy
@ 2012-07-23 11:59   ` Jonathan Nieder
  0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Nieder @ 2012-07-23 11:59 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Florian Achleitner, davidbarr, git

Matthieu Moy wrote:

> The man page for git-remote-helpers says:
>
> refspec <refspec>
>     [...] It is recommended that all importers providing the import
>     capability use this.
>
> I'm not sure I fully understand the rationale, but one difference
> between refs/<remote name>/* and refs/remotes/<remote name> is that
> refs/remotes/ is automatically updated by Git on push, while the private
> namespace isn't (it only receives updates when importing).

It's mostly to allow "git fetch" to avoid non fast-forward updates
unless -f was used or the refspec starts with +.

I always liked the idea of tweaking the fast-import stream format to
allow the import to happen on no branch at all since it would avoid
all these questions.  Maybe another day.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: GSOC remote-svn
  2012-07-23  9:42   ` Florian Achleitner
@ 2012-07-23 12:04     ` Jonathan Nieder
  2012-07-23 12:44     ` [PATCH] Add a svnrdump-simulator replaying a dump file for testing Florian Achleitner
  1 sibling, 0 replies; 21+ messages in thread
From: Jonathan Nieder @ 2012-07-23 12:04 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: Florian Achleitner, davidbarr, git

Florian Achleitner wrote:

> But when I fetch from a git repo that imported from svn, the notes are not 
> fetched automatically. In this case I currently loose marks and notes.
> What can I do?

In the long term, git will need to be tweaked to automatically fetch
notes along with branches by default.  There are other reasons not
related to remote helpers to want this, too.

In the short term, we can document which ref the notes are expected to
be fetched to.  Maybe someone interested would provide commands like
"git remote-testsvn --clone <repo>" and "git remote-testsvn
--add-remote <nickname> <repo>" as a stopgap to set up the appropriate
fetch refspec automatically so the user doesn't have to worry about
it.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23  9:42   ` Florian Achleitner
  2012-07-23 12:04     ` Jonathan Nieder
@ 2012-07-23 12:44     ` Florian Achleitner
  2012-07-23 12:59       ` Jonathan Nieder
                         ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Florian Achleitner @ 2012-07-23 12:44 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Florian Achleitner, davidbarr, git

To ease testing without depending on a reachable svn server, this
compact python script mimics parts of svnrdumps behaviour.
It requires the remote url to start with sim://.
Start and end revisions are evaluated.
If the requested revision doesn't exist, as it is the case with
incremental imports, if no new commit was added, it returns 1
(like svnrdump).
To allow using the same dump file for simulating multiple
incremental imports the highest revision can be limited by setting
the environment variable SVNRMAX to that value. This simulates the
situation where higher revs don't exist yet.
---
 contrib/svn-fe/svnrdump_sim.py |   53 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)
 create mode 100755 contrib/svn-fe/svnrdump_sim.py

diff --git a/contrib/svn-fe/svnrdump_sim.py b/contrib/svn-fe/svnrdump_sim.py
new file mode 100755
index 0000000..4701d76
--- /dev/null
+++ b/contrib/svn-fe/svnrdump_sim.py
@@ -0,0 +1,53 @@
+#!/usr/bin/python
+"""
+Simulates svnrdump by replaying an existing dump from a file, taking care
+of the specified revision range.
+To simulate incremental imports the environment variable SVNRMAX can be set
+to the highest revision that should be available.
+"""
+import sys, os
+
+
+def getrevlimit():
+	var = 'SVNRMAX'
+	if os.environ.has_key(var):
+		return os.environ[var]
+	return None
+	
+def writedump(url, lower, upper):
+	if url.startswith('sim://'):
+		filename = url[6:]
+		if filename[-1] == '/': filename = filename[:-1] #remove terminating slash
+	else:
+		raise ValueError('sim:// url required')
+	f = open(filename, 'r');
+	state = 'header'
+	wroterev = False
+	while(True):
+		l = f.readline()
+		if l == '': break
+		if state == 'header' and l.startswith('Revision-number: '):
+			state = 'prefix'
+		if state == 'prefix' and l == 'Revision-number: %s\n' % lower:
+			state = 'selection'
+		if not upper == 'HEAD' and state == 'selection' and l == 'Revision-number: %s\n' % upper:
+			break;
+
+		if state == 'header' or state == 'selection':
+			if state == 'selection': wroterev = True
+			sys.stdout.write(l)
+	return wroterev
+
+if __name__ == "__main__":
+	if not (len(sys.argv) in (3, 4, 5)):
+		print "usage: %s dump URL -rLOWER:UPPER"
+		sys.exit(1)
+	if not sys.argv[1] == 'dump': raise NotImplementedError('only "dump" is suppported.')
+	url = sys.argv[2]
+	r = ('0', 'HEAD')
+	if len(sys.argv) == 4 and sys.argv[3][0:2] == '-r':
+		r = sys.argv[3][2:].lstrip().split(':')
+	if not getrevlimit() is None: r[1] = getrevlimit()
+	if writedump(url, r[0], r[1]): ret = 0
+	else: ret = 1
+	sys.exit(ret)
\ No newline at end of file
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 12:44     ` [PATCH] Add a svnrdump-simulator replaying a dump file for testing Florian Achleitner
@ 2012-07-23 12:59       ` Jonathan Nieder
  2012-07-23 13:16         ` Florian Achleitner
  2012-07-23 13:16         ` Florian Achleitner
  2012-07-23 15:06       ` Junio C Hamano
  2012-07-24 12:06       ` Erik Faye-Lund
  2 siblings, 2 replies; 21+ messages in thread
From: Jonathan Nieder @ 2012-07-23 12:59 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: davidbarr, git

Florian Achleitner wrote:

> To ease testing without depending on a reachable svn server, this
> compact python script mimics parts of svnrdumps behaviour.

Thanks.  Mind if I forge your sign-off?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 12:59       ` Jonathan Nieder
@ 2012-07-23 13:16         ` Florian Achleitner
  2012-07-23 13:16         ` Florian Achleitner
  1 sibling, 0 replies; 21+ messages in thread
From: Florian Achleitner @ 2012-07-23 13:16 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Florian Achleitner, davidbarr, git

On Monday 23 July 2012 07:59:21 Jonathan Nieder wrote:
> Florian Achleitner wrote:
> > To ease testing without depending on a reachable svn server, this
> > compact python script mimics parts of svnrdumps behaviour.
> 
> Thanks.  Mind if I forge your sign-off?

Ups. No problem, anyways I've added it locally, so here's the new version ..

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 12:59       ` Jonathan Nieder
  2012-07-23 13:16         ` Florian Achleitner
@ 2012-07-23 13:16         ` Florian Achleitner
  2012-07-23 16:24           ` Matthieu Moy
  1 sibling, 1 reply; 21+ messages in thread
From: Florian Achleitner @ 2012-07-23 13:16 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Florian Achleitner, davidbarr, git

To ease testing without depending on a reachable svn server, this
compact python script mimics parts of svnrdumps behaviour.
It requires the remote url to start with sim://.
Start and end revisions are evaluated.
If the requested revision doesn't exist, as it is the case with
incremental imports, if no new commit was added, it returns 1
(like svnrdump).
To allow using the same dump file for simulating multiple
incremental imports the highest revision can be limited by setting
the environment variable SVNRMAX to that value. This simulates the
situation where higher revs don't exist yet.

Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
---

I had to fix the missing sign-off anyways..

 contrib/svn-fe/svnrdump_sim.py |   53 
++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)
 create mode 100755 contrib/svn-fe/svnrdump_sim.py

diff --git a/contrib/svn-fe/svnrdump_sim.py b/contrib/svn-fe/svnrdump_sim.py
new file mode 100755
index 0000000..4701d76
--- /dev/null
+++ b/contrib/svn-fe/svnrdump_sim.py
@@ -0,0 +1,53 @@
+#!/usr/bin/python
+"""
+Simulates svnrdump by replaying an existing dump from a file, taking care
+of the specified revision range.
+To simulate incremental imports the environment variable SVNRMAX can be set
+to the highest revision that should be available.
+"""
+import sys, os
+
+
+def getrevlimit():
+	var = 'SVNRMAX'
+	if os.environ.has_key(var):
+		return os.environ[var]
+	return None
+	
+def writedump(url, lower, upper):
+	if url.startswith('sim://'):
+		filename = url[6:]
+		if filename[-1] == '/': filename = filename[:-1] #remove terminating slash
+	else:
+		raise ValueError('sim:// url required')
+	f = open(filename, 'r');
+	state = 'header'
+	wroterev = False
+	while(True):
+		l = f.readline()
+		if l == '': break
+		if state == 'header' and l.startswith('Revision-number: '):
+			state = 'prefix'
+		if state == 'prefix' and l == 'Revision-number: %s\n' % lower:
+			state = 'selection'
+		if not upper == 'HEAD' and state == 'selection' and l == 'Revision-
number: %s\n' % upper:
+			break;
+
+		if state == 'header' or state == 'selection':
+			if state == 'selection': wroterev = True
+			sys.stdout.write(l)
+	return wroterev
+
+if __name__ == "__main__":
+	if not (len(sys.argv) in (3, 4, 5)):
+		print "usage: %s dump URL -rLOWER:UPPER"
+		sys.exit(1)
+	if not sys.argv[1] == 'dump': raise NotImplementedError('only "dump" is 
suppported.')
+	url = sys.argv[2]
+	r = ('0', 'HEAD')
+	if len(sys.argv) == 4 and sys.argv[3][0:2] == '-r':
+		r = sys.argv[3][2:].lstrip().split(':')
+	if not getrevlimit() is None: r[1] = getrevlimit()
+	if writedump(url, r[0], r[1]): ret = 0
+	else: ret = 1
+	sys.exit(ret)
\ No newline at end of file
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 12:44     ` [PATCH] Add a svnrdump-simulator replaying a dump file for testing Florian Achleitner
  2012-07-23 12:59       ` Jonathan Nieder
@ 2012-07-23 15:06       ` Junio C Hamano
  2012-07-23 20:08         ` Florian Achleitner
  2012-07-24 12:06       ` Erik Faye-Lund
  2 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2012-07-23 15:06 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: Jonathan Nieder, davidbarr, git

Florian Achleitner <florian.achleitner.2.6.31@gmail.com> writes:

> It requires the remote url to start with sim://.
> Start and end revisions are evaluated.

It is a bit unclear where "start" and "end" comes from, and if
"evaluated" is the most important aspect of the handling of these
two values.  Do you mean the tool takes start and end revisions as
arguments?  If so, describe "how".  E.g. as two arguments (-rSTART
-rEND)? As an argument that shows a range (-rSTART-END? -rSTART,END)?

Do not answer with "It is in the code" (I cheated and peeked to find
out it is -rSTART:END, but the reader should not have to peek).

> If the requested revision doesn't exist, as it is the case with
> incremental imports, if no new commit was added, it returns 1
> (like svnrdump).

This sentence does not parse for me.  What is it trying to say?
Requested revision does not exist _where_?  It is unclear how
"incremental import" and "revision doesn't exist" are related.  "no
new commit was added" to _what_ by _whom_?  I presume that nobody is
adding new commit _to_ an existing dump file, and the only thing
this script does is to read and selectively write part of a dump file,
so that would not add any new commit either.

Puzzled.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 13:16         ` Florian Achleitner
@ 2012-07-23 16:24           ` Matthieu Moy
  2012-07-23 19:28             ` Florian Achleitner
  0 siblings, 1 reply; 21+ messages in thread
From: Matthieu Moy @ 2012-07-23 16:24 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: Jonathan Nieder, davidbarr, git

Florian Achleitner <florian.achleitner.2.6.31@gmail.com> writes:

> I had to fix the missing sign-off anyways..
>
>  contrib/svn-fe/svnrdump_sim.py |   53 
> ++++++++++++++++++++++++++++++++++++++++

You also have whitespace damages (i.e. line wrapping introduced by your
mailer). Using git-send-email avoids this kind of problem (there are
also some advices for some mailers in Documentation/SubmittingPatches).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 16:24           ` Matthieu Moy
@ 2012-07-23 19:28             ` Florian Achleitner
  2012-07-23 19:46               ` Matthieu Moy
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Achleitner @ 2012-07-23 19:28 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Florian Achleitner, Jonathan Nieder, davidbarr, git

On Monday 23 July 2012 18:24:40 Matthieu Moy wrote:
> You also have whitespace damages (i.e. line wrapping introduced by your
> mailer). Using git-send-email avoids this kind of problem (there are
> also some advices for some mailers in Documentation/SubmittingPatches).

Damn. That's usually no problem with kmail either, if the config is right.
I've already used git-send-email several times.
But for replying to threads and adding several Cc: addresses it's a little 
cumbersome.
How do you do that in a nice way?

--
Florian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 19:28             ` Florian Achleitner
@ 2012-07-23 19:46               ` Matthieu Moy
  2012-07-23 20:02                 ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: Matthieu Moy @ 2012-07-23 19:46 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: Jonathan Nieder, davidbarr, git

Florian Achleitner <florian.achleitner.2.6.31@gmail.com> writes:

> On Monday 23 July 2012 18:24:40 Matthieu Moy wrote:
>> You also have whitespace damages (i.e. line wrapping introduced by your
>> mailer). Using git-send-email avoids this kind of problem (there are
>> also some advices for some mailers in Documentation/SubmittingPatches).
>
> Damn. That's usually no problem with kmail either, if the config is right.
> I've already used git-send-email several times.
> But for replying to threads and adding several Cc: addresses it's a little 
> cumbersome.
> How do you do that in a nice way?

For the threading itself, I usually find the message-id, and use
"git send-email --in-reply-to='<cut-and-pasted-id>'". The painful part
is when you want to reproduce a Cc: list, but I have no magic trick for
that ;-).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 19:46               ` Matthieu Moy
@ 2012-07-23 20:02                 ` Jeff King
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2012-07-23 20:02 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Florian Achleitner, Jonathan Nieder, davidbarr, git

On Mon, Jul 23, 2012 at 09:46:49PM +0200, Matthieu Moy wrote:

> > Damn. That's usually no problem with kmail either, if the config is right.
> > I've already used git-send-email several times.
> > But for replying to threads and adding several Cc: addresses it's a little 
> > cumbersome.
> > How do you do that in a nice way?
> 
> For the threading itself, I usually find the message-id, and use
> "git send-email --in-reply-to='<cut-and-pasted-id>'". The painful part
> is when you want to reproduce a Cc: list, but I have no magic trick for
> that ;-).

I save a copy of the message I am replying to (usually my cover letter,
which I generated by just replying in my MUA) into a well-known location
(I use a mutt hot-key to do this), and then run it through this
monstrosity:

  get_reply_headers() {
    perl -ne '
      if (defined $opt && /^\s+(.*)/) {
        $val .= " $1";
        next;
      }
      if (defined $opt) {
        print "--$opt=", quotemeta($val), " ";
        $opt = $val = undef;
      }
      if (/^(cc|to):\s*(.*)/i) {
        $opt = lc($1);
        $val = $2;
      }
      elsif (/^message-id:\s*(.*)/i) {
        $opt = "in-reply-to";
        $val = $1;
      }
    '
  }

which can be used on the command line of format-patch or send-email, like:

  eval "git format-patch $(get_reply_headers <your-patch-file)"

I put the result in an mbox, then review and send it out in mutt (using
the resend-message command), but you could invoke send-email directly,
or format-patch into a file for review and send it with send-email.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 15:06       ` Junio C Hamano
@ 2012-07-23 20:08         ` Florian Achleitner
  2012-07-23 20:38           ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Achleitner @ 2012-07-23 20:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: florian.achleitner.2.6.31, Jonathan Nieder, davidbarr, git

To ease testing without depending on a reachable svn server, this
compact python script mimics parts of svnrdumps behaviour.
It requires the remote url to start with sim://.
Eventual slashes at the end of the url are stripped.
The url specifies the path of the svn dump file (as created by
svnrdump). Selectable parts of it, or the whole file, are written
to stdout. The part is selectable by giving start and end revision
on the command line.

Start and end revisions can be specified on the command line
(-rSTART:END, like for svnrdump).
Only revisions between START and excluding END are replayed from
the dumpfile specified by the url. END can also be HEAD.

If the start revision specified on the command line doesn't exist
in the dump file, it returns 1.
This emulates the behaviour of svnrdump when START>HEAD, i.e. the
requested start revision doesn't exist on the server.

To allow using the same dump file for simulating multiple
incremental imports the highest visible revision can be limited by
setting the environment variable SVNRMAX to that value. This
effectively limits HEAD to simulate the situation where higher
revs don't exist yet.

Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
---
 contrib/svn-fe/svnrdump_sim.py |   53 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)
 create mode 100755 contrib/svn-fe/svnrdump_sim.py

diff --git a/contrib/svn-fe/svnrdump_sim.py b/contrib/svn-fe/svnrdump_sim.py
new file mode 100755
index 0000000..4701d76
--- /dev/null
+++ b/contrib/svn-fe/svnrdump_sim.py
@@ -0,0 +1,53 @@
+#!/usr/bin/python
+"""
+Simulates svnrdump by replaying an existing dump from a file, taking care
+of the specified revision range.
+To simulate incremental imports the environment variable SVNRMAX can be set
+to the highest revision that should be available.
+"""
+import sys, os
+
+
+def getrevlimit():
+	var = 'SVNRMAX'
+	if os.environ.has_key(var):
+		return os.environ[var]
+	return None
+	
+def writedump(url, lower, upper):
+	if url.startswith('sim://'):
+		filename = url[6:]
+		if filename[-1] == '/': filename = filename[:-1] #remove terminating slash
+	else:
+		raise ValueError('sim:// url required')
+	f = open(filename, 'r');
+	state = 'header'
+	wroterev = False
+	while(True):
+		l = f.readline()
+		if l == '': break
+		if state == 'header' and l.startswith('Revision-number: '):
+			state = 'prefix'
+		if state == 'prefix' and l == 'Revision-number: %s\n' % lower:
+			state = 'selection'
+		if not upper == 'HEAD' and state == 'selection' and l == 'Revision-number: %s\n' % upper:
+			break;
+
+		if state == 'header' or state == 'selection':
+			if state == 'selection': wroterev = True
+			sys.stdout.write(l)
+	return wroterev
+
+if __name__ == "__main__":
+	if not (len(sys.argv) in (3, 4, 5)):
+		print "usage: %s dump URL -rLOWER:UPPER"
+		sys.exit(1)
+	if not sys.argv[1] == 'dump': raise NotImplementedError('only "dump" is suppported.')
+	url = sys.argv[2]
+	r = ('0', 'HEAD')
+	if len(sys.argv) == 4 and sys.argv[3][0:2] == '-r':
+		r = sys.argv[3][2:].lstrip().split(':')
+	if not getrevlimit() is None: r[1] = getrevlimit()
+	if writedump(url, r[0], r[1]): ret = 0
+	else: ret = 1
+	sys.exit(ret)
\ No newline at end of file
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 20:08         ` Florian Achleitner
@ 2012-07-23 20:38           ` Junio C Hamano
  2012-07-24 19:50             ` Jonathan Nieder
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2012-07-23 20:38 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: Jonathan Nieder, davidbarr, git

Florian Achleitner <florian.achleitner.2.6.31@gmail.com> writes:

> To ease testing without depending on a reachable svn server, this
> compact python script mimics parts of svnrdumps behaviour.
> It requires the remote url to start with sim://.
> Eventual slashes at the end of the url are stripped.

s/ventual/xcess/ perhaps?

> The url specifies the path of the svn dump file (as created by
> svnrdump). Selectable parts of it, or the whole file, are written
> to stdout. The part is selectable by giving start and end revision
> on the command line.
>
> Start and end revisions can be specified on the command line
> (-rSTART:END, like for svnrdump).
> Only revisions between START and excluding END are replayed from
> the dumpfile specified by the url. END can also be HEAD.
>
> If the start revision specified on the command line doesn't exist
> in the dump file, it returns 1.
> This emulates the behaviour of svnrdump when START>HEAD, i.e. the
> requested start revision doesn't exist on the server.

Much more understandable than before.

> To allow using the same dump file for simulating multiple
> incremental imports the highest visible revision can be limited by
> setting the environment variable SVNRMAX to that value. This
> effectively limits HEAD to simulate the situation where higher
> revs don't exist yet.

It is unclear how this is different from giving the ceiling by
specifying it as the "END" in -rSTART:END command line.  Is this
feature really needed?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 12:44     ` [PATCH] Add a svnrdump-simulator replaying a dump file for testing Florian Achleitner
  2012-07-23 12:59       ` Jonathan Nieder
  2012-07-23 15:06       ` Junio C Hamano
@ 2012-07-24 12:06       ` Erik Faye-Lund
  2 siblings, 0 replies; 21+ messages in thread
From: Erik Faye-Lund @ 2012-07-24 12:06 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: Jonathan Nieder, davidbarr, git

On Mon, Jul 23, 2012 at 2:44 PM, Florian Achleitner
<florian.achleitner.2.6.31@gmail.com> wrote:
> +       sys.exit(ret)
> \ No newline at end of file

Nit: add a \n after "sys.exit(ret)", perhaps?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-23 20:38           ` Junio C Hamano
@ 2012-07-24 19:50             ` Jonathan Nieder
  2012-07-25  6:20               ` Florian Achleitner
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Nieder @ 2012-07-24 19:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Florian Achleitner, davidbarr, git

Hi,

Junio C Hamano wrote:
> Florian Achleitner <florian.achleitner.2.6.31@gmail.com> writes:

>> To ease testing without depending on a reachable svn server, this
>> compact python script mimics parts of svnrdumps behaviour.
>> It requires the remote url to start with sim://.
[...]
>> To allow using the same dump file for simulating multiple
>> incremental imports the highest visible revision can be limited by
>> setting the environment variable SVNRMAX to that value. This
>> effectively limits HEAD to simulate the situation where higher
>> revs don't exist yet.
>
> It is unclear how this is different from giving the ceiling by
> specifying it as the "END" in -rSTART:END command line.  Is this
> feature really needed?

I think the idea is that you put this script (or a symlink to it) on
your $PATH with higher precedence than svnrdump and run a command
that expected to be able to use svnrdump.  Then instead of going to
the network, the command you run magically uses your test data
instead.

If the command you are testing wanted to run "svnrdump" without the
upper endpoint set, we need to handle that request, either by emitting
all the revs we have, or by stopping somewhere.  The revlimit feature
provides the "stopping somewhere" behavior which is not strictly
needed but is presumably very useful when testing incremental fetch.

Florian, do you mind if I make the revlimit feature a separate patch
when applying this?

Anyway, it looks good and reasonable to me, so will apply.

Thanks.
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Add a svnrdump-simulator replaying a dump file for testing.
  2012-07-24 19:50             ` Jonathan Nieder
@ 2012-07-25  6:20               ` Florian Achleitner
  0 siblings, 0 replies; 21+ messages in thread
From: Florian Achleitner @ 2012-07-25  6:20 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, Florian Achleitner, davidbarr, git

On Tuesday 24 July 2012 14:50:49 Jonathan Nieder wrote:
> > It is unclear how this is different from giving the ceiling by
> > specifying it as the "END" in -rSTART:END command line.  Is this
> > feature really needed?
> 
> I think the idea is that you put this script (or a symlink to it) on
> your $PATH with higher precedence than svnrdump and run a command
> that expected to be able to use svnrdump.  Then instead of going to
> the network, the command you run magically uses your test data
> instead.
> 
> If the command you are testing wanted to run "svnrdump" without the
> upper endpoint set, we need to handle that request, either by emitting
> all the revs we have, or by stopping somewhere.  The revlimit feature
> provides the "stopping somewhere" behavior which is not strictly
> needed but is presumably very useful when testing incremental fetch.

Exactly, the purpose is to transparently replace svnrdump.
Callers of svnrdump usually will specify -rSTART:HEAD, because they want to 
fetch everything they don't yet have.
This feature allows to limit HEAD and to simulate incremental fetches using 
the same dump file.
For me it proved very useful.

> Florian, do you mind if I make the revlimit feature a separate patch
> when applying this?

No problem.

> 
> Anyway, it looks good and reasonable to me, so will apply.
> 
> Thanks.
> Jonathan

--
Florian

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-07-25  6:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-22 21:03 GSOC remote-svn Florian Achleitner
2012-07-22 21:43 ` Jonathan Nieder
2012-07-23  9:42   ` Florian Achleitner
2012-07-23 12:04     ` Jonathan Nieder
2012-07-23 12:44     ` [PATCH] Add a svnrdump-simulator replaying a dump file for testing Florian Achleitner
2012-07-23 12:59       ` Jonathan Nieder
2012-07-23 13:16         ` Florian Achleitner
2012-07-23 13:16         ` Florian Achleitner
2012-07-23 16:24           ` Matthieu Moy
2012-07-23 19:28             ` Florian Achleitner
2012-07-23 19:46               ` Matthieu Moy
2012-07-23 20:02                 ` Jeff King
2012-07-23 15:06       ` Junio C Hamano
2012-07-23 20:08         ` Florian Achleitner
2012-07-23 20:38           ` Junio C Hamano
2012-07-24 19:50             ` Jonathan Nieder
2012-07-25  6:20               ` Florian Achleitner
2012-07-24 12:06       ` Erik Faye-Lund
2012-07-23  7:59 ` GSOC remote-svn Matthieu Moy
2012-07-23 11:59   ` Jonathan Nieder
2012-07-23  9:42 ` Florian Achleitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.