git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Rewriting references to existing commits in commit messages with filter-branch
@ 2013-03-11 11:45 Vadim Zeitlin
  2013-03-11 12:06 ` Lawrence Mitchell
  2013-03-11 13:53 ` Michael Haggerty
  0 siblings, 2 replies; 7+ messages in thread
From: Vadim Zeitlin @ 2013-03-11 11:45 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1588 bytes --]

 Hello,

 I'm in the process of converting an existing svn repository to git. The
first step was a standard "git svn clone" that went successfully (after
taking 15 hours to complete). However I don't want to stop there and would
like massage the repository a little before making it publicly available.

 The first thing I'd like to do is to replace all references to subversion
revision numbers in the commit messages with the corresponding git commit
SHA1s. I've written a small message filter script called "svnmsg2git" that
searches for all occurrences of r12345, runs "git svn find-rev r12345" and
then -- and this is the important part -- looks up the new commit id
corresponding to this under .git-rewrite/map. This seemed to work well in
limited testing I did initially but after running

git filter-branch --msg-filter svnmsg2git --tag-name-filter cat -- --all

on all ~50k revisions, I have a couple of dozens of errors which happen
because the file .git-rewrite/map/$commit doesn't exist yet when I'm trying
to look it up.

 Does anybody know of a way to fix this? This happens apparently because
filter-branch doesn't process the commits in their svn order, and when one
of them is on a branch while the other one is on the trunk, it can happen
that the commit references a previous svn revision is processed before the
commit corresponding to this revision itself. At least this is the only
explanation I see. But even if my hypothesis is correct, I still have no
idea about how to force filter-branch to do things in the "right" order.

 Thanks in advance for any ideas!
VZ

[-- Attachment #2: Type: APPLICATION/PGP-SIGNATURE, Size: 196 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting references to existing commits in commit messages with filter-branch
  2013-03-11 11:45 Rewriting references to existing commits in commit messages with filter-branch Vadim Zeitlin
@ 2013-03-11 12:06 ` Lawrence Mitchell
  2013-03-11 12:23   ` Vadim Zeitlin
  2013-03-11 13:53 ` Michael Haggerty
  1 sibling, 1 reply; 7+ messages in thread
From: Lawrence Mitchell @ 2013-03-11 12:06 UTC (permalink / raw)
  To: git

Vadim Zeitlin wrote:

[...]


> git filter-branch --msg-filter svnmsg2git --tag-name-filter cat -- --all

git rev-list lists by default in chronological order.  Do you
want to pass --topo-order as one of the rev-list options?
[...]

Lawrence
-- 
Lawrence Mitchell <wence@gmx.li>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting references to existing commits in commit messages with filter-branch
  2013-03-11 12:06 ` Lawrence Mitchell
@ 2013-03-11 12:23   ` Vadim Zeitlin
  2013-03-11 13:11     ` Thomas Rast
  0 siblings, 1 reply; 7+ messages in thread
From: Vadim Zeitlin @ 2013-03-11 12:23 UTC (permalink / raw)
  To: git

Lawrence Mitchell <wence <at> gmx.li> writes:

> Vadim Zeitlin wrote:
> 
> [...]
> 
> > git filter-branch --msg-filter svnmsg2git --tag-name-filter cat -- --all
> 
> git rev-list lists by default in chronological order.  Do you
> want to pass --topo-order as one of the rev-list options?

 Thanks, this looked like a good idea but reading git-filter-branch code it
seems to already do it, at
https://github.com/git/git/blob/master/git-filter-branch.sh#L269 you can see
that it does "git rev-list --reverse --topo-order ...".

 So this probably won't help (I could try it just in case I'm missing something
but the first errors appear after almost 2 hours of running...). Notice that I
could well be wrong in my explanation of what happens, perhaps it's not related
to the order of processing of the branches/trunk at all. All I know is that when
a chronologically later commit referring to preceding one on a different branch
is processed by git-filter-branch, sometimes (or perhaps even always) the file
corresponding to the previous commit is not yet present in .git-rewrite/map
directory.

 Thanks again for any help with this,
VZ

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting references to existing commits in commit messages with filter-branch
  2013-03-11 12:23   ` Vadim Zeitlin
@ 2013-03-11 13:11     ` Thomas Rast
  2013-03-11 15:58       ` Vadim Zeitlin
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Rast @ 2013-03-11 13:11 UTC (permalink / raw)
  To: Vadim Zeitlin; +Cc: git

Vadim Zeitlin <vz-git@zeitlins.org> writes:

> Lawrence Mitchell <wence <at> gmx.li> writes:
>
>> Vadim Zeitlin wrote:
>> 
>> [...]
>> 
>> > git filter-branch --msg-filter svnmsg2git --tag-name-filter cat -- --all
>> 
>> git rev-list lists by default in chronological order.  Do you
>> want to pass --topo-order as one of the rev-list options?
>
>  Thanks, this looked like a good idea but reading git-filter-branch code it
> seems to already do it, at
> https://github.com/git/git/blob/master/git-filter-branch.sh#L269 you can see
> that it does "git rev-list --reverse --topo-order ...".

Try overring that with --date-order (you may have to patch the source).
--topo-order doesn't order by dates.  --date-order does somewhat
(respecting topology), which in the absence of clock skew should do what
you are looking for.

Note that you cannot *remove* --topo-order and use the default, which is
to only respect dates and not topology; that would break filter-branch.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting references to existing commits in commit messages with filter-branch
  2013-03-11 11:45 Rewriting references to existing commits in commit messages with filter-branch Vadim Zeitlin
  2013-03-11 12:06 ` Lawrence Mitchell
@ 2013-03-11 13:53 ` Michael Haggerty
  2013-03-11 14:05   ` Vadim Zeitlin
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Haggerty @ 2013-03-11 13:53 UTC (permalink / raw)
  To: Vadim Zeitlin; +Cc: git

On 03/11/2013 12:45 PM, Vadim Zeitlin wrote:
> [...]
>  The first thing I'd like to do is to replace all references to subversion
> revision numbers in the commit messages with the corresponding git commit
> SHA1s. [...] I have a couple of dozens of errors which happen
> because the file .git-rewrite/map/$commit doesn't exist yet when I'm trying
> to look it up.

The quick and dirty solution would be to rewrite your script such that
if the commit is still unknown to Git, it emits a warning and leaves the
commit message unchanged (i.e., leaves the Subversion revision number
untouched).  Then simply run the filter-branch a few times until it
emits no warnings.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting references to existing commits in commit messages with filter-branch
  2013-03-11 13:53 ` Michael Haggerty
@ 2013-03-11 14:05   ` Vadim Zeitlin
  0 siblings, 0 replies; 7+ messages in thread
From: Vadim Zeitlin @ 2013-03-11 14:05 UTC (permalink / raw)
  To: git

Michael Haggerty <mhagger <at> alum.mit.edu> writes:

> 
> On 03/11/2013 12:45 PM, Vadim Zeitlin wrote:
> > [...]
> >  The first thing I'd like to do is to replace all references to subversion
> > revision numbers in the commit messages with the corresponding git commit
> > SHA1s. [...] I have a couple of dozens of errors which happen
> > because the file .git-rewrite/map/$commit doesn't exist yet when I'm trying
> > to look it up.
> 
> The quick and dirty solution would be to rewrite your script such that
> if the commit is still unknown to Git, it emits a warning and leaves the
> commit message unchanged (i.e., leaves the Subversion revision number
> untouched).  Then simply run the filter-branch a few times until it
> emits no warnings.

 Thanks, I did think about this but the trouble is that after the first rewrite
"git svn find-rev" wouldn't work any more, so I'd have to do the substitutions
manually. Which might be doable as there are not that many of them but, if
possible, I'd rather do it automatically.

 FWIW I'm playing with --date-order now (thanks Thomas) but somehow it seems to
create other problems while fixing (some of) the existing ones. I need to look
at this more closely to understand what's going on here...

VZ

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting references to existing commits in commit messages with filter-branch
  2013-03-11 13:11     ` Thomas Rast
@ 2013-03-11 15:58       ` Vadim Zeitlin
  0 siblings, 0 replies; 7+ messages in thread
From: Vadim Zeitlin @ 2013-03-11 15:58 UTC (permalink / raw)
  To: git

Thomas Rast <trast <at> student.ethz.ch> writes:

> 
> Vadim Zeitlin <vz-git <at> zeitlins.org> writes:
> 
> > Lawrence Mitchell <wence <at> gmx.li> writes:
> >
> >> Vadim Zeitlin wrote:
> >> 
> >> [...]
> >> 
> >> > git filter-branch --msg-filter svnmsg2git --tag-name-filter cat -- --all
> >> 
> >> git rev-list lists by default in chronological order.  Do you
> >> want to pass --topo-order as one of the rev-list options?
> >
> >  Thanks, this looked like a good idea but reading git-filter-branch code it
> > seems to already do it, at
> > https://github.com/git/git/blob/master/git-filter-branch.sh#L269 you can see
> > that it does "git rev-list --reverse --topo-order ...".
> 
> Try overring that with --date-order (you may have to patch the source).

 Thanks for the hint, this was indeed the solution. And there is actually no
need to patch the source because, considering the way git-filter-branch.sh is
written, the user-specified parameters come after the hard-coded --topo-order
and it seems that --date-order overrides it if it comes after it. So I just had
to use

git filter-branch --msg-filter svnmsg2git --tag-name-filter cat -- --date-order
--all

instead of my original command. The only remaining question I have is why isn't
--date-order the default? At least when using message filter, it seems to me
that we always want to rewrite commits in chronological order to deal with
possible back references (even when not migrating from svn, commit messages can
still refer to previous commits, like e.g. the ones created by "git revert" do
and they need to be updated when rewriting history). So why not use it in
git-filter-branch.sh?


 BTW, the explanation for the new errors I was getting with --date-order was
that I had some artificial commits generated by cvs2svn in the history of this
repository which had _exactly_ the same date as the previous commit and
--date-order sorted them in the wrong order for some reason. I got round this by
simply checking for the specific form of the message (which is "This commit was
generated by cvs2svn to compensate for changes in rNNNNN, which included commits
to RCS files with non-trunk default branches.") and replacing "rNNNNN" with "the
previous commit" in this particular case in order to avoid the problem.

 Thanks again for your help!
VZ

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-03-11 15:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-11 11:45 Rewriting references to existing commits in commit messages with filter-branch Vadim Zeitlin
2013-03-11 12:06 ` Lawrence Mitchell
2013-03-11 12:23   ` Vadim Zeitlin
2013-03-11 13:11     ` Thomas Rast
2013-03-11 15:58       ` Vadim Zeitlin
2013-03-11 13:53 ` Michael Haggerty
2013-03-11 14:05   ` Vadim Zeitlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).