git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CVS to git: parsing ChangeLog entries?
@ 2008-03-12 19:08 Ralf Wildenhues
  2008-03-13 17:03 ` Jeff King
  2008-03-14 18:40 ` Michael Haggerty
  0 siblings, 2 replies; 4+ messages in thread
From: Ralf Wildenhues @ 2008-03-12 19:08 UTC (permalink / raw)
  To: git

Hello,

when migrating a project that uses GNU-style ChangeLogs from CVS
to git, is it possible to extract patch author information from
the ChangeLog entries rather than from the CVS commit logs?
For simplicity let's first assume the project used only one
ChangeLog file.

Asking because it is not uncommon that patches are committed on
behalf of other people, and it would be nice to credit them.

Related question: when CVS commit logs have varying encoding,
say, some latin1 and some UTF-8, is it possible to have uniformly
encoded git log entries?

Thank you, and please Cc: me on replies,
Ralf

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CVS to git: parsing ChangeLog entries?
  2008-03-12 19:08 CVS to git: parsing ChangeLog entries? Ralf Wildenhues
@ 2008-03-13 17:03 ` Jeff King
  2008-03-15 10:38   ` Ralf Wildenhues
  2008-03-14 18:40 ` Michael Haggerty
  1 sibling, 1 reply; 4+ messages in thread
From: Jeff King @ 2008-03-13 17:03 UTC (permalink / raw)
  To: Ralf Wildenhues, git

On Wed, Mar 12, 2008 at 08:08:27PM +0100, Ralf Wildenhues wrote:

> when migrating a project that uses GNU-style ChangeLogs from CVS
> to git, is it possible to extract patch author information from
> the ChangeLog entries rather than from the CVS commit logs?
> For simplicity let's first assume the project used only one
> ChangeLog file.

I don't think there is a way to do this automatically with
git-cvsimport. However, once imported, I think you could rewrite history
using git-filter-branch with a filter that looked at the diff of
ChangeLog for that commit and rewrote the author. See the documentation
for git-filter-branch.

> Related question: when CVS commit logs have varying encoding,
> say, some latin1 and some UTF-8, is it possible to have uniformly
> encoded git log entries?

I don't think git-cvsimport does much with encodings at all. But again,
you could probably go back through the imported repo with
git-filter-branch and iconv the commit messages as appropriate.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CVS to git: parsing ChangeLog entries?
  2008-03-12 19:08 CVS to git: parsing ChangeLog entries? Ralf Wildenhues
  2008-03-13 17:03 ` Jeff King
@ 2008-03-14 18:40 ` Michael Haggerty
  1 sibling, 0 replies; 4+ messages in thread
From: Michael Haggerty @ 2008-03-14 18:40 UTC (permalink / raw)
  To: Ralf Wildenhues, git

I'll answer your questions from the point of view of cvs2git, a.k.a.
cvs2svn (http://cvs2svn.tigris.org).

Ralf Wildenhues wrote:
> when migrating a project that uses GNU-style ChangeLogs from CVS
> to git, is it possible to extract patch author information from
> the ChangeLog entries rather than from the CVS commit logs?
> For simplicity let's first assume the project used only one
> ChangeLog file.
> 
> Asking because it is not uncommon that patches are committed on
> behalf of other people, and it would be nice to credit them.

There is no builtin support for this in cvs2git.  But the place where
the author is determined knows about the whole changeset, including any
simultaneous changes to the changelog.  So it should be possible to add
this functionality without too much work.

> Related question: when CVS commit logs have varying encoding,
> say, some latin1 and some UTF-8, is it possible to have uniformly
> encoded git log entries?

cvs2git allows you to specify multiple encodings.  It tries one after
the other until one works successfully.  It also has hooks where you can
add your own decoder using arbitrary Python code.

That reminds me that there is a Python universal decoder that uses
heuristics to determine the encoding of an arbitrary octet stream.  That
might be a nice thing to add support for....

Michael

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CVS to git: parsing ChangeLog entries?
  2008-03-13 17:03 ` Jeff King
@ 2008-03-15 10:38   ` Ralf Wildenhues
  0 siblings, 0 replies; 4+ messages in thread
From: Ralf Wildenhues @ 2008-03-15 10:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git

* Jeff King wrote on Thu, Mar 13, 2008 at 06:03:22PM CET:
> On Wed, Mar 12, 2008 at 08:08:27PM +0100, Ralf Wildenhues wrote:
> 
> > when migrating a project that uses GNU-style ChangeLogs from CVS
> > to git, is it possible to extract patch author information from
> > the ChangeLog entries rather than from the CVS commit logs?

> I don't think there is a way to do this automatically with
> git-cvsimport. However, once imported, I think you could rewrite history
> using git-filter-branch with a filter that looked at the diff of
> ChangeLog for that commit and rewrote the author. See the documentation
> for git-filter-branch.

Thank you, I just learned a new cool tool!

FWIW, here's what I used successfully on a repository with only one
ChangeLog file (some fiddling to cope with format variations):

git filter-branch -d /dev/shm/t --tree-filter '
  line=`sed -n "s,^[12][90][0-9][0-9]-[0-1][0-9]-[0-3][0-9]  *\([A-Za-z].*\),\1,;
                s,.*  \([A-Za-z].*\),\1,;
                /./{
                        s/(tiny change)//
                        s/ *$//
                        p
                        q
                }" ChangeLog`
  author=`echo "$line" | sed "s, *[<(].*,,"`
  email=`echo "$line" | sed "s,[^,(]*[<(],<,; s/[)>].*/>/"`
  test -n "$author" && test -n "$email" && {
        GIT_AUTHOR_NAME="$author"
        GIT_AUTHOR_EMAIL="$email"
        export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL
  }'

> > Related question: when CVS commit logs have varying encoding,
> > say, some latin1 and some UTF-8, is it possible to have uniformly
> > encoded git log entries?
> 
> I don't think git-cvsimport does much with encodings at all. But again,
> you could probably go back through the imported repo with
> git-filter-branch and iconv the commit messages as appropriate.

I'll try that, too.

Thanks!
Ralf

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-03-15 10:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-12 19:08 CVS to git: parsing ChangeLog entries? Ralf Wildenhues
2008-03-13 17:03 ` Jeff King
2008-03-15 10:38   ` Ralf Wildenhues
2008-03-14 18:40 ` Michael Haggerty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).