git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Storing (hidden) per-commit metadata
@ 2010-02-19 17:11 Jelmer Vernooij
  2010-02-20 17:41 ` Ben Gamari
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-19 17:11 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

To allow round-tripping pushes from Bazaar into Git, I'm looking for a
good place to store Bazaar semantics that can not be represented in Git
at the moment. This data should ideally be hidden from the user as much
as possible; it would e.g. contain mappings from git hashes to Bazaar
ids. 

One option would be to store it (as hg-git does) at the bottom of each
git commit message. However, given the amount of data and the its kind,
it would be annoying to have it displayed by e.g. "git show" or "git
log".

Some people have suggested I use the new git notes to store this
metadata, but I haven't quite figured out how to add notes that aren't
displayed by git log/show and are still propagated along with the
revision. Is that at all possible using notes, and are they the right
thing to use here?

There also doesn't appear to be any documentation on notes in
Documentation/technical at the moment. I'm happy to contribute some if
somebody can provide pointers.

Cheers,

Jelmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-19 17:11 Storing (hidden) per-commit metadata Jelmer Vernooij
@ 2010-02-20 17:41 ` Ben Gamari
  2010-02-20 18:57   ` Avery Pennarun
  2010-02-22  5:11 ` Gabriel Filion
  2010-02-22 22:13 ` "Alejandro R. Sedeño"
  2 siblings, 1 reply; 19+ messages in thread
From: Ben Gamari @ 2010-02-20 17:41 UTC (permalink / raw)
  To: git

Excerpts from Jelmer Vernooij's message of Fri Feb 19 12:11:25 -0500 2010:
> To allow round-tripping pushes from Bazaar into Git, I'm looking for a
> good place to store Bazaar semantics that can not be represented in Git
> at the moment. This data should ideally be hidden from the user as much
> as possible; it would e.g. contain mappings from git hashes to Bazaar
> ids. 
> 
Are you sure you want to hide this? I believe git-svn puts this
information in its commit messages (although I don't know whether it's
stored elsewhere as well).

- Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-20 17:41 ` Ben Gamari
@ 2010-02-20 18:57   ` Avery Pennarun
  2010-02-21  6:34     ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Avery Pennarun @ 2010-02-20 18:57 UTC (permalink / raw)
  To: Ben Gamari; +Cc: git

On Sat, Feb 20, 2010 at 12:41 PM, Ben Gamari <bgamari@gmail.com> wrote:
> Excerpts from Jelmer Vernooij's message of Fri Feb 19 12:11:25 -0500 2010:
>> To allow round-tripping pushes from Bazaar into Git, I'm looking for a
>> good place to store Bazaar semantics that can not be represented in Git
>> at the moment. This data should ideally be hidden from the user as much
>> as possible; it would e.g. contain mappings from git hashes to Bazaar
>> ids.
>>
> Are you sure you want to hide this? I believe git-svn puts this
> information in its commit messages (although I don't know whether it's
> stored elsewhere as well).

Note that git-svn doesn't store *all* the stuff from svn in the git
repository.  So you couldn't, for example, regenerate an svn repo
identical to the original from its git-svn clone.  This limitation is
rarely noticed since the stuff git-svn doesn't store is stuff that git
mostly does differently/automatically/etc.  But that's why git-svn can
get away with cluttering your commit messages with "only" one line of
git-svn cruft each.

However, this does bring up the question: how important is it *really*
to be able to "round trip" from bzr to git and back without losing
information?  Maybe you only need to store enough information to pull
from bzr and then push back your commits.

As for git-notes, they sound like they would be useful for this sort
of thing.  I haven't tried them yet, but my understanding is that
notes anywhere other than the "default" notes ref are not shown in
commit messages, so you can use them for whatever you want.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-20 18:57   ` Avery Pennarun
@ 2010-02-21  6:34     ` Jeff King
  2010-02-21  8:49       ` Johannes Schindelin
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jeff King @ 2010-02-21  6:34 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ben Gamari, git

On Sat, Feb 20, 2010 at 01:57:31PM -0500, Avery Pennarun wrote:

> As for git-notes, they sound like they would be useful for this sort
> of thing.  I haven't tried them yet, but my understanding is that
> notes anywhere other than the "default" notes ref are not shown in
> commit messages, so you can use them for whatever you want.

I would want to hear more about the actual data being stored. The
strength of notes is that you can _change_ them after the commit has
been created. And the price you pay is that they are more annoying to
move around, because they are in a totally different ref.

If this is data that is being generated at the time the commit is
created and then set in stone, then it probably should be part of the
commit object.

If the only problem is that the data is ugly in "git show", then perhaps
we need a "suppress these pseudo-headers" feature for showing logs. It
keeps them easily available for inspection or for --grep, but most of
the time you would not see them.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-21  6:34     ` Jeff King
@ 2010-02-21  8:49       ` Johannes Schindelin
  2010-02-21  8:52         ` Jeff King
  2010-02-21 12:17       ` Jelmer Vernooij
  2010-02-22 14:57       ` Jelmer Vernooij
  2 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2010-02-21  8:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Avery Pennarun, Ben Gamari, git

Hi,

On Sun, 21 Feb 2010, Jeff King wrote:

> If the only problem is that the data is ugly in "git show", then perhaps 
> we need a "suppress these pseudo-headers" feature for showing logs. It 
> keeps them easily available for inspection or for --grep, but most of 
> the time you would not see them.

Whoa. Even more processing to do for each commit during a "git log" run? 
You know, other people are working on _accelerating_ git log as we speak!

And really, while I can understand that the OP wanted to hide the 
information, I am really against that. For example, when I see a log with 
git-svn footers, it gives me _additional_ information which I actually 
like (it tells me where these commits really come from). If they do not 
need bidirectional, they can skip those footers.

But I do agree that it is better to put the information into the same 
objects rather than notes, lest the information get out-of-sync.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-21  8:49       ` Johannes Schindelin
@ 2010-02-21  8:52         ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2010-02-21  8:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Avery Pennarun, Ben Gamari, git

On Sun, Feb 21, 2010 at 09:49:28AM +0100, Johannes Schindelin wrote:

> > If the only problem is that the data is ugly in "git show", then perhaps 
> > we need a "suppress these pseudo-headers" feature for showing logs. It 
> > keeps them easily available for inspection or for --grep, but most of 
> > the time you would not see them.
> 
> Whoa. Even more processing to do for each commit during a "git log" run? 
> You know, other people are working on _accelerating_ git log as we speak!

I think this is premature.  You would not need to pay the price for such
a feature if you were not actually using it. On top of which, as it does
not yet exist, it has not actually been benchmarked. So any complaints
should wait until it is actually implemented.

> And really, while I can understand that the OP wanted to hide the 
> information, I am really against that. For example, when I see a log with 
> git-svn footers, it gives me _additional_ information which I actually 
> like (it tells me where these commits really come from). If they do not 
> need bidirectional, they can skip those footers.

I think it depends on what the information is, which I'm still not clear
on. But most importantly, I think it makes sense to put control of
whether that information is seen in the hands of the user who is
invoking git.

-Peff

PS I tried to keep this message short. Short enough? :)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-21  6:34     ` Jeff King
  2010-02-21  8:49       ` Johannes Schindelin
@ 2010-02-21 12:17       ` Jelmer Vernooij
  2010-02-22  5:17         ` Dmitry Potapov
  2010-02-22 14:57       ` Jelmer Vernooij
  2 siblings, 1 reply; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-21 12:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Avery Pennarun, Ben Gamari, git

[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]

On Sun, 2010-02-21 at 01:34 -0500, Jeff King wrote:
> On Sat, Feb 20, 2010 at 01:57:31PM -0500, Avery Pennarun wrote:
> > As for git-notes, they sound like they would be useful for this sort
> > of thing.  I haven't tried them yet, but my understanding is that
> > notes anywhere other than the "default" notes ref are not shown in
> > commit messages, so you can use them for whatever you want.
> I would want to hear more about the actual data being stored. The
> strength of notes is that you can _change_ them after the commit has
> been created. And the price you pay is that they are more annoying to
> move around, because they are in a totally different ref.
> 
> If this is data that is being generated at the time the commit is
> created and then set in stone, then it probably should be part of the
> commit object.
This data is supposed to be set in stone, since Bazaar revisions are
intended to be immutable, like Git commits are.

For each file we would need to store:

 * the Bazaar revision id
 * any Bazaar revision properties. This is typically a list of URLs of
bugs that were fixed, name of the branch the commit was on, any
additional parents, or anything arbitrary set by plugins (e.g. the
rebase plugin sets 'rebase-of' to the id of the original revision)
 * For each file that was added or moved around in the revision, a path
to fileid mapping
 * Optionally, a list of ghost parent ids and "unusual" revisions for
each file but these should be rare.

This is at least a couple of lines of data and in some cases a lot more.
I would rather avoid confronting git users who don't care about Bazaar
with it.

> If the only problem is that the data is ugly in "git show", then perhaps
> we need a "suppress these pseudo-headers" feature for showing logs. It
> keeps them easily available for inspection or for --grep, but most of
> the time you would not see them.
That seems like a sensible thing to do, and would work well for me.

Cheers,

Jelmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-19 17:11 Storing (hidden) per-commit metadata Jelmer Vernooij
  2010-02-20 17:41 ` Ben Gamari
@ 2010-02-22  5:11 ` Gabriel Filion
  2010-02-22  9:49   ` Jelmer Vernooij
  2010-02-22 22:13 ` "Alejandro R. Sedeño"
  2 siblings, 1 reply; 19+ messages in thread
From: Gabriel Filion @ 2010-02-22  5:11 UTC (permalink / raw)
  To: Jelmer Vernooij; +Cc: git

Hello,

On 2010-02-19 12:11, Jelmer Vernooij wrote:
> To allow round-tripping pushes from Bazaar into Git, I'm looking for a
> good place to store Bazaar semantics that can not be represented in Git
> at the moment. This data should ideally be hidden from the user as much
> as possible; it would e.g. contain mappings from git hashes to Bazaar
> ids. 
> 
What are you currently using for interacting with Bazaar repositories?

If you already have code for a remote helper, I would be interested in
helping you out. I started a discussion recently on this list about
staring such a script. Would you be willing to collaborate on having
this implemented?

-- 
Gabriel Filion

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-21 12:17       ` Jelmer Vernooij
@ 2010-02-22  5:17         ` Dmitry Potapov
  2010-02-22  9:56           ` Jelmer Vernooij
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Potapov @ 2010-02-22  5:17 UTC (permalink / raw)
  To: Jelmer Vernooij; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

On Sun, Feb 21, 2010 at 01:17:26PM +0100, Jelmer Vernooij wrote:
> 
> For each file we would need to store:
> 
>  * the Bazaar revision id
>  * any Bazaar revision properties. This is typically a list of URLs of
> bugs that were fixed, name of the branch the commit was on, any
> additional parents, or anything arbitrary set by plugins (e.g. the
> rebase plugin sets 'rebase-of' to the id of the original revision)
>  * For each file that was added or moved around in the revision, a path
> to fileid mapping
>  * Optionally, a list of ghost parent ids and "unusual" revisions for
> each file but these should be rare.
> 
> This is at least a couple of lines of data and in some cases a lot more.
> I would rather avoid confronting git users who don't care about Bazaar
> with it.

The problem with storying this meta data in the commit object is that
any newly created commits in Git will not have this information, and you
probably have to add it later when you export these commits to Bazaar,
which means that the history in Git should be re-written, and Git users
will have to rebase their branches from one commit to another that are
identical except this Bazaar-specific information, which you try to hide
from them. So much for don't care about Bazaar!

In other words, no matter what git-log displays, as long as you put this
meta data wherever it changes commit-id, it is visible to Git users, and
trying to hide this fact is utterly stupid.

There are many ways to store Bazaar data in Git without confronting git
users who don't care about Bazaar with it. For instance, you can create
a separate branch that will hold this meta data.

   master      bzr/master

      /---------o
     o          |
     |          |
     |/---------o
     o          |
     |          |

Commits on bzr/master are fast-forward merges that have the same tree-id
as corresponding commits on master, but the commit message contains
Bazaar specific information. So, if someone does not care about Bazaar,
this is a throw away branch for him. Also, there is no problem to add
Bazaar specific information to any git commit later when it is pushed
to Bazaar. The only problem is if you try to rebase commits that were
pushed to Bazaar, but AFAIK Bazaar does not support overwriting history,
so you cannot expect anything good of this attempt anyway. The published
history should not rebased.


Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22  5:11 ` Gabriel Filion
@ 2010-02-22  9:49   ` Jelmer Vernooij
  0 siblings, 0 replies; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-22  9:49 UTC (permalink / raw)
  To: Gabriel Filion; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

On Mon, 2010-02-22 at 00:11 -0500, Gabriel Filion wrote:
> On 2010-02-19 12:11, Jelmer Vernooij wrote:
> > To allow round-tripping pushes from Bazaar into Git, I'm looking for a
> > good place to store Bazaar semantics that can not be represented in Git
> > at the moment. This data should ideally be hidden from the user as much
> > as possible; it would e.g. contain mappings from git hashes to Bazaar
> > ids. 
> What are you currently using for interacting with Bazaar repositories?
It's the other way around :-) I'm the developer of the bzr-git plugin,
which allows Bazaar to be used with Git repositories. 

> If you already have code for a remote helper, I would be interested in
> helping you out. I started a discussion recently on this list about
> staring such a script. Would you be willing to collaborate on having
> this implemented?
I'm happy to discuss a remote helper for Bazaar in Git, but not
particularly interested in contributing.

Cheers,

Jelmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22  5:17         ` Dmitry Potapov
@ 2010-02-22  9:56           ` Jelmer Vernooij
  2010-02-22 11:28             ` Dmitry Potapov
  0 siblings, 1 reply; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-22  9:56 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

[-- Attachment #1: Type: text/plain, Size: 3397 bytes --]

On Mon, 2010-02-22 at 08:17 +0300, Dmitry Potapov wrote:
> On Sun, Feb 21, 2010 at 01:17:26PM +0100, Jelmer Vernooij wrote:
> > For each file we would need to store:
> > 
> >  * the Bazaar revision id
> >  * any Bazaar revision properties. This is typically a list of URLs of
> > bugs that were fixed, name of the branch the commit was on, any
> > additional parents, or anything arbitrary set by plugins (e.g. the
> > rebase plugin sets 'rebase-of' to the id of the original revision)
> >  * For each file that was added or moved around in the revision, a path
> > to fileid mapping
> >  * Optionally, a list of ghost parent ids and "unusual" revisions for
> > each file but these should be rare.
> > 
> > This is at least a couple of lines of data and in some cases a lot more.
> > I would rather avoid confronting git users who don't care about Bazaar
> > with it.
> The problem with storying this meta data in the commit object is that
> any newly created commits in Git will not have this information, and you
> probably have to add it later when you export these commits to Bazaar,
> which means that the history in Git should be re-written, and Git users
> will have to rebase their branches from one commit to another that are
> identical except this Bazaar-specific information, which you try to hide
> from them. So much for don't care about Bazaar!

> In other words, no matter what git-log displays, as long as you put this
> meta data wherever it changes commit-id, it is visible to Git users, and
> trying to hide this fact is utterly stupid.
> 
> There are many ways to store Bazaar data in Git without confronting git
> users who don't care about Bazaar with it. For instance, you can create
> a separate branch that will hold this meta data.
> 
>    master      bzr/master
> 
>       /---------o
>      o          |
>      |          |
>      |/---------o
>      o          |
>      |          |
> 
> Commits on bzr/master are fast-forward merges that have the same tree-id
> as corresponding commits on master, but the commit message contains
> Bazaar specific information. So, if someone does not care about Bazaar,
> this is a throw away branch for him. Also, there is no problem to add
> Bazaar specific information to any git commit later when it is pushed
> to Bazaar. The only problem is if you try to rebase commits that were
> pushed to Bazaar, but AFAIK Bazaar does not support overwriting history,
> so you cannot expect anything good of this attempt anyway. The published
> history should not rebased.

There is no need for that data to be added later for revisions that did
not originate from Bazaar. All of the metadata that has to be stored
will be known at the time the commit is created. Those commits that were
made in Git later will not have any metadata that can not be represented
in Git (they were made with Git, after all). There is no need for
rebasing/overwriting history for existing revisions to enable access by
Bazaar.

Having a bzr/master ref means that the extra metadata will not always be
copied around (unless git is patched), so if I push my work from Bazaar
into Git, somebody works on it in Git and pushes a derived branch and
then somebody else clones that derived Git branch into Bazaar again, I
will not be able to communicate with that person's branch.

Cheers,

Jelmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22  9:56           ` Jelmer Vernooij
@ 2010-02-22 11:28             ` Dmitry Potapov
  2010-02-22 11:59               ` Jelmer Vernooij
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Potapov @ 2010-02-22 11:28 UTC (permalink / raw)
  To: Jelmer Vernooij; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

On Mon, Feb 22, 2010 at 10:56:47AM +0100, Jelmer Vernooij wrote:
> 
> There is no need for that data to be added later for revisions that did
> not originate from Bazaar. All of the metadata that has to be stored
> will be known at the time the commit is created. Those commits that were
> made in Git later will not have any metadata that can not be represented
> in Git (they were made with Git, after all).

If so, I do not see why any metadata should be stored in Git at all. If
you can work without them then why do you want to add to Git? And then
how about commit that originated in Git then exported to Bazaar and then
imported back at Git? It is still originated in Git and thus should not
have any metadata despite being imported from Bazaar.

> Having a bzr/master ref means that the extra metadata will not always be
> copied around (unless git is patched), so if I push my work from Bazaar
> into Git, somebody works on it in Git and pushes a derived branch and
> then somebody else clones that derived Git branch into Bazaar again, I
> will not be able to communicate with that person's branch.

No matter how many times a branch was cloned, it is exactly same branch
(i.e. it consists of commits having exactly the same id). So, if you can
work with the original branch, you can work with any cloned branch. So,
I see no need to copy this data around for people who do not work with
Bazaar directly.


Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22 11:28             ` Dmitry Potapov
@ 2010-02-22 11:59               ` Jelmer Vernooij
  2010-02-22 13:08                 ` Dmitry Potapov
  0 siblings, 1 reply; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-22 11:59 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

[-- Attachment #1: Type: text/plain, Size: 2573 bytes --]

On Mon, 2010-02-22 at 14:28 +0300, Dmitry Potapov wrote:
> On Mon, Feb 22, 2010 at 10:56:47AM +0100, Jelmer Vernooij wrote:
> > 
> > There is no need for that data to be added later for revisions that did
> > not originate from Bazaar. All of the metadata that has to be stored
> > will be known at the time the commit is created. Those commits that were
> > made in Git later will not have any metadata that can not be represented
> > in Git (they were made with Git, after all).
> If so, I do not see why any metadata should be stored in Git at all. If
> you can work without them then why do you want to add to Git? And then
> how about commit that originated in Git then exported to Bazaar and then
> imported back at Git? It is still originated in Git and thus should not
> have any metadata despite being imported from Bazaar.
Commits that originated in Git do not contain any Bazaar-specific
metadata, even if they also lived in Bazaar at some point, because they
could not have been set by Git at commit time. 

We would only add the metadata for revisions that did not come out of
Git originally. 

We'd like to have the extra metadata in Git so that we can push Bazaar
commits into a Git repository losslessly. If we can't do this losslessly
then the identity of the commit changes just like it does in git if you
aren't able to produce the same tree, blob and commit objects.

> > Having a bzr/master ref means that the extra metadata will not always be
> > copied around (unless git is patched), so if I push my work from Bazaar
> > into Git, somebody works on it in Git and pushes a derived branch and
> > then somebody else clones that derived Git branch into Bazaar again, I
> > will not be able to communicate with that person's branch.
> No matter how many times a branch was cloned, it is exactly same branch
> (i.e. it consists of commits having exactly the same id). So, if you can
> work with the original branch, you can work with any cloned branch. So,
> I see no need to copy this data around for people who do not work with
> Bazaar directly.
The original branch is a Bazaar branch here, so that's not true. You can
only work with any cloned branch if the matching bzr/ branch is also
around. If it isn't then you won't be able to find the original commit. 

hg-git already does something similar by putting a --HG-- line followed
by hg-git specific metadata in the commit message when it pushes into
Git. I'd like to find a place to put this data that's not as intruisive
for users.

Cheers,

Jelmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22 11:59               ` Jelmer Vernooij
@ 2010-02-22 13:08                 ` Dmitry Potapov
  2010-02-22 13:44                   ` Jelmer Vernooij
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Potapov @ 2010-02-22 13:08 UTC (permalink / raw)
  To: Jelmer Vernooij; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

On Mon, Feb 22, 2010 at 12:59:32PM +0100, Jelmer Vernooij wrote:
> 
> We'd like to have the extra metadata in Git so that we can push Bazaar
> commits into a Git repository losslessly. If we can't do this losslessly
> then the identity of the commit changes just like it does in git if you
> aren't able to produce the same tree, blob and commit objects.

but the problem is that you may want to add some information when you
import some Git to Bazaar. For instance, Git does not record file
renames explicitly and relies on content of files to detect renames
automatically. So, when I use gitk, I can see that what file is renamed.
If you work in Bazaar, you probably also want to see renames, but this
requires that you add this information when you import commits to
Bazaaar. But if you do that, the export to Git will produce a different
commit just because you added this Bazaar-specific data.

> 
> > > Having a bzr/master ref means that the extra metadata will not always be
> > > copied around (unless git is patched), so if I push my work from Bazaar
> > > into Git, somebody works on it in Git and pushes a derived branch and
> > > then somebody else clones that derived Git branch into Bazaar again, I
> > > will not be able to communicate with that person's branch.
> > No matter how many times a branch was cloned, it is exactly same branch
> > (i.e. it consists of commits having exactly the same id). So, if you can
> > work with the original branch, you can work with any cloned branch. So,
> > I see no need to copy this data around for people who do not work with
> > Bazaar directly.
> The original branch is a Bazaar branch here, so that's not true. You can
> only work with any cloned branch if the matching bzr/ branch is also
> around. If it isn't then you won't be able to find the original commit. 

Obviously bzr/ branch should be around somewhere, but it does not have
to be in any cloned repo. It is sufficient to have it in one place,
because it refers to commit-id, which does not change when you clone it.

> 
> hg-git already does something similar by putting a --HG-- line followed
> by hg-git specific metadata in the commit message when it pushes into
> Git. I'd like to find a place to put this data that's not as intruisive
> for users.

I still think it is wrong to hide some information in the commit object.
I am not sure that the commit object is the right place to store that
metadata, but hidding this information is even more problematic. Let's
suppose that someone cherry-pick your Bazaar originated commit. Now when
you try to synchronize with Bazaar, your synchronizer will see that it
has some Bazaar revision ID and branch name, but, in fact, it is new
commit on a completely different branch...


Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22 13:08                 ` Dmitry Potapov
@ 2010-02-22 13:44                   ` Jelmer Vernooij
  2010-02-22 14:20                     ` Dmitry Potapov
  0 siblings, 1 reply; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-22 13:44 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

[-- Attachment #1: Type: text/plain, Size: 4256 bytes --]

On Mon, 2010-02-22 at 16:08 +0300, Dmitry Potapov wrote:
> On Mon, Feb 22, 2010 at 12:59:32PM +0100, Jelmer Vernooij wrote:
> > We'd like to have the extra metadata in Git so that we can push Bazaar
> > commits into a Git repository losslessly. If we can't do this losslessly
> > then the identity of the commit changes just like it does in git if you
> > aren't able to produce the same tree, blob and commit objects.
> but the problem is that you may want to add some information when you
> import some Git to Bazaar. For instance, Git does not record file
> renames explicitly and relies on content of files to detect renames
> automatically. So, when I use gitk, I can see that what file is renamed.
> If you work in Bazaar, you probably also want to see renames, but this
> requires that you add this information when you import commits to
> Bazaaar. But if you do that, the export to Git will produce a different
> commit just because you added this Bazaar-specific data.
We can already do the other way around - Bazaar allows storing arbitrary
revision properties, so we use that to some things that can not be
represented in Bazaar but exist in Git. An example of this are  the
unusual file modes created by older versions of git or non-utf8 commit
messages. Those extra revision properties are set at the moment that the
Bazaar revision is imported into Git, not afterwards and there is no
need to update them later.

The fact that we have this extra metadata allows us to reproduce the
original Git commit bit for bit so we can actually extract the same
revision that went in, with the same git sha1.

> > > > Having a bzr/master ref means that the extra metadata will not always be
> > > > copied around (unless git is patched), so if I push my work from Bazaar
> > > > into Git, somebody works on it in Git and pushes a derived branch and
> > > > then somebody else clones that derived Git branch into Bazaar again, I
> > > > will not be able to communicate with that person's branch.
> > > No matter how many times a branch was cloned, it is exactly same branch
> > > (i.e. it consists of commits having exactly the same id). So, if you can
> > > work with the original branch, you can work with any cloned branch. So,
> > > I see no need to copy this data around for people who do not work with
> > > Bazaar directly.
> > The original branch is a Bazaar branch here, so that's not true. You can
> > only work with any cloned branch if the matching bzr/ branch is also
> > around. If it isn't then you won't be able to find the original commit. 
> Obviously bzr/ branch should be around somewhere, but it does not have
> to be in any cloned repo. It is sufficient to have it in one place,
> because it refers to commit-id, which does not change when you clone it.
If some other Bazaar user clones that repo, they end up without the
Bazaar specific metadata and thus with different Bazaar commits. If they
then try to communicate with the Bazaar user that pushed the revisions
in, their histories appear unrelated.

> > hg-git already does something similar by putting a --HG-- line followed
> > by hg-git specific metadata in the commit message when it pushes into
> > Git. I'd like to find a place to put this data that's not as intruisive
> > for users.
> I still think it is wrong to hide some information in the commit object.
What exactly is the problem with doing so? "encoding" is already there
and as far as I can tell not displayed directly to the user.

> I am not sure that the commit object is the right place to store that
> metadata, but hidding this information is even more problematic. Let's
> suppose that someone cherry-pick your Bazaar originated commit. Now when
> you try to synchronize with Bazaar, your synchronizer will see that it
> has some Bazaar revision ID and branch name, but, in fact, it is new
> commit on a completely different branch...
I don't see how the fact that the bzr-git/hg-git data is being hidden is
the problem in the scenario you mention.

It'd be nice if this sort of information was discarded by "git rebase",
but that's another good reason to treat it in a different way from the
commit message instead.

Cheers,

Jelmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22 13:44                   ` Jelmer Vernooij
@ 2010-02-22 14:20                     ` Dmitry Potapov
  2010-02-22 19:13                       ` Jelmer Vernooij
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Potapov @ 2010-02-22 14:20 UTC (permalink / raw)
  To: Jelmer Vernooij; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

On Mon, Feb 22, 2010 at 02:44:49PM +0100, Jelmer Vernooij wrote:
> On Mon, 2010-02-22 at 16:08 +0300, Dmitry Potapov wrote:
> > I am not sure that the commit object is the right place to store that
> > metadata, but hidding this information is even more problematic. Let's
> > suppose that someone cherry-pick your Bazaar originated commit. Now when
> > you try to synchronize with Bazaar, your synchronizer will see that it
> > has some Bazaar revision ID and branch name, but, in fact, it is new
> > commit on a completely different branch...
> I don't see how the fact that the bzr-git/hg-git data is being hidden is
> the problem in the scenario you mention.

Because you can easily remove that information manually when you cherry-pick
some commit. It is more difficult to do when it is hidden.

> It'd be nice if this sort of information was discarded by "git rebase",
> but that's another good reason to treat it in a different way from the
> commit message instead.

Well, I do not see any other place in the commit object aside the commit
message where you can easily put information, and I do not think it is a
good idea for "git rebase" to edit the commit message automatically.
Maybe, you should look at git-notes. (I don't know enough about them to
tell whether they are suitable or not).


Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-21  6:34     ` Jeff King
  2010-02-21  8:49       ` Johannes Schindelin
  2010-02-21 12:17       ` Jelmer Vernooij
@ 2010-02-22 14:57       ` Jelmer Vernooij
  2 siblings, 0 replies; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-22 14:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Avery Pennarun, Ben Gamari, git

[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]

On Sun, 2010-02-21 at 01:34 -0500, Jeff King wrote:
> On Sat, Feb 20, 2010 at 01:57:31PM -0500, Avery Pennarun wrote:
> > As for git-notes, they sound like they would be useful for this sort
> > of thing.  I haven't tried them yet, but my understanding is that
> > notes anywhere other than the "default" notes ref are not shown in
> > commit messages, so you can use them for whatever you want.
> I would want to hear more about the actual data being stored. The
> strength of notes is that you can _change_ them after the commit has
> been created. And the price you pay is that they are more annoying to
> move around, because they are in a totally different ref.
> 
> If this is data that is being generated at the time the commit is
> created and then set in stone, then it probably should be part of the
> commit object.
This data is supposed to be set in stone, since Bazaar revisions are
intended to be immutable, like Git commits are.

For each file we would need to store:

 * the Bazaar revision id
 * any Bazaar revision properties. This is typically a list of URLs of
bugs that were fixed, name of the branch the commit was on, any
additional parents, or anything arbitrary set by plugins (e.g. the
rebase plugin sets 'rebase-of' to the id of the original revision)
 * For each file that was added or moved around in the revision, a path
to fileid mapping
 * Optionally, a list of ghost parent ids and "unusual" revisions for
each file but these should be rare.

This is at least a couple of lines of data and in some cases a lot more.
I would rather avoid confronting git users who don't care about Bazaar
with it.

> If the only problem is that the data is ugly in "git show", then perhaps
> we need a "suppress these pseudo-headers" feature for showing logs. It
> keeps them easily available for inspection or for --grep, but most of
> the time you would not see them.
That seems like a sensible thing to do, and would work well for me.

Cheers,

Jelmer


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-22 14:20                     ` Dmitry Potapov
@ 2010-02-22 19:13                       ` Jelmer Vernooij
  0 siblings, 0 replies; 19+ messages in thread
From: Jelmer Vernooij @ 2010-02-22 19:13 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Jeff King, Avery Pennarun, Ben Gamari, git

[-- Attachment #1: Type: text/plain, Size: 1843 bytes --]

On Mon, 2010-02-22 at 17:20 +0300, Dmitry Potapov wrote:
> On Mon, Feb 22, 2010 at 02:44:49PM +0100, Jelmer Vernooij wrote:
> > On Mon, 2010-02-22 at 16:08 +0300, Dmitry Potapov wrote:
> > > I am not sure that the commit object is the right place to store that
> > > metadata, but hidding this information is even more problematic. Let's
> > > suppose that someone cherry-pick your Bazaar originated commit. Now when
> > > you try to synchronize with Bazaar, your synchronizer will see that it
> > > has some Bazaar revision ID and branch name, but, in fact, it is new
> > > commit on a completely different branch...
> > I don't see how the fact that the bzr-git/hg-git data is being hidden is
> > the problem in the scenario you mention.
> Because you can easily remove that information manually when you cherry-pick
> some commit. It is more difficult to do when it is hidden.
My point is that if you don't make it part of the user-visible commit
message there is no need to remove it at all, it'll just disappear by
itself.

> > It'd be nice if this sort of information was discarded by "git rebase",
> > but that's another good reason to treat it in a different way from the
> > commit message instead.
> Well, I do not see any other place in the commit object aside the commit
> message where you can easily put information, and I do not think it is a
> good idea for "git rebase" to edit the commit message automatically.
> Maybe, you should look at git-notes. (I don't know enough about them to
> tell whether they are suitable or not).
Some other people have suggested putting e.g. a RFC822-style header in
the commit message field and using the headers in that to allow custom
revision properties, only displaying the body in "git log", "git show"
etc. What do you think about that?

Cheers,

Jelmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Storing (hidden) per-commit metadata
  2010-02-19 17:11 Storing (hidden) per-commit metadata Jelmer Vernooij
  2010-02-20 17:41 ` Ben Gamari
  2010-02-22  5:11 ` Gabriel Filion
@ 2010-02-22 22:13 ` "Alejandro R. Sedeño"
  2 siblings, 0 replies; 19+ messages in thread
From: "Alejandro R. Sedeño" @ 2010-02-22 22:13 UTC (permalink / raw)
  To: Jelmer Vernooij; +Cc: git

On 02/19/2010 12:11 PM, Jelmer Vernooij wrote:
> To allow round-tripping pushes from Bazaar into Git, I'm looking for a
> good place to store Bazaar semantics that can not be represented in Git
> at the moment. This data should ideally be hidden from the user as much
> as possible; it would e.g. contain mappings from git hashes to Bazaar
> ids.

I've been having similar thoughts for git-svn, since I am working with a
very large svn repository that uses svn:keywords and svn:externals in a
few places. I've written some scripts to parse the git-svn
unhandled.log, but that does not propagate to other git clones of the
repo, and rebuilding the log is about as expensive as cloning the svn
repo again. Also, querying for externals is slow, presumably because
git-svn needs to talk to svn to fetch them.

So far, I have been leaning towards having an optional tree associated
with commits, in which metadata could be stored. This metadata tree
would be propagated by git clone, used by remote helper scripts, and be
quite ignorable if unneeded. The metadata tree would use sub-trees for
namespaces. For instance, a sub-tree called git-svn would contain the
revision map, externals, ignores, properties, etc.

This isn't fully thought out yet, and I'm not even sure if it would
work, or be backwards-compatible. However, since the topic came up, I
figured I should mention what I had so far.

Preemptive: No, I don't like svn:keywords, but I can't just ignore them.

Preemptive: Yes, I considered notes in a different GIT_NOTES_REF, but I
feel those are too loosely coupled to the commits. I could be wrong, and
have not completely dismissed them yet.

-Alejandro

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-02-22 22:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-19 17:11 Storing (hidden) per-commit metadata Jelmer Vernooij
2010-02-20 17:41 ` Ben Gamari
2010-02-20 18:57   ` Avery Pennarun
2010-02-21  6:34     ` Jeff King
2010-02-21  8:49       ` Johannes Schindelin
2010-02-21  8:52         ` Jeff King
2010-02-21 12:17       ` Jelmer Vernooij
2010-02-22  5:17         ` Dmitry Potapov
2010-02-22  9:56           ` Jelmer Vernooij
2010-02-22 11:28             ` Dmitry Potapov
2010-02-22 11:59               ` Jelmer Vernooij
2010-02-22 13:08                 ` Dmitry Potapov
2010-02-22 13:44                   ` Jelmer Vernooij
2010-02-22 14:20                     ` Dmitry Potapov
2010-02-22 19:13                       ` Jelmer Vernooij
2010-02-22 14:57       ` Jelmer Vernooij
2010-02-22  5:11 ` Gabriel Filion
2010-02-22  9:49   ` Jelmer Vernooij
2010-02-22 22:13 ` "Alejandro R. Sedeño"

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).