git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using trees for metatagging
@ 2010-02-18  4:12 martin f krafft
  2010-02-18 18:57 ` Avery Pennarun
  2010-02-18 21:00 ` Johan Herland
  0 siblings, 2 replies; 9+ messages in thread
From: martin f krafft @ 2010-02-18  4:12 UTC (permalink / raw)
  To: git discussion list

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

Git's object store uses trees mainly to represent a hierarchical
filesystem. It occurs to me that you could layer additional
hierarchies on top — specifically, you could use it to track subsets
of files, i.e. "tagging".

For instance you want some sort of representation for "the set of
files that need review". You /could/ create a new tree and reference
all files in that set as children. Now if you wanted to find out
what to review, you'd list the children of this tree. After
reviewing a file, you write a new tree with the set less that file's
ref.. Obviously, if you made changes to the file, it should be
reconnected to all other trees that referenced it.

I have a couple of questions about this:

1. Does Git provide plumbing for me to find out which trees
   reference a given blob? If not, I will have to iterate all trees
   and record which ones have a given message as a child.

2. Is there a way you can fathom by which unlinking a blob from the
   main hierarchy also causes it to be unlinked from this meta tree
   I am speaking of as well? Similarly, if a blob is rewritten, how
   could I make sure it replaces the old blob in all referencing
   trees?

3. Am I right in assuming that I'd have to track a completely
   seperate ancestry for this tree, that is create e.g. a commit
   object, point refs/metatrees/mytree to it, and reference the tree
   from the commit?

4. Since this hierarchy is not really to be mapped into the
   filesystem, how would one resolve conflicts when merging
   ancestries? Of course it would be nice if I could check out this
   meta tree into the filesystem, make changes, and be assured that
   new blobs replace old blobs in other referencing trees, as per
   (2.), but that's a pipedream maybe.

5. Do you know of similar efforts? Are there must-reads out there,
   apart from the design of Git?

Thank you,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
kill ugly radio
                                                        -- frank zappa
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18  4:12 Using trees for metatagging martin f krafft
@ 2010-02-18 18:57 ` Avery Pennarun
  2010-02-18 22:53   ` martin f krafft
  2010-02-18 21:00 ` Johan Herland
  1 sibling, 1 reply; 9+ messages in thread
From: Avery Pennarun @ 2010-02-18 18:57 UTC (permalink / raw)
  To: git discussion list

On Wed, Feb 17, 2010 at 11:12 PM, martin f krafft <madduck@madduck.net> wrote:
> Git's object store uses trees mainly to represent a hierarchical
> filesystem. It occurs to me that you could layer additional
> hierarchies on top — specifically, you could use it to track subsets
> of files, i.e. "tagging".

I think what you *really* want here is to create a branch containing a
single file, which is the list of all the files you want to review.
Then when you're done reviewing a file, delete it from your list and
commit it.  Then just check out that file list branch in another clone
of your repository and manipulate it however you like.

Sorry to be boring.

> 1. Does Git provide plumbing for me to find out which trees
>   reference a given blob? If not, I will have to iterate all trees
>   and record which ones have a given message as a child.

No, you will have to iterate.  Also, if *other* people have trees
referencing that blob in *their* repositories, you won't know, so you
can never be sure that you've successfully found all objects in the
universe that refer to a particular blob.

> 2. Is there a way you can fathom by which unlinking a blob from the
>   main hierarchy also causes it to be unlinked from this meta tree
>   I am speaking of as well? Similarly, if a blob is rewritten, how
>   could I make sure it replaces the old blob in all referencing
>   trees?

blobs cannot replace other blobs.  And a tree that contains a
particular blob (indexed by sha1) will never *not* contain that blob,
because the identity of that tree is based on the identitity of the
blobs it contains.  You can create a new tree that doesn't contain the
blob, but the commit that contained the old tree will never contain
the new tree.  You would have to create a new commit that contains the
new tree, but any commits based on your old commit will never be based
on your new commit.  And so on.

That's just the way content-addressed storage works.  Sounds like you
need to read more about it.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18  4:12 Using trees for metatagging martin f krafft
  2010-02-18 18:57 ` Avery Pennarun
@ 2010-02-18 21:00 ` Johan Herland
  2010-02-18 22:57   ` martin f krafft
  1 sibling, 1 reply; 9+ messages in thread
From: Johan Herland @ 2010-02-18 21:00 UTC (permalink / raw)
  To: martin f krafft; +Cc: git

On Thursday 18 February 2010, martin f krafft wrote:
> Git's object store uses trees mainly to represent a hierarchical
> filesystem. It occurs to me that you could layer additional
> hierarchies on top — specifically, you could use it to track subsets
> of files, i.e. "tagging".
>
> For instance you want some sort of representation for "the set of
> files that need review". You /could/ create a new tree and reference
> all files in that set as children. Now if you wanted to find out
> what to review, you'd list the children of this tree. After
> reviewing a file, you write a new tree with the set less that file's
> ref.. Obviously, if you made changes to the file, it should be
> reconnected to all other trees that referenced it.
>
> I have a couple of questions about this:
>
> 1. Does Git provide plumbing for me to find out which trees
>    reference a given blob? If not, I will have to iterate all trees
>    and record which ones have a given message as a child.
>
> 2. Is there a way you can fathom by which unlinking a blob from the
>    main hierarchy also causes it to be unlinked from this meta tree
>    I am speaking of as well? Similarly, if a blob is rewritten, how
>    could I make sure it replaces the old blob in all referencing
>    trees?
>
> 3. Am I right in assuming that I'd have to track a completely
>    seperate ancestry for this tree, that is create e.g. a commit
>    object, point refs/metatrees/mytree to it, and reference the tree
>    from the commit?
>
> 4. Since this hierarchy is not really to be mapped into the
>    filesystem, how would one resolve conflicts when merging
>    ancestries? Of course it would be nice if I could check out this
>    meta tree into the filesystem, make changes, and be assured that
>    new blobs replace old blobs in other referencing trees, as per
>    (2.), but that's a pipedream maybe.
>
> 5. Do you know of similar efforts? Are there must-reads out there,
>    apart from the design of Git?

Take a look at the (relatively) new notes feature. (See the jh/notes 
series in 'pu' and various recent discussions on this mailing list.) 
Git notes probably won't satisfy the exact requirements you list above, 
but it _does_ tackle some parallel issues (e.g. how to maintain a tree 
that is not checked out, storing metadata associated with Git objects, 
etc.). If you take a step back and reconsider your original problem, 
you might find that it's solvable by using commit notes.

For example, you could add a simple note to each blob that has been 
reviewed, on the refs/notes/reviewed notes ref. You could then write a 
simple script (using "git notes list") that lists all blobs (i.e. 
files) without a corresponding note in refs/notes/reviewed.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18 18:57 ` Avery Pennarun
@ 2010-02-18 22:53   ` martin f krafft
  0 siblings, 0 replies; 9+ messages in thread
From: martin f krafft @ 2010-02-18 22:53 UTC (permalink / raw)
  To: Avery Pennarun, git discussion list

also sprach Avery Pennarun <apenwarr@gmail.com> [2010.02.19.0757 +1300]:
> > 1. Does Git provide plumbing for me to find out which trees
> >   reference a given blob? If not, I will have to iterate all
> > trees   and record which ones have a given message as a child.
> 
> No, you will have to iterate.  Also, if *other* people have trees
> referencing that blob in *their* repositories, you won't know, so
> you can never be sure that you've successfully found all objects
> in the universe that refer to a particular blob.

The idea is obviously that you could merge ancestries and thus
propagate all those changes.

> > 2. Is there a way you can fathom by which unlinking a blob from the
> >   main hierarchy also causes it to be unlinked from this meta tree
> >   I am speaking of as well? Similarly, if a blob is rewritten, how
> >   could I make sure it replaces the old blob in all referencing
> >   trees?
> 
> blobs cannot replace other blobs.

It was a shortcut on my behalf. I meant that a new tree is written
with the ref to the old blob removed and the ref to the new blob
added.

> And a tree that contains a particular blob (indexed by sha1) will
> never *not* contain that blob, because the identity of that tree
> is based on the identitity of the blobs it contains.  You can
> create a new tree that doesn't contain the blob, but the commit
> that contained the old tree will never contain the new tree.  You
> would have to create a new commit that contains the new tree, but
> any commits based on your old commit will never be based on your
> new commit.  And so on.

Right, this is the basis of merging. I understand all this.
I suppose I didn't express myself clearly enough.

So I am trying to figure out:

1. how to create new trees for all trees that reference a blob that
   is superseeded by a new blob in some sort of scalable way;

2. how to maintain a separate ancestry of commits pointing to those
   trees in a way to be able to harness Git's merging capabilities.

Is this clearer?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"alle vorurteile kommen aus den eingeweiden."
                                                 - friedrich nietzsche
 
spamtraps: madduck.bogus@madduck.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18 21:00 ` Johan Herland
@ 2010-02-18 22:57   ` martin f krafft
  2010-02-18 23:06     ` Avery Pennarun
  2010-02-19  0:43     ` Johan Herland
  0 siblings, 2 replies; 9+ messages in thread
From: martin f krafft @ 2010-02-18 22:57 UTC (permalink / raw)
  To: Johan Herland, git

also sprach Johan Herland <johan@herland.net> [2010.02.19.1000 +1300]:
> Take a look at the (relatively) new notes feature. (See the
> jh/notes series in 'pu' and various recent discussions on this
> mailing list.) Git notes probably won't satisfy the exact
> requirements you list above, but it _does_ tackle some parallel
> issues (e.g. how to maintain a tree that is not checked out,
> storing metadata associated with Git objects, etc.). If you take
> a step back and reconsider your original problem, you might find
> that it's solvable by using commit notes.
> 
> For example, you could add a simple note to each blob that has
> been reviewed, on the refs/notes/reviewed notes ref. You could
> then write a simple script (using "git notes list") that lists all
> blobs (i.e. files) without a corresponding note in
> refs/notes/reviewed.

I am aware of notes, but so far I stayed away from them, simply
because it seems hackish to represent tag trees as text when dealing
with a tool that is essentially all about trees and refs.

Can I use notes to append information to blobs and trees, or just
commit objects?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"without music, life would be a mistake."
                                                 - friedrich nietzsche
 
spamtraps: madduck.bogus@madduck.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18 22:57   ` martin f krafft
@ 2010-02-18 23:06     ` Avery Pennarun
  2010-02-18 23:25       ` martin f krafft
  2010-02-19  0:43     ` Johan Herland
  1 sibling, 1 reply; 9+ messages in thread
From: Avery Pennarun @ 2010-02-18 23:06 UTC (permalink / raw)
  To: martin f krafft; +Cc: Johan Herland, git

On Thu, Feb 18, 2010 at 5:57 PM, martin f krafft <madduck@madduck.net> wrote:
> I am aware of notes, but so far I stayed away from them, simply
> because it seems hackish to represent tag trees as text when dealing
> with a tool that is essentially all about trees and refs.

I think you're using the wrong definition of hacky vs. elegant.  A
"tree" is really just a file containing a list of objects.  A "ref" is
just a file that contains an object id.

So checking in a file that contains a list of object ids (or
filenames) is perfectly appropriate.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18 23:06     ` Avery Pennarun
@ 2010-02-18 23:25       ` martin f krafft
  2010-02-18 23:32         ` Avery Pennarun
  0 siblings, 1 reply; 9+ messages in thread
From: martin f krafft @ 2010-02-18 23:25 UTC (permalink / raw)
  To: Avery Pennarun, Johan Herland, git

[-- Attachment #1: Type: text/plain, Size: 969 bytes --]

also sprach Avery Pennarun <apenwarr@gmail.com> [2010.02.19.1206 +1300]:
> > I am aware of notes, but so far I stayed away from them, simply
> > because it seems hackish to represent tag trees as text when dealing
> > with a tool that is essentially all about trees and refs.
> 
> I think you're using the wrong definition of hacky vs. elegant.  A
> "tree" is really just a file containing a list of objects.  A "ref" is
> just a file that contains an object id.
>
> So checking in a file that contains a list of object ids (or
> filenames) is perfectly appropriate.

Indeed. But Git provides a lot of tools to manipulate all those,
which I would not be able to reuse in the text-file approach.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"good advice is something a man gives
 when he is too old to set a bad example.
                                                  -- la rouchefoucauld
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18 23:25       ` martin f krafft
@ 2010-02-18 23:32         ` Avery Pennarun
  0 siblings, 0 replies; 9+ messages in thread
From: Avery Pennarun @ 2010-02-18 23:32 UTC (permalink / raw)
  To: martin f krafft; +Cc: Johan Herland, git

On Thu, Feb 18, 2010 at 6:25 PM, martin f krafft <madduck@madduck.net> wrote:
> also sprach Avery Pennarun <apenwarr@gmail.com> [2010.02.19.1206 +1300]:
>> So checking in a file that contains a list of object ids (or
>> filenames) is perfectly appropriate.
>
> Indeed. But Git provides a lot of tools to manipulate all those,
> which I would not be able to reuse in the text-file approach.

But you're talking about using a nonstandard approach anyway (unless
you use git-notes).  So you'd end up rolling your own code with git's
plumbing anyway, which you can do just as easily with a text file full
of object ids.

Once you have an object id, there are plenty of existing git commands
to do whatever you want.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Using trees for metatagging
  2010-02-18 22:57   ` martin f krafft
  2010-02-18 23:06     ` Avery Pennarun
@ 2010-02-19  0:43     ` Johan Herland
  1 sibling, 0 replies; 9+ messages in thread
From: Johan Herland @ 2010-02-19  0:43 UTC (permalink / raw)
  To: martin f krafft; +Cc: git

On Thursday 18 February 2010, martin f krafft wrote:
> Can I use notes to append information to blobs and trees, or just
> commit objects?

Yes, Since patch #2 in the jh/notes series in 'pu'.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-02-19  0:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-18  4:12 Using trees for metatagging martin f krafft
2010-02-18 18:57 ` Avery Pennarun
2010-02-18 22:53   ` martin f krafft
2010-02-18 21:00 ` Johan Herland
2010-02-18 22:57   ` martin f krafft
2010-02-18 23:06     ` Avery Pennarun
2010-02-18 23:25       ` martin f krafft
2010-02-18 23:32         ` Avery Pennarun
2010-02-19  0:43     ` Johan Herland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).