* Using trees for metatagging
@ 2010-02-18 4:12 martin f krafft
2010-02-18 18:57 ` Avery Pennarun
2010-02-18 21:00 ` Johan Herland
0 siblings, 2 replies; 9+ messages in thread
From: martin f krafft @ 2010-02-18 4:12 UTC (permalink / raw)
To: git discussion list
[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]
Git's object store uses trees mainly to represent a hierarchical
filesystem. It occurs to me that you could layer additional
hierarchies on top — specifically, you could use it to track subsets
of files, i.e. "tagging".
For instance you want some sort of representation for "the set of
files that need review". You /could/ create a new tree and reference
all files in that set as children. Now if you wanted to find out
what to review, you'd list the children of this tree. After
reviewing a file, you write a new tree with the set less that file's
ref.. Obviously, if you made changes to the file, it should be
reconnected to all other trees that referenced it.
I have a couple of questions about this:
1. Does Git provide plumbing for me to find out which trees
reference a given blob? If not, I will have to iterate all trees
and record which ones have a given message as a child.
2. Is there a way you can fathom by which unlinking a blob from the
main hierarchy also causes it to be unlinked from this meta tree
I am speaking of as well? Similarly, if a blob is rewritten, how
could I make sure it replaces the old blob in all referencing
trees?
3. Am I right in assuming that I'd have to track a completely
seperate ancestry for this tree, that is create e.g. a commit
object, point refs/metatrees/mytree to it, and reference the tree
from the commit?
4. Since this hierarchy is not really to be mapped into the
filesystem, how would one resolve conflicts when merging
ancestries? Of course it would be nice if I could check out this
meta tree into the filesystem, make changes, and be assured that
new blobs replace old blobs in other referencing trees, as per
(2.), but that's a pipedream maybe.
5. Do you know of similar efforts? Are there must-reads out there,
apart from the design of Git?
Thank you,
--
martin | http://madduck.net/ | http://two.sentenc.es/
kill ugly radio
-- frank zappa
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 4:12 Using trees for metatagging martin f krafft
@ 2010-02-18 18:57 ` Avery Pennarun
2010-02-18 22:53 ` martin f krafft
2010-02-18 21:00 ` Johan Herland
1 sibling, 1 reply; 9+ messages in thread
From: Avery Pennarun @ 2010-02-18 18:57 UTC (permalink / raw)
To: git discussion list
On Wed, Feb 17, 2010 at 11:12 PM, martin f krafft <madduck@madduck.net> wrote:
> Git's object store uses trees mainly to represent a hierarchical
> filesystem. It occurs to me that you could layer additional
> hierarchies on top — specifically, you could use it to track subsets
> of files, i.e. "tagging".
I think what you *really* want here is to create a branch containing a
single file, which is the list of all the files you want to review.
Then when you're done reviewing a file, delete it from your list and
commit it. Then just check out that file list branch in another clone
of your repository and manipulate it however you like.
Sorry to be boring.
> 1. Does Git provide plumbing for me to find out which trees
> reference a given blob? If not, I will have to iterate all trees
> and record which ones have a given message as a child.
No, you will have to iterate. Also, if *other* people have trees
referencing that blob in *their* repositories, you won't know, so you
can never be sure that you've successfully found all objects in the
universe that refer to a particular blob.
> 2. Is there a way you can fathom by which unlinking a blob from the
> main hierarchy also causes it to be unlinked from this meta tree
> I am speaking of as well? Similarly, if a blob is rewritten, how
> could I make sure it replaces the old blob in all referencing
> trees?
blobs cannot replace other blobs. And a tree that contains a
particular blob (indexed by sha1) will never *not* contain that blob,
because the identity of that tree is based on the identitity of the
blobs it contains. You can create a new tree that doesn't contain the
blob, but the commit that contained the old tree will never contain
the new tree. You would have to create a new commit that contains the
new tree, but any commits based on your old commit will never be based
on your new commit. And so on.
That's just the way content-addressed storage works. Sounds like you
need to read more about it.
Have fun,
Avery
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 4:12 Using trees for metatagging martin f krafft
2010-02-18 18:57 ` Avery Pennarun
@ 2010-02-18 21:00 ` Johan Herland
2010-02-18 22:57 ` martin f krafft
1 sibling, 1 reply; 9+ messages in thread
From: Johan Herland @ 2010-02-18 21:00 UTC (permalink / raw)
To: martin f krafft; +Cc: git
On Thursday 18 February 2010, martin f krafft wrote:
> Git's object store uses trees mainly to represent a hierarchical
> filesystem. It occurs to me that you could layer additional
> hierarchies on top — specifically, you could use it to track subsets
> of files, i.e. "tagging".
>
> For instance you want some sort of representation for "the set of
> files that need review". You /could/ create a new tree and reference
> all files in that set as children. Now if you wanted to find out
> what to review, you'd list the children of this tree. After
> reviewing a file, you write a new tree with the set less that file's
> ref.. Obviously, if you made changes to the file, it should be
> reconnected to all other trees that referenced it.
>
> I have a couple of questions about this:
>
> 1. Does Git provide plumbing for me to find out which trees
> reference a given blob? If not, I will have to iterate all trees
> and record which ones have a given message as a child.
>
> 2. Is there a way you can fathom by which unlinking a blob from the
> main hierarchy also causes it to be unlinked from this meta tree
> I am speaking of as well? Similarly, if a blob is rewritten, how
> could I make sure it replaces the old blob in all referencing
> trees?
>
> 3. Am I right in assuming that I'd have to track a completely
> seperate ancestry for this tree, that is create e.g. a commit
> object, point refs/metatrees/mytree to it, and reference the tree
> from the commit?
>
> 4. Since this hierarchy is not really to be mapped into the
> filesystem, how would one resolve conflicts when merging
> ancestries? Of course it would be nice if I could check out this
> meta tree into the filesystem, make changes, and be assured that
> new blobs replace old blobs in other referencing trees, as per
> (2.), but that's a pipedream maybe.
>
> 5. Do you know of similar efforts? Are there must-reads out there,
> apart from the design of Git?
Take a look at the (relatively) new notes feature. (See the jh/notes
series in 'pu' and various recent discussions on this mailing list.)
Git notes probably won't satisfy the exact requirements you list above,
but it _does_ tackle some parallel issues (e.g. how to maintain a tree
that is not checked out, storing metadata associated with Git objects,
etc.). If you take a step back and reconsider your original problem,
you might find that it's solvable by using commit notes.
For example, you could add a simple note to each blob that has been
reviewed, on the refs/notes/reviewed notes ref. You could then write a
simple script (using "git notes list") that lists all blobs (i.e.
files) without a corresponding note in refs/notes/reviewed.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 18:57 ` Avery Pennarun
@ 2010-02-18 22:53 ` martin f krafft
0 siblings, 0 replies; 9+ messages in thread
From: martin f krafft @ 2010-02-18 22:53 UTC (permalink / raw)
To: Avery Pennarun, git discussion list
also sprach Avery Pennarun <apenwarr@gmail.com> [2010.02.19.0757 +1300]:
> > 1. Does Git provide plumbing for me to find out which trees
> > reference a given blob? If not, I will have to iterate all
> > trees and record which ones have a given message as a child.
>
> No, you will have to iterate. Also, if *other* people have trees
> referencing that blob in *their* repositories, you won't know, so
> you can never be sure that you've successfully found all objects
> in the universe that refer to a particular blob.
The idea is obviously that you could merge ancestries and thus
propagate all those changes.
> > 2. Is there a way you can fathom by which unlinking a blob from the
> > main hierarchy also causes it to be unlinked from this meta tree
> > I am speaking of as well? Similarly, if a blob is rewritten, how
> > could I make sure it replaces the old blob in all referencing
> > trees?
>
> blobs cannot replace other blobs.
It was a shortcut on my behalf. I meant that a new tree is written
with the ref to the old blob removed and the ref to the new blob
added.
> And a tree that contains a particular blob (indexed by sha1) will
> never *not* contain that blob, because the identity of that tree
> is based on the identitity of the blobs it contains. You can
> create a new tree that doesn't contain the blob, but the commit
> that contained the old tree will never contain the new tree. You
> would have to create a new commit that contains the new tree, but
> any commits based on your old commit will never be based on your
> new commit. And so on.
Right, this is the basis of merging. I understand all this.
I suppose I didn't express myself clearly enough.
So I am trying to figure out:
1. how to create new trees for all trees that reference a blob that
is superseeded by a new blob in some sort of scalable way;
2. how to maintain a separate ancestry of commits pointing to those
trees in a way to be able to harness Git's merging capabilities.
Is this clearer?
--
martin | http://madduck.net/ | http://two.sentenc.es/
"alle vorurteile kommen aus den eingeweiden."
- friedrich nietzsche
spamtraps: madduck.bogus@madduck.net
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 21:00 ` Johan Herland
@ 2010-02-18 22:57 ` martin f krafft
2010-02-18 23:06 ` Avery Pennarun
2010-02-19 0:43 ` Johan Herland
0 siblings, 2 replies; 9+ messages in thread
From: martin f krafft @ 2010-02-18 22:57 UTC (permalink / raw)
To: Johan Herland, git
also sprach Johan Herland <johan@herland.net> [2010.02.19.1000 +1300]:
> Take a look at the (relatively) new notes feature. (See the
> jh/notes series in 'pu' and various recent discussions on this
> mailing list.) Git notes probably won't satisfy the exact
> requirements you list above, but it _does_ tackle some parallel
> issues (e.g. how to maintain a tree that is not checked out,
> storing metadata associated with Git objects, etc.). If you take
> a step back and reconsider your original problem, you might find
> that it's solvable by using commit notes.
>
> For example, you could add a simple note to each blob that has
> been reviewed, on the refs/notes/reviewed notes ref. You could
> then write a simple script (using "git notes list") that lists all
> blobs (i.e. files) without a corresponding note in
> refs/notes/reviewed.
I am aware of notes, but so far I stayed away from them, simply
because it seems hackish to represent tag trees as text when dealing
with a tool that is essentially all about trees and refs.
Can I use notes to append information to blobs and trees, or just
commit objects?
--
martin | http://madduck.net/ | http://two.sentenc.es/
"without music, life would be a mistake."
- friedrich nietzsche
spamtraps: madduck.bogus@madduck.net
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 22:57 ` martin f krafft
@ 2010-02-18 23:06 ` Avery Pennarun
2010-02-18 23:25 ` martin f krafft
2010-02-19 0:43 ` Johan Herland
1 sibling, 1 reply; 9+ messages in thread
From: Avery Pennarun @ 2010-02-18 23:06 UTC (permalink / raw)
To: martin f krafft; +Cc: Johan Herland, git
On Thu, Feb 18, 2010 at 5:57 PM, martin f krafft <madduck@madduck.net> wrote:
> I am aware of notes, but so far I stayed away from them, simply
> because it seems hackish to represent tag trees as text when dealing
> with a tool that is essentially all about trees and refs.
I think you're using the wrong definition of hacky vs. elegant. A
"tree" is really just a file containing a list of objects. A "ref" is
just a file that contains an object id.
So checking in a file that contains a list of object ids (or
filenames) is perfectly appropriate.
Have fun,
Avery
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 23:06 ` Avery Pennarun
@ 2010-02-18 23:25 ` martin f krafft
2010-02-18 23:32 ` Avery Pennarun
0 siblings, 1 reply; 9+ messages in thread
From: martin f krafft @ 2010-02-18 23:25 UTC (permalink / raw)
To: Avery Pennarun, Johan Herland, git
[-- Attachment #1: Type: text/plain, Size: 969 bytes --]
also sprach Avery Pennarun <apenwarr@gmail.com> [2010.02.19.1206 +1300]:
> > I am aware of notes, but so far I stayed away from them, simply
> > because it seems hackish to represent tag trees as text when dealing
> > with a tool that is essentially all about trees and refs.
>
> I think you're using the wrong definition of hacky vs. elegant. A
> "tree" is really just a file containing a list of objects. A "ref" is
> just a file that contains an object id.
>
> So checking in a file that contains a list of object ids (or
> filenames) is perfectly appropriate.
Indeed. But Git provides a lot of tools to manipulate all those,
which I would not be able to reuse in the text-file approach.
--
martin | http://madduck.net/ | http://two.sentenc.es/
"good advice is something a man gives
when he is too old to set a bad example.
-- la rouchefoucauld
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 23:25 ` martin f krafft
@ 2010-02-18 23:32 ` Avery Pennarun
0 siblings, 0 replies; 9+ messages in thread
From: Avery Pennarun @ 2010-02-18 23:32 UTC (permalink / raw)
To: martin f krafft; +Cc: Johan Herland, git
On Thu, Feb 18, 2010 at 6:25 PM, martin f krafft <madduck@madduck.net> wrote:
> also sprach Avery Pennarun <apenwarr@gmail.com> [2010.02.19.1206 +1300]:
>> So checking in a file that contains a list of object ids (or
>> filenames) is perfectly appropriate.
>
> Indeed. But Git provides a lot of tools to manipulate all those,
> which I would not be able to reuse in the text-file approach.
But you're talking about using a nonstandard approach anyway (unless
you use git-notes). So you'd end up rolling your own code with git's
plumbing anyway, which you can do just as easily with a text file full
of object ids.
Once you have an object id, there are plenty of existing git commands
to do whatever you want.
Have fun,
Avery
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using trees for metatagging
2010-02-18 22:57 ` martin f krafft
2010-02-18 23:06 ` Avery Pennarun
@ 2010-02-19 0:43 ` Johan Herland
1 sibling, 0 replies; 9+ messages in thread
From: Johan Herland @ 2010-02-19 0:43 UTC (permalink / raw)
To: martin f krafft; +Cc: git
On Thursday 18 February 2010, martin f krafft wrote:
> Can I use notes to append information to blobs and trees, or just
> commit objects?
Yes, Since patch #2 in the jh/notes series in 'pu'.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-02-19 0:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-18 4:12 Using trees for metatagging martin f krafft
2010-02-18 18:57 ` Avery Pennarun
2010-02-18 22:53 ` martin f krafft
2010-02-18 21:00 ` Johan Herland
2010-02-18 22:57 ` martin f krafft
2010-02-18 23:06 ` Avery Pennarun
2010-02-18 23:25 ` martin f krafft
2010-02-18 23:32 ` Avery Pennarun
2010-02-19 0:43 ` Johan Herland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).