* [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
@ 2009-08-06 9:55 Nick Edelen
2009-08-06 14:48 ` Johannes Schindelin
0 siblings, 1 reply; 34+ messages in thread
From: Nick Edelen @ 2009-08-06 9:55 UTC (permalink / raw)
To: Junio C Hamano, Johannes Schindelin, Jeff King, Sam Vilain,
Shawn O. Pearce
SUGGESTED FOR 'PU':
Traversing objects is currently very costly, as every commit and tree must be
loaded and parsed. Much time and energy could be saved by caching metadata and
topological info in an efficient, easily accessible manner. Furthermore, this
could improve git's interfacing potential, by providing a condensed summary of
a repository's commit tree.
This is a series to implement such a revision caching mechanism, aptly named
rev-cache. The series will provide:
- a core API to manipulate and traverse caches
- an integration into the internal revision walker
- a porcelain front-end providing access to users and (shell) applications
- a series of tests to verify/demonstrate correctness
- documentation of the API, porcelain and core concepts
In cold starts rev-cache has sped up packing and walking by a factor of 4, and
over twice that on warm starts. Some times on slax for the linux repository:
rev-list --all --objects >/dev/null
default
cold 1:13
warm 0:43
rev-cache'd
cold 0:19
warm 0:02
pack-objects --revs --all --stdout >/dev/null
default
cold 2:44
warm 1:21
rev-cache'd
cold 0:44
warm 0:10
The mechanism is minimally intrusive: most of the changes take place in
seperate files, and only a handful of git's existing functions are modified.
Hope you find this useful.
- Nick
Documentation/rev-cache.txt | 51 +
Documentation/technical/rev-cache.txt | 336 ++++++
Makefile | 2 +
blob.c | 1 +
blob.h | 1 +
builtin-rev-cache.c | 284 +++++
builtin.h | 1 +
commit.c | 3 +
commit.h | 2 +
git.c | 1 +
list-objects.c | 49 +-
rev-cache.c | 1832 +++++++++++++++++++++++++++++++++
revision.c | 89 ++-
revision.h | 46 +-
t/t6015-rev-cache-list.sh | 228 ++++
t/t6015-sha1-dump-diff.py | 36 +
16 files changed, 2937 insertions(+), 25 deletions(-)
create mode 100755 Documentation/rev-cache.txt
create mode 100755 Documentation/technical/rev-cache.txt
create mode 100755 builtin-rev-cache.c
create mode 100755 rev-cache.c
create mode 100755 t/t6015-rev-cache-list.sh
create mode 100755 t/t6015-sha1-dump-diff.py
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 9:55 [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking Nick Edelen
@ 2009-08-06 14:48 ` Johannes Schindelin
2009-08-06 14:58 ` Michael J Gruber
0 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2009-08-06 14:48 UTC (permalink / raw)
To: Nick Edelen
Cc: Junio C Hamano, Jeff King, Sam Vilain, Shawn O. Pearce,
Andreas Ericsson, Christian Couder, git
Hi,
On Thu, 6 Aug 2009, Nick Edelen wrote:
> SUGGESTED FOR 'PU':
>
> Traversing objects is currently very costly, as every commit and tree must be
> loaded and parsed. Much time and energy could be saved by caching metadata and
> topological info in an efficient, easily accessible manner. Furthermore, this
> could improve git's interfacing potential, by providing a condensed summary of
> a repository's commit tree.
>
> This is a series to implement such a revision caching mechanism, aptly named
> rev-cache. The series will provide:
> - a core API to manipulate and traverse caches
> - an integration into the internal revision walker
> - a porcelain front-end providing access to users and (shell) applications
> - a series of tests to verify/demonstrate correctness
> - documentation of the API, porcelain and core concepts
>
> In cold starts rev-cache has sped up packing and walking by a factor of 4, and
> over twice that on warm starts. Some times on slax for the linux repository:
>
> rev-list --all --objects >/dev/null
> default
> cold 1:13
> warm 0:43
> rev-cache'd
> cold 0:19
> warm 0:02
>
> pack-objects --revs --all --stdout >/dev/null
> default
> cold 2:44
> warm 1:21
> rev-cache'd
> cold 0:44
> warm 0:10
Nice!
> The mechanism is minimally intrusive: most of the changes take place in
> seperate files, and only a handful of git's existing functions are
> modified.
Sorry, I forgot the details, could you quickly remind me why these caches
are not in the pack index files?
> Documentation/rev-cache.txt | 51 +
> Documentation/technical/rev-cache.txt | 336 ++++++
> Makefile | 2 +
> blob.c | 1 +
> blob.h | 1 +
> builtin-rev-cache.c | 284 +++++
> builtin.h | 1 +
> commit.c | 3 +
> commit.h | 2 +
> git.c | 1 +
> list-objects.c | 49 +-
> rev-cache.c | 1832 +++++++++++++++++++++++++++++++++
> revision.c | 89 ++-
> revision.h | 46 +-
> t/t6015-rev-cache-list.sh | 228 ++++
> t/t6015-sha1-dump-diff.py | 36 +
Hmpf.
We got rid of the last Python script in Git a long time ago, but now two
different patch series try to sneak that dependency (at least for testing)
back in.
That's all the worse because we cannot use Python in msysGit, and Windows
should be a platform benefitting dramatically from your work.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 14:48 ` Johannes Schindelin
@ 2009-08-06 14:58 ` Michael J Gruber
2009-08-06 17:39 ` Nick Edelen
0 siblings, 1 reply; 34+ messages in thread
From: Michael J Gruber @ 2009-08-06 14:58 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Nick Edelen, Junio C Hamano, Jeff King, Sam Vilain,
Shawn O. Pearce, Andreas Ericsson, Christian Couder, git
Johannes Schindelin venit, vidit, dixit 06.08.2009 16:48:
> Hi,
>
> On Thu, 6 Aug 2009, Nick Edelen wrote:
>
>> SUGGESTED FOR 'PU':
>>
>> Traversing objects is currently very costly, as every commit and tree must be
>> loaded and parsed. Much time and energy could be saved by caching metadata and
>> topological info in an efficient, easily accessible manner. Furthermore, this
>> could improve git's interfacing potential, by providing a condensed summary of
>> a repository's commit tree.
>>
>> This is a series to implement such a revision caching mechanism, aptly named
>> rev-cache. The series will provide:
>> - a core API to manipulate and traverse caches
>> - an integration into the internal revision walker
>> - a porcelain front-end providing access to users and (shell) applications
>> - a series of tests to verify/demonstrate correctness
>> - documentation of the API, porcelain and core concepts
>>
>> In cold starts rev-cache has sped up packing and walking by a factor of 4, and
>> over twice that on warm starts. Some times on slax for the linux repository:
>>
>> rev-list --all --objects >/dev/null
>> default
>> cold 1:13
>> warm 0:43
>> rev-cache'd
>> cold 0:19
>> warm 0:02
>>
>> pack-objects --revs --all --stdout >/dev/null
>> default
>> cold 2:44
>> warm 1:21
>> rev-cache'd
>> cold 0:44
>> warm 0:10
>
> Nice!
>
>> The mechanism is minimally intrusive: most of the changes take place in
>> seperate files, and only a handful of git's existing functions are
>> modified.
>
> Sorry, I forgot the details, could you quickly remind me why these caches
> are not in the pack index files?
>
>> Documentation/rev-cache.txt | 51 +
>> Documentation/technical/rev-cache.txt | 336 ++++++
>> Makefile | 2 +
>> blob.c | 1 +
>> blob.h | 1 +
>> builtin-rev-cache.c | 284 +++++
>> builtin.h | 1 +
>> commit.c | 3 +
>> commit.h | 2 +
>> git.c | 1 +
>> list-objects.c | 49 +-
>> rev-cache.c | 1832 +++++++++++++++++++++++++++++++++
>> revision.c | 89 ++-
>> revision.h | 46 +-
>> t/t6015-rev-cache-list.sh | 228 ++++
>> t/t6015-sha1-dump-diff.py | 36 +
>
> Hmpf.
>
> We got rid of the last Python script in Git a long time ago, but now two
> different patch series try to sneak that dependency (at least for testing)
> back in.
>
> That's all the worse because we cannot use Python in msysGit, and Windows
> should be a platform benefitting dramatically from your work.
In fact, the test the script performs could be easily rephrased with
"sort", "uniq" and "comm".
OTOH: If the walker is supposed to return the exact same orderd list of
commits you can just use test_cmp.
Michael
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 14:58 ` Michael J Gruber
@ 2009-08-06 17:39 ` Nick Edelen
2009-08-06 19:06 ` Johannes Schindelin
0 siblings, 1 reply; 34+ messages in thread
From: Nick Edelen @ 2009-08-06 17:39 UTC (permalink / raw)
To: Michael J Gruber
Cc: Johannes Schindelin, Junio C Hamano, Jeff King, Sam Vilain,
Shawn O. Pearce, Andreas Ericsson, Christian Couder, git
Hi there,
> Sorry, I forgot the details, could you quickly remind me why these caches
> are not in the pack index files?
Er, I'm not sure what you mean. Are you asking why these revision
caches are required if we have a pack index, or why they aren't in the
pack index, or something different? I'm thinking probably the second:
the short answer is that cache slices are totally independant of pack
files.
It might be possible to somehow merge revision cache slices with pack
indexes, but I don't think it'd be a very suitable modification. The
rev-cache slices are meant to act almost like topo-relation pack
files: new slices are simply new files, seperate slice files can be
fused ("repacked") into a larger one, the index is a (recreatable)
single file associating file (positions) with objects. The format was
geared to reducing potential cache/data loss and preventing overly
large cache slices.
>> Hmpf.
>>
>> We got rid of the last Python script in Git a long time ago, but now two
>> different patch series try to sneak that dependency (at least for testing)
>> back in.
>>
>> That's all the worse because we cannot use Python in msysGit, and Windows
>> should be a platform benefitting dramatically from your work.
>
> In fact, the test the script performs could be easily rephrased with
> "sort", "uniq" and "comm".
> OTOH: If the walker is supposed to return the exact same orderd list of
> commits you can just use test_cmp.
The language that script is written in isn't important. I originally
wrote it in python because I wanted something quick and wasn't much of
a sh guru (sorry :-/ ). As Micheal said I've no doubt it can easily
be converted to shell script -- in fact, I'll try to get a shell
version working today.
- Nick
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 17:39 ` Nick Edelen
@ 2009-08-06 19:06 ` Johannes Schindelin
2009-08-06 20:01 ` Nick Edelen
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Johannes Schindelin @ 2009-08-06 19:06 UTC (permalink / raw)
To: Nick Edelen
Cc: Michael J Gruber, Junio C Hamano, Jeff King, Sam Vilain,
Shawn O. Pearce, Andreas Ericsson, Christian Couder, git
Hi,
On Thu, 6 Aug 2009, Nick Edelen wrote:
> > Sorry, I forgot the details, could you quickly remind me why these
> > caches are not in the pack index files?
>
> Er, I'm not sure what you mean. Are you asking why these revision
> caches are required if we have a pack index, or why they aren't in the
> pack index, or something different? I'm thinking probably the second:
Yep.
> the short answer is that cache slices are totally independant of pack
> files.
My idea with that was that you already have a SHA-1 map in the pack index,
and if all you want to be able to accelerate the revision walker, you'd
probably need something that adds yet another mapping, from commit to
parents and tree, and from tree to sub-tree and blob (so you can avoid
unpacking commit and tree objects).
I just thought that it could be more efficient to do it at the time the
pack index is written _anyway_, as nothing will change in the pack after
that anyway.
But I guess I can answer my question easily myself: the boundary commits
will not be handled that way.
Still, there is some redundancy between the pack index and your cache, as
you have to write out the whole list of SHA-1s all over again. I guess it
is time to look at the code instead of asking stupid questions.
> It might be possible to somehow merge revision cache slices with pack
> indexes, but I don't think it'd be a very suitable modification. The
> rev-cache slices are meant to act almost like topo-relation pack files:
> new slices are simply new files, seperate slice files can be fused
> ("repacked") into a larger one, the index is a (recreatable) single file
> associating file (positions) with objects. The format was geared to
> reducing potential cache/data loss and preventing overly large cache
> slices.
>
> >> Hmpf.
> >>
> >> We got rid of the last Python script in Git a long time ago, but now
> >> two different patch series try to sneak that dependency (at least for
> >> testing) back in.
> >>
> >> That's all the worse because we cannot use Python in msysGit, and
> >> Windows should be a platform benefitting dramatically from your work.
> >
> > In fact, the test the script performs could be easily rephrased with
> > "sort", "uniq" and "comm". OTOH: If the walker is supposed to return
> > the exact same orderd list of commits you can just use test_cmp.
>
> The language that script is written in isn't important. I originally
> wrote it in python because I wanted something quick and wasn't much of
> a sh guru (sorry :-/ ). As Micheal said I've no doubt it can easily
> be converted to shell script
That is not what I wanted to hear.
> -- in fact, I'll try to get a shell version working today.
That is.
Thanks,
Dscho
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 19:06 ` Johannes Schindelin
@ 2009-08-06 20:01 ` Nick Edelen
2009-08-06 20:30 ` Nick Edelen
2009-08-07 2:47 ` Sam Vilain
2009-08-08 18:57 ` Junio C Hamano
2 siblings, 1 reply; 34+ messages in thread
From: Nick Edelen @ 2009-08-06 20:01 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Michael J Gruber, Junio C Hamano, Jeff King, Sam Vilain,
Shawn O. Pearce, Andreas Ericsson, Christian Couder, git
Hi,
>My idea with that was that you already have a SHA-1 map in the pack index,
>and if all you want to be able to accelerate the revision walker, you'd
>probably need something that adds yet another mapping, from commit to
>parents and tree, and from tree to sub-tree and blob (so you can avoid
>unpacking commit and tree objects).
As I mention in one of the patch descriptions, along with each commit
a list of objects introduced per commit is cached, so no extra I/O is
necessary for tree recursion, etc. during traversal.
>I just thought that it could be more efficient to do it at the time the
>pack index is written _anyway_, as nothing will change in the pack after
>that anyway.
Nothing might change in the pack, but the slices were made to allow
for continual addition and refinement of the cache. In a typical
usage slices will be added and fused on a regular basis, which would
require tinkering in pack indexes if they were combined.
>But I guess I can answer my question easily myself: the boundary commits
>will not be handled that way.
>
>Still, there is some redundancy between the pack index and your cache, as
>you have to write out the whole list of SHA-1s all over again. I guess it
>is time to look at the code instead of asking stupid questions.
The whole revision cache is redundant, technically speaking: nothing
in it can't be found by rummaging through packs or objects. The point
of it was to distill out important information for fast, easy access
of the commit tree.
On another note, I've eliminated the python dependancy. Shall I
resend the patchset now or should I wait until it has been further
reviewed? (don't want to flood the list with resubmits)
- Nick
On Thu, Aug 6, 2009 at 9:06 PM, Johannes
Schindelin<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Thu, 6 Aug 2009, Nick Edelen wrote:
>
>> > Sorry, I forgot the details, could you quickly remind me why these
>> > caches are not in the pack index files?
>>
>> Er, I'm not sure what you mean. Are you asking why these revision
>> caches are required if we have a pack index, or why they aren't in the
>> pack index, or something different? I'm thinking probably the second:
>
> Yep.
>
>> the short answer is that cache slices are totally independant of pack
>> files.
>
> My idea with that was that you already have a SHA-1 map in the pack index,
> and if all you want to be able to accelerate the revision walker, you'd
> probably need something that adds yet another mapping, from commit to
> parents and tree, and from tree to sub-tree and blob (so you can avoid
> unpacking commit and tree objects).
>
> I just thought that it could be more efficient to do it at the time the
> pack index is written _anyway_, as nothing will change in the pack after
> that anyway.
>
> But I guess I can answer my question easily myself: the boundary commits
> will not be handled that way.
>
> Still, there is some redundancy between the pack index and your cache, as
> you have to write out the whole list of SHA-1s all over again. I guess it
> is time to look at the code instead of asking stupid questions.
>
>> It might be possible to somehow merge revision cache slices with pack
>> indexes, but I don't think it'd be a very suitable modification. The
>> rev-cache slices are meant to act almost like topo-relation pack files:
>> new slices are simply new files, seperate slice files can be fused
>> ("repacked") into a larger one, the index is a (recreatable) single file
>> associating file (positions) with objects. The format was geared to
>> reducing potential cache/data loss and preventing overly large cache
>> slices.
>>
>> >> Hmpf.
>> >>
>> >> We got rid of the last Python script in Git a long time ago, but now
>> >> two different patch series try to sneak that dependency (at least for
>> >> testing) back in.
>> >>
>> >> That's all the worse because we cannot use Python in msysGit, and
>> >> Windows should be a platform benefitting dramatically from your work.
>> >
>> > In fact, the test the script performs could be easily rephrased with
>> > "sort", "uniq" and "comm". OTOH: If the walker is supposed to return
>> > the exact same orderd list of commits you can just use test_cmp.
>>
>> The language that script is written in isn't important. I originally
>> wrote it in python because I wanted something quick and wasn't much of
>> a sh guru (sorry :-/ ). As Micheal said I've no doubt it can easily
>> be converted to shell script
>
> That is not what I wanted to hear.
>
>> -- in fact, I'll try to get a shell version working today.
>
> That is.
>
> Thanks,
> Dscho
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 20:01 ` Nick Edelen
@ 2009-08-06 20:30 ` Nick Edelen
2009-08-06 20:32 ` Shawn O. Pearce
2009-08-07 4:42 ` Nicolas Pitre
0 siblings, 2 replies; 34+ messages in thread
From: Nick Edelen @ 2009-08-06 20:30 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Michael J Gruber, Junio C Hamano, Jeff King, Sam Vilain,
Shawn O. Pearce, Andreas Ericsson, Christian Couder, git
Hrmm, I just realized that it dosn't actually cache paths/names...
This obviously has no bearing on its use in packing, but I should
either add that in or restrict usage in non-packing-related walks.
Weird how things like that escape you.
I think I may go ahead and add support for this tomorrow. It should
have no effect on performance and very little impact on cache slice
size.
On Thu, Aug 6, 2009 at 10:01 PM, Nick Edelen<sirnot@gmail.com> wrote:
> Hi,
>
>>My idea with that was that you already have a SHA-1 map in the pack index,
>>and if all you want to be able to accelerate the revision walker, you'd
>>probably need something that adds yet another mapping, from commit to
>>parents and tree, and from tree to sub-tree and blob (so you can avoid
>>unpacking commit and tree objects).
>
> As I mention in one of the patch descriptions, along with each commit
> a list of objects introduced per commit is cached, so no extra I/O is
> necessary for tree recursion, etc. during traversal.
>
>>I just thought that it could be more efficient to do it at the time the
>>pack index is written _anyway_, as nothing will change in the pack after
>>that anyway.
>
> Nothing might change in the pack, but the slices were made to allow
> for continual addition and refinement of the cache. In a typical
> usage slices will be added and fused on a regular basis, which would
> require tinkering in pack indexes if they were combined.
>
>>But I guess I can answer my question easily myself: the boundary commits
>>will not be handled that way.
>>
>>Still, there is some redundancy between the pack index and your cache, as
>>you have to write out the whole list of SHA-1s all over again. I guess it
>>is time to look at the code instead of asking stupid questions.
>
> The whole revision cache is redundant, technically speaking: nothing
> in it can't be found by rummaging through packs or objects. The point
> of it was to distill out important information for fast, easy access
> of the commit tree.
>
> On another note, I've eliminated the python dependancy. Shall I
> resend the patchset now or should I wait until it has been further
> reviewed? (don't want to flood the list with resubmits)
>
> - Nick
>
> On Thu, Aug 6, 2009 at 9:06 PM, Johannes
> Schindelin<Johannes.Schindelin@gmx.de> wrote:
>> Hi,
>>
>> On Thu, 6 Aug 2009, Nick Edelen wrote:
>>
>>> > Sorry, I forgot the details, could you quickly remind me why these
>>> > caches are not in the pack index files?
>>>
>>> Er, I'm not sure what you mean. Are you asking why these revision
>>> caches are required if we have a pack index, or why they aren't in the
>>> pack index, or something different? I'm thinking probably the second:
>>
>> Yep.
>>
>>> the short answer is that cache slices are totally independant of pack
>>> files.
>>
>> My idea with that was that you already have a SHA-1 map in the pack index,
>> and if all you want to be able to accelerate the revision walker, you'd
>> probably need something that adds yet another mapping, from commit to
>> parents and tree, and from tree to sub-tree and blob (so you can avoid
>> unpacking commit and tree objects).
>>
>> I just thought that it could be more efficient to do it at the time the
>> pack index is written _anyway_, as nothing will change in the pack after
>> that anyway.
>>
>> But I guess I can answer my question easily myself: the boundary commits
>> will not be handled that way.
>>
>> Still, there is some redundancy between the pack index and your cache, as
>> you have to write out the whole list of SHA-1s all over again. I guess it
>> is time to look at the code instead of asking stupid questions.
>>
>>> It might be possible to somehow merge revision cache slices with pack
>>> indexes, but I don't think it'd be a very suitable modification. The
>>> rev-cache slices are meant to act almost like topo-relation pack files:
>>> new slices are simply new files, seperate slice files can be fused
>>> ("repacked") into a larger one, the index is a (recreatable) single file
>>> associating file (positions) with objects. The format was geared to
>>> reducing potential cache/data loss and preventing overly large cache
>>> slices.
>>>
>>> >> Hmpf.
>>> >>
>>> >> We got rid of the last Python script in Git a long time ago, but now
>>> >> two different patch series try to sneak that dependency (at least for
>>> >> testing) back in.
>>> >>
>>> >> That's all the worse because we cannot use Python in msysGit, and
>>> >> Windows should be a platform benefitting dramatically from your work.
>>> >
>>> > In fact, the test the script performs could be easily rephrased with
>>> > "sort", "uniq" and "comm". OTOH: If the walker is supposed to return
>>> > the exact same orderd list of commits you can just use test_cmp.
>>>
>>> The language that script is written in isn't important. I originally
>>> wrote it in python because I wanted something quick and wasn't much of
>>> a sh guru (sorry :-/ ). As Micheal said I've no doubt it can easily
>>> be converted to shell script
>>
>> That is not what I wanted to hear.
>>
>>> -- in fact, I'll try to get a shell version working today.
>>
>> That is.
>>
>> Thanks,
>> Dscho
>>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 20:30 ` Nick Edelen
@ 2009-08-06 20:32 ` Shawn O. Pearce
2009-08-06 23:35 ` A Large Angry SCM
2009-08-07 4:42 ` Nicolas Pitre
1 sibling, 1 reply; 34+ messages in thread
From: Shawn O. Pearce @ 2009-08-06 20:32 UTC (permalink / raw)
To: Nick Edelen
Cc: Johannes Schindelin, Michael J Gruber, Junio C Hamano, Jeff King,
Sam Vilain, Andreas Ericsson, Christian Couder, git
Nick Edelen <sirnot@gmail.com> wrote:
> Hrmm, I just realized that it dosn't actually cache paths/names...
> This obviously has no bearing on its use in packing, but I should
> either add that in or restrict usage in non-packing-related walks.
> Weird how things like that escape you.
>
> I think I may go ahead and add support for this tomorrow. It should
> have no effect on performance and very little impact on cache slice
> size.
You may not need the path name, but instead the hash value that
pack-objects computes from the path name. All that matters is
the hash, so pack-objects can schedule the objects into the right
buckets when its doing delta computation for objects which are not
yet delta compressed, or whose delta cannot be suitably reused.
--
Shawn.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 20:32 ` Shawn O. Pearce
@ 2009-08-06 23:35 ` A Large Angry SCM
2009-08-06 23:37 ` Shawn O. Pearce
0 siblings, 1 reply; 34+ messages in thread
From: A Large Angry SCM @ 2009-08-06 23:35 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Nick Edelen, Johannes Schindelin, Michael J Gruber,
Junio C Hamano, Jeff King, Sam Vilain, Andreas Ericsson,
Christian Couder, git
Shawn O. Pearce wrote:
> Nick Edelen <sirnot@gmail.com> wrote:
>> Hrmm, I just realized that it dosn't actually cache paths/names...
>> This obviously has no bearing on its use in packing, but I should
>> either add that in or restrict usage in non-packing-related walks.
>> Weird how things like that escape you.
>>
>> I think I may go ahead and add support for this tomorrow. It should
>> have no effect on performance and very little impact on cache slice
>> size.
>
> You may not need the path name, but instead the hash value that
> pack-objects computes from the path name. All that matters is
> the hash, so pack-objects can schedule the objects into the right
> buckets when its doing delta computation for objects which are not
> yet delta compressed, or whose delta cannot be suitably reused.
>
Please do NOT expose the hash values. The hash used by pack-objects is
an implementation detail of the heuristics used by the _current_ object
packing code. It would be a real shame to have to maintain backward
compatibility with it at some future date after the packing machinery
has changed.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 23:35 ` A Large Angry SCM
@ 2009-08-06 23:37 ` Shawn O. Pearce
2009-08-06 23:43 ` A Large Angry SCM
2009-08-07 6:05 ` Johannes Schindelin
0 siblings, 2 replies; 34+ messages in thread
From: Shawn O. Pearce @ 2009-08-06 23:37 UTC (permalink / raw)
To: A Large Angry SCM
Cc: Nick Edelen, Johannes Schindelin, Michael J Gruber,
Junio C Hamano, Jeff King, Sam Vilain, Andreas Ericsson,
Christian Couder, git
A Large Angry SCM <gitzilla@gmail.com> wrote:
> Shawn O. Pearce wrote:
>> Nick Edelen <sirnot@gmail.com> wrote:
>>> Hrmm, I just realized that it dosn't actually cache paths/names...
>>
>> You may not need the path name, but instead the hash value that
>> pack-objects computes from the path name.
>
> Please do NOT expose the hash values. The hash used by pack-objects is
> an implementation detail of the heuristics used by the _current_ object
> packing code. It would be a real shame to have to maintain backward
> compatibility with it at some future date after the packing machinery
> has changed.
This is a local cache. If there was a version number in the header,
and the hash function changes, we could just bump the version number
and invalidate all of the caches.
No sense in storing (and doing IO of) huge duplicate string values
for something where we really only need 32 bits, and where a
recompute from scratch only costs a minute.
--
Shawn.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 23:37 ` Shawn O. Pearce
@ 2009-08-06 23:43 ` A Large Angry SCM
2009-08-07 0:15 ` Nick Edelen
2009-08-07 6:05 ` Johannes Schindelin
1 sibling, 1 reply; 34+ messages in thread
From: A Large Angry SCM @ 2009-08-06 23:43 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Nick Edelen, Johannes Schindelin, Michael J Gruber,
Junio C Hamano, Jeff King, Sam Vilain, Andreas Ericsson,
Christian Couder, git
Shawn O. Pearce wrote:
> A Large Angry SCM <gitzilla@gmail.com> wrote:
>> Shawn O. Pearce wrote:
>>> Nick Edelen <sirnot@gmail.com> wrote:
>>>> Hrmm, I just realized that it dosn't actually cache paths/names...
>>> You may not need the path name, but instead the hash value that
>>> pack-objects computes from the path name.
>> Please do NOT expose the hash values. The hash used by pack-objects is
>> an implementation detail of the heuristics used by the _current_ object
>> packing code. It would be a real shame to have to maintain backward
>> compatibility with it at some future date after the packing machinery
>> has changed.
>
> This is a local cache. If there was a version number in the header,
> and the hash function changes, we could just bump the version number
> and invalidate all of the caches.
>
> No sense in storing (and doing IO of) huge duplicate string values
> for something where we really only need 32 bits, and where a
> recompute from scratch only costs a minute.
>
That will work for me if the cache gets a version number and iff the
pack-objects hash code gets big warning comments about the cache code
dependency.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 23:43 ` A Large Angry SCM
@ 2009-08-07 0:15 ` Nick Edelen
0 siblings, 0 replies; 34+ messages in thread
From: Nick Edelen @ 2009-08-07 0:15 UTC (permalink / raw)
To: gitzilla
Cc: Shawn O. Pearce, Johannes Schindelin, Michael J Gruber,
Junio C Hamano, Jeff King, Sam Vilain, Andreas Ericsson,
Christian Couder, git
That would work, but I sorta like the idea of caching the actual
names. I'm thinking of having a block of slice-unique, null-seperated
names at the end of each slice (ie. not in the mapping) which is
loaded into memory (it wouldn't be very big). Then each blob/tree
object would have an variable length index referencing a particular
name.
Using the actual names would give us greater flexbility, and allow
rev-cache to output proper rev-list type output (with the names after
the hashes).
On Fri, Aug 7, 2009 at 1:43 AM, A Large Angry SCM<gitzilla@gmail.com> wrote:
> Shawn O. Pearce wrote:
>>
>> A Large Angry SCM <gitzilla@gmail.com> wrote:
>>>
>>> Shawn O. Pearce wrote:
>>>>
>>>> Nick Edelen <sirnot@gmail.com> wrote:
>>>>>
>>>>> Hrmm, I just realized that it dosn't actually cache paths/names...
>>>>
>>>> You may not need the path name, but instead the hash value that
>>>> pack-objects computes from the path name.
>>>
>>> Please do NOT expose the hash values. The hash used by pack-objects is
>>> an implementation detail of the heuristics used by the _current_ object
>>> packing code. It would be a real shame to have to maintain backward
>>> compatibility with it at some future date after the packing machinery has
>>> changed.
>>
>> This is a local cache. If there was a version number in the header,
>> and the hash function changes, we could just bump the version number
>> and invalidate all of the caches.
>>
>> No sense in storing (and doing IO of) huge duplicate string values
>> for something where we really only need 32 bits, and where a
>> recompute from scratch only costs a minute.
>>
>
> That will work for me if the cache gets a version number and iff the
> pack-objects hash code gets big warning comments about the cache code
> dependency.
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 19:06 ` Johannes Schindelin
2009-08-06 20:01 ` Nick Edelen
@ 2009-08-07 2:47 ` Sam Vilain
2009-08-07 4:35 ` Nicolas Pitre
2009-08-07 6:12 ` Johannes Schindelin
2009-08-08 18:57 ` Junio C Hamano
2 siblings, 2 replies; 34+ messages in thread
From: Sam Vilain @ 2009-08-07 2:47 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Nick Edelen, Michael J Gruber, Junio C Hamano, Jeff King,
Shawn O. Pearce, Andreas Ericsson, Christian Couder, git
Johannes Schindelin wrote:
>> the short answer is that cache slices are totally independant of pack
>> files.
>>
>
> My idea with that was that you already have a SHA-1 map in the pack index,
> and if all you want to be able to accelerate the revision walker, you'd
> probably need something that adds yet another mapping, from commit to
> parents and tree, and from tree to sub-tree and blob (so you can avoid
> unpacking commit and tree objects).
>
Tying indexes together like that is not a good idea in the database
world. Especially as in this case as Nick mentions, the domain is subtly
different (ie pack vs dag). Unfortunately you just can't try to pretend
that they will always be the same; you can't force a full repack on
every ref change!
> Still, there is some redundancy between the pack index and your cache, as
> you have to write out the whole list of SHA-1s all over again. I guess it
> is time to look at the code instead of asking stupid questions.
>
"Disk is cheap" :-) It should be a welcome trade-off; perhaps it's worth
including numbers about how big the indexes are with the time
statistics. It sounds though like it should be a significant win as a
single index can be used to accelerate a wide range of rev-list
operations, and store indexes for many different questions that can be
asked.
Sam
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 2:47 ` Sam Vilain
@ 2009-08-07 4:35 ` Nicolas Pitre
2009-08-07 6:08 ` Johannes Schindelin
2009-08-07 6:12 ` Johannes Schindelin
1 sibling, 1 reply; 34+ messages in thread
From: Nicolas Pitre @ 2009-08-07 4:35 UTC (permalink / raw)
To: Sam Vilain
Cc: Johannes Schindelin, Nick Edelen, Michael J Gruber,
Junio C Hamano, Jeff King, Shawn O. Pearce, Andreas Ericsson,
Christian Couder, git
On Fri, 7 Aug 2009, Sam Vilain wrote:
> Johannes Schindelin wrote:
> >> the short answer is that cache slices are totally independant of pack
> >> files.
> >>
> >
> > My idea with that was that you already have a SHA-1 map in the pack index,
> > and if all you want to be able to accelerate the revision walker, you'd
> > probably need something that adds yet another mapping, from commit to
> > parents and tree, and from tree to sub-tree and blob (so you can avoid
> > unpacking commit and tree objects).
> >
>
> Tying indexes together like that is not a good idea in the database
> world. Especially as in this case as Nick mentions, the domain is subtly
> different (ie pack vs dag). Unfortunately you just can't try to pretend
> that they will always be the same; you can't force a full repack on
> every ref change!
Right. And the rev cache must work even if the repository is not
packed. So pack index and rev caching are orthogonal things and are best
kept separate on disk.
How big this cache might get would be interesting indeed.
Nicolas
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 20:30 ` Nick Edelen
2009-08-06 20:32 ` Shawn O. Pearce
@ 2009-08-07 4:42 ` Nicolas Pitre
1 sibling, 0 replies; 34+ messages in thread
From: Nicolas Pitre @ 2009-08-07 4:42 UTC (permalink / raw)
To: Nick Edelen
Cc: Johannes Schindelin, Michael J Gruber, Junio C Hamano, Jeff King,
Sam Vilain, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
On Thu, 6 Aug 2009, Nick Edelen wrote:
> Hrmm, I just realized that it dosn't actually cache paths/names...
> This obviously has no bearing on its use in packing, but I should
> either add that in or restrict usage in non-packing-related walks.
> Weird how things like that escape you.
Actually it is really the packing related walk that would benefit the
most from this work. The "counting objects" phase of a clone may take
quite a while with some repositories. Most other operations don't care
as much because the rev walk is done incrementally whereas the packing
operation needs to perform it all up front.
Nicolas
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 23:37 ` Shawn O. Pearce
2009-08-06 23:43 ` A Large Angry SCM
@ 2009-08-07 6:05 ` Johannes Schindelin
1 sibling, 0 replies; 34+ messages in thread
From: Johannes Schindelin @ 2009-08-07 6:05 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: A Large Angry SCM, Nick Edelen, Michael J Gruber, Junio C Hamano,
Jeff King, Sam Vilain, Andreas Ericsson, Christian Couder, git
On Thu, 6 Aug 2009, Shawn O. Pearce wrote:
> A Large Angry SCM <gitzilla@gmail.com> wrote:
> > Shawn O. Pearce wrote:
> >> Nick Edelen <sirnot@gmail.com> wrote:
> >>> Hrmm, I just realized that it dosn't actually cache paths/names...
> >>
> >> You may not need the path name, but instead the hash value that
> >> pack-objects computes from the path name.
> >
> > Please do NOT expose the hash values. The hash used by pack-objects is
> > an implementation detail of the heuristics used by the _current_
> > object packing code. It would be a real shame to have to maintain
> > backward compatibility with it at some future date after the packing
> > machinery has changed.
>
> This is a local cache. If there was a version number in the header, and
> the hash function changes, we could just bump the version number and
> invalidate all of the caches.
>
> No sense in storing (and doing IO of) huge duplicate string values for
> something where we really only need 32 bits, and where a recompute from
> scratch only costs a minute.
FWIW it was this redundancy in duplicate (unpacked) string redundancy
I meant, but I did a poor job at expressing myself, and consequently Nick
did not understand what I want (and I'm on a slow connection, so I deleted
the mail halfway through looking if there are some real answers hidden in
the huge quoted part instead of replying).
And the fragility of the dependency to the implementation detail of the
pack index suggests to me that my intuition that this whole thing should
be more tightly integrated with the pack index was not totally off the
mark.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 4:35 ` Nicolas Pitre
@ 2009-08-07 6:08 ` Johannes Schindelin
2009-08-07 14:18 ` Nicolas Pitre
0 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2009-08-07 6:08 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Sam Vilain, Nick Edelen, Michael J Gruber, Junio C Hamano,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
Hi,
On Fri, 7 Aug 2009, Nicolas Pitre wrote:
> On Fri, 7 Aug 2009, Sam Vilain wrote:
>
> > Johannes Schindelin wrote:
> > >> the short answer is that cache slices are totally independant of
> > >> pack files.
> > >>
> > >
> > > My idea with that was that you already have a SHA-1 map in the pack
> > > index, and if all you want to be able to accelerate the revision
> > > walker, you'd probably need something that adds yet another mapping,
> > > from commit to parents and tree, and from tree to sub-tree and blob
> > > (so you can avoid unpacking commit and tree objects).
> > >
> >
> > Tying indexes together like that is not a good idea in the database
> > world. Especially as in this case as Nick mentions, the domain is
> > subtly different (ie pack vs dag). Unfortunately you just can't try to
> > pretend that they will always be the same; you can't force a full
> > repack on every ref change!
>
> Right. And the rev cache must work even if the repository is not
> packed.
Umm, why? AFAICT the principal purpose of the rev cache is to help work
loads on, say, www.kernel.org.
I am unlikely to notice the improvements in my regular "git log" calls
that only show a couple of pages before I quit the pager.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 2:47 ` Sam Vilain
2009-08-07 4:35 ` Nicolas Pitre
@ 2009-08-07 6:12 ` Johannes Schindelin
2009-08-07 15:00 ` Nicolas Pitre
1 sibling, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2009-08-07 6:12 UTC (permalink / raw)
To: Sam Vilain
Cc: Nick Edelen, Michael J Gruber, Junio C Hamano, Jeff King,
Shawn O. Pearce, Andreas Ericsson, Christian Couder, git
Hi,
On Fri, 7 Aug 2009, Sam Vilain wrote:
> Johannes Schindelin wrote:
> >> the short answer is that cache slices are totally independant of pack
> >> files.
> >>
> >
> > My idea with that was that you already have a SHA-1 map in the pack
> > index, and if all you want to be able to accelerate the revision
> > walker, you'd probably need something that adds yet another mapping,
> > from commit to parents and tree, and from tree to sub-tree and blob
> > (so you can avoid unpacking commit and tree objects).
> >
>
> Tying indexes together like that is not a good idea in the database
> world.
This is not the same index as in the database world. It is more
comparable with a cached view. And there, it is generally a good idea to
keep related things in the same cached view (with an outer join), rather
than having two primary keys for almost every record.
> Especially as in this case as Nick mentions, the domain is subtly
> different (ie pack vs dag). Unfortunately you just can't try to pretend
> that they will always be the same; you can't force a full repack on
> every ref change!
No, but you do not need that, either. In the setting that is most likely
the most thankful one, i.e. a git:// server, you _want_ to keep the
repository "as packed as possible", otherwise the rev cache improvements
will be lost in the bad packing performance anyway.
> > Still, there is some redundancy between the pack index and your cache,
> > as you have to write out the whole list of SHA-1s all over again. I
> > guess it is time to look at the code instead of asking stupid
> > questions.
> >
>
> "Disk is cheap" :-)
Disk I/O ain't.
(Size of the I/O caches, yaddayadda, I'm sure you get my point).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 6:08 ` Johannes Schindelin
@ 2009-08-07 14:18 ` Nicolas Pitre
2009-08-08 15:18 ` Johannes Schindelin
0 siblings, 1 reply; 34+ messages in thread
From: Nicolas Pitre @ 2009-08-07 14:18 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Sam Vilain, Nick Edelen, Michael J Gruber, Junio C Hamano,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
On Fri, 7 Aug 2009, Johannes Schindelin wrote:
> Hi,
>
> On Fri, 7 Aug 2009, Nicolas Pitre wrote:
>
> > On Fri, 7 Aug 2009, Sam Vilain wrote:
> >
> > > Johannes Schindelin wrote:
> > > >> the short answer is that cache slices are totally independant of
> > > >> pack files.
> > > >>
> > > >
> > > > My idea with that was that you already have a SHA-1 map in the pack
> > > > index, and if all you want to be able to accelerate the revision
> > > > walker, you'd probably need something that adds yet another mapping,
> > > > from commit to parents and tree, and from tree to sub-tree and blob
> > > > (so you can avoid unpacking commit and tree objects).
> > > >
> > >
> > > Tying indexes together like that is not a good idea in the database
> > > world. Especially as in this case as Nick mentions, the domain is
> > > subtly different (ie pack vs dag). Unfortunately you just can't try to
> > > pretend that they will always be the same; you can't force a full
> > > repack on every ref change!
> >
> > Right. And the rev cache must work even if the repository is not
> > packed.
>
> Umm, why? AFAICT the principal purpose of the rev cache is to help work
> loads on, say, www.kernel.org.
So what?
Speeding up rev-list with a rev cache is completely orthogonal to
whether the repository is packed or not. It is like having a "git diff"
result cache: no one would think of stuffing that in the pack index.
If we want to improve on the repository packing format, that must be
doable without bothering with an independent concept such as a rev
cache.
> I am unlikely to notice the improvements in my regular "git log" calls
> that only show a couple of pages before I quit the pager.
Indeed. But what is your point again?
Nicolas
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 6:12 ` Johannes Schindelin
@ 2009-08-07 15:00 ` Nicolas Pitre
2009-08-07 22:02 ` Nick Edelen
0 siblings, 1 reply; 34+ messages in thread
From: Nicolas Pitre @ 2009-08-07 15:00 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Sam Vilain, Nick Edelen, Michael J Gruber, Junio C Hamano,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
On Fri, 7 Aug 2009, Johannes Schindelin wrote:
> On Fri, 7 Aug 2009, Sam Vilain wrote:
>
> > Especially as in this case as Nick mentions, the domain is subtly
> > different (ie pack vs dag). Unfortunately you just can't try to pretend
> > that they will always be the same; you can't force a full repack on
> > every ref change!
>
> No, but you do not need that, either. In the setting that is most likely
> the most thankful one, i.e. a git:// server, you _want_ to keep the
> repository "as packed as possible", otherwise the rev cache improvements
> will be lost in the bad packing performance anyway.
Yes and no.
Currently, the number #1 latency in any initial git clone is the famous
"counting objects" phase, even if the repo is perfectly packed. And
that's all this rev cache can and will improve. The packing does play
its performance role of course, but for a totally different reason.
Hence the repository needs no be perfectly packed for a rev cache to
speed up its own part of the game.
> > > Still, there is some redundancy between the pack index and your cache,
> > > as you have to write out the whole list of SHA-1s all over again. I
> > > guess it is time to look at the code instead of asking stupid
> > > questions.
> > >
> >
> > "Disk is cheap" :-)
>
> Disk I/O ain't.
>
> (Size of the I/O caches, yaddayadda, I'm sure you get my point).
I don't know about the size of the rev cache on disk yet (I asked Nick
about that) nor do I really know how this cache is implemented. But I
know damn well about git packs and associated index and I for sure don't
want to see a revision cache coupled with it.
And for a clone the disk IO will certainly be a magnitude larger than
for the cache (or so I hope). Maybe the IO for the rev cache might be a
significant overhead for operations performed on a client (aka
developer) repository, in which case it would be a good idea to have a
config variable to control the cache size, or even to turn it off
entirely. We do it for delta depth and many other things already.
Nicolas
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 15:00 ` Nicolas Pitre
@ 2009-08-07 22:02 ` Nick Edelen
2009-08-07 22:48 ` Junio C Hamano
0 siblings, 1 reply; 34+ messages in thread
From: Nick Edelen @ 2009-08-07 22:02 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Johannes Schindelin, Sam Vilain, Michael J Gruber,
Junio C Hamano, Jeff King, Shawn O. Pearce, Andreas Ericsson,
Christian Couder, git
> I don't know about the size of the rev cache on disk yet (I asked Nick
> about that) nor do I really know how this cache is implemented.
The cache file for all of the linux repository (as of a few weeks ago)
is around 42MB, without names. The names would probably add 2 or 3 MB
on top of that. That's probably about as big as I'd want to get, as
the whole slice (minus the name list) is mapped to memory (then again
bigger might be ok; I'm not an expert on mem mapping). The rev-cache
command fuse provides functionality to ignore certain cache sizes,
which was geared towards preventing overly large slices.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 22:02 ` Nick Edelen
@ 2009-08-07 22:48 ` Junio C Hamano
2009-08-07 22:53 ` Nick Edelen
2009-08-08 2:50 ` Jeff King
0 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2009-08-07 22:48 UTC (permalink / raw)
To: Nick Edelen
Cc: Nicolas Pitre, Johannes Schindelin, Sam Vilain, Michael J Gruber,
Junio C Hamano, Jeff King, Shawn O. Pearce, Andreas Ericsson,
Christian Couder, git
Nick Edelen <sirnot@gmail.com> writes:
> The cache file for all of the linux repository (as of a few weeks ago)
> is around 42MB, without names. The names would probably add 2 or 3 MB
> on top of that. That's probably about as big as I'd want to get,
Hmm. .git/objects/ as of today is about 482M here, so we are talking
about roughly 10% overhead?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 22:48 ` Junio C Hamano
@ 2009-08-07 22:53 ` Nick Edelen
2009-08-08 3:11 ` Junio C Hamano
2009-08-08 2:50 ` Jeff King
1 sibling, 1 reply; 34+ messages in thread
From: Nick Edelen @ 2009-08-07 22:53 UTC (permalink / raw)
To: Junio C Hamano
Cc: Nicolas Pitre, Johannes Schindelin, Sam Vilain, Michael J Gruber,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
> Hmm. .git/objects/ as of today is about 482M here, so we are talking
> about roughly 10% overhead?
Yes, that sounds about right. The cache file for git's repository is
3MB, and my repo (partly packed) is ~35MB.
By the way, what would be the best way of posting a revised patchset?
Should I just reply to my older posts, or make new ones?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 22:48 ` Junio C Hamano
2009-08-07 22:53 ` Nick Edelen
@ 2009-08-08 2:50 ` Jeff King
1 sibling, 0 replies; 34+ messages in thread
From: Jeff King @ 2009-08-08 2:50 UTC (permalink / raw)
To: Nick Edelen
Cc: Junio C Hamano, Nicolas Pitre, Johannes Schindelin, Sam Vilain,
Michael J Gruber, Shawn O. Pearce, Andreas Ericsson,
Christian Couder, git
On Fri, Aug 07, 2009 at 03:48:51PM -0700, Junio C Hamano wrote:
> > The cache file for all of the linux repository (as of a few weeks ago)
> > is around 42MB, without names. The names would probably add 2 or 3 MB
> > on top of that. That's probably about as big as I'd want to get,
>
> Hmm. .git/objects/ as of today is about 482M here, so we are talking
> about roughly 10% overhead?
IIRC from previous discussions, kernel.org's main performance problem is
I/O, not CPU. Are there any provisions for sharing rev-caches between
similar repositories, as we already do for objects?
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 22:53 ` Nick Edelen
@ 2009-08-08 3:11 ` Junio C Hamano
2009-08-08 7:27 ` Nick Edelen
0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2009-08-08 3:11 UTC (permalink / raw)
To: Nick Edelen
Cc: Nicolas Pitre, Johannes Schindelin, Sam Vilain, Michael J Gruber,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
Nick Edelen <sirnot@gmail.com> writes:
> By the way, what would be the best way of posting a revised patchset?
> Should I just reply to my older posts, or make new ones?
That depends primarily on how heavily the patches needed to change in
response to review comments, but until the series lands in 'next', you
would typically send updated series as a replacement, not incremental.
Many people seemed to be interested in the series and had a volume of
comments on it. I suspect the updated series would be quite different
from the original, so for the next round I would suspect it would be best
to start anew, marking them as [PATCH N/M (v2)], in a fresh thread. It
would help reviewers if you said "this corresponds to [PATCH 3/5] in the
original series, with the following improvements based on X and Y's
comments" after the three-dash line.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-08 3:11 ` Junio C Hamano
@ 2009-08-08 7:27 ` Nick Edelen
2009-08-08 7:30 ` Jeff King
0 siblings, 1 reply; 34+ messages in thread
From: Nick Edelen @ 2009-08-08 7:27 UTC (permalink / raw)
To: Junio C Hamano
Cc: Nicolas Pitre, Johannes Schindelin, Sam Vilain, Michael J Gruber,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
> IIRC from previous discussions, kernel.org's main performance problem is
> I/O, not CPU. Are there any provisions for sharing rev-caches between
> similar repositories, as we already do for objects?
I haven't implemented a transmission protocol or anything, but it
would be perfectly possible to copy cache slices from one repo to
another. Generating the revision cache from scratch on large repos
can take several minutes, so this wouldn't be a bad idea.
> That depends primarily on how heavily the patches needed to change in
> response to review comments, but until the series lands in 'next', you
> would typically send updated series as a replacement, not incremental.
>
> Many people seemed to be interested in the series and had a volume of
> comments on it. I suspect the updated series would be quite different
> from the original, so for the next round I would suspect it would be best
> to start anew, marking them as [PATCH N/M (v2)], in a fresh thread. It
> would help reviewers if you said "this corresponds to [PATCH 3/5] in the
> original series, with the following improvements based on X and Y's
> comments" after the three-dash line.
Ok, that sounds good. I've added a new patch as well so the numbering changes.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-08 7:27 ` Nick Edelen
@ 2009-08-08 7:30 ` Jeff King
2009-08-08 7:40 ` Nick Edelen
0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2009-08-08 7:30 UTC (permalink / raw)
To: Nick Edelen
Cc: Junio C Hamano, Nicolas Pitre, Johannes Schindelin, Sam Vilain,
Michael J Gruber, Shawn O. Pearce, Andreas Ericsson,
Christian Couder, git
On Sat, Aug 08, 2009 at 09:27:55AM +0200, Nick Edelen wrote:
> > IIRC from previous discussions, kernel.org's main performance problem is
> > I/O, not CPU. Are there any provisions for sharing rev-caches between
> > similar repositories, as we already do for objects?
>
> I haven't implemented a transmission protocol or anything, but it
> would be perfectly possible to copy cache slices from one repo to
> another. Generating the revision cache from scratch on large repos
> can take several minutes, so this wouldn't be a bad idea.
That might be useful, but I was thinking more of an "alternates"-like
mechanism between repos. So that the data is stored only once on disk
and in the disk cache, which is helpful for sites like kernel.org which
serve many similar repositories.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-08 7:30 ` Jeff King
@ 2009-08-08 7:40 ` Nick Edelen
0 siblings, 0 replies; 34+ messages in thread
From: Nick Edelen @ 2009-08-08 7:40 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, Nicolas Pitre, Johannes Schindelin, Sam Vilain,
Michael J Gruber, Shawn O. Pearce, Andreas Ericsson,
Christian Couder, git
> That might be useful, but I was thinking more of an "alternates"-like
> mechanism between repos. So that the data is stored only once on disk
> and in the disk cache, which is helpful for sites like kernel.org which
> serve many similar repositories.
Oh, right. Yes, that seems like it could work. We'd have to be
careful that a shared cache slice wouldn' change (like in a fuse or
something), but other than that we could have something as simple as a
link.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-07 14:18 ` Nicolas Pitre
@ 2009-08-08 15:18 ` Johannes Schindelin
2009-08-08 16:07 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Johannes Schindelin @ 2009-08-08 15:18 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Sam Vilain, Nick Edelen, Michael J Gruber, Junio C Hamano,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
Hi,
On Fri, 7 Aug 2009, Nicolas Pitre wrote:
> On Fri, 7 Aug 2009, Johannes Schindelin wrote:
>
> > Hi,
> >
> > On Fri, 7 Aug 2009, Nicolas Pitre wrote:
> >
> > > On Fri, 7 Aug 2009, Sam Vilain wrote:
> > >
> > > > Johannes Schindelin wrote:
> > > > >> the short answer is that cache slices are totally independant of
> > > > >> pack files.
> > > > >>
> > > > >
> > > > > My idea with that was that you already have a SHA-1 map in the pack
> > > > > index, and if all you want to be able to accelerate the revision
> > > > > walker, you'd probably need something that adds yet another mapping,
> > > > > from commit to parents and tree, and from tree to sub-tree and blob
> > > > > (so you can avoid unpacking commit and tree objects).
> > > > >
> > > >
> > > > Tying indexes together like that is not a good idea in the database
> > > > world. Especially as in this case as Nick mentions, the domain is
> > > > subtly different (ie pack vs dag). Unfortunately you just can't try to
> > > > pretend that they will always be the same; you can't force a full
> > > > repack on every ref change!
> > >
> > > Right. And the rev cache must work even if the repository is not
> > > packed.
> >
> > Umm, why? AFAICT the principal purpose of the rev cache is to help work
> > loads on, say, www.kernel.org.
>
> So what?
>
> Speeding up rev-list with a rev cache is completely orthogonal to
> whether the repository is packed or not.
No, it is not.
For both technical and practical reasons, caching revision walker data is
very closely related to packing.
You are _very_ unlikely helped by speeding up revision walking in the
general case, _especially_ when you do stuff like blame or -S that needs
to unpack tons of objects _anyway_.
The one big kicker argument for speeding up revision walking _is_ to
relieve the loads on big ass servers, and they _should_ be as packed as
possible (as I will patiently explain over and over again).
> It is like having a "git diff" result cache: no one would think of
> stuffing that in the pack index.
Do you want to try to kid me? You'll have to try harder. Caching "git
diff" results... no, really!
> If we want to improve on the repository packing format, that must be
> doable without bothering with an independent concept such as a rev
> cache.
I would love to tell you that you're right, but the single fact that
pack v4 is startig to compete with Duke Nukem Forever just prevents me
from doing that.
> > I am unlikely to notice the improvements in my regular "git log" calls
> > that only show a couple of pages before I quit the pager.
>
> Indeed. But what is your point again?
Oh? My point? Being that the rev cache has a certain target audience,
and that the regular user is not part of that audience, and that it just
so happens that the _technical_ similarities with the pack index can be
exploited in those scenarios?
IOW we can be pretty certain that a heavy-load server has a fully (or
next-to-fully) packed object database. The pack indices already contain a
SHA-1 table that we can simply reuse. And it should not be hard (or
fragile) at all to put the "cached" information about parents,
referenced tree and blob objects into that file, into a different section.
After all, the parents, referenced tree and blob objects to change as
often as the objects in the pack: never.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-08 15:18 ` Johannes Schindelin
@ 2009-08-08 16:07 ` Junio C Hamano
2009-08-08 23:54 ` Sam Vilain
2009-08-09 2:37 ` Nicolas Pitre
2 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2009-08-08 16:07 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Nicolas Pitre, Sam Vilain, Nick Edelen, Michael J Gruber,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> It is like having a "git diff" result cache: no one would think of
>> stuffing that in the pack index.
>
> Do you want to try to kid me? You'll have to try harder. Caching "git
> diff" results... no, really!
I thought Nico meant caching the rename similarity matrix when he said
"git diff" cache. In a narrow context of "log --follow" it may make
sense. It also might help "blame" without -C.
But I do not see how we would gain anything if we had that cache tied to
the pack index.
> After all, the parents, referenced tree and blob objects to change as
> often as the objects in the pack: never.
But I notice that the aspects of objects you listed: the parents,
referenced tree and blob objects. The frequency of them changing does not
depend on where the actual object is, either packed or loose. These
aspects of an object (more specifically, you are talking about a commit
object) never change either way. So I am somewhat puzzled.
It does change if a particular commit and its associated objects are
relevant to the traversal as refs change (especially when they rewind).
Just like an old "kept" pack can suddenly have tons of irrelevant objects
after refs are pruned (e.g. a branch is dropped), cached reachability
data, even though they may stay correct, would become irrelevant when
nobody starts traversing from an object that is no longer reachable.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-06 19:06 ` Johannes Schindelin
2009-08-06 20:01 ` Nick Edelen
2009-08-07 2:47 ` Sam Vilain
@ 2009-08-08 18:57 ` Junio C Hamano
2 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2009-08-08 18:57 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Nick Edelen, Michael J Gruber, Junio C Hamano, Jeff King,
Sam Vilain, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> My idea with that was that you already have a SHA-1 map in the pack index,
> and if all you want to be able to accelerate the revision walker, you'd
> probably need something that adds yet another mapping, from commit to
> parents and tree, and from tree to sub-tree and blob (so you can avoid
> unpacking commit and tree objects).
>
> I just thought that it could be more efficient to do it at the time the
> pack index is written _anyway_, as nothing will change in the pack after
> that anyway.
After reading the version 2 of the "documentation" patch and commenting
heavily on it, I partly share the same feeling with you. The codepath to
pack objects is _one of the places_ you can generate rev-cache and slice
information without redoing a lot of work that has already been done
anyway.
But
- You can write that information separately out to a different file.
Logically it does not have to be _in_ the same pack idx file; and
- You may want to generate rev-cache information even if you do not pack
the repository. They may practically go hand-in-hand, but logically
they are orthogonal.
And I am not sure if it is easy to retrofit "rev-list | pack-objects" code
to additionally produce this information, while keeping the standalone
version of rev-cache generation.
Having said all that.
I haven't read the side of the patch that _uses_ the information stored in
the rev-cache to figure out what it optimizes and what its limitations are
(e.g. how it interacts with pathspecs). Perhaps the rev-cache may turn
out to be _only_ useful for pack-objects and nothing else, in which case
we may not care about standalone version of rev-cache generator after all.
If that is the case, I think it is also a reasonable implementation if the
rev-cache is generated only by "rev-list | pack-objects" codepath as a
side effect of traversal it already does, and it might even make sense to
introduce the version 3 of pack idx format that let you record additional
information, like you suggest. I am not ready to make that judgement as I
haven't read the rest, but my gut feeling tells me that you might be
right.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-08 15:18 ` Johannes Schindelin
2009-08-08 16:07 ` Junio C Hamano
@ 2009-08-08 23:54 ` Sam Vilain
2009-08-09 2:37 ` Nicolas Pitre
2 siblings, 0 replies; 34+ messages in thread
From: Sam Vilain @ 2009-08-08 23:54 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Nicolas Pitre, Nick Edelen, Michael J Gruber, Junio C Hamano,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
On Sat, 2009-08-08 at 17:18 +0200, Johannes Schindelin wrote:
> > Speeding up rev-list with a rev cache is completely orthogonal to
> > whether the repository is packed or not.
>
> No, it is not.
>
> For both technical and practical reasons, caching revision walker data is
> very closely related to packing.
> [...]
> ... the rev cache has a certain target audience,
> and that the regular user is not part of that audience, and that it just
> so happens that the _technical_ similarities with the pack index can be
> exploited in those scenarios?
>
> IOW we can be pretty certain that a heavy-load server has a fully (or
> next-to-fully) packed object database. The pack indices already contain a
> SHA-1 table that we can simply reuse. And it should not be hard (or
> fragile) at all to put the "cached" information about parents,
> referenced tree and blob objects into that file, into a different section.
I think your argument would work better if packs and bundles were the
same thing, and we always stored bundles in the objects/packs directory,
but they're not and we don't. You can't assume that a pack has any
particular properties, such as representing the objects returned from a
single rev-cache walk. And I will say that *especially* on a busy git
server, serving active projects you can't expect people to repack their
repository for every single update. Repacking daily or so by a batch
job, sure. Expecting the repository to always be fully packed? No.
Too much churn, or inefficient packing. You can't just pretend that the
mixed packed/loose case doesn't exist.
The 10% size seems a very good bang for your buck to me and a good
start.
Sam
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-08 15:18 ` Johannes Schindelin
2009-08-08 16:07 ` Junio C Hamano
2009-08-08 23:54 ` Sam Vilain
@ 2009-08-09 2:37 ` Nicolas Pitre
2009-08-09 13:42 ` Nick Edelen
2 siblings, 1 reply; 34+ messages in thread
From: Nicolas Pitre @ 2009-08-09 2:37 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Sam Vilain, Nick Edelen, Michael J Gruber, Junio C Hamano,
Jeff King, Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git
On Sat, 8 Aug 2009, Johannes Schindelin wrote:
> Hi,
>
> On Fri, 7 Aug 2009, Nicolas Pitre wrote:
>
> > On Fri, 7 Aug 2009, Johannes Schindelin wrote:
> >
> > > Hi,
> > >
> > > On Fri, 7 Aug 2009, Nicolas Pitre wrote:
> > >
> > > > On Fri, 7 Aug 2009, Sam Vilain wrote:
> > > >
> > > > > Johannes Schindelin wrote:
> > > > > >> the short answer is that cache slices are totally independant of
> > > > > >> pack files.
> > > > > >>
> > > > > >
> > > > > > My idea with that was that you already have a SHA-1 map in the pack
> > > > > > index, and if all you want to be able to accelerate the revision
> > > > > > walker, you'd probably need something that adds yet another mapping,
> > > > > > from commit to parents and tree, and from tree to sub-tree and blob
> > > > > > (so you can avoid unpacking commit and tree objects).
> > > > > >
> > > > >
> > > > > Tying indexes together like that is not a good idea in the database
> > > > > world. Especially as in this case as Nick mentions, the domain is
> > > > > subtly different (ie pack vs dag). Unfortunately you just can't try to
> > > > > pretend that they will always be the same; you can't force a full
> > > > > repack on every ref change!
> > > >
> > > > Right. And the rev cache must work even if the repository is not
> > > > packed.
> > >
> > > Umm, why? AFAICT the principal purpose of the rev cache is to help work
> > > loads on, say, www.kernel.org.
> >
> > So what?
> >
> > Speeding up rev-list with a rev cache is completely orthogonal to
> > whether the repository is packed or not.
>
> No, it is not.
>
> For both technical and practical reasons, caching revision walker data is
> very closely related to packing.
No it is not.
> You are _very_ unlikely helped by speeding up revision walking in the
> general case, _especially_ when you do stuff like blame or -S that needs
> to unpack tons of objects _anyway_.
I completely agree. And this is why I wish for the rev cache to be
enabled with a config variable. Why? Because "client" repositories are
unlikely to benefit as much as "server" repositories from this cache.
> The one big kicker argument for speeding up revision walking _is_ to
> relieve the loads on big ass servers, and they _should_ be as packed as
> possible (as I will patiently explain over and over again).
Please do a shortlog on builtin-pack-objects.c and realize who spent a
lot of energy making git repository packing what it is now.
Then re-read what I said on this list for more than a year now about the
remaining latency on large git clone operations. After explaining it a
couple times already I will need some of your patience to make people
understand that fully packing your repo is _not_ going to help it as
we're always talking about fully packed repos to start with.
> > It is like having a "git diff" result cache: no one would think of
> > stuffing that in the pack index.
>
> Do you want to try to kid me? You'll have to try harder. Caching "git
> diff" results... no, really!
Why not? You worked on git diff yourself, so you certainly are well
positionned to appreciate my suggestion, no?
> > If we want to improve on the repository packing format, that must be
> > doable without bothering with an independent concept such as a rev
> > cache.
>
> I would love to tell you that you're right, but the single fact that
> pack v4 is startig to compete with Duke Nukem Forever just prevents me
> from doing that.
Because of the current economy, I was waiting to be laid off so to have
the time to make pack v4 a reality. Unfortunately for git they didn't
fire me yet. Oh well...
> > > I am unlikely to notice the improvements in my regular "git log" calls
> > > that only show a couple of pages before I quit the pager.
> >
> > Indeed. But what is your point again?
>
> Oh? My point? Being that the rev cache has a certain target audience,
> and that the regular user is not part of that audience, and that it just
> so happens that the _technical_ similarities with the pack index can be
> exploited in those scenarios?
I don't see a similarity with the pack index at all, certainly not a
technical one.
> IOW we can be pretty certain that a heavy-load server has a fully (or
> next-to-fully) packed object database. The pack indices already contain a
> SHA-1 table that we can simply reuse. And it should not be hard (or
> fragile) at all to put the "cached" information about parents,
> referenced tree and blob objects into that file, into a different section.
And then someone does a few pushes. So most of the time your repository
contains a few packs and not only a single one. So in which pack index
files should you put the rev cache? What do yo do with that cache if it
happens to be split across multiple pack index files when a repack is
performed? Can't you see all the disadvantages to tie rev cache data
which happens to share no issue with repacking into the same file?
So what do you do to keep the code simple and maintainable? Yes, you
abstract things and use separate files, thank you very much. After all,
the packed refs are not stored in the pack index file either even if the
packed-refs file contains a list of SHA1's.
Nicolas
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking
2009-08-09 2:37 ` Nicolas Pitre
@ 2009-08-09 13:42 ` Nick Edelen
0 siblings, 0 replies; 34+ messages in thread
From: Nick Edelen @ 2009-08-09 13:42 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Johannes Schindelin, Sam Vilain, Michael J Gruber,
Junio C Hamano, Jeff King, Shawn O. Pearce, Andreas Ericsson,
Christian Couder, git
I can see the logic behind Johannes's ideas, but I'm still not sure
it'd be a great modification. If you wanted to associate revision
caching much more strongly with packs, then packs and slices could be
merged reasonably well. Say you just attached the actual slice data
at the end of the pack, then stored offsets of the slice payload in
the pack index. Since you'd (presumably) have to search the index for
the object anyway, you wouldn't have to deal with searching a
rev-cache index on top of that (although it's not exactly unoptimized
now).
However, that would sorta be preemptively limiting rev-cache to
pack-related optimizations. I mean at the moment that's the main
target, but it could be improved in the future to be more relavant to
other operations as well. Leaving the rev-cache as a seperate system
would keep both it and packing much more flexible, and open to
longer-term developments.
>I haven't read the side of the patch that _uses_ the information stored in
>the rev-cache to figure out what it optimizes and what its limitations are
>(e.g. how it interacts with pathspecs). Perhaps the rev-cache may turn
>out to be _only_ useful for pack-objects and nothing else, in which case
>we may not care about standalone version of rev-cache generator after all.
rev-cache's cache slice traversal basically emulates git's revision
walker, on a smaller scale. At the moment it only really handles date
limiting (and obviously slop stuff) so it's not used for any pruning.
That's not to say it couldn't be updated in the future though.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2009-08-09 13:42 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-06 9:55 [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking Nick Edelen
2009-08-06 14:48 ` Johannes Schindelin
2009-08-06 14:58 ` Michael J Gruber
2009-08-06 17:39 ` Nick Edelen
2009-08-06 19:06 ` Johannes Schindelin
2009-08-06 20:01 ` Nick Edelen
2009-08-06 20:30 ` Nick Edelen
2009-08-06 20:32 ` Shawn O. Pearce
2009-08-06 23:35 ` A Large Angry SCM
2009-08-06 23:37 ` Shawn O. Pearce
2009-08-06 23:43 ` A Large Angry SCM
2009-08-07 0:15 ` Nick Edelen
2009-08-07 6:05 ` Johannes Schindelin
2009-08-07 4:42 ` Nicolas Pitre
2009-08-07 2:47 ` Sam Vilain
2009-08-07 4:35 ` Nicolas Pitre
2009-08-07 6:08 ` Johannes Schindelin
2009-08-07 14:18 ` Nicolas Pitre
2009-08-08 15:18 ` Johannes Schindelin
2009-08-08 16:07 ` Junio C Hamano
2009-08-08 23:54 ` Sam Vilain
2009-08-09 2:37 ` Nicolas Pitre
2009-08-09 13:42 ` Nick Edelen
2009-08-07 6:12 ` Johannes Schindelin
2009-08-07 15:00 ` Nicolas Pitre
2009-08-07 22:02 ` Nick Edelen
2009-08-07 22:48 ` Junio C Hamano
2009-08-07 22:53 ` Nick Edelen
2009-08-08 3:11 ` Junio C Hamano
2009-08-08 7:27 ` Nick Edelen
2009-08-08 7:30 ` Jeff King
2009-08-08 7:40 ` Nick Edelen
2009-08-08 2:50 ` Jeff King
2009-08-08 18:57 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.