* A system administration use case for git
@ 2009-04-22 8:33 Jon Seymour
2009-04-22 8:48 ` Martin Langhoff
2009-04-23 9:55 ` Uwe Kleine-König
0 siblings, 2 replies; 8+ messages in thread
From: Jon Seymour @ 2009-04-22 8:33 UTC (permalink / raw)
To: Git Mailing List
Hello all, it's been quite a while.
And, no, I don't care that you ripped out git-rev-list --merge-order,
it was fun while it lasted and it was still a cool (if under
appreciated) algorithm :-)
I've got a system administration use case that I know git does 99.9%
of. I am wondering if the last 0.1%.
It'd be nice to do this on arbitrary (non-git-controlled) file system tree:
* calculate the git hashes of the tree (without making copies of the
files in the tree)
* archive the tree hashes
* rsync the tree hashes to another place
* work out which files aren't available in the other place's git repo
* rsync those files the the other place
Is there a way to easily achieve this with git's existing tool set or
a higher layer?
jon.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A system administration use case for git
2009-04-22 8:33 A system administration use case for git Jon Seymour
@ 2009-04-22 8:48 ` Martin Langhoff
[not found] ` <2cfc40320904220208g5acc2200w6144668ba2da5a09@mail.gmail.com>
2009-04-23 9:55 ` Uwe Kleine-König
1 sibling, 1 reply; 8+ messages in thread
From: Martin Langhoff @ 2009-04-22 8:48 UTC (permalink / raw)
To: Jon Seymour; +Cc: Git Mailing List
On Wed, Apr 22, 2009 at 10:33 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
> Hello all, it's been quite a while.
> It'd be nice to do this on arbitrary (non-git-controlled) file system tree:
>
> * calculate the git hashes of the tree (without making copies of the
> files in the tree)
> * archive the tree hashes
> * rsync the tree hashes to another place
> * work out which files aren't available in the other place's git repo
> * rsync those files the the other place
Steps:
1 - rsync to the "other place"
2 - use the git repo in that "other place"
3 - if the tree hashes are needed "here", copy them from the "other
place" git to here.
rsync + git = awesomenesssssss
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 8+ messages in thread
* A system administration use case for git
[not found] ` <2cfc40320904220208g5acc2200w6144668ba2da5a09@mail.gmail.com>
@ 2009-04-22 9:22 ` Jon Seymour
2009-04-22 10:07 ` Martin Langhoff
0 siblings, 1 reply; 8+ messages in thread
From: Jon Seymour @ 2009-04-22 9:22 UTC (permalink / raw)
To: Git Mailing List
Martin,
One disadvantage of that approach is if the file system is very large
and only has a few deltas, then I effectively have to have two copies
of the reference file system - one in the GIT repo and one that I can
damage on a regular basis with rsync for the purposes calculating the
git deltas. I know I can then use git to repair the damage, but then
the reference has to be protected from concurrent access.
I'd prefer not to have to maintain a copy of the tree that has to be
"damaged" on a regular basis and also if I didn't have to maintain a
copy of the git objects for identical files.
In an ideal world, storage requirements at the other place would be
those of the reference file system + those of the various deltas, but
no more.
jon seymour
On Wed, Apr 22, 2009 at 6:48 PM, Martin Langhoff
<martin.langhoff@gmail.com> wrote:
> On Wed, Apr 22, 2009 at 10:33 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
>> Hello all, it's been quite a while.
>> It'd be nice to do this on arbitrary (non-git-controlled) file system tree:
>>
>> * calculate the git hashes of the tree (without making copies of the
>> files in the tree)
>> * archive the tree hashes
>> * rsync the tree hashes to another place
>> * work out which files aren't available in the other place's git repo
>> * rsync those files the the other place
>
> Steps:
>
> 1 - rsync to the "other place"
> 2 - use the git repo in that "other place"
> 3 - if the tree hashes are needed "here", copy them from the "other
> place" git to here.
>
> rsync + git = awesomenesssssss
>
>
>
>
> --
> martin.langhoff@gmail.com
> martin@laptop.org -- School Server Architect
> - ask interesting questions
> - don't get distracted with shiny stuff - working code first
> - http://wiki.laptop.org/go/User:Martinlanghoff
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A system administration use case for git
2009-04-22 9:22 ` Jon Seymour
@ 2009-04-22 10:07 ` Martin Langhoff
2009-04-22 10:18 ` Jon Seymour
0 siblings, 1 reply; 8+ messages in thread
From: Martin Langhoff @ 2009-04-22 10:07 UTC (permalink / raw)
To: Jon Seymour; +Cc: Git Mailing List
On Wed, Apr 22, 2009 at 11:22 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
> One disadvantage of that approach is if the file system is very large
> and only has a few deltas, then I effectively have to have two copies
> of the reference file system - one in the GIT repo and one that I can
You could minimise the on-disk footprint -- and protect it from
concurrent access (concurrent change) by using a hardlinked tree on
the destination side. rsync knows to break hardlinks, etc.
Currently, you can't "rsync into git" which would save you that step.
It's a ton of work to do that -- if anyone is planning on working on
something like that, perhaps writing directly into the fast-import
protocol is a good shortcut.
I'd like to have something like that for my OLPC School Server, which
could benefit from using git as the backup backend -- it currently
uses hardlinked directories.
> In an ideal world, storage requirements at the other place would be
> those of the reference file system + those of the various deltas, but
> no more.
rsync + hardlinked trees + git gets you quite close to that.
cheers,
m
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A system administration use case for git
2009-04-22 10:07 ` Martin Langhoff
@ 2009-04-22 10:18 ` Jon Seymour
0 siblings, 0 replies; 8+ messages in thread
From: Jon Seymour @ 2009-04-22 10:18 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Git Mailing List
Martin,
Thanks for the info about hard-linked trees...they may well do exactly
what I need - thank you!
jon.
On Wed, Apr 22, 2009 at 8:07 PM, Martin Langhoff
<martin.langhoff@gmail.com> wrote:
> On Wed, Apr 22, 2009 at 11:22 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
>> One disadvantage of that approach is if the file system is very large
>> and only has a few deltas, then I effectively have to have two copies
>> of the reference file system - one in the GIT repo and one that I can
>
> You could minimise the on-disk footprint -- and protect it from
> concurrent access (concurrent change) by using a hardlinked tree on
> the destination side. rsync knows to break hardlinks, etc.
>
> Currently, you can't "rsync into git" which would save you that step.
> It's a ton of work to do that -- if anyone is planning on working on
> something like that, perhaps writing directly into the fast-import
> protocol is a good shortcut.
>
> I'd like to have something like that for my OLPC School Server, which
> could benefit from using git as the backup backend -- it currently
> uses hardlinked directories.
>
>> In an ideal world, storage requirements at the other place would be
>> those of the reference file system + those of the various deltas, but
>> no more.
>
> rsync + hardlinked trees + git gets you quite close to that.
>
> cheers,
>
>
>
> m
> --
> martin.langhoff@gmail.com
> martin@laptop.org -- School Server Architect
> - ask interesting questions
> - don't get distracted with shiny stuff - working code first
> - http://wiki.laptop.org/go/User:Martinlanghoff
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A system administration use case for git
2009-04-22 8:33 A system administration use case for git Jon Seymour
2009-04-22 8:48 ` Martin Langhoff
@ 2009-04-23 9:55 ` Uwe Kleine-König
2009-04-23 10:38 ` Johannes Sixt
1 sibling, 1 reply; 8+ messages in thread
From: Uwe Kleine-König @ 2009-04-23 9:55 UTC (permalink / raw)
To: Jon Seymour; +Cc: Git Mailing List
Hello Jon,
On Wed, Apr 22, 2009 at 06:33:12PM +1000, Jon Seymour wrote:
> * calculate the git hashes of the tree (without making copies of the
> files in the tree)
You can hand-craft a tree object:
for f in $filelist; do
printf "100644 %s\x00" $f;
git hash-object $f |
perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }';
done |
git hash-object -t tree --stdin
You get some extra points for directories that include subdirs :-)
There is a problem though, I think it has to do with the order of the
files in the tree object. Here is a real-life example from the Linux
kernel (I choosed the usr subdirectory, because it's small and doesn't
contain subdirs):
ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in .gitignore Kconfig Makefile gen_init_cpio.c initramfs_data.S initramfs_data.bz2.S initramfs_data.gz.S initramfs_data.lzma.S; do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree --stdin
64f2ca854bc19fd29479b198be11beba56a26b1e
ukleinek@cepheus:~/gsrc/linux-2.6/usr$ git rev-parse HEAD:usr
64f2ca854bc19fd29479b198be11beba56a26b1e
There is a practical problem though: The filelist has to be sorted in a
way that is not provided by ls, so:
ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in $(ls -A); do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree -w --stdin
a0a6efb3f1de956badc7607c7d372cc325a18846
ukleinek@cepheus:~/gsrc/linux-2.6/usr$ git ls-tree a0a6efb3f1de956badc7607c7d372cc325a18846 | wc -l
0
doesn't work :-/
This would make a nice plumbing:
git hash-tree $directory
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A system administration use case for git
2009-04-23 9:55 ` Uwe Kleine-König
@ 2009-04-23 10:38 ` Johannes Sixt
2009-04-23 11:39 ` Uwe Kleine-König
0 siblings, 1 reply; 8+ messages in thread
From: Johannes Sixt @ 2009-04-23 10:38 UTC (permalink / raw)
To: Uwe Kleine-König; +Cc: Jon Seymour, Git Mailing List
Uwe Kleine-König schrieb:
> There is a practical problem though: The filelist has to be sorted in a
> way that is not provided by ls, so:
>
> ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in $(ls -A); do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree -w --stdin
> a0a6efb3f1de956badc7607c7d372cc325a18846
Does ... $(LANG=C ls -A) ... make a difference for you?
But note that this is still not the whole story because, IIUC, in tree
objects directories are sorted as if they had a slash appended; i.e.
directory "foo" is sorted *after* file "foo.c", but *before* file "foo0".
-- Hannes
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A system administration use case for git
2009-04-23 10:38 ` Johannes Sixt
@ 2009-04-23 11:39 ` Uwe Kleine-König
0 siblings, 0 replies; 8+ messages in thread
From: Uwe Kleine-König @ 2009-04-23 11:39 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Jon Seymour, Git Mailing List
Hi Hannes,
On Thu, Apr 23, 2009 at 12:38:09PM +0200, Johannes Sixt wrote:
> Uwe Kleine-König schrieb:
> > There is a practical problem though: The filelist has to be sorted in a
> > way that is not provided by ls, so:
> >
> > ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in $(ls -A); do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree -w --stdin
> > a0a6efb3f1de956badc7607c7d372cc325a18846
>
> Does ... $(LANG=C ls -A) ... make a difference for you?
oh, up to now I thought that C and en_US.UTF-8 use the same sorting. So
yes, it does it right then.
Best regards and thanks
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-04-23 11:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-22 8:33 A system administration use case for git Jon Seymour
2009-04-22 8:48 ` Martin Langhoff
[not found] ` <2cfc40320904220208g5acc2200w6144668ba2da5a09@mail.gmail.com>
2009-04-22 9:22 ` Jon Seymour
2009-04-22 10:07 ` Martin Langhoff
2009-04-22 10:18 ` Jon Seymour
2009-04-23 9:55 ` Uwe Kleine-König
2009-04-23 10:38 ` Johannes Sixt
2009-04-23 11:39 ` Uwe Kleine-König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.