All of lore.kernel.org
 help / color / mirror / Atom feed
* A system administration use case for git
@ 2009-04-22  8:33 Jon Seymour
  2009-04-22  8:48 ` Martin Langhoff
  2009-04-23  9:55 ` Uwe Kleine-König
  0 siblings, 2 replies; 8+ messages in thread
From: Jon Seymour @ 2009-04-22  8:33 UTC (permalink / raw)
  To: Git Mailing List

Hello all, it's been quite a while.

And, no, I don't care that you ripped out git-rev-list --merge-order,
it was fun while it lasted and it was still a cool  (if under
appreciated)  algorithm :-)

I've got a system administration use case that I know git does 99.9%
of. I am wondering if the last 0.1%.

It'd be nice to do this on arbitrary (non-git-controlled) file system tree:

* calculate the git hashes of the tree (without making copies of the
files in the tree)
* archive the tree hashes
* rsync the tree hashes to another place
* work out which files aren't available in the other place's git repo
* rsync those files the the other place

Is there a way to easily achieve this with git's existing tool set or
a higher layer?

jon.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A system administration use case for git
  2009-04-22  8:33 A system administration use case for git Jon Seymour
@ 2009-04-22  8:48 ` Martin Langhoff
       [not found]   ` <2cfc40320904220208g5acc2200w6144668ba2da5a09@mail.gmail.com>
  2009-04-23  9:55 ` Uwe Kleine-König
  1 sibling, 1 reply; 8+ messages in thread
From: Martin Langhoff @ 2009-04-22  8:48 UTC (permalink / raw)
  To: Jon Seymour; +Cc: Git Mailing List

On Wed, Apr 22, 2009 at 10:33 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
> Hello all, it's been quite a while.
> It'd be nice to do this on arbitrary (non-git-controlled) file system tree:
>
> * calculate the git hashes of the tree (without making copies of the
> files in the tree)
> * archive the tree hashes
> * rsync the tree hashes to another place
> * work out which files aren't available in the other place's git repo
> * rsync those files the the other place

Steps:

1 - rsync to the "other place"
2 - use the git repo in that "other place"
3 - if the tree hashes are needed "here", copy them from the "other
place" git to here.

rsync + git = awesomenesssssss




-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* A system administration use case for git
       [not found]   ` <2cfc40320904220208g5acc2200w6144668ba2da5a09@mail.gmail.com>
@ 2009-04-22  9:22     ` Jon Seymour
  2009-04-22 10:07       ` Martin Langhoff
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Seymour @ 2009-04-22  9:22 UTC (permalink / raw)
  To: Git Mailing List

Martin,

One disadvantage of that approach is if the file system is very large
and only has a few deltas, then I effectively have to have two copies
of the reference file system - one in the GIT repo and one that I can
damage on a regular basis with rsync for the purposes calculating the
git deltas.   I know I can then use git to repair the damage, but then
the reference has to be protected from concurrent access.

I'd prefer not to have to maintain a copy of the tree that has to be
"damaged" on a regular basis and also if I didn't have to maintain a
copy of the git objects for identical files.

In an ideal world, storage requirements at the other place would be
those of the reference file system + those of the various deltas, but
no more.

jon seymour

On Wed, Apr 22, 2009 at 6:48 PM, Martin Langhoff
<martin.langhoff@gmail.com> wrote:
> On Wed, Apr 22, 2009 at 10:33 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
>> Hello all, it's been quite a while.
>> It'd be nice to do this on arbitrary (non-git-controlled) file system tree:
>>
>> * calculate the git hashes of the tree (without making copies of the
>> files in the tree)
>> * archive the tree hashes
>> * rsync the tree hashes to another place
>> * work out which files aren't available in the other place's git repo
>> * rsync those files the the other place
>
> Steps:
>
> 1 - rsync to the "other place"
> 2 - use the git repo in that "other place"
> 3 - if the tree hashes are needed "here", copy them from the "other
> place" git to here.
>
> rsync + git = awesomenesssssss
>
>
>
>
> --
>  martin.langhoff@gmail.com
>  martin@laptop.org -- School Server Architect
>  - ask interesting questions
>  - don't get distracted with shiny stuff  - working code first
>  - http://wiki.laptop.org/go/User:Martinlanghoff
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A system administration use case for git
  2009-04-22  9:22     ` Jon Seymour
@ 2009-04-22 10:07       ` Martin Langhoff
  2009-04-22 10:18         ` Jon Seymour
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Langhoff @ 2009-04-22 10:07 UTC (permalink / raw)
  To: Jon Seymour; +Cc: Git Mailing List

On Wed, Apr 22, 2009 at 11:22 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
> One disadvantage of that approach is if the file system is very large
> and only has a few deltas, then I effectively have to have two copies
> of the reference file system - one in the GIT repo and one that I can

You could minimise the on-disk footprint -- and protect it from
concurrent access (concurrent change) by using a hardlinked tree on
the destination side. rsync knows to break hardlinks, etc.

Currently, you can't "rsync into git" which would save you that step.
It's a ton of work to do that -- if anyone is planning on working on
something like that, perhaps writing directly into the fast-import
protocol is a good shortcut.

I'd like to have something like that for my OLPC School Server, which
could benefit from using git as the backup backend -- it currently
uses hardlinked directories.

> In an ideal world, storage requirements at the other place would be
> those of the reference file system + those of the various deltas, but
> no more.

rsync + hardlinked trees + git gets you quite close to that.

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A system administration use case for git
  2009-04-22 10:07       ` Martin Langhoff
@ 2009-04-22 10:18         ` Jon Seymour
  0 siblings, 0 replies; 8+ messages in thread
From: Jon Seymour @ 2009-04-22 10:18 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Git Mailing List

Martin,

Thanks for the info about hard-linked trees...they may well do exactly
what I need - thank you!

jon.

On Wed, Apr 22, 2009 at 8:07 PM, Martin Langhoff
<martin.langhoff@gmail.com> wrote:
> On Wed, Apr 22, 2009 at 11:22 AM, Jon Seymour <jon.seymour@gmail.com> wrote:
>> One disadvantage of that approach is if the file system is very large
>> and only has a few deltas, then I effectively have to have two copies
>> of the reference file system - one in the GIT repo and one that I can
>
> You could minimise the on-disk footprint -- and protect it from
> concurrent access (concurrent change) by using a hardlinked tree on
> the destination side. rsync knows to break hardlinks, etc.
>
> Currently, you can't "rsync into git" which would save you that step.
> It's a ton of work to do that -- if anyone is planning on working on
> something like that, perhaps writing directly into the fast-import
> protocol is a good shortcut.
>
> I'd like to have something like that for my OLPC School Server, which
> could benefit from using git as the backup backend -- it currently
> uses hardlinked directories.
>
>> In an ideal world, storage requirements at the other place would be
>> those of the reference file system + those of the various deltas, but
>> no more.
>
> rsync + hardlinked trees + git gets you quite close to that.
>
> cheers,
>
>
>
> m
> --
>  martin.langhoff@gmail.com
>  martin@laptop.org -- School Server Architect
>  - ask interesting questions
>  - don't get distracted with shiny stuff  - working code first
>  - http://wiki.laptop.org/go/User:Martinlanghoff
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A system administration use case for git
  2009-04-22  8:33 A system administration use case for git Jon Seymour
  2009-04-22  8:48 ` Martin Langhoff
@ 2009-04-23  9:55 ` Uwe Kleine-König
  2009-04-23 10:38   ` Johannes Sixt
  1 sibling, 1 reply; 8+ messages in thread
From: Uwe Kleine-König @ 2009-04-23  9:55 UTC (permalink / raw)
  To: Jon Seymour; +Cc: Git Mailing List

Hello Jon,

On Wed, Apr 22, 2009 at 06:33:12PM +1000, Jon Seymour wrote:
> * calculate the git hashes of the tree (without making copies of the
> files in the tree)
You can hand-craft a tree object:

	for f in $filelist; do
		printf "100644 %s\x00" $f;

		git hash-object $f |
			perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }';
	done |
		git hash-object -t tree --stdin


You get some extra points for directories that include subdirs :-)

There is a problem though, I think it has to do with the order of the
files in the tree object.  Here is a real-life example from the Linux
kernel (I choosed the usr subdirectory, because it's small and doesn't
contain subdirs):

	ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in .gitignore Kconfig Makefile gen_init_cpio.c initramfs_data.S initramfs_data.bz2.S initramfs_data.gz.S initramfs_data.lzma.S; do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree --stdin
	64f2ca854bc19fd29479b198be11beba56a26b1e

	ukleinek@cepheus:~/gsrc/linux-2.6/usr$ git rev-parse HEAD:usr
	64f2ca854bc19fd29479b198be11beba56a26b1e

There is a practical problem though:  The filelist has to be sorted in a
way that is not provided by ls, so:

	ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in $(ls -A); do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree -w --stdin
	a0a6efb3f1de956badc7607c7d372cc325a18846

	ukleinek@cepheus:~/gsrc/linux-2.6/usr$ git ls-tree a0a6efb3f1de956badc7607c7d372cc325a18846 | wc -l
	0

doesn't work :-/

This would make a nice plumbing:

	git hash-tree $directory

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A system administration use case for git
  2009-04-23  9:55 ` Uwe Kleine-König
@ 2009-04-23 10:38   ` Johannes Sixt
  2009-04-23 11:39     ` Uwe Kleine-König
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Sixt @ 2009-04-23 10:38 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Jon Seymour, Git Mailing List

Uwe Kleine-König schrieb:
> There is a practical problem though:  The filelist has to be sorted in a
> way that is not provided by ls, so:
> 
> 	ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in $(ls -A); do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree -w --stdin
> 	a0a6efb3f1de956badc7607c7d372cc325a18846

Does ... $(LANG=C ls -A) ... make a difference for you?

But note that this is still not the whole story because, IIUC, in tree
objects directories are sorted as if they had a slash appended; i.e.
directory "foo" is sorted *after* file "foo.c", but *before* file "foo0".

-- Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A system administration use case for git
  2009-04-23 10:38   ` Johannes Sixt
@ 2009-04-23 11:39     ` Uwe Kleine-König
  0 siblings, 0 replies; 8+ messages in thread
From: Uwe Kleine-König @ 2009-04-23 11:39 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jon Seymour, Git Mailing List

Hi Hannes,

On Thu, Apr 23, 2009 at 12:38:09PM +0200, Johannes Sixt wrote:
> Uwe Kleine-König schrieb:
> > There is a practical problem though:  The filelist has to be sorted in a
> > way that is not provided by ls, so:
> > 
> > 	ukleinek@cepheus:~/gsrc/linux-2.6/usr$ for f in $(ls -A); do printf "100644 %s\x00" $f; git hash-object $f | perl -n -e 'chomp; for $c (split(/(.{2})/)) { printf("%c", hex($c)) if $c }'; done | git hash-object -t tree -w --stdin
> > 	a0a6efb3f1de956badc7607c7d372cc325a18846
> 
> Does ... $(LANG=C ls -A) ... make a difference for you?
oh, up to now I thought that C and en_US.UTF-8 use the same sorting.  So
yes, it does it right then.

Best regards and thanks
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-04-23 11:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-22  8:33 A system administration use case for git Jon Seymour
2009-04-22  8:48 ` Martin Langhoff
     [not found]   ` <2cfc40320904220208g5acc2200w6144668ba2da5a09@mail.gmail.com>
2009-04-22  9:22     ` Jon Seymour
2009-04-22 10:07       ` Martin Langhoff
2009-04-22 10:18         ` Jon Seymour
2009-04-23  9:55 ` Uwe Kleine-König
2009-04-23 10:38   ` Johannes Sixt
2009-04-23 11:39     ` Uwe Kleine-König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.