All of lore.kernel.org
 help / color / mirror / Atom feed
* Using git for code deployment on webservers?
@ 2009-06-15 23:11 Ingo Oeser
  2009-06-16  7:13 ` Allan Wind
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ingo Oeser @ 2009-06-15 23:11 UTC (permalink / raw)
  To: git; +Cc: Ingo Oeser

[please CC me, as I'm not subscribed]

Hi there,

I try to use git in a quite unusual way.

I have a bunch of servers (hundreds), which get regular pulls of web developer code.
The code consists of images, flash files, scripting language files, you name it.
An exported repo (just the files, no SCM metadata) contains up to 4GB of files.

No I want to distribute changes the developers made in a tree like structure:

main server --> slave_1 --> webserver_0815
            |-> slave_2 --> webserver_2342
                        |-> webserver_4711

But with the following contraints:
- Store as little as possible on the webservers.
  One selected revision/tag is enough.
- Transfer as little as possible data.
  Cancel out addition and deletion on the fly.
- Nearly atomic update of file tree (easy to implement outside git)

Nice to have:
- Instead of copying the files to their proper names, 
  hardlink them to their git objects.

At the moment I always get more data than I need and have to store
the repository AND the checked out data.

I couldn't find a way so far to get around this. Is this possible? 
Any ideas are welcome.

Many Thanks in Advance!

Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-15 23:11 Using git for code deployment on webservers? Ingo Oeser
@ 2009-06-16  7:13 ` Allan Wind
  2009-06-17 17:42   ` Ingo Oeser
  2009-06-16  8:01 ` Thomas Koch
  2009-06-16 17:49 ` Daniel Barkalow
  2 siblings, 1 reply; 10+ messages in thread
From: Allan Wind @ 2009-06-16  7:13 UTC (permalink / raw)
  To: git; +Cc: ioe-git

On 2009-06-16T01:11:47, Ingo Oeser wrote:
> - Transfer as little as possible data.
>   Cancel out addition and deletion on the fly.

I use `git diff` with the post-receive hook to distribute changes 
to my web server.  diff carries the previous content when you 
delete a file, and in my case this was large mpeg files defeating 
the purpose somewhat.

If you do not mind having a full repository on the web servers, 
then pushing changes might work better.  This appears to be what 
you are doing now though.

If I had to scale this I would probably build a master image 
(either locally or remotely) and use rsync to distribute the 
content instead of git.

> - Nearly atomic update of file tree (easy to implement outside git)

stow can be handy for this.


/Allan
-- 
Allan Wind
Life Integrity, LLC
http://lifeintegrity.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-15 23:11 Using git for code deployment on webservers? Ingo Oeser
  2009-06-16  7:13 ` Allan Wind
@ 2009-06-16  8:01 ` Thomas Koch
  2009-06-17 17:27   ` Ingo Oeser
  2009-06-16 17:49 ` Daniel Barkalow
  2 siblings, 1 reply; 10+ messages in thread
From: Thomas Koch @ 2009-06-16  8:01 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: git

Would it help, to share a read only GIT object store among all webservers via 
NFS?

Best regards, Thomas Koch

> [please CC me, as I'm not subscribed]
>
> Hi there,
>
> I try to use git in a quite unusual way.
>
> I have a bunch of servers (hundreds), which get regular pulls of web
> developer code. The code consists of images, flash files, scripting
> language files, you name it. An exported repo (just the files, no SCM
> metadata) contains up to 4GB of files.
>
> No I want to distribute changes the developers made in a tree like
> structure:
>
> main server --> slave_1 --> webserver_0815
>
>             |-> slave_2 --> webserver_2342
>             |
>                         |-> webserver_4711
>
> But with the following contraints:
> - Store as little as possible on the webservers.
>   One selected revision/tag is enough.
> - Transfer as little as possible data.
>   Cancel out addition and deletion on the fly.
> - Nearly atomic update of file tree (easy to implement outside git)
>
> Nice to have:
> - Instead of copying the files to their proper names,
>   hardlink them to their git objects.
>
> At the moment I always get more data than I need and have to store
> the repository AND the checked out data.
>
> I couldn't find a way so far to get around this. Is this possible?
> Any ideas are welcome.
>
> Many Thanks in Advance!
>
> Best Regards
>
> Ingo Oeser
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Koch, http://www.koch.ro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-15 23:11 Using git for code deployment on webservers? Ingo Oeser
  2009-06-16  7:13 ` Allan Wind
  2009-06-16  8:01 ` Thomas Koch
@ 2009-06-16 17:49 ` Daniel Barkalow
  2009-06-17 17:23   ` Ingo Oeser
  2 siblings, 1 reply; 10+ messages in thread
From: Daniel Barkalow @ 2009-06-16 17:49 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: git

On Tue, 16 Jun 2009, Ingo Oeser wrote:

> [please CC me, as I'm not subscribed]
> 
> Hi there,
> 
> I try to use git in a quite unusual way.
> 
> I have a bunch of servers (hundreds), which get regular pulls of web developer code.
> The code consists of images, flash files, scripting language files, you name it.
> An exported repo (just the files, no SCM metadata) contains up to 4GB of files.
> 
> No I want to distribute changes the developers made in a tree like structure:
> 
> main server --> slave_1 --> webserver_0815
>             |-> slave_2 --> webserver_2342
>                         |-> webserver_4711
> 
> But with the following contraints:
> - Store as little as possible on the webservers.
>   One selected revision/tag is enough.
> - Transfer as little as possible data.
>   Cancel out addition and deletion on the fly.
> - Nearly atomic update of file tree (easy to implement outside git)
> 
> Nice to have:
> - Instead of copying the files to their proper names, 
>   hardlink them to their git objects.
> 
> At the moment I always get more data than I need and have to store
> the repository AND the checked out data.

You should be able to have the slave repositories store tags for tree 
objects (instead of commit objects), and have the webservers fetch those. 
You'll still have the object database, but it will only contain stuff 
that's been deployed to that webserver, not intermediate versions or 
historical versions. You'll still have to store both the repo and the 
checked out data (but git stores the content delta-compressed against each 
other in one big file, normally, so there really aren't files to hard link 
to.

Of course, the other possibility is to check out versions on the slaves, 
and rsync that to the webservers, which is probably the optimal method if 
you're not in a situation where you benefit from anything git does in 
transit.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-16 17:49 ` Daniel Barkalow
@ 2009-06-17 17:23   ` Ingo Oeser
  2009-06-17 19:26     ` Daniel Barkalow
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Oeser @ 2009-06-17 17:23 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Ingo Oeser, git

Hi Daniel,

On Tuesday 16 June 2009, Daniel Barkalow wrote:
> You should be able to have the slave repositories store tags for tree 
> objects (instead of commit objects), and have the webservers fetch those. 
> You'll still have the object database, but it will only contain stuff 
> that's been deployed to that webserver, not intermediate versions or 
> historical versions.

Ah, that sound like a great solution. I'll try that.

> You'll still have to store both the repo and the checked out data 
> (but git stores the content delta-compressed against each 
> other in one big file, normally, so there really aren't files to hard link 
> to.

Ok. That was under the assumption, that the core of git is basically a 
content addressable file system. But that seems to be history :-)

> Of course, the other possibility is to check out versions on the slaves, 
> and rsync that to the webservers, which is probably the optimal method if 
> you're not in a situation where you benefit from anything git does in 
> transit.

I would benefit from noticing local changes. But simple rsync is what is tried now.
Problem is, we get no de-duplication from rsync, which git could do.

Many thanks for your suggestions!


Best Regards 

Ingo Oeser

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-16  8:01 ` Thomas Koch
@ 2009-06-17 17:27   ` Ingo Oeser
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Oeser @ 2009-06-17 17:27 UTC (permalink / raw)
  To: thomas; +Cc: git

Hi Thomas,

On Tuesday 16 June 2009, Thomas Koch wrote:
> Would it help, to share a read only GIT object store among all webservers via 
> NFS?

NFS on hundreds of web servers has severe scaling problems. That is by design and is solved
by alternative file systems or soon pNFS.

We tried such a setup already.


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-16  7:13 ` Allan Wind
@ 2009-06-17 17:42   ` Ingo Oeser
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Oeser @ 2009-06-17 17:42 UTC (permalink / raw)
  To: git, ioe-git

Hi Allan,

On Tuesday 16 June 2009, Allan Wind wrote:
> If you do not mind having a full repository on the web servers, 
> then pushing changes might work better.  This appears to be what 
> you are doing now though.

No, at the moment we have built our own version of a content addressable 
filesystem and are distributing changes to it. We have symlinks to real file names.

I just thought, that git can do sth. similiar with its core, 
before trying to solve a solved problem :-)

> If I had to scale this I would probably build a master image 
> (either locally or remotely) and use rsync to distribute the 
> content instead of git.

We do sth. similiar at the moment. De-duplication is important, because
web people copy lots of data for images and flash around when doing things.

> > - Nearly atomic update of file tree (easy to implement outside git)
> 
> stow can be handy for this.

Ah! Will have a look.

Many Thanks!


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-17 17:23   ` Ingo Oeser
@ 2009-06-17 19:26     ` Daniel Barkalow
  2009-06-17 20:26       ` Alex Riesen
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Barkalow @ 2009-06-17 19:26 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: git

On Wed, 17 Jun 2009, Ingo Oeser wrote:

> Hi Daniel,
> 
> On Tuesday 16 June 2009, Daniel Barkalow wrote:
> > You should be able to have the slave repositories store tags for tree 
> > objects (instead of commit objects), and have the webservers fetch those. 
> > You'll still have the object database, but it will only contain stuff 
> > that's been deployed to that webserver, not intermediate versions or 
> > historical versions.
> 
> Ah, that sound like a great solution. I'll try that.
> 
> > You'll still have to store both the repo and the checked out data 
> > (but git stores the content delta-compressed against each 
> > other in one big file, normally, so there really aren't files to hard link 
> > to.
> 
> Ok. That was under the assumption, that the core of git is basically a 
> content addressable file system. But that seems to be history :-)

It is (based on) a content-addressable file system, but it's not a host 
file system. It's a file system in the sense that you can put octet 
sequences into it and lookup them up by their names, but you can't mount 
it from the kernel and link to it. It's like a tar file, although it's 
more limited in that it doesn't provide a "list" operation.

There's no fundamental reason there couldn't be a kernel driver (or, 
more likely, FUSE helper) which could mount it, but that's not the normal 
method.

> > Of course, the other possibility is to check out versions on the slaves, 
> > and rsync that to the webservers, which is probably the optimal method if 
> > you're not in a situation where you benefit from anything git does in 
> > transit.
> 
> I would benefit from noticing local changes. But simple rsync is what is tried now.
> Problem is, we get no de-duplication from rsync, which git could do.

In that case, fetching trees is probably the right thing; that should give 
you a point-to-point de-duplication without any history (although you may 
also turn up git bugs, since this isn't how git is normally used).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-17 19:26     ` Daniel Barkalow
@ 2009-06-17 20:26       ` Alex Riesen
  2009-06-17 20:33         ` Alex Riesen
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Riesen @ 2009-06-17 20:26 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Daniel Barkalow, git

2009/6/17 Daniel Barkalow <barkalow@iabervon.org>:
> On Wed, 17 Jun 2009, Ingo Oeser wrote:
>> > Of course, the other possibility is to check out versions on the slaves,
>> > and rsync that to the webservers, which is probably the optimal method if
>> > you're not in a situation where you benefit from anything git does in
>> > transit.
>>
>> I would benefit from noticing local changes. But simple rsync is what is tried now.
>> Problem is, we get no de-duplication from rsync, which git could do.
>
> In that case, fetching trees is probably the right thing; that should give
> you a point-to-point de-duplication without any history (although you may
> also turn up git bugs, since this isn't how git is normally used).

Or, you can just keep a namespace for each server in the intermediate
repositories, which records the version the server has and the version
it should have. Then you can use git diff-tree to find you which files
have to be transferred. You wont be able to record changes on the servers,
though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using git for code deployment on webservers?
  2009-06-17 20:26       ` Alex Riesen
@ 2009-06-17 20:33         ` Alex Riesen
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Riesen @ 2009-06-17 20:33 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Daniel Barkalow, git

2009/6/17 Alex Riesen <raa.lkml@gmail.com>:
> Or, you can just keep a namespace for each server in the intermediate

I mean namespace of branches:

  refs/heads/webserver_1/master (current)
  refs/heads/webserver_1/next (to be updated to)

> repositories, which records the version the server has and the version
> it should have. Then you can use git diff-tree to find you which files
> have to be transferred. You wont be able to record changes on the servers,
> though.

Something like that:

git diff-tree --diff-filter=AM webserver1_/master..webserver_1/next |
while read f; do scp "$f" webserver_1:"$f" || break; done
git diff-tree --diff-filter=D webserver1_/master..webserver_1/next |
while read f; do ssh webserver_1 rm -f "$f" || break; done

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-06-17 20:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-15 23:11 Using git for code deployment on webservers? Ingo Oeser
2009-06-16  7:13 ` Allan Wind
2009-06-17 17:42   ` Ingo Oeser
2009-06-16  8:01 ` Thomas Koch
2009-06-17 17:27   ` Ingo Oeser
2009-06-16 17:49 ` Daniel Barkalow
2009-06-17 17:23   ` Ingo Oeser
2009-06-17 19:26     ` Daniel Barkalow
2009-06-17 20:26       ` Alex Riesen
2009-06-17 20:33         ` Alex Riesen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.