All of lore.kernel.org
 help / color / mirror / Atom feed
* Adding git hooks
@ 2014-04-26  9:34 Suvorov Ivan
  2014-04-26 17:24 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Suvorov Ivan @ 2014-04-26  9:34 UTC (permalink / raw)
  To: git

Hello.
I want to extend the functionality of git due to the possibility of separation of the user repository into 2 parts - one part will be stored as usual, under version control git, and the second part will be stored in another location such as an FTP-server.

This will be done in order to be able to separate the user repository binary data from the source code and binary data can stored separately.

For example, now on github prohibited to upload files larger than 100 MB, but some large files still would like to keep under version control.

And it will be possible to make due to the proposed division of the repository. It is assumed that this functionality will be developed for the most part separately from the program git, using a mechanism git hook. But in the current program git not enough hooks to implement this functionality. For example, it would be nice to have existed on the hook command git status, so that the output of this command can be supplemented by a list of binaries that are not under version control in main repository, but are under version control, for example in FTP server.

As a community Git consider the idea of a extend mechanism the functionality git hook?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adding git hooks
  2014-04-26  9:34 Adding git hooks Suvorov Ivan
@ 2014-04-26 17:24 ` Junio C Hamano
  2014-04-26 17:50   ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2014-04-26 17:24 UTC (permalink / raw)
  To: Suvorov Ivan; +Cc: git

Suvorov Ivan <sv_91@inbox.ru> writes:

> I want to extend the functionality of git due to the possibility of
> separation of the user repository into 2 parts - one part will be
> stored as usual, under version control git, and the second part will
> be stored in another location such as an FTP-server.

Sounds like you are looking for git-annex, perhaps?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adding git hooks
  2014-04-26 17:24 ` Junio C Hamano
@ 2014-04-26 17:50   ` Jeff King
  2014-04-28 16:43     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2014-04-26 17:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Suvorov Ivan, git

On Sat, Apr 26, 2014 at 10:24:50AM -0700, Junio C Hamano wrote:

> Suvorov Ivan <sv_91@inbox.ru> writes:
> 
> > I want to extend the functionality of git due to the possibility of
> > separation of the user repository into 2 parts - one part will be
> > stored as usual, under version control git, and the second part will
> > be stored in another location such as an FTP-server.
> 
> Sounds like you are looking for git-annex, perhaps?

I agree that is the right approach, but git-annex and git-media work
_above_ the object layer, and taint the history by storing symlinks in
git instead of the real sha1s. I'd love to see a solution that does the
same thing, but lives at the pack/loose object layer. Basically:

  1. Teach sha1-file.c to look for missing objects by hitting an
     external script, like:

        git config odb.command "curl https://example.com/%s"

     and place them in an alternates-like separate object database.

  2. Teach the git protocol a new extension to say "don't bother sending
     blobs over size X". You'd have to coordinate that X with the source
     from your odb.command.

You'd probably want to wrap up the odb.command in a more fancy helper.
For example, for performance, we'd probably want to be able to query it
for "which objects do you have", as well as "fetch this object". And it
would be nice if it could auto-query the "X" for step 2, and manage
pruning local objects (e.g., when they become deep in history).

We'd probably also want to teach a few places in git to treat external
objects specially. For example, they should probably be auto-treated as
binary, so that a "log -p" does not try to fetch all of them. And
likewise, things like "log -S" should probably ignore them by default.

I have a messy sketch of step 1 that I did quite a while ago, but
haven't proceeded further on it.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adding git hooks
  2014-04-26 17:50   ` Jeff King
@ 2014-04-28 16:43     ` Junio C Hamano
  2014-04-28 19:11       ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2014-04-28 16:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Suvorov Ivan, git

Jeff King <peff@peff.net> writes:

> On Sat, Apr 26, 2014 at 10:24:50AM -0700, Junio C Hamano wrote:
>
>> Suvorov Ivan <sv_91@inbox.ru> writes:
>> 
>> > I want to extend the functionality of git due to the possibility of
>> > separation of the user repository into 2 parts - one part will be
>> > stored as usual, under version control git, and the second part will
>> > be stored in another location such as an FTP-server.
>> 
>> Sounds like you are looking for git-annex, perhaps?
>
> I agree that is the right approach, but git-annex and git-media work
> _above_ the object layer, and taint the history by storing symlinks in
> git instead of the real sha1s. I'd love to see a solution that does the
> same thing, but lives at the pack/loose object layer. Basically:
>
>   1. Teach sha1-file.c to look for missing objects by hitting an
>      external script, like:
>
>         git config odb.command "curl https://example.com/%s"
>
>      and place them in an alternates-like separate object database.
>
>   2. Teach the git protocol a new extension to say "don't bother sending
>      blobs over size X". You'd have to coordinate that X with the source
>      from your odb.command.

Yes, I'd love to see something along that line in the longer term,
showing all the objects as just regular objects under the hood, with
implementation details hidden in the object layer (just like there
is no distinction between packed and loose objects from the point of
view of read_sha1_file() users), as a real solution to address
issues in larger trees.

Also see http://thread.gmane.org/gmane.comp.version-control.git/241940
where Shawn had an interesting experiment.

> You'd probably want to wrap up the odb.command in a more fancy helper.
> For example, for performance, we'd probably want to be able to query it
> for "which objects do you have", as well as "fetch this object". And it
> would be nice if it could auto-query the "X" for step 2, and manage
> pruning local objects (e.g., when they become deep in history).
>
> We'd probably also want to teach a few places in git to treat external
> objects specially. For example, they should probably be auto-treated as
> binary, so that a "log -p" does not try to fetch all of them. And
> likewise, things like "log -S" should probably ignore them by default.
>
> I have a messy sketch of step 1 that I did quite a while ago, but
> haven't proceeded further on it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adding git hooks
  2014-04-28 16:43     ` Junio C Hamano
@ 2014-04-28 19:11       ` Jeff King
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2014-04-28 19:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Suvorov Ivan, git

On Mon, Apr 28, 2014 at 09:43:10AM -0700, Junio C Hamano wrote:

> Yes, I'd love to see something along that line in the longer term,
> showing all the objects as just regular objects under the hood, with
> implementation details hidden in the object layer (just like there
> is no distinction between packed and loose objects from the point of
> view of read_sha1_file() users), as a real solution to address
> issues in larger trees.
> 
> Also see http://thread.gmane.org/gmane.comp.version-control.git/241940
> where Shawn had an interesting experiment.

Yeah, I think it's pretty clear that a naive high-latency object store
is unusably slow. You mentioned in that thread trying to do pre-fetching
based on commits/trees, and I recall that Shawn's Cassandra experiments
did that (and maybe the BigTable-backed Google Code does, too?).

There's also a question of deltas. You don't want to get trees or text
blobs individually without deltas, because your total size ends up way
bigger.

But I think for large object support, we can side-step the issue. The
objects will all be blobs (so they cannot refer to anything else), they
will typically not delta well, and the connection setup and latency will
be dwarfed by actual transfer time. My plan was to have all clones fetch
all commits and trees (and small blobs, too), and then download and
cache the large blobs as-needed.

That doesn't help with repositories where the actual commit history or
tree size is a problem. But we already have shallow clones to help with
the former. And for the latter, I think we would want a narrow clone
that behaves differently than what I described above. You'd probably
want a specific "widen" operation that would fetch all of the objects
for the newly-widened part of the tree in one go (including deltas), and
you wouldn't want it to happen on an as-needed basis.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-04-28 19:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-26  9:34 Adding git hooks Suvorov Ivan
2014-04-26 17:24 ` Junio C Hamano
2014-04-26 17:50   ` Jeff King
2014-04-28 16:43     ` Junio C Hamano
2014-04-28 19:11       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.