All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Neal Kreitzinger <nkreitzinger@gmail.com>,
	Bo Chen <chen@chenirvine.org>,
	Sergio <sergio.callegari@gmail.com>,
	git@vger.kernel.org
Subject: Re: GSoC - Some questions on the idea of
Date: Mon, 02 Apr 2012 15:19:35 -0700	[thread overview]
Message-ID: <7vvclhdbew.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20120402214049.GB28926@sigill.intra.peff.net> (Jeff King's message of "Mon, 2 Apr 2012 17:40:49 -0400")

Jeff King <peff@peff.net> writes:

>   1. You really have 100G of data in the current version that doesn't
>      compress well (e.g., you are storing your music collection). You
>      can't afford to store two copies on your laptop (because you have a
>      fancy SSD, and 100G is expensive again).  You need the working tree
>      version, but it's OK to stream the repo version of a blob from the
>      network when you actually need it (mostly "checkout", assuming you
>      have marked the file as "-diff").

This feels like a good candidate for an independent project that allows
you fuse-mount from a remote repository to give you an illusion that you
have a checkout of a specific version.  Such a remote fuse-server would be
an application that is built using Git, but I do not think we are in any
business on the client end in such a setup.

So I'll write it off as a "non-Git" issue for now.

The other parts of your message is much more interesting.

> Right. This is the same concept, except over the network. So people's
> working repositories are on their own workstations instead of a central
> server. You could even do it today by network-mounting a filesystem and
> pointing your alternates file at it. However, I think it's worth making
> git aware that the objects are on the network for a few reasons:
>
>   1. Git can be more careful about how it handles the objects, including
>      when to fetch, when to stream, and when to cache. For example,
>      you'd want to fetch the manifest of objects and cache it in your
>      local repository, because you want fast lookups of "do I have this
>      object".
>
>   2. Providing remote filesystems on an Internet scale is a management
>      pain (and it's a pain for the user, too). My thought was that this
>      would be implemented on top of http (the connection setup cost is
>      negligible, since these objects would generally be large).
>
>   3. Usually alternate repositories are full repositories that meet the
>      connectivity requirements (so you could run "git fsck" in them).
>      But this is explicitly about taking just a few disconnected large
>      blobs out of the repository and putting them elsewhere. So it needs
>      a new set of tools for managing the upstream repository.

Or you can split out the really large write-only blobs out of SCM control.
Every time you introduce a new blob, throw it verbatim in an append-only
directory on a networked filesystem under some unique ID as its filename,
and maintain a symlink into that networked filesystem under SCM control.

I think git-annex already does something like that...

  reply	other threads:[~2012-04-02 22:19 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-28  4:38 GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-28  6:19 ` Nguyen Thai Ngoc Duy
2012-03-28 11:33   ` GSoC - Some questions on the idea of Sergio
2012-03-30 19:44     ` Bo Chen
2012-03-30 19:51     ` Bo Chen
2012-03-30 20:34       ` Jeff King
2012-03-30 23:08         ` Bo Chen
2012-03-31 11:02           ` Sergio Callegari
2012-03-31 16:18             ` Neal Kreitzinger
2012-04-02 21:07               ` Jeff King
2012-04-03  9:58                 ` Sergio Callegari
2012-04-11  1:24                 ` Neal Kreitzinger
2012-04-11  6:04                   ` Jonathan Nieder
2012-04-11 16:29                     ` Neal Kreitzinger
2012-04-11 22:09                       ` Jeff King
2012-04-11 16:35                     ` Neal Kreitzinger
2012-04-11 16:44                     ` Neal Kreitzinger
2012-04-11 17:20                       ` Jonathan Nieder
2012-04-11 18:51                         ` Junio C Hamano
2012-04-11 19:03                           ` Jonathan Nieder
2012-04-11 18:23                     ` Neal Kreitzinger
2012-04-11 21:35                   ` Jeff King
2012-04-12 19:29                     ` Neal Kreitzinger
2012-04-12 21:03                       ` Jeff King
     [not found]                         ` <4F8A2EBD.1070407@gmail.com>
2012-04-15  2:15                           ` Jeff King
2012-04-15  2:33                             ` Neal Kreitzinger
2012-04-16 14:54                               ` Jeff King
2012-05-10 21:43                             ` Neal Kreitzinger
2012-05-10 22:39                               ` Jeff King
2012-04-12 21:08                       ` Neal Kreitzinger
2012-04-13 21:36                       ` Bo Chen
2012-03-31 15:19         ` Neal Kreitzinger
2012-04-02 21:40           ` Jeff King
2012-04-02 22:19             ` Junio C Hamano [this message]
2012-04-03 10:07               ` Jeff King
2012-03-31 16:49         ` Neal Kreitzinger
2012-03-31 20:28         ` Neal Kreitzinger
2012-03-31 21:27           ` Bo Chen
2012-04-01  4:22             ` Nguyen Thai Ngoc Duy
2012-04-01 23:30               ` Bo Chen
2012-04-02  1:00                 ` Nguyen Thai Ngoc Duy
2012-03-30 19:11   ` GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-30 19:54     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vvclhdbew.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=chen@chenirvine.org \
    --cc=git@vger.kernel.org \
    --cc=nkreitzinger@gmail.com \
    --cc=peff@peff.net \
    --cc=sergio.callegari@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.