All of lore.kernel.org
 help / color / mirror / Atom feed
* Features from GitSurvey 2010
@ 2011-01-29 10:01 Dmitry S. Kravtsov
  2011-01-29 23:13 ` Jonathan Nieder
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry S. Kravtsov @ 2011-01-29 10:01 UTC (permalink / raw)
  To: git

Hello,

I want to dedicate my coursework at University to implementation of
some useful git feature. So I'm interesting in some kind of list of
development status of these features
https://git.wiki.kernel.org/index.php/GitSurvey2010#17._Which_of_the_following_features_would_you_like_to_see_implemented_in_git.3F

Or I'll be glad to know what features are now 'free' and what are
currently in active development.

Best Regards
-- 
Dmitry S. Kravtsov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-01-29 10:01 Features from GitSurvey 2010 Dmitry S. Kravtsov
@ 2011-01-29 23:13 ` Jonathan Nieder
  2011-02-01 13:51   ` Jakub Narebski
  2011-02-01 17:44   ` Matthieu Moy
  0 siblings, 2 replies; 36+ messages in thread
From: Jonathan Nieder @ 2011-01-29 23:13 UTC (permalink / raw)
  To: Dmitry S. Kravtsov; +Cc: git, Jakub Narebski

Hi Dmitry,

Dmitry S. Kravtsov wrote:

> I want to dedicate my coursework at University to implementation of
> some useful git feature. So I'm interesting in some kind of list of
> development status of these features
[...]
> Or I'll be glad to know what features are now 'free' and what are
> currently in active development.

Interesting question.  The short answer is that they are all "free".
Generally people seem to be happy to learn of an alternative approach
to what they have been working on.

[For the following pointers, the easiest way to follow up is probably
to search the mailing list archives.]

> better support for big files (large media)

For a conservative approach, you might want to get in touch with Sam
Hocevar, Nicolas Pitre, and Miklos Vajna.  The idea is to stream big
files directly to pack and not waste time trying to compress them.

For an alternative approach, Joey Hess's git-annex might be
interesting.

> resumable clone/fetch (and other remote operations)

Jakub Narebski seems to be interested in this and Nicolas Pitre has
given some good advice about it.  You can get something usable today
by putting up a git bundle for download over HTTP or rsync, so it is
possible that this just involves some UI (porcelain) and documentation
work to become standard practice.
 
> GitTorrent Protocol, or git-mirror

Sam Vilain and Jonas Fonseca did some good work on this, but it's
stalled.
 
> lazy clone / on-demand fetching of object

There's a patch.  As is, it is not likely to be useful outside
specialized circumstances imho.

http://thread.gmane.org/gmane.comp.version-control.git/73117
 
> subtree clone

Nguyễn Thái Ngọc Duy and Elijah Newren have done some design and
prototyping work.
 
> support for tracking empty directories

Tricky to get the UI right.  I am interested in and would be glad to
help with this one.
 
> environment variables in config
> better undo/abort/continue, and for more commands

These might involve nice "bite-sized" projects.  Christian Couder
and Johannes Schindelin have discussed cherry-pick --abort/--continue
and they might be interested in patches on that subject.  Stephen
Beyer's sequencer might be interesting for inspiration:

git://repo.or.cz/git/sbeyer.git

> '-n' like option for each command, which describes what would happen
> warn before/when rewriting published history
> git push --create
> "commands issued" (or "command equivalents") in git-gui / gitk

Go for it.  "git init --remote" might be a good companion to "git push
--create".
 
> side-by-side diffs and/or color-words diff in gitweb
> admin and/or write features in gitweb
> graphical history view in gitweb
 
There may or may not have been design work on some of these for Pavan
Kumar Sunkara's last summer of code project.  John 'Warthog9' Hawley,
Jakub Narebski, and Petr Baudis might have advice.

> GUI for rebase in git-gui
> GUI for creating repository in git-gui
> graphical diff/merge tool integrated with git-gui
> syntax highlighting in git-gui
 
Pat Thoyts is probably the one to talk to.

> filename encoding (in repository vs in filesystem)
 
This is important for the Windows port and likely to be a nuisance
on Unix.  I think there has been some work on it on the msysgit list?

> localization of command-line messages (i18n)

Ævar Arnfjörð Bjarmason did some work which is in pu.  It needs
some polishing.  I am also interested.
 
> wholesame directory rename detection

Yann Dirson wrote a patch.  It needs some polishing (I'd be glad to
help --- it would be exciting to see this move forward).
 
> union checkouts (some files from one branch, some from other)

Not sure I understand the use case?
 
> advisory locking / "this file is being edited"

Probably better to implement out of band (using hooks?).  I don't
know of any work or documentation in that direction.
 
> built-in gitjour/bananajour support
 
A good start might be to submit one or both of these to contrib?

> better support for submodules	

Jens Lehmann has done some great work on this and presumably would
be happy for help.

https://github.com/jlehmann/git-submod-enhancements/wiki

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-01-29 23:13 ` Jonathan Nieder
@ 2011-02-01 13:51   ` Jakub Narebski
  2011-02-01 15:52     ` Nguyen Thai Ngoc Duy
                       ` (4 more replies)
  2011-02-01 17:44   ` Matthieu Moy
  1 sibling, 5 replies; 36+ messages in thread
From: Jakub Narebski @ 2011-02-01 13:51 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Dmitry S. Kravtsov, git

On Sun, 30 Jan 2011, Jonathan Nieder wrote:
> Hi Dmitry,
> 
> Dmitry S. Kravtsov wrote:
> 
> > I want to dedicate my coursework at University to implementation of
> > some useful git feature. So I'm interesting in some kind of list of
> > development status of these features
> [...]
> > Or I'll be glad to know what features are now 'free' and what are
> > currently in active development.
> 
> Interesting question.  The short answer is that they are all "free".
> Generally people seem to be happy to learn of an alternative approach
> to what they have been working on.
> 
> [For the following pointers, the easiest way to follow up is probably
> to search the mailing list archives.]
> 
> > better support for big files (large media)
> 
> For a conservative approach, you might want to get in touch with Sam
> Hocevar, Nicolas Pitre, and Miklos Vajna.  The idea is to stream big
> files directly to pack and not waste time trying to compress them.

There is also, supposedly stalled, git-bigfiles project.

> 
> For an alternative approach, Joey Hess's git-annex might be
> interesting.

Note that this feature would not be easy to implement; you would need
both quite good knowledge of git internals, and some realistic use case
that you can test performance of your improvements with.

> > resumable clone/fetch (and other remote operations)
> 
> Jakub Narebski seems to be interested in this and Nicolas Pitre has
> given some good advice about it.  You can get something usable today
> by putting up a git bundle for download over HTTP or rsync, so it is
> possible that this just involves some UI (porcelain) and documentation
> work to become standard practice.

I wouldn't say that: it is Nicolas Pitre (IIRC) who was doing the work;
I was only interested party posting comments, but no code.

Again, this feature is not very easy to implement, and would require 
knowledge of git internals including "smart" git transport ("Pro Git"
book can help there).

> > GitTorrent Protocol, or git-mirror
> 
> Sam Vilain and Jonas Fonseca did some good work on this, but it's
> stalled.

There was some recent discussion on this on git mailing llist, but
without any code.

One would need to know similar areas as for "resumable clone" feature.
Plus some knowledge on P2P transport in GitTorrent case.

> > lazy clone / on-demand fetching of object
> 
> There's a patch.  As is, it is not likely to be useful outside
> specialized circumstances imho.
> 
> http://thread.gmane.org/gmane.comp.version-control.git/73117

One would need to know about how git checks for consistency, e.g. via
git-fsck for that.

> > subtree clone
> 
> Nguyễn Thái Ngọc Duy and Elijah Newren have done some design and
> prototyping work.

Git mailing list archives should contain proof of concept / RFC patches
for this feature.  Quite interesting.

> > support for tracking empty directories
> 
> Tricky to get the UI right.  I am interested in and would be glad to
> help with this one.

Also one needs to remember that this would require adding extension
to git index, because currently it tracks only files, and not 
directories.  Explicitly tracking directories in the index could be 
useful for other purposes...

The major difficulty of this is IMHO not the UI, but tracking all those
tricky corner cases (like directory/file conflict, etc.).

[...] 
> > side-by-side diffs and/or color-words diff in gitweb
> > admin and/or write features in gitweb
> > graphical history view in gitweb
>  
> There may or may not have been design work on some of these for Pavan
> Kumar Sunkara's last summer of code project.  John 'Warthog9' Hawley,
> Jakub Narebski, and Petr Baudis might have advice.

Pavan was to work on admin and/or write features in gitweb, while he
didn't even finish splitting gitweb (perhaps he tried to be too 
ambitious in trying to split whole of gitweb at once, instead of 
splitting-off well definied pieces?).

There are existing Perl modules on CPAN and/or Perl web apps that use
side-by-side diffs and/or color-words diff, so there is code to take an
example, to use in gitweb, or to borrow.

The graphical history view would be more difficult, but not very 
difficult, I think.  You would have to decide how to draw a graph: 
should gitweb generate image on the fly, use some smart combination
of CSS and pre-made images (perhaps with transparency), use Unicode
graph characters, or use ASCII-art graph.  You would also have to decide 
whether to use or base on some existing algorithm to generate graph,
borrowing from tig, git-browser, gitk or git-forest (the last is in 
Perl, like gitweb), or whether to make use of "git log --graph" output.

> > GUI for rebase in git-gui
> > GUI for creating repository in git-gui
> > graphical diff/merge tool integrated with git-gui
> > syntax highlighting in git-gui
>  
> Pat Thoyts is probably the one to talk to.

Note that for graphical diff/merge tool you should be able to borrow 
from xxdiff graphical diff/merge tool, which is also written in Tcl/Tk.

> > filename encoding (in repository vs in filesystem)
>  
> This is important for the Windows port and likely to be a nuisance
> on Unix.  I think there has been some work on it on the msysgit list?

That would need a good design (note also different forms of Unicode),
and overcoming inertia.

> > localization of command-line messages (i18n)
> 
> Ævar Arnfjörð Bjarmason did some work which is in pu.  It needs
> some polishing.  I am also interested.

I don't know how much is left to do (beside actually translating 
messages).  But this series could use a little help.
  
> > union checkouts (some files from one branch, some from other)
> 
> Not sure I understand the use case?

I don't know if it would be really useful, but the concept is similar to 
union mounts.  In union mount (unionfs, aufs,...) you can e.g. mount 
CD-ROM read-only, and over it overlay on some read-write filesystem.

The idea is to have some files (some directories) in working area come 
from one branch, some from other branch, persistently in some way.  Not 
sure if it would be actually useful.
  
> > advisory locking / "this file is being edited"
> 
> Probably better to implement out of band (using hooks?).  I don't
> know of any work or documentation in that direction.

Yes, t would probably be best as external project, only with git 
integration.

> > built-in gitjour/bananajour support
>  
> A good start might be to submit one or both of these to contrib?

If I remember correctly there are some third-party extensions to be 
found, and perhaps even some patches in git mailing list.


Best choose something you are both interested in, and proficient with.

HTH
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 13:51   ` Jakub Narebski
@ 2011-02-01 15:52     ` Nguyen Thai Ngoc Duy
  2011-02-01 16:33       ` Shawn Pearce
  2011-02-01 16:27     ` Shawn Pearce
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-02-01 15:52 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jonathan Nieder, Dmitry S. Kravtsov, git, Shawn O. Pearce

On Tue, Feb 1, 2011 at 8:51 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Sun, 30 Jan 2011, Jonathan Nieder wrote:
>> > support for tracking empty directories
>>
>> Tricky to get the UI right.  I am interested in and would be glad to
>> help with this one.
>
> Also one needs to remember that this would require adding extension
> to git index, because currently it tracks only files, and not
> directories.  Explicitly tracking directories in the index could be
> useful for other purposes...
>
> The major difficulty of this is IMHO not the UI, but tracking all those
> tricky corner cases (like directory/file conflict, etc.).

Sort order in index is quite special/strange and must be handled
correctly when dirs and files are mixed. There are already special
directories in index: the submodules. Current git code treats
S_ISDIR() and S_ISGITLINK() the same in ce_to_dtype() and some more
places. You need to decouple it somehow.

I tried this (for another purpose) and pulled back. I recall Shawn had
a tree-based index implementation, don't know if he still has it.
Could be a good point to start adding dirs to index.

Actually tree-based index with dictionary (something like trees in
packv4) is a good feature itself. It could shrink index size down a
lot. index is frequently read/written so small index helps (webkit's
index is 16M, 4M after gzipped).
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 13:51   ` Jakub Narebski
  2011-02-01 15:52     ` Nguyen Thai Ngoc Duy
@ 2011-02-01 16:27     ` Shawn Pearce
  2011-02-01 17:05       ` Nguyen Thai Ngoc Duy
  2011-02-01 17:11       ` Nguyen Thai Ngoc Duy
  2011-02-01 17:28     ` Tracking empty directories Jonathan Nieder
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 36+ messages in thread
From: Shawn Pearce @ 2011-02-01 16:27 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jonathan Nieder, Dmitry S. Kravtsov, git

On Tue, Feb 1, 2011 at 05:51, Jakub Narebski <jnareb@gmail.com> wrote:
>
>> > resumable clone/fetch (and other remote operations)
>>
>> Jakub Narebski seems to be interested in this and Nicolas Pitre has
>> given some good advice about it.  You can get something usable today
>> by putting up a git bundle for download over HTTP or rsync, so it is
>> possible that this just involves some UI (porcelain) and documentation
>> work to become standard practice.
>
> I wouldn't say that: it is Nicolas Pitre (IIRC) who was doing the work;
> I was only interested party posting comments, but no code.
>
> Again, this feature is not very easy to implement, and would require
> knowledge of git internals including "smart" git transport ("Pro Git"
> book can help there).

I think Nico and I have mostly solved this with the pack caching idea.
 If we cache the pack file, we can resume anywhere in about 97% of the
transfer.  The first 3% cannot be resumed easily, its back to the old
"git cannot be resumed" issue.  Fixing that last 3% is incredibly
difficult... but resuming within the remaining 97% is a pretty simple
extension of the protocol.  The hard part is the client side
infrastructure to remember where we left off and restart.

>> > GitTorrent Protocol, or git-mirror
>>
>> Sam Vilain and Jonas Fonseca did some good work on this, but it's
>> stalled.
>
> There was some recent discussion on this on git mailing llist, but
> without any code.
>
> One would need to know similar areas as for "resumable clone" feature.
> Plus some knowledge on P2P transport in GitTorrent case.

I think this is very similar to resumable clone.  With the cached
pack, clients could use torrent to find it.  But right now Nico and I
are sort of expecting a cached pack to live for about the release
cycle of a project... e.g. only a couple of months.  I don't know if
that can be seeded fast enough on P2P networks to make it useful to
torrent the ~97% of the project that is the cached pack during an
initial clone request.

>> > subtree clone
>>
>> Nguyễn Thái Ngọc Duy and Elijah Newren have done some design and
>> prototyping work.
>
> Git mailing list archives should contain proof of concept / RFC patches
> for this feature.  Quite interesting.

I think Junio has already started thinking about this one.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 15:52     ` Nguyen Thai Ngoc Duy
@ 2011-02-01 16:33       ` Shawn Pearce
  0 siblings, 0 replies; 36+ messages in thread
From: Shawn Pearce @ 2011-02-01 16:33 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jakub Narebski, Jonathan Nieder, Dmitry S. Kravtsov, git

On Tue, Feb 1, 2011 at 07:52, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Tue, Feb 1, 2011 at 8:51 PM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Sun, 30 Jan 2011, Jonathan Nieder wrote:
>>> > support for tracking empty directories
>>>
>>> Tricky to get the UI right.  I am interested in and would be glad to
>>> help with this one.
>>
>> Also one needs to remember that this would require adding extension
>> to git index, because currently it tracks only files, and not
>> directories.  Explicitly tracking directories in the index could be
>> useful for other purposes...
>>
>> The major difficulty of this is IMHO not the UI, but tracking all those
>> tricky corner cases (like directory/file conflict, etc.).
>
> Sort order in index is quite special/strange and must be handled
> correctly when dirs and files are mixed.

Its not the order in the index that is confusing, its the order in the
tree objects.  The index sort order is simple, since every path is a
full path string from the top of the repository... you use strcmp() to
order them into a natural order.  This however skews where a
subdirectory should live relative to a sibling file, because the
"subdirectory" sorts as though its name ends with '/'.

> There are already special
> directories in index: the submodules. Current git code treats
> S_ISDIR() and S_ISGITLINK() the same in ce_to_dtype() and some more
> places. You need to decouple it somehow.

More confusingly, the GITLINK type is handled as though its *not* a
directory.  Storing an empty directory probably means tracking it like
a real directory, but using the empty tree SHA-1 as its value.
Otherwise we probably have all sorts of stuff broken.

> I tried this (for another purpose) and pulled back. I recall Shawn had
> a tree-based index implementation, don't know if he still has it.

No, we threw out the tree-based index that was used inside of EGit
years ago.  It turned out to be a horrible idea because it wasn't
compatible with the C tools, and it didn't have the inode stat cache
to tell us which files were clean or dirty quickly.

> Actually tree-based index with dictionary (something like trees in
> packv4) is a good feature itself. It could shrink index size down a
> lot. index is frequently read/written so small index helps (webkit's
> index is 16M, 4M after gzipped).

I think a lot of the reason the webkit index is 16M, gzip to 4M is
because of the duplicate path prefixes that appear on all files within
the same directory.  If the index was still a single file, but was
organized into sections by tree (like the TREE extension within the
index itself) you could avoid having the full path within the index
file and save a lot of space when there are many files within
subdirectories.  But this does complicate the C code because you would
need to copy each of those path segments together into a path buffer
in order to access the file in the working tree.

Its probably faster to copy those path segments on read into a big
path buffer, and break them apart on write, than to have a huge index
file.  We already reformat the index during reading/writing to expand
some of the fields for in-memory only flags.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 16:27     ` Shawn Pearce
@ 2011-02-01 17:05       ` Nguyen Thai Ngoc Duy
  2011-02-01 21:27         ` Junio C Hamano
  2011-02-01 21:44         ` Nicolas Pitre
  2011-02-01 17:11       ` Nguyen Thai Ngoc Duy
  1 sibling, 2 replies; 36+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-02-01 17:05 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Jakub Narebski, Jonathan Nieder, Dmitry S. Kravtsov, git, Junio C Hamano

On Tue, Feb 1, 2011 at 11:27 PM, Shawn Pearce <spearce@spearce.org> wrote:
>>> > subtree clone
>>>
>>> Nguyễn Thái Ngọc Duy and Elijah Newren have done some design and
>>> prototyping work.
>>
>> Git mailing list archives should contain proof of concept / RFC patches
>> for this feature.  Quite interesting.
>
> I think Junio has already started thinking about this one.

I need to get nd/pathspec right and implement negative pathspecs
before returning to this feature. But there are still interesting
issues:

 - narrow by directories or pathspecs (or unpack_trees() by
directories or pathspecs)
 - widen a clone (negative pathspecs should help calculating necessary objects)
 - should commit objects that does not update narrow area be fetched
(I recall it consumes a considerable amount. Security is also an
issue)
 - push support for shallow clone (Elijah approach does not base on
shallow clone, so it's non-issue)
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 16:27     ` Shawn Pearce
  2011-02-01 17:05       ` Nguyen Thai Ngoc Duy
@ 2011-02-01 17:11       ` Nguyen Thai Ngoc Duy
  2011-02-01 17:34         ` Shawn Pearce
  1 sibling, 1 reply; 36+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-02-01 17:11 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Jakub Narebski, Jonathan Nieder, Dmitry S. Kravtsov, git

On Tue, Feb 1, 2011 at 11:27 PM, Shawn Pearce <spearce@spearce.org> wrote:
> On Tue, Feb 1, 2011 at 05:51, Jakub Narebski <jnareb@gmail.com> wrote:
>>
>>> > resumable clone/fetch (and other remote operations)
>>>
>>> Jakub Narebski seems to be interested in this and Nicolas Pitre has
>>> given some good advice about it.  You can get something usable today
>>> by putting up a git bundle for download over HTTP or rsync, so it is
>>> possible that this just involves some UI (porcelain) and documentation
>>> work to become standard practice.
>>
>> I wouldn't say that: it is Nicolas Pitre (IIRC) who was doing the work;
>> I was only interested party posting comments, but no code.
>>
>> Again, this feature is not very easy to implement, and would require
>> knowledge of git internals including "smart" git transport ("Pro Git"
>> book can help there).
>
> I think Nico and I have mostly solved this with the pack caching idea.
>  If we cache the pack file, we can resume anywhere in about 97% of the
> transfer.  The first 3% cannot be resumed easily, its back to the old
> "git cannot be resumed" issue.  Fixing that last 3% is incredibly

I thought the cached pack contained anything and for initial clone, we
simply send the pack. What is this 3%? Commit list? Initial commit?

> difficult... but resuming within the remaining 97% is a pretty simple
> extension of the protocol.  The hard part is the client side
> infrastructure to remember where we left off and restart.

Narrow/Subtree clone is still just an idea, but can pack cache support
be made to resumable initial narrow clone too?
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Tracking empty directories
  2011-02-01 13:51   ` Jakub Narebski
  2011-02-01 15:52     ` Nguyen Thai Ngoc Duy
  2011-02-01 16:27     ` Shawn Pearce
@ 2011-02-01 17:28     ` Jonathan Nieder
  2011-02-01 17:54       ` Nguyen Thai Ngoc Duy
  2011-02-01 21:36     ` Features from GitSurvey 2010 Nicolas Pitre
  2011-02-01 22:50     ` big files in git was: " david
  4 siblings, 1 reply; 36+ messages in thread
From: Jonathan Nieder @ 2011-02-01 17:28 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Dmitry S. Kravtsov, git, Nguyen Thai Ngoc Duy, Shawn Pearce

Jakub Narebski wrote:

> Also one needs to remember that this would require adding extension
> to git index, because currently it tracks only files, and not 
> directories.  Explicitly tracking directories in the index could be 
> useful for other purposes...
>
> The major difficulty of this is IMHO not the UI, but tracking all those
> tricky corner cases (like directory/file conflict, etc.).

I have ideas about how to resolve those tricky corner cases, but not
about what the UI should look like.  How does one go about adding a
directory?  Does it ever get implicitly removed?

Would this actually require an index extension, strictly speaking?
Certainly one ought to register an extension name or bump the version
number to avoid confusing gits that don't know about the feature.
But after that, couldn't we (e.g.) allow the directory name (ending
with '/') as index entry?

A related question is backward compatibility (both for alternative git
implementations and for scripts that did not know that "git ls-files"
might mention an empty directory) which somehow seems less
daunting. ;-)

Jonathan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 17:11       ` Nguyen Thai Ngoc Duy
@ 2011-02-01 17:34         ` Shawn Pearce
  2011-02-01 21:51           ` Nicolas Pitre
  0 siblings, 1 reply; 36+ messages in thread
From: Shawn Pearce @ 2011-02-01 17:34 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jakub Narebski, Jonathan Nieder, Dmitry S. Kravtsov, git

On Tue, Feb 1, 2011 at 09:11, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Tue, Feb 1, 2011 at 11:27 PM, Shawn Pearce <spearce@spearce.org> wrote:
>> On Tue, Feb 1, 2011 at 05:51, Jakub Narebski <jnareb@gmail.com> wrote:
>>>
>>>> > resumable clone/fetch (and other remote operations)
>>>>
>>>> Jakub Narebski seems to be interested in this and Nicolas Pitre has
>>>> given some good advice about it.  You can get something usable today
>>>> by putting up a git bundle for download over HTTP or rsync, so it is
>>>> possible that this just involves some UI (porcelain) and documentation
>>>> work to become standard practice.
>>>
>>> I wouldn't say that: it is Nicolas Pitre (IIRC) who was doing the work;
>>> I was only interested party posting comments, but no code.
>>>
>>> Again, this feature is not very easy to implement, and would require
>>> knowledge of git internals including "smart" git transport ("Pro Git"
>>> book can help there).
>>
>> I think Nico and I have mostly solved this with the pack caching idea.
>>  If we cache the pack file, we can resume anywhere in about 97% of the
>> transfer.  The first 3% cannot be resumed easily, its back to the old
>> "git cannot be resumed" issue.  Fixing that last 3% is incredibly
>
> I thought the cached pack contained anything and for initial clone, we
> simply send the pack. What is this 3%? Commit list? Initial commit?

Its the recent changes.  If the cached pack starts from the tip of
master, its probably 0%.  But if the repository owner pushes new
changes since the cached pack was created, these are sent as a thin
pack in front of the cached pack... and make up that ~3% guess.  For
linux-2.6 I tested a 2 week period when the merge window as open right
after a release, and the new delta was about 3% of the overall
repository size.

> Narrow/Subtree clone is still just an idea, but can pack cache support
> be made to resumable initial narrow clone too?

This would be very hard to do.  We could do cached packs for a popular
set of path specifications (e.g. Documentation/ if documentation only
editing is common), but once we start getting random requests for path
specifications that we cannot predict in advance and pre-pack we'd
have to fall back to the normal enumerate code path.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-01-29 23:13 ` Jonathan Nieder
  2011-02-01 13:51   ` Jakub Narebski
@ 2011-02-01 17:44   ` Matthieu Moy
  2011-02-01 18:42     ` Jonathan Nieder
  1 sibling, 1 reply; 36+ messages in thread
From: Matthieu Moy @ 2011-02-01 17:44 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Dmitry S. Kravtsov, git, Jakub Narebski

Jonathan Nieder <jrnieder@gmail.com> writes:

>> support for tracking empty directories
>
> Tricky to get the UI right.  I am interested in and would be glad to
> help with this one.

A starting point, with some proposed (broken) patches:

http://thread.gmane.org/gmane.comp.version-control.git/56310/focus=56348

>> advisory locking / "this file is being edited"
>
> Probably better to implement out of band (using hooks?).  I don't
> know of any work or documentation in that direction.

File locking and distributed tool are conflicting interests. A
file-locking tool for git should be able to use a centralized locks
database (for example, one can imagine a simple PHP script hosted
somewhere independantly of the Git repo, keeping a list of locked
files up to date).

That needs to be integrated with Git, but it should probably still
remain out of the Git core, because different users would want
different locking databases. Hooks and git-* commands in the $PATH are
probably sufficient.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-01 17:28     ` Tracking empty directories Jonathan Nieder
@ 2011-02-01 17:54       ` Nguyen Thai Ngoc Duy
  2011-02-01 18:15         ` Ilari Liusvaara
  2011-02-01 18:35         ` Jonathan Nieder
  0 siblings, 2 replies; 36+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-02-01 17:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jakub Narebski, Dmitry S. Kravtsov, git, Shawn Pearce

On Wed, Feb 2, 2011 at 12:28 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Jakub Narebski wrote:
>
>> Also one needs to remember that this would require adding extension
>> to git index, because currently it tracks only files, and not
>> directories.  Explicitly tracking directories in the index could be
>> useful for other purposes...
>>
>> The major difficulty of this is IMHO not the UI, but tracking all those
>> tricky corner cases (like directory/file conflict, etc.).
>
> I have ideas about how to resolve those tricky corner cases, but not
> about what the UI should look like.  How does one go about adding a
> directory?  Does it ever get implicitly removed?

I suppose a special command for it is appropriate (git-keepdir?). Many
index-related commands are recursive by default and hard to change.

Yes I think it should be automatically removed from index when a file
is added inside tracked directories. Removing those files will also
remove the containing directory though.

> Would this actually require an index extension, strictly speaking?

Could it be done with an index extension? Interesting.

> Certainly one ought to register an extension name or bump the version
> number to avoid confusing gits that don't know about the feature.

Index extension with lowercase name are "necessary for correct
operation". Older git will abort on unknown required extensions. If
you add to the main part of the index, better bump version number.

> But after that, couldn't we (e.g.) allow the directory name (ending
> with '/') as index entry?

You could. You also need to strip '/' sometimes because certain part
of git does not expect '/' to be there (traverse_trees or
unpack_trees, I don't remember).
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-01 17:54       ` Nguyen Thai Ngoc Duy
@ 2011-02-01 18:15         ` Ilari Liusvaara
  2011-02-01 18:31           ` Jakub Narebski
  2011-02-01 18:35         ` Jonathan Nieder
  1 sibling, 1 reply; 36+ messages in thread
From: Ilari Liusvaara @ 2011-02-01 18:15 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jonathan Nieder, Jakub Narebski, Dmitry S. Kravtsov, git, Shawn Pearce

On Wed, Feb 02, 2011 at 12:54:35AM +0700, Nguyen Thai Ngoc Duy wrote:
> On Wed, Feb 2, 2011 at 12:28 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> 
> Could it be done with an index extension? Interesting.
> 
> > Certainly one ought to register an extension name or bump the version
> > number to avoid confusing gits that don't know about the feature.
> 
> Index extension with lowercase name are "necessary for correct
> operation". Older git will abort on unknown required extensions. If
> you add to the main part of the index, better bump version number.

Worse problem than the index: Tree entries. Those are actually transferable
and IIRC older (current?) git versions don't handle empty subdirectories
(pointing entry of type directory to empty tree hash) all too well...

Worse yet, there isn't easy way to break the tree parser to avoid current
git versions from screwing things up (IIRC, when I tested, invalid octal
numbers finally broke it, invalid file types didn't do the trick)...

-Ilari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-01 18:15         ` Ilari Liusvaara
@ 2011-02-01 18:31           ` Jakub Narebski
  2011-02-01 19:09             ` Ilari Liusvaara
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Narebski @ 2011-02-01 18:31 UTC (permalink / raw)
  To: Ilari Liusvaara
  Cc: Nguyen Thai Ngoc Duy, Jonathan Nieder, Dmitry S. Kravtsov, git,
	Shawn Pearce

Dnia wtorek 1. lutego 2011 19:15, Ilari Liusvaara napisał:
> On Wed, Feb 02, 2011 at 12:54:35AM +0700, Nguyen Thai Ngoc Duy wrote:
> > On Wed, Feb 2, 2011 at 12:28 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> > 
> > Could it be done with an index extension? Interesting.
> > 
> > > Certainly one ought to register an extension name or bump the version
> > > number to avoid confusing gits that don't know about the feature.
> > 
> > Index extension with lowercase name are "necessary for correct
> > operation". Older git will abort on unknown required extensions. If
> > you add to the main part of the index, better bump version number.
> 
> Worse problem than the index: Tree entries. Those are actually transferable
> and IIRC older (current?) git versions don't handle empty subdirectories
> (pointing entry of type directory to empty tree hash) all too well...

What did you mean by "don't handle" here?  The following entry

  040000 tree 22d5826c087c4b9dcc72e2131c2cfb061403f7eb	empty

should be not a problem; empty tree is hardcoded and also shouldn't there
be a problem with such object.  Is the problem when checking out such tree
(writing to index and/or working area)?

> Worse yet, there isn't easy way to break the tree parser to avoid current
> git versions from screwing things up (IIRC, when I tested, invalid octal
> numbers finally broke it, invalid file types didn't do the trick)...

Well, then 1.8.0 version could be good place to break backwards 
compatibility; we did similar thing when introducing submodule entries,
isn't it?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-01 17:54       ` Nguyen Thai Ngoc Duy
  2011-02-01 18:15         ` Ilari Liusvaara
@ 2011-02-01 18:35         ` Jonathan Nieder
  2011-02-01 19:03           ` Jakub Narebski
  1 sibling, 1 reply; 36+ messages in thread
From: Jonathan Nieder @ 2011-02-01 18:35 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jakub Narebski, Dmitry S. Kravtsov, git, Shawn Pearce

Nguyen Thai Ngoc Duy wrote:
> On Wed, Feb 2, 2011 at 12:28 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> I have ideas about how to resolve those tricky corner cases, but not
>> about what the UI should look like.  How does one go about adding a
>> directory?  Does it ever get implicitly removed?
>
> I suppose a special command for it is appropriate (git-keepdir?). Many
> index-related commands are recursive by default and hard to change.
>
> Yes I think it should be automatically removed from index when a file
> is added inside tracked directories. Removing those files will also
> remove the containing directory though.

Okay, I'm convinced.  This fits a "worse is better" point of view
nicely.

To add, one would use "git update-index --add".  The magic disappears
when you register a file within that directory; to tell git you want
to keep it, one would mkdir and "git update-index --add" again.  Once
it's working, we can think about if there is a need for making that
last step automatic after all (my guess: "no"). ;-)

Use case: [1]
Nice starting point: [2]
Motivational word of wisdom: [3]

This treatment leaves out the backward compatibility detail.  I still
think that's the easy part (at worst, we can always implement read
support, wait a year, and then turn on write support).

Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/46947/focus=47278
[2] http://thread.gmane.org/gmane.comp.version-control.git/52813/focus=52908
[3] http://thread.gmane.org/gmane.comp.version-control.git/53494

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 17:44   ` Matthieu Moy
@ 2011-02-01 18:42     ` Jonathan Nieder
  2011-02-01 20:23       ` Matthieu Moy
  0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Nieder @ 2011-02-01 18:42 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Dmitry S. Kravtsov, git, Jakub Narebski

Matthieu Moy wrote:

>>> support for tracking empty directories
[...]
> http://thread.gmane.org/gmane.comp.version-control.git/56310/focus=56348

Thanks!  I followed up a bit on this lead in the "tracking empty
directories" thread.

>>> advisory locking / "this file is being edited"
[...]
> That needs to be integrated with Git, but it should probably still
> remain out of the Git core, because different users would want
> different locking databases. Hooks and git-* commands in the $PATH are
> probably sufficient.

Yes, a nice side effect could be the addition of a couple of hooks.

An rcs-style workflow (lockable files absent from worktree until
locked) on top of git sounds fun.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-01 18:35         ` Jonathan Nieder
@ 2011-02-01 19:03           ` Jakub Narebski
  2011-02-02  3:54             ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Narebski @ 2011-02-01 19:03 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Nguyen Thai Ngoc Duy, Dmitry S. Kravtsov, git, Shawn Pearce

Dnia wtorek 1. lutego 2011 19:35, Jonathan Nieder napisał:
> Nguyen Thai Ngoc Duy wrote:
>> On Wed, Feb 2, 2011 at 12:28 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> 
>>> I have ideas about how to resolve those tricky corner cases, but not
>>> about what the UI should look like.  How does one go about adding a
>>> directory?  Does it ever get implicitly removed?
>>
>> I suppose a special command for it is appropriate (git-keepdir?). Many
>> index-related commands are recursive by default and hard to change.
>>
>> Yes I think it should be automatically removed from index when a file
>> is added inside tracked directories. Removing those files will also
>> remove the containing directory though.
> 
> Okay, I'm convinced.  This fits a "worse is better" point of view
> nicely.
> 
> To add, one would use "git update-index --add".

Porcelain version could be "git add -N <directory>", don't you agree?

> The magic disappears when you register a file within that directory;
> to tell git you want to keep it, one would mkdir and
> "git update-index --add" again.  Once it's working, we can think about
> if there is a need for making that last step automatic after all
> (my guess: "no"). ;-) 

Hmmm... could we use mechanism similar to assume-unchanged to mark
directory as explicitely tracked, and that git should not remove it
when it becomes empty?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-01 18:31           ` Jakub Narebski
@ 2011-02-01 19:09             ` Ilari Liusvaara
  0 siblings, 0 replies; 36+ messages in thread
From: Ilari Liusvaara @ 2011-02-01 19:09 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Nguyen Thai Ngoc Duy, Jonathan Nieder, Dmitry S. Kravtsov, git,
	Shawn Pearce

On Tue, Feb 01, 2011 at 07:31:38PM +0100, Jakub Narebski wrote:
> Dnia wtorek 1. lutego 2011 19:15, Ilari Liusvaara napisał:
> > 
> > Worse problem than the index: Tree entries. Those are actually transferable
> > and IIRC older (current?) git versions don't handle empty subdirectories
> > (pointing entry of type directory to empty tree hash) all too well...
> 
> What did you mean by "don't handle" here?  The following entry
> 
>   040000 tree 22d5826c087c4b9dcc72e2131c2cfb061403f7eb	empty
> 
> should be not a problem; empty tree is hardcoded and also shouldn't there
> be a problem with such object.  Is the problem when checking out such tree
> (writing to index and/or working area)?

Yes, writing to index/working area. IIRC, having such entry in tree causes
a "ghost directory". I don't exactly recall what such thing broke, but I
remember that it broke something (merging?)...

Those ghosts also had annoying tendency to persist between commits. Commits
didn't kill them. Rm didn't work. You had to create something on top/inside to
get rid of them.

> > Worse yet, there isn't easy way to break the tree parser to avoid current
> > git versions from screwing things up (IIRC, when I tested, invalid octal
> > numbers finally broke it, invalid file types didn't do the trick)...
> 
> Well, then 1.8.0 version could be good place to break backwards 
> compatibility; we did similar thing when introducing submodule entries,
> isn't it?

Hint: Entry of mode "88888" blows up the tree parser nicely... :-)

At the same time, it could be useful to have manually tracked directories
(incidate via "sticky" bit of tree entry mode?)

-Ilari

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 18:42     ` Jonathan Nieder
@ 2011-02-01 20:23       ` Matthieu Moy
  0 siblings, 0 replies; 36+ messages in thread
From: Matthieu Moy @ 2011-02-01 20:23 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Dmitry S. Kravtsov, git, Jakub Narebski

Jonathan Nieder <jrnieder@gmail.com> writes:

> An rcs-style workflow (lockable files absent from worktree until
> locked) on top of git sounds fun.

They need not be absent, they can just be read-only (and lock would do
a chmod u+w on the file).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 17:05       ` Nguyen Thai Ngoc Duy
@ 2011-02-01 21:27         ` Junio C Hamano
  2011-02-01 21:44         ` Nicolas Pitre
  1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2011-02-01 21:27 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Shawn Pearce, Jakub Narebski, Jonathan Nieder, Dmitry S. Kravtsov, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Tue, Feb 1, 2011 at 11:27 PM, Shawn Pearce <spearce@spearce.org> wrote:
> ...
>> I think Junio has already started thinking about this one.
>
> I need to get nd/pathspec right and implement negative pathspecs
> before returning to this feature.

I don't think we need negative pathspecs before going forward.

I wanted a unified "We have a path; is it inside this set of pathspecs?"
(and its sibling, "We have a leading path and a name_entry taken from that
tree; is it inside this set of pathspecs?"), and with that we can run:

	$ git clone git://k.org/pub/scm/git/git.git -- Documentation '*.sh'

that would limit the clone (not just checkout) to the given parts of the
tree.  By recording the pathspecs in the repository (and initially making
it frozen---we can design extending the scope in later rounds), we can
limit "fsck", "unpack-trees", "log", etc. all using the unified pathspec
API.

We may later want to add negative or imaginary pathspecs to the mix, but
as long as the unified pathspec API understands that, the narrow-clone
part should be able to be unaware of that.

So I think that is (or at least _should be_ if the pathspec API is done
right) pretty much orthogonal.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 13:51   ` Jakub Narebski
                       ` (2 preceding siblings ...)
  2011-02-01 17:28     ` Tracking empty directories Jonathan Nieder
@ 2011-02-01 21:36     ` Nicolas Pitre
  2011-02-01 22:50     ` big files in git was: " david
  4 siblings, 0 replies; 36+ messages in thread
From: Nicolas Pitre @ 2011-02-01 21:36 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jonathan Nieder, Dmitry S. Kravtsov, git

On Tue, 1 Feb 2011, Jakub Narebski wrote:
> On Sun, 30 Jan 2011, Jonathan Nieder wrote:
> > Dmitry S. Kravtsov wrote:
> > 
> > > resumable clone/fetch (and other remote operations)
> > 
> > Jakub Narebski seems to be interested in this and Nicolas Pitre has
> > given some good advice about it.  You can get something usable today
> > by putting up a git bundle for download over HTTP or rsync, so it is
> > possible that this just involves some UI (porcelain) and documentation
> > work to become standard practice.
> 
> I wouldn't say that: it is Nicolas Pitre (IIRC) who was doing the work;
> I was only interested party posting comments, but no code.

No, I'm not working on that.  I provided suggestions on how to go about 
it in the past:

1) The git-archive based solution:
   http://article.gmane.org/gmane.comp.version-control.git/126431
   Relatively simple to implement, with questionable efficiency if you 
   care about the full history, but perfectly suited for shallow clones 
   which is what people with flaky connections should aim for anyway.

2) The bundle based solution:
   http://article.gmane.org/gmane.comp.version-control.git/164699
   (see towards the end of the message)
   This was about BitTorrent distribution, but any resumable transport 
   can be applied to the bundle.
   Extremely simple to implement, as this all can be scripted on top of 
   existing tools.  Good for the bulk of history, but there is always a 
   risk for problems during the update of the repository from the 
   bundle's state up to the most recent commits which has to fall back 
   to the non resumable smart Git protocol.

There is also some possibility that the cache pack work might be 
leveraged to provide a resumable clone solution similar to #2 above, but 
that would of course share the same flaws.

> Again, this feature is not very easy to implement, and would require 
> knowledge of git internals including "smart" git transport ("Pro Git"
> book can help there).

The two proposed solutions above require no prior knowledge of the smart 
Git protocol, and they should be pretty simple to implement. Certainly 
in the reach of a GSOC student.

> > > GitTorrent Protocol, or git-mirror
> > 
> > Sam Vilain and Jonas Fonseca did some good work on this, but it's
> > stalled.
> 
> There was some recent discussion on this on git mailing llist, but
> without any code.
> 
> One would need to know similar areas as for "resumable clone" feature.
> Plus some knowledge on P2P transport in GitTorrent case.

Again, please see 

http://article.gmane.org/gmane.comp.version-control.git/164699

This is simple, and with guaranteed results.  Why no one was interested 
in implementing that yet I don't know.


Nicolas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 17:05       ` Nguyen Thai Ngoc Duy
  2011-02-01 21:27         ` Junio C Hamano
@ 2011-02-01 21:44         ` Nicolas Pitre
  1 sibling, 0 replies; 36+ messages in thread
From: Nicolas Pitre @ 2011-02-01 21:44 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Shawn Pearce, Jakub Narebski, Jonathan Nieder,
	Dmitry S. Kravtsov, git, Junio C Hamano

On Wed, 2 Feb 2011, Nguyen Thai Ngoc Duy wrote:

>  - push support for shallow clone (Elijah approach does not base on
> shallow clone, so it's non-issue)

Push support from a shallow clone should be really simple.  Either you 
can update the remote, or you can't.  If your local history does include 
the remote branch head you wish to update then there is nothing 
currently that would prevent the push from proceeding as implemented 
today.  If the remote head is outside the history subset you have 
locally then the push simply cannot proceed.


Nicolas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 17:34         ` Shawn Pearce
@ 2011-02-01 21:51           ` Nicolas Pitre
  2011-02-02  0:26             ` Shawn Pearce
  2011-02-03 14:38             ` Geert Bosch
  0 siblings, 2 replies; 36+ messages in thread
From: Nicolas Pitre @ 2011-02-01 21:51 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Nguyen Thai Ngoc Duy, Jakub Narebski, Jonathan Nieder,
	Dmitry S. Kravtsov, git

On Tue, 1 Feb 2011, Shawn Pearce wrote:

> On Tue, Feb 1, 2011 at 09:11, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> > Narrow/Subtree clone is still just an idea, but can pack cache support
> > be made to resumable initial narrow clone too?
> 
> This would be very hard to do.  We could do cached packs for a popular
> set of path specifications (e.g. Documentation/ if documentation only
> editing is common), but once we start getting random requests for path
> specifications that we cannot predict in advance and pre-pack we'd
> have to fall back to the normal enumerate code path.

Also... people interested in Narrow clones are likely to be shallow 
clone users too, right?


Nicolas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* big files in git was: Re: Features from GitSurvey 2010
  2011-02-01 13:51   ` Jakub Narebski
                       ` (3 preceding siblings ...)
  2011-02-01 21:36     ` Features from GitSurvey 2010 Nicolas Pitre
@ 2011-02-01 22:50     ` david
  2011-02-03  6:25       ` Nicolas Pitre
  4 siblings, 1 reply; 36+ messages in thread
From: david @ 2011-02-01 22:50 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jonathan Nieder, Dmitry S. Kravtsov, git

On Tue, 1 Feb 2011, Jakub Narebski wrote:

> On Sun, 30 Jan 2011, Jonathan Nieder wrote:
>> Hi Dmitry,
>>
>> Dmitry S. Kravtsov wrote:
>>
>>> I want to dedicate my coursework at University to implementation of
>>> some useful git feature. So I'm interesting in some kind of list of
>>> development status of these features
>> [...]
>>> Or I'll be glad to know what features are now 'free' and what are
>>> currently in active development.
>>
>> Interesting question.  The short answer is that they are all "free".
>> Generally people seem to be happy to learn of an alternative approach
>> to what they have been working on.
>>
>> [For the following pointers, the easiest way to follow up is probably
>> to search the mailing list archives.]
>>
>>> better support for big files (large media)
>>
>> For a conservative approach, you might want to get in touch with Sam
>> Hocevar, Nicolas Pitre, and Miklos Vajna.  The idea is to stream big
>> files directly to pack and not waste time trying to compress them.
>
> There is also, supposedly stalled, git-bigfiles project.

why is the clean/smudge approach that came through the list a week or two 
ago not acceptable?

While people talked about how it would be nice to store the large files on 
$remote_destination, just create a .git/bigfiles and store them in there.

with the ability to pass the filename to the clean/smudge scripts, you can 
even avoid the copy (replacing it with a mv) and have a working, if 
bare-bones system.

Then people can create/submit enhanced versions of these scripts that 
store the large files elsewhere if they want, but we would be past the 
"git can't handle large files" into "git handles large files less 
efficiently", which is a much better place to be.

If nobody else has time to take those e-mails and create a set of 
clean/smudge scripts, I'll do so later this week (unless there is some 
reason why they wouldn't be acceptable)

I guess the only question is how to tell what files need to be handled 
this way, but can't we have something in .gitattributes about the file 
size? (and if that's a problem for checking files out, have the stored 
file be a sparse file, that way it's large, but doesn't take much space on 
sane filesystems)

David Lang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 21:51           ` Nicolas Pitre
@ 2011-02-02  0:26             ` Shawn Pearce
  2011-02-02  2:11               ` Nicolas Pitre
  2011-02-03 14:38             ` Geert Bosch
  1 sibling, 1 reply; 36+ messages in thread
From: Shawn Pearce @ 2011-02-02  0:26 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Nguyen Thai Ngoc Duy, Jakub Narebski, Jonathan Nieder,
	Dmitry S. Kravtsov, git

On Tue, Feb 1, 2011 at 13:51, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Tue, 1 Feb 2011, Shawn Pearce wrote:
>
>> On Tue, Feb 1, 2011 at 09:11, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>> > Narrow/Subtree clone is still just an idea, but can pack cache support
>> > be made to resumable initial narrow clone too?
>>
>> This would be very hard to do.  We could do cached packs for a popular
>> set of path specifications (e.g. Documentation/ if documentation only
>> editing is common), but once we start getting random requests for path
>> specifications that we cannot predict in advance and pre-pack we'd
>> have to fall back to the normal enumerate code path.
>
> Also... people interested in Narrow clones are likely to be shallow
> clone users too, right?

I think that depends.  Some users might want the full history of the
files they are working on.  Others wouldn't care and just want the tip
revision so they can make changes.  Obviously a shallow clone of depth
1 is very cheap to implement on the server; there really isn't any
caching required.

Probably 50% want full history, 50% want shallow clone.  So I doubt we
can assume that narrow implies shallow and thus is cheap.  :-(

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-02  0:26             ` Shawn Pearce
@ 2011-02-02  2:11               ` Nicolas Pitre
  2011-02-02  2:23                 ` david
  0 siblings, 1 reply; 36+ messages in thread
From: Nicolas Pitre @ 2011-02-02  2:11 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Nguyen Thai Ngoc Duy, Jakub Narebski, Jonathan Nieder,
	Dmitry S. Kravtsov, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1359 bytes --]

On Tue, 1 Feb 2011, Shawn Pearce wrote:

> On Tue, Feb 1, 2011 at 13:51, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Tue, 1 Feb 2011, Shawn Pearce wrote:
> >
> >> On Tue, Feb 1, 2011 at 09:11, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> >> > Narrow/Subtree clone is still just an idea, but can pack cache support
> >> > be made to resumable initial narrow clone too?
> >>
> >> This would be very hard to do.  We could do cached packs for a popular
> >> set of path specifications (e.g. Documentation/ if documentation only
> >> editing is common), but once we start getting random requests for path
> >> specifications that we cannot predict in advance and pre-pack we'd
> >> have to fall back to the normal enumerate code path.
> >
> > Also... people interested in Narrow clones are likely to be shallow
> > clone users too, right?
> 
> I think that depends.  Some users might want the full history of the
> files they are working on.  Others wouldn't care and just want the tip
> revision so they can make changes.  Obviously a shallow clone of depth
> 1 is very cheap to implement on the server; there really isn't any
> caching required.
> 
> Probably 50% want full history, 50% want shallow clone.  So I doubt we
> can assume that narrow implies shallow and thus is cheap.  :-(

Let's see what happens when this gets used in the wild.


Nicolas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-02  2:11               ` Nicolas Pitre
@ 2011-02-02  2:23                 ` david
  0 siblings, 0 replies; 36+ messages in thread
From: david @ 2011-02-02  2:23 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	Jonathan Nieder, Dmitry S. Kravtsov, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1659 bytes --]

On Tue, 1 Feb 2011, Nicolas Pitre wrote:

> On Tue, 1 Feb 2011, Shawn Pearce wrote:
>
>> On Tue, Feb 1, 2011 at 13:51, Nicolas Pitre <nico@fluxnic.net> wrote:
>>> On Tue, 1 Feb 2011, Shawn Pearce wrote:
>>>
>>>> On Tue, Feb 1, 2011 at 09:11, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>>>>> Narrow/Subtree clone is still just an idea, but can pack cache support
>>>>> be made to resumable initial narrow clone too?
>>>>
>>>> This would be very hard to do.  We could do cached packs for a popular
>>>> set of path specifications (e.g. Documentation/ if documentation only
>>>> editing is common), but once we start getting random requests for path
>>>> specifications that we cannot predict in advance and pre-pack we'd
>>>> have to fall back to the normal enumerate code path.
>>>
>>> Also... people interested in Narrow clones are likely to be shallow
>>> clone users too, right?
>>
>> I think that depends.  Some users might want the full history of the
>> files they are working on.  Others wouldn't care and just want the tip
>> revision so they can make changes.  Obviously a shallow clone of depth
>> 1 is very cheap to implement on the server; there really isn't any
>> caching required.
>>
>> Probably 50% want full history, 50% want shallow clone.  So I doubt we
>> can assume that narrow implies shallow and thus is cheap.  :-(
>
> Let's see what happens when this gets used in the wild.

also, many users may assume that a full clone is very expensive, but for 
code-based projects a full clone is usually comparable to the size to 
download a single tarball.

if you have large binary files this changes, but most projects don't.

David Lang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-01 19:03           ` Jakub Narebski
@ 2011-02-02  3:54             ` Nguyen Thai Ngoc Duy
  2011-02-02 12:31               ` Kevin P. Fleming
  0 siblings, 1 reply; 36+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-02-02  3:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jonathan Nieder, Dmitry S. Kravtsov, git, Shawn Pearce

On Wed, Feb 2, 2011 at 2:03 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> To add, one would use "git update-index --add".
>
> Porcelain version could be "git add -N <directory>", don't you agree?

"git add" is recursive, with or without -N. What I worry is user
accidentally "git add -N <dir>" where <dir> is not empty, which adds
everything in <dir>.

>> The magic disappears when you register a file within that directory;
>> to tell git you want to keep it, one would mkdir and
>> "git update-index --add" again.  Once it's working, we can think about
>> if there is a need for making that last step automatic after all
>> (my guess: "no"). ;-)
>
> Hmmm... could we use mechanism similar to assume-unchanged to mark
> directory as explicitely tracked, and that git should not remove it
> when it becomes empty?

I think git-attr suits better, more persistent. Although if you insist
the directory must stay, why not just put a hidden file in there?
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Tracking empty directories
  2011-02-02  3:54             ` Nguyen Thai Ngoc Duy
@ 2011-02-02 12:31               ` Kevin P. Fleming
  0 siblings, 0 replies; 36+ messages in thread
From: Kevin P. Fleming @ 2011-02-02 12:31 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jakub Narebski, Jonathan Nieder, Dmitry S. Kravtsov, git, Shawn Pearce

On 02/01/2011 09:54 PM, Nguyen Thai Ngoc Duy wrote:
> On Wed, Feb 2, 2011 at 2:03 AM, Jakub Narebski<jnareb@gmail.com>  wrote:
>>> To add, one would use "git update-index --add".
>>
>> Porcelain version could be "git add -N<directory>", don't you agree?
>
> "git add" is recursive, with or without -N. What I worry is user
> accidentally "git add -N<dir>" where<dir>  is not empty, which adds
> everything in<dir>.
>
>>> The magic disappears when you register a file within that directory;
>>> to tell git you want to keep it, one would mkdir and
>>> "git update-index --add" again.  Once it's working, we can think about
>>> if there is a need for making that last step automatic after all
>>> (my guess: "no"). ;-)
>>
>> Hmmm... could we use mechanism similar to assume-unchanged to mark
>> directory as explicitely tracked, and that git should not remove it
>> when it becomes empty?
>
> I think git-attr suits better, more persistent. Although if you insist
> the directory must stay, why not just put a hidden file in there?

That's what I do now... in fact, since the empty directory needs to 
exist in checkouts *and* be empty, adding a .gitignore file with content 
'*' works quite well.

-- 
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
skype: kpfleming | jabber: kfleming@digium.com
Check us out at www.digium.com & www.asterisk.org

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: big files in git was: Re: Features from GitSurvey 2010
  2011-02-01 22:50     ` big files in git was: " david
@ 2011-02-03  6:25       ` Nicolas Pitre
  0 siblings, 0 replies; 36+ messages in thread
From: Nicolas Pitre @ 2011-02-03  6:25 UTC (permalink / raw)
  To: david; +Cc: Jakub Narebski, Jonathan Nieder, Dmitry S. Kravtsov, git

On Tue, 1 Feb 2011, david@lang.hm wrote:

> On Tue, 1 Feb 2011, Jakub Narebski wrote:
> 
> > There is also, supposedly stalled, git-bigfiles project.
> 
> why is the clean/smudge approach that came through the list a week or two ago
> not acceptable?

No idea.

I suppose that's because it is not complicated enough to actually be 
interesting.  This is like my suggestion for simply distributing bundles 
with BitTorrent.

> If nobody else has time to take those e-mails and create a set of clean/smudge
> scripts, I'll do so later this week (unless there is some reason why they
> wouldn't be acceptable)

Please do so.  The contrib directory would be a pretty good place to put 
them.

> I guess the only question is how to tell what files need to be handled this
> way, but can't we have something in .gitattributes about the file size?

Surely.  There is even a core.bigFileThreshold config variable already 
which could be reused right away for this purpose.


Nicolas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-01 21:51           ` Nicolas Pitre
  2011-02-02  0:26             ` Shawn Pearce
@ 2011-02-03 14:38             ` Geert Bosch
  2011-02-03 17:39               ` Narrow clone (Re: features from GitSurvey 2010) Jonathan Nieder
  2011-02-03 21:33               ` Features from GitSurvey 2010 Nicolas Pitre
  1 sibling, 2 replies; 36+ messages in thread
From: Geert Bosch @ 2011-02-03 14:38 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	Jonathan Nieder, Dmitry S. Kravtsov, git


On Feb 1, 2011, at 16:51, Nicolas Pitre wrote:

> Also... people interested in Narrow clones are likely to be shallow 
> clone users too, right?

Not necessarily. Many corporate repositories are huge (caused by
the concept of 1 central repository with everything in it) and have
tons of crud (like marketing materials, media-heavy powerpoint
presentations).  Here you really want a narrow clone (such as the
sources of the project you're working on), but don't mind having
the whole history.

Looking at it from another angle: typically the whole history of a
project is not much bigger than a check out, so it is fine to have
a deep history. On the other hand, for these monster repositories
one would typically do a narrow clone of only a single subdirectory
that may be more than an order of magnitude smaller.

These narrow clones are especially important for imports of unwieldy
svn repositories where there is a large amount of unstructured
branching.

Regards,
   -Geert

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Narrow clone (Re: features from GitSurvey 2010)
  2011-02-03 14:38             ` Geert Bosch
@ 2011-02-03 17:39               ` Jonathan Nieder
  2011-02-03 21:23                 ` Geert Bosch
  2011-02-03 21:33               ` Features from GitSurvey 2010 Nicolas Pitre
  1 sibling, 1 reply; 36+ messages in thread
From: Jonathan Nieder @ 2011-02-03 17:39 UTC (permalink / raw)
  To: Geert Bosch
  Cc: Nicolas Pitre, Shawn Pearce, Nguyen Thai Ngoc Duy,
	Jakub Narebski, Dmitry S. Kravtsov, git

Geert Bosch wrote:

> These narrow clones are especially important for imports of unwieldy
> svn repositories where there is a large amount of unstructured
> branching.

Wouldn't a more careful import be a better solution to that problem?
Practically speaking, I'd rather work with an enormous svn repo like
that by using git-svn to extract subsets than with a botched import
that treats it as one huge (git-managed) project.

svn-all-fast-import, for example, has a fairly simple configuration
syntax allowing to extract whatever subrepositories are needed.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Narrow clone (Re: features from GitSurvey 2010)
  2011-02-03 17:39               ` Narrow clone (Re: features from GitSurvey 2010) Jonathan Nieder
@ 2011-02-03 21:23                 ` Geert Bosch
  2011-02-03 21:33                   ` Jonathan Nieder
  2011-02-03 21:38                   ` Jonathan Nieder
  0 siblings, 2 replies; 36+ messages in thread
From: Geert Bosch @ 2011-02-03 21:23 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Nicolas Pitre, Shawn Pearce, Nguyen Thai Ngoc Duy,
	Jakub Narebski, Dmitry S. Kravtsov, git


On Feb 3, 2011, at 12:39, Jonathan Nieder wrote:

> Geert Bosch wrote:
> 
>> These narrow clones are especially important for imports of unwieldy
>> svn repositories where there is a large amount of unstructured
>> branching.
> 
> Wouldn't a more careful import be a better solution to that problem?

Yes.

> Practically speaking, I'd rather work with an enormous svn repo like
> that by using git-svn to extract subsets than with a botched import
> that treats it as one huge (git-managed) project.

Practically speaking, I don't always have a say in the organization
of repositories that I have to work with. Some would rather
spend 30+ days of CPU time to import an entire SVN repository with
branch forests straight into git than considering organization.
Of course the resulting git repository will be less than useful.
And that's where the narrow clone comes in handy...

  -Geert

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Narrow clone (Re: features from GitSurvey 2010)
  2011-02-03 21:23                 ` Geert Bosch
@ 2011-02-03 21:33                   ` Jonathan Nieder
  2011-02-03 21:38                   ` Jonathan Nieder
  1 sibling, 0 replies; 36+ messages in thread
From: Jonathan Nieder @ 2011-02-03 21:33 UTC (permalink / raw)
  To: Geert Bosch
  Cc: Nicolas Pitre, Shawn Pearce, Nguyen Thai Ngoc Duy,
	Jakub Narebski, Dmitry S. Kravtsov, git

Geert Bosch wrote:

>                                           Some would rather
> spend 30+ days of CPU time to import an entire SVN repository with
> branch forests straight into git than considering organization.
> Of course the resulting git repository will be less than useful.
> And that's where the narrow clone comes in handy...

Right, narrow clone gives them an excuse to do it.  Ergo we should
not have narrow clone. ;-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Features from GitSurvey 2010
  2011-02-03 14:38             ` Geert Bosch
  2011-02-03 17:39               ` Narrow clone (Re: features from GitSurvey 2010) Jonathan Nieder
@ 2011-02-03 21:33               ` Nicolas Pitre
  1 sibling, 0 replies; 36+ messages in thread
From: Nicolas Pitre @ 2011-02-03 21:33 UTC (permalink / raw)
  To: Geert Bosch
  Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	Jonathan Nieder, Dmitry S. Kravtsov, git

On Thu, 3 Feb 2011, Geert Bosch wrote:

> 
> On Feb 1, 2011, at 16:51, Nicolas Pitre wrote:
> 
> > Also... people interested in Narrow clones are likely to be shallow 
> > clone users too, right?
> 
> Not necessarily. Many corporate repositories are huge (caused by
> the concept of 1 central repository with everything in it) and have
> tons of crud (like marketing materials, media-heavy powerpoint
> presentations).  Here you really want a narrow clone (such as the
> sources of the project you're working on), but don't mind having
> the whole history.

OK.  I was asking just to see if the cache pack concept might have to 
cater for that case too.  but let's wait for proper narrow clone support 
first.


Nicolas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Narrow clone (Re: features from GitSurvey 2010)
  2011-02-03 21:23                 ` Geert Bosch
  2011-02-03 21:33                   ` Jonathan Nieder
@ 2011-02-03 21:38                   ` Jonathan Nieder
  1 sibling, 0 replies; 36+ messages in thread
From: Jonathan Nieder @ 2011-02-03 21:38 UTC (permalink / raw)
  To: Geert Bosch
  Cc: Nicolas Pitre, Shawn Pearce, Nguyen Thai Ngoc Duy,
	Jakub Narebski, Dmitry S. Kravtsov, git

Geert Bosch wrote:

> Of course the resulting git repository will be less than useful.
> And that's where the narrow clone comes in handy...

Side note: while clearly I don't consider this particular use case to
be a strong motivation, there are other use cases that do strongly
motivate the feature.  And if this particular use case happens to
motivate someone to work on it, I'd be happy for it still.  I just
hope the documentation does not encourage people to do such things.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-02-03 21:38 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-29 10:01 Features from GitSurvey 2010 Dmitry S. Kravtsov
2011-01-29 23:13 ` Jonathan Nieder
2011-02-01 13:51   ` Jakub Narebski
2011-02-01 15:52     ` Nguyen Thai Ngoc Duy
2011-02-01 16:33       ` Shawn Pearce
2011-02-01 16:27     ` Shawn Pearce
2011-02-01 17:05       ` Nguyen Thai Ngoc Duy
2011-02-01 21:27         ` Junio C Hamano
2011-02-01 21:44         ` Nicolas Pitre
2011-02-01 17:11       ` Nguyen Thai Ngoc Duy
2011-02-01 17:34         ` Shawn Pearce
2011-02-01 21:51           ` Nicolas Pitre
2011-02-02  0:26             ` Shawn Pearce
2011-02-02  2:11               ` Nicolas Pitre
2011-02-02  2:23                 ` david
2011-02-03 14:38             ` Geert Bosch
2011-02-03 17:39               ` Narrow clone (Re: features from GitSurvey 2010) Jonathan Nieder
2011-02-03 21:23                 ` Geert Bosch
2011-02-03 21:33                   ` Jonathan Nieder
2011-02-03 21:38                   ` Jonathan Nieder
2011-02-03 21:33               ` Features from GitSurvey 2010 Nicolas Pitre
2011-02-01 17:28     ` Tracking empty directories Jonathan Nieder
2011-02-01 17:54       ` Nguyen Thai Ngoc Duy
2011-02-01 18:15         ` Ilari Liusvaara
2011-02-01 18:31           ` Jakub Narebski
2011-02-01 19:09             ` Ilari Liusvaara
2011-02-01 18:35         ` Jonathan Nieder
2011-02-01 19:03           ` Jakub Narebski
2011-02-02  3:54             ` Nguyen Thai Ngoc Duy
2011-02-02 12:31               ` Kevin P. Fleming
2011-02-01 21:36     ` Features from GitSurvey 2010 Nicolas Pitre
2011-02-01 22:50     ` big files in git was: " david
2011-02-03  6:25       ` Nicolas Pitre
2011-02-01 17:44   ` Matthieu Moy
2011-02-01 18:42     ` Jonathan Nieder
2011-02-01 20:23       ` Matthieu Moy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.