On Sun, Apr 05, 2009 at 12:17:03PM -0700, Shawn O. Pearce wrote: > Another option is to use rsync:// for initial clones. > git clone rsync://git.gentoo.org/tree.git > rsync should be more efficient at dragging 1.6GiB over the network, > as its only streaming the files. But it may fall over if the server > has a lot of loose objects; many more small files to create. I just tried this, and ran into a segfault. Original command: # git clone rsync://git.overlays.gentoo.org/vcs-public-gitroot/exp/gentoo-x86.git It looks at a glance like the linked list has a null value it hits during the internal while loop, not checking 'list' before using 'list->next'. gdb> bt #0 strcmp () at ../sysdeps/x86_64/strcmp.S:30 #1 0x000000000049474c in get_refs_via_rsync (transport=, for_push=) at transport.c:123 #2 0x000000000049234c in transport_get_remote_refs (transport=0x725fc9) at transport.c:1045 #3 0x000000000041620a in cmd_clone (argc=, argv=0x7fff908c8550, prefix=) at builtin-clone.c:487 #4 0x0000000000404f59 in handle_internal_command (argc=0x2, argv=0x7fff908c8550) at git.c:244 #5 0x0000000000405167 in main (argc=0x2, argv=0x7fff908c8550) at git.c:434 gdb> up #1 0x000000000049474c in get_refs_via_rsync (transport=, for_push=) at transport.c:123 123 (cmp = strcmp(buffer + 41, gdb> print list $1 = {nr = 0x0, alloc = 0x0, name = 0x0} If I go into the repo thereafter and manually run git-fetch again, it does work fine. > One way around that would be to use two repositories on the server; > a historical repository that is fully packed and contains the full > history, and a bleeding edge repository that users would normally > work against: Yup, we've been considering similar. We do have one specific need with that however: to prevent resource abuse, we would like to DENY the ability to do the initial clone with git:// then - just so that nobody tries to DoS our servers by doing a couple of hungry initial clones at once. > That caching GSoC project may help, but didn't I see earlier in > this thread that you have >4.8 million objects in your repository? > Any proposals on that project would still have Git malloc()'ing > data per object; its ~80 bytes per object needed so that's a data > segment of 384+ MiB, per concurrent clone client. 384MiB or even 512MiB I can cover. It's the 200+ wallclock minutes of cpu burn with no download that aren't acceptable. P.S. The -v output of the rsync-mode git-fetch is very devoid of output. Can we maybe pipe the rsync progress back? -- Robin Hugh Johnson Gentoo Linux Developer & Infra Guy E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85