git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: David Turner <dturner@twopensource.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 18/19] index-helper: autorun
Date: Fri, 18 Mar 2016 14:44:41 +0700	[thread overview]
Message-ID: <CACsJy8Amdr-2WqwYjYjyaag0jR_pq=h36QFKMk3BYQmE_A-DOw@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1603180752540.4690@virtualbox>

On Fri, Mar 18, 2016 at 2:14 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Fri, 18 Mar 2016, Duy Nguyen wrote:
>
>> On Thu, Mar 17, 2016 at 9:43 PM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>>
>> > I know of use cases where the index weighs 300MB, and falling back to
>> > reading it directly *really* hurts.
>>
>> For crying out loud, what do you store in that repo? What I have in
>> mind for all these works are indexes in 10MB range, or maybe 50MB max.
>
> Welcome to the real world.
>
>> Very unscientifically, git.git index is about 274kb and contains ~3000
>> entries, so 94 bytes per entry on average.
>
> In terms of software projects' size, git.git is but a toy. Most developers
> deal with vastly larger (and often messier) repositories. This is
> especially true outside Open Source. Even the Linux kernel's repository is
> *tiny* compared to real-world repositories.
>
> I am sure that David could tell many a tale about repository/working
> directory size, too.

I know a few real-world repos, decades old, but I don't remember if
any of them reached this size. Did I get that 3 million entry number
right? Because even my whole /home uses over 1m inodes. And whole /usr
only has 300k files. If the number of entries is lower, maybe there's
some improvement we can do to reduce index size a bit. There's some
fast compression we can do, for starter.

> So yeah, this is the challenge: to make Git work at real-world scale
> (didn't we hear a lot about this at the latest Git Merge?)

I'm all for making Junio cry by using Git for what it is (or was) not
intended for, but this seems too much. A repo about 500k files or
less, I think I can deal with,  not those in million range.
-- 
Duy

  reply	other threads:[~2016-03-18  7:45 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
2016-03-09 18:36 ` [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics David Turner
2016-03-09 22:58   ` Junio C Hamano
2016-03-10  0:05     ` David Turner
2016-03-10 10:59       ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 02/19] read-cache.c: fix constness of verify_hdr() David Turner
2016-03-09 18:36 ` [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading David Turner
2016-03-09 23:02   ` Junio C Hamano
2016-03-10  0:09     ` David Turner
2016-03-09 18:36 ` [PATCH 04/19] index-helper: new daemon for caching index and related stuff David Turner
2016-03-09 23:09   ` Junio C Hamano
2016-03-09 23:21     ` Junio C Hamano
2016-03-10  0:01       ` David Turner
2016-03-10 11:17       ` Duy Nguyen
2016-03-10 20:22         ` David Turner
2016-03-11  1:11           ` Duy Nguyen
2016-03-10  0:18     ` David Turner
2016-03-15 11:56     ` Duy Nguyen
2016-03-15 15:56       ` Junio C Hamano
2016-03-15 11:52   ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 05/19] trace.c: add GIT_TRACE_INDEX_STATS for index statistics David Turner
2016-03-09 18:36 ` [PATCH 06/19] index-helper: add --strict David Turner
2016-03-09 18:36 ` [PATCH 07/19] daemonize(): set a flag before exiting the main process David Turner
2016-03-09 18:36 ` [PATCH 08/19] index-helper: add --detach David Turner
2016-03-09 18:36 ` [PATCH 09/19] index-helper: add Windows support David Turner
2016-03-16 11:42   ` Duy Nguyen
2016-03-17 12:18     ` Johannes Schindelin
2016-03-17 12:59       ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 10/19] read-cache: add watchman 'WAMA' extension David Turner
2016-03-09 18:36 ` [PATCH 11/19] Add watchman support to reduce index refresh cost David Turner
2016-03-09 18:36 ` [PATCH 12/19] read-cache: allow index-helper to prepare shm before git reads it David Turner
2016-03-09 18:36 ` [PATCH 13/19] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-03-09 18:36 ` [PATCH 14/19] update-index: enable/disable watchman support David Turner
2016-03-09 18:36 ` [PATCH 15/19] unpack-trees: preserve index extensions David Turner
2016-03-09 18:36 ` [PATCH 16/19] index-helper: rewrite pidfile after daemonizing David Turner
2016-03-09 18:36 ` [PATCH 17/19] index-helper: process management David Turner
2016-03-09 18:36 ` [PATCH 18/19] index-helper: autorun David Turner
2016-03-15 12:12   ` Duy Nguyen
2016-03-15 14:26     ` Johannes Schindelin
2016-03-16 11:37       ` Duy Nguyen
2016-03-16 18:11       ` David Turner
2016-03-16 18:27         ` Johannes Schindelin
2016-03-17 13:02           ` Duy Nguyen
2016-03-17 14:43             ` Johannes Schindelin
2016-03-17 18:31               ` David Turner
2016-03-18  0:50               ` Duy Nguyen
2016-03-18  7:14                 ` Johannes Schindelin
2016-03-18  7:44                   ` Duy Nguyen [this message]
2016-03-18 17:22                     ` David Turner
2016-03-18 23:09                       ` Duy Nguyen
2016-03-18  7:17                 ` Johannes Schindelin
2016-03-18  7:34                   ` Duy Nguyen
2016-03-18 15:57                     ` Johannes Schindelin
2016-03-09 18:36 ` [PATCH 19/19] hack: watchman/untracked cache mashup David Turner
2016-03-15 12:31   ` Duy Nguyen
2016-03-17  0:56     ` David Turner
2016-03-17 13:06       ` Duy Nguyen
2016-03-17 18:08         ` David Turner
2016-03-29 17:09 ` [PATCH 00/19] index-helper, watchman Torsten Bögershausen
2016-03-29 21:51   ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8Amdr-2WqwYjYjyaag0jR_pq=h36QFKMk3BYQmE_A-DOw@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).