All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Stefan Beller <sbeller@google.com>, Ben Peart <Ben.Peart@microsoft.com>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1] read-cache: speed up index load through parallelization
Date: Thu, 23 Aug 2018 15:44:58 -0400	[thread overview]
Message-ID: <12b29bbf-4565-fe0e-97d4-19da2b4ddf5e@gmail.com> (raw)
In-Reply-To: <CAGZ79kbXfPPvcQ1rnUdiOqWs5wC2qccGCnf8DvCVnp8QV126MA@mail.gmail.com>



On 8/23/2018 1:31 PM, Stefan Beller wrote:
> On Thu, Aug 23, 2018 at 8:45 AM Ben Peart <Ben.Peart@microsoft.com> wrote:
>>
>> This patch helps address the CPU cost of loading the index by creating
>> multiple threads to divide the work of loading and converting the cache
>> entries across all available CPU cores.
>>
>> It accomplishes this by having the primary thread loop across the index file
>> tracking the offset and (for V4 indexes) expanding the name. It creates a
>> thread to process each block of entries as it comes to them. Once the
>> threads are complete and the cache entries are loaded, the rest of the
>> extensions can be loaded and processed normally on the primary thread.
>>
>> Performance impact:
>>
>> read cache .git/index times on a synthetic repo with:
>>
>> 100,000 entries
>> FALSE       TRUE        Savings     %Savings
>> 0.014798767 0.009580433 0.005218333 35.26%
>>
>> 1,000,000 entries
>> FALSE       TRUE        Savings     %Savings
>> 0.240896533 0.1751243   0.065772233 27.30%
>>
>> read cache .git/index times on an actual repo with:
>>
>> ~3M entries
>> FALSE       TRUE        Savings     %Savings
>> 0.59898098  0.4513169   0.14766408  24.65%
>>
>> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
>> ---
>>
>> Notes:
>>      Base Ref: master
>>      Web-Diff: https://github.com/benpeart/git/commit/67a700419b
>>      Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v1 && git checkout 67a700419b
>>
>>   Documentation/config.txt |   8 ++
>>   config.c                 |  13 +++
>>   config.h                 |   1 +
>>   read-cache.c             | 218 ++++++++++++++++++++++++++++++++++-----
>>   4 files changed, 216 insertions(+), 24 deletions(-)
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 1c42364988..3344685cc4 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -899,6 +899,14 @@ relatively high IO latencies.  When enabled, Git will do the
>>   index comparison to the filesystem data in parallel, allowing
>>   overlapping IO's.  Defaults to true.
>>
>> +core.fastIndex::
>> +       Enable parallel index loading
>> ++
>> +This can speed up operations like 'git diff' and 'git status' especially
>> +when the index is very large.  When enabled, Git will do the index
>> +loading from the on disk format to the in-memory format in parallel.
>> +Defaults to true.
> 
> "fast" is a non-descriptive word as we try to be fast in any operation?
> Maybe core.parallelIndexReading as that just describes what it
> turns on/off, without second guessing its effects?
> (Are there still computers with just a single CPU, where this would not
> make it faster? ;-))
> 

How about core.parallelReadIndex?  Slightly shorter and matches the 
function names better.

> 
>> +int git_config_get_fast_index(void)
>> +{
>> +       int val;
>> +
>> +       if (!git_config_get_maybe_bool("core.fastindex", &val))
>> +               return val;
>> +
>> +       if (getenv("GIT_FASTINDEX_TEST"))
>> +               return 1;
> 
> We look at this env value just before calling this function,
> can be write it to only look at the evn variable once?
> 

Sure, I didn't like the fact that it was called twice but didn't get 
around to cleaning it up.

>> +++ b/config.h
>> @@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
>>   extern int git_config_get_split_index(void);
>>   extern int git_config_get_max_percent_split_change(void);
>>   extern int git_config_get_fsmonitor(void);
>> +extern int git_config_get_fast_index(void);
> 
> Oh. nd/no-extern did not cover config.h
> 
> 
>>
>> +#ifndef min
>> +#define min(a,b) (((a) < (b)) ? (a) : (b))
>> +#endif
> 
> We do not have a minimum function in the tree,
> except for xdiff/xmacros.h:29: XDL_MIN.
> I wonder what the rationale is for not having a MIN()
> definition, I think we discussed that on the list a couple
> times but the rationale escaped me.
> 
> If we introduce a min/max macro, can we put it somewhere
> more prominent? (I would find it useful elsewhere)
>

I'll avoid that particular rabbit hole and just remove the min macro 
definition.  ;-)

>> +/*
>> +* Mostly randomly chosen maximum thread counts: we
>> +* cap the parallelism to online_cpus() threads, and we want
>> +* to have at least 7500 cache entries per thread for it to
>> +* be worth starting a thread.
>> +*/
>> +#define THREAD_COST            (7500)
> 
> This reads very similar to preload-index.c THREAD_COST
> 
>> +       /* loop through index entries starting a thread for every thread_nr entries */
>> +       consumed = thread = 0;
>> +       for (i = 0; ; i++) {
>> +               struct ondisk_cache_entry *ondisk;
>> +               const char *name;
>> +               unsigned int flags;
>> +
>> +               /* we've reached the begining of a block of cache entries, kick off a thread to process them */
> 
> beginning
> 

Thanks

> Thanks,
> Stefan
> 

  reply	other threads:[~2018-08-23 19:45 UTC|newest]

Thread overview: 199+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23 15:41 [PATCH v1] read-cache: speed up index load through parallelization Ben Peart
2018-08-23 17:31 ` Stefan Beller
2018-08-23 19:44   ` Ben Peart [this message]
2018-08-24 18:40   ` Duy Nguyen
2018-08-28 14:53     ` Ben Peart
2018-08-23 18:06 ` Junio C Hamano
2018-08-23 20:33   ` Ben Peart
2018-08-24 15:37     ` Duy Nguyen
2018-08-24 15:57       ` Duy Nguyen
2018-08-24 17:28         ` Ben Peart
2018-08-25  6:44         ` [PATCH] read-cache.c: optimize reading index format v4 Nguyễn Thái Ngọc Duy
2018-08-27 19:36           ` Junio C Hamano
2018-08-28 19:25             ` Duy Nguyen
2018-08-28 23:54               ` Ben Peart
2018-08-29 17:14               ` Junio C Hamano
2018-09-04 16:08             ` Duy Nguyen
2018-09-02 13:19           ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy
2018-09-02 13:19             ` [PATCH v2 1/1] read-cache.c: " Nguyễn Thái Ngọc Duy
2018-09-04 18:58               ` Junio C Hamano
2018-09-04 19:31               ` Junio C Hamano
2018-08-24 18:20       ` [PATCH v1] read-cache: speed up index load through parallelization Duy Nguyen
2018-08-24 18:40         ` Ben Peart
2018-08-24 19:00           ` Duy Nguyen
2018-08-24 19:57             ` Ben Peart
2018-08-29 15:25 ` [PATCH v2 0/3] " Ben Peart
2018-08-29 15:25   ` [PATCH v2 1/3] " Ben Peart
2018-08-29 17:14     ` Junio C Hamano
2018-08-29 21:35       ` Ben Peart
2018-09-03 19:16     ` Duy Nguyen
2018-08-29 15:25   ` [PATCH v2 2/3] read-cache: load cache extensions on worker thread Ben Peart
2018-08-29 17:12     ` Junio C Hamano
2018-08-29 21:42       ` Ben Peart
2018-08-29 22:19         ` Junio C Hamano
2018-09-03 19:21     ` Duy Nguyen
2018-09-03 19:27       ` Duy Nguyen
2018-08-29 15:25   ` [PATCH v2 3/3] read-cache: micro-optimize expand_name_field() to speed up V4 index parsing Ben Peart
2018-09-06 21:03 ` [PATCH v3 0/4] read-cache: speed up index load through parallelization Ben Peart
2018-09-06 21:03   ` [PATCH v3 1/4] read-cache: optimize expand_name_field() to speed up V4 index parsing Ben Peart
2018-09-06 21:03   ` [PATCH v3 2/4] eoie: add End of Index Entry (EOIE) extension Ben Peart
2018-09-07 17:55     ` Junio C Hamano
2018-09-07 20:23       ` Ben Peart
2018-09-08  6:29         ` Martin Ågren
2018-09-08 14:03           ` Ben Peart
2018-09-08 17:08             ` Martin Ågren
2018-09-06 21:03   ` [PATCH v3 3/4] read-cache: load cache extensions on a worker thread Ben Peart
2018-09-07 21:10     ` Junio C Hamano
2018-09-08 14:56       ` Ben Peart
2018-09-06 21:03   ` [PATCH v3 4/4] read-cache: speed up index load through parallelization Ben Peart
2018-09-07  4:16     ` Torsten Bögershausen
2018-09-07 13:43       ` Ben Peart
2018-09-07 17:21   ` [PATCH v3 0/4] " Junio C Hamano
2018-09-07 18:31     ` Ben Peart
2018-09-08 13:18     ` Duy Nguyen
2018-09-11 23:26 ` [PATCH v4 0/5] " Ben Peart
2018-09-11 23:26   ` [PATCH v4 1/5] eoie: add End of Index Entry (EOIE) extension Ben Peart
2018-09-11 23:26   ` [PATCH v4 2/5] read-cache: load cache extensions on a worker thread Ben Peart
2018-09-11 23:26   ` [PATCH v4 3/5] read-cache: speed up index load through parallelization Ben Peart
2018-09-11 23:26   ` [PATCH v4 4/5] read-cache.c: optimize reading index format v4 Ben Peart
2018-09-11 23:26   ` [PATCH v4 5/5] read-cache: clean up casting and byte decoding Ben Peart
2018-09-12 14:34   ` [PATCH v4 0/5] read-cache: speed up index load through parallelization Ben Peart
2018-09-12 16:18 ` [PATCH v5 " Ben Peart
2018-09-12 16:18   ` [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension Ben Peart
2018-09-13 22:44     ` Junio C Hamano
2018-09-15 10:02     ` Duy Nguyen
2018-09-17 14:54       ` Ben Peart
2018-09-17 16:05         ` Duy Nguyen
2018-09-17 17:31           ` Junio C Hamano
2018-09-17 17:38             ` Duy Nguyen
2018-09-17 19:08               ` Junio C Hamano
2018-09-12 16:18   ` [PATCH v5 2/5] read-cache: load cache extensions on a worker thread Ben Peart
2018-09-15 10:22     ` Duy Nguyen
2018-09-15 10:24       ` Duy Nguyen
2018-09-17 16:38         ` Ben Peart
2018-09-15 16:23       ` Duy Nguyen
2018-09-17 17:19         ` Junio C Hamano
2018-09-17 16:26       ` Ben Peart
2018-09-17 16:45         ` Duy Nguyen
2018-09-17 21:32       ` Junio C Hamano
2018-09-12 16:18   ` [PATCH v5 3/5] read-cache: load cache entries on worker threads Ben Peart
2018-09-15 10:31     ` Duy Nguyen
2018-09-17 17:25       ` Ben Peart
2018-09-15 11:07     ` Duy Nguyen
2018-09-15 11:09       ` Duy Nguyen
2018-09-17 18:52         ` Ben Peart
2018-09-15 11:29     ` Duy Nguyen
2018-09-12 16:18   ` [PATCH v5 4/5] read-cache.c: optimize reading index format v4 Ben Peart
2018-09-12 16:18   ` [PATCH v5 5/5] read-cache: clean up casting and byte decoding Ben Peart
2018-09-26 19:54 ` [PATCH v6 0/7] speed up index load through parallelization Ben Peart
2018-09-26 19:54   ` [PATCH v6 1/7] read-cache.c: optimize reading index format v4 Ben Peart
2018-09-26 19:54   ` [PATCH v6 2/7] read-cache: clean up casting and byte decoding Ben Peart
2018-09-26 19:54   ` [PATCH v6 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart
2018-09-28  0:19     ` SZEDER Gábor
2018-09-28 18:38       ` Ben Peart
2018-09-29  0:51     ` SZEDER Gábor
2018-09-29  5:45     ` Duy Nguyen
2018-09-29 18:24       ` Junio C Hamano
2018-09-26 19:54   ` [PATCH v6 4/7] config: add new index.threads config setting Ben Peart
2018-09-28  0:26     ` SZEDER Gábor
2018-09-28 13:39       ` Ben Peart
2018-09-28 17:07         ` Junio C Hamano
2018-09-28 19:41           ` Ben Peart
2018-09-28 20:30             ` Ramsay Jones
2018-09-28 22:15               ` Junio C Hamano
2018-10-01 13:17                 ` Ben Peart
2018-10-01 15:06                   ` SZEDER Gábor
2018-09-26 19:54   ` [PATCH v6 5/7] read-cache: load cache extensions on a worker thread Ben Peart
2018-09-26 19:54   ` [PATCH v6 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart
2018-09-26 19:54   ` [PATCH v6 7/7] read-cache: load cache entries on worker threads Ben Peart
2018-09-26 22:06   ` [PATCH v6 0/7] speed up index load through parallelization Junio C Hamano
2018-09-27 17:13   ` Duy Nguyen
2018-10-01 13:45 ` [PATCH v7 " Ben Peart
2018-10-01 13:45   ` [PATCH v7 1/7] read-cache.c: optimize reading index format v4 Ben Peart
2018-10-01 13:45   ` [PATCH v7 2/7] read-cache: clean up casting and byte decoding Ben Peart
2018-10-01 15:10     ` Duy Nguyen
2018-10-01 13:45   ` [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart
2018-10-01 15:17     ` SZEDER Gábor
2018-10-02 14:34       ` Ben Peart
2018-10-01 15:30     ` Duy Nguyen
2018-10-02 15:13       ` Ben Peart
2018-10-01 13:45   ` [PATCH v7 4/7] config: add new index.threads config setting Ben Peart
2018-10-01 13:45   ` [PATCH v7 5/7] read-cache: load cache extensions on a worker thread Ben Peart
2018-10-01 15:50     ` Duy Nguyen
2018-10-02 15:00       ` Ben Peart
2018-10-01 13:45   ` [PATCH v7 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart
2018-10-01 16:27     ` Duy Nguyen
2018-10-02 16:34       ` Ben Peart
2018-10-02 17:02         ` Duy Nguyen
2018-10-01 13:45   ` [PATCH v7 7/7] read-cache: load cache entries on worker threads Ben Peart
2018-10-01 17:09     ` Duy Nguyen
2018-10-02 19:09       ` Ben Peart
2018-10-10 15:59 ` [PATCH v8 0/7] speed up index load through parallelization Ben Peart
2018-10-10 15:59   ` [PATCH v8 1/7] read-cache.c: optimize reading index format v4 Ben Peart
2018-10-10 15:59   ` [PATCH v8 2/7] read-cache: clean up casting and byte decoding Ben Peart
2018-10-10 15:59   ` [PATCH v8 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart
2018-10-10 15:59   ` [PATCH v8 4/7] config: add new index.threads config setting Ben Peart
2018-10-10 15:59   ` [PATCH v8 5/7] read-cache: load cache extensions on a worker thread Ben Peart
2018-10-10 15:59   ` [PATCH v8 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart
2018-10-10 15:59   ` [PATCH v8 7/7] read-cache: load cache entries on worker threads Ben Peart
2018-10-19 16:11     ` Jeff King
2018-10-22  2:14       ` Junio C Hamano
2018-10-22 14:40         ` Ben Peart
2018-10-12  3:18   ` [PATCH v8 0/7] speed up index load through parallelization Junio C Hamano
2018-10-14 12:28   ` Duy Nguyen
2018-10-15 17:33     ` Ben Peart
2018-11-13  0:38   ` [PATCH 0/3] Avoid confusing messages from new index extensions (Re: [PATCH v8 0/7] speed up index load through parallelization) Jonathan Nieder
2018-11-13  0:39     ` [PATCH 1/3] eoie: default to not writing EOIE section Jonathan Nieder
2018-11-13  1:05       ` Junio C Hamano
2018-11-13 15:14         ` Ben Peart
2018-11-13 18:25           ` Jonathan Nieder
2018-11-14  1:36             ` Junio C Hamano
2018-11-15  0:19               ` Jonathan Nieder
2018-11-13  0:39     ` [PATCH 2/3] ieot: default to not writing IEOT section Jonathan Nieder
2018-11-13  0:58       ` Jonathan Tan
2018-11-13  1:09       ` Junio C Hamano
2018-11-13  1:12         ` Jonathan Nieder
2018-11-13 15:37           ` Duy Nguyen
2018-11-13 18:09         ` Jonathan Nieder
2018-11-13 15:22       ` Ben Peart
2018-11-13 18:18         ` Jonathan Nieder
2018-11-13 19:15           ` Ben Peart
2018-11-13 21:08             ` Jonathan Nieder
2018-11-14 18:09               ` Ben Peart
2018-11-15  0:05                 ` Jonathan Nieder
2018-11-14  3:05         ` Junio C Hamano
2018-11-20  6:09           ` [PATCH v2 0/5] Avoid confusing messages from new index extensions Jonathan Nieder
2018-11-20  6:11             ` [PATCH 1/5] eoie: default to not writing EOIE section Jonathan Nieder
2018-11-20 13:06               ` Ben Peart
2018-11-20 13:21                 ` SZEDER Gábor
2018-11-21 16:46                   ` Jeff King
2018-11-22  0:47                     ` Junio C Hamano
2018-11-20 15:01               ` Ben Peart
2018-11-20  6:12             ` [PATCH 2/5] ieot: default to not writing IEOT section Jonathan Nieder
2018-11-20 13:07               ` Ben Peart
2018-11-26 19:59                 ` Stefan Beller
2018-11-26 21:47                   ` Ben Peart
2018-11-26 22:02                     ` Stefan Beller
2018-11-27  0:50                   ` Junio C Hamano
2018-11-20  6:12             ` [PATCH 3/5] index: do not warn about unrecognized extensions Jonathan Nieder
2018-11-20  6:14             ` [PATCH 4/5] index: make index.threads=true enable ieot and eoie Jonathan Nieder
2018-11-20 13:24               ` Ben Peart
2018-11-20  6:15             ` [PATCH 5/5] index: offer advice for unknown index extensions Jonathan Nieder
2018-11-20  9:26               ` Ævar Arnfjörð Bjarmason
2018-11-20 13:30                 ` Ben Peart
2018-11-21  0:22                   ` Junio C Hamano
2018-11-21  0:39                     ` Jonathan Nieder
2018-11-21  0:44                       ` Jonathan Nieder
2018-11-21  5:01                       ` Junio C Hamano
2018-11-21  5:04                         ` Jonathan Nieder
2018-11-21  5:15                         ` Junio C Hamano
2018-11-21  5:31                           ` Junio C Hamano
2018-11-21  1:03                     ` Jonathan Nieder
2018-11-21  4:23                       ` Junio C Hamano
2018-11-21  4:57                         ` Jonathan Nieder
2018-11-21  9:30                       ` Ævar Arnfjörð Bjarmason
2018-11-13  0:40     ` [PATCH 3/3] index: do not warn about unrecognized extensions Jonathan Nieder
2018-11-13  1:10       ` Junio C Hamano
2018-11-13 15:25       ` Ben Peart
2018-11-14  3:24       ` Junio C Hamano
2018-11-14 18:19         ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12b29bbf-4565-fe0e-97d4-19da2b4ddf5e@gmail.com \
    --to=peartben@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.