All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: benpeart@microsoft.com, git@vger.kernel.org
Subject: Re: [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()
Date: Fri, 2 Nov 2018 12:14:22 -0400	[thread overview]
Message-ID: <3dc46005-016f-e1c3-32ae-2797317aed08@gmail.com> (raw)
In-Reply-To: <xmqqy3abo64r.fsf@gitster-ct.c.googlers.com>



On 11/2/2018 11:23 AM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
> 
>> From: Ben Peart <benpeart@microsoft.com>
>>
>> During an "add", a call is made to run_diff_files() which calls
>> check_remove() for each index-entry.  The preload_index() code
>> distributes some of the costs across multiple threads.
> 
> Nice.  I peeked around and noticed that we already do this in
> builtin_diff_index() before running run_diff_index() when !cached,
> and builtin_diff_files(), of course.
> 

Thanks for double checking!

>> Because the files checked are restricted to pathspec, adding individual
>> files makes no measurable impact but on a Windows repo with ~200K files,
>> 'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.
> 
> ;-)
> 
>> diff --git a/builtin/add.c b/builtin/add.c
>> index ad49806ebf..f65c172299 100644
>> --- a/builtin/add.c
>> +++ b/builtin/add.c
>> @@ -445,11 +445,6 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>>   		return 0;
>>   	}
>>   
>> -	if (read_cache() < 0)
>> -		die(_("index file corrupt"));
>> -
>> -	die_in_unpopulated_submodule(&the_index, prefix);
>> -
>>   	/*
>>   	 * Check the "pathspec '%s' did not match any files" block
>>   	 * below before enabling new magic.
> 
> It is not explained why this is not a mere s/read_cache/&_preload/
> in the log message.  I can see it is because you wanted to make the
> pathspec available to preload to further cut down the preloaded
> paths, and I do not think it has any unintended (negatie) side
> effect to parse the pathspec before populating the in-core index,
> but that would have been a good thing to mention in the proposed log
> message.
> 

That is correct.  parse_pathspec() was after read_cache() because it 
_used_ to use the index to determine whether a pathspec is in a 
submodule.  That was fixed by Brandon in Aug 2017 when he cleaned up all 
submodule config code so it is safe to move read_cache_preload() after 
the call to parse_pathspec().

How about this for a revised commit message?



During an "add", a call is made to run_diff_files() which calls
check_remove() for each index-entry.  The preload_index() code 
distributes some of the costs across multiple threads.

Limit read_cache_preload() to pathspec, so that it doesn't process more 
entries than are needed and end up slowing things down instead of 
speeding them up.

Because the files checked are restricted to pathspec, adding individual
files makes no measurable impact but on a Windows repo with ~200K files,
'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.



>> @@ -459,6 +454,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>>   		       PATHSPEC_SYMLINK_LEADING_PATH,
>>   		       prefix, argv);
>>   
>> +	if (read_cache_preload(&pathspec) < 0)
>> +		die(_("index file corrupt"));
>> +
>> +	die_in_unpopulated_submodule(&the_index, prefix);
>>   	die_path_inside_submodule(&the_index, &pathspec);
>>   
>>   	if (add_new_files) {
>>
>> base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2

  reply	other threads:[~2018-11-02 16:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-02 13:30 [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload() Ben Peart
2018-11-02 15:23 ` Junio C Hamano
2018-11-02 16:14   ` Ben Peart [this message]
2018-11-02 15:49 ` Duy Nguyen
2018-11-03  0:38   ` Junio C Hamano
2018-11-03  4:47     ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3dc46005-016f-e1c3-32ae-2797317aed08@gmail.com \
    --to=peartben@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.