All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing
@ 2017-04-11 20:08 git
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Version 4 greatly simplifies the p0005 perf test. It now uses an existing
repo -- either real-world or artificial from t/perf/repos/many-files.sh.

Jeff Hostetler (2):
  string-list: use ALLOC_GROW macro when reallocing string_list
  p0005-status: time status on very large repo

 string-list.c          |  5 +----
 t/perf/p0005-status.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 4 deletions(-)
 create mode 100755 t/perf/p0005-status.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list
  2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
@ 2017-04-11 20:08 ` git
  2017-04-11 21:29   ` Jonathan Nieder
  2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
  2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King
  2 siblings, 1 reply; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use ALLOC_GROW() macro when reallocing a string_list array
rather than simply increasing it by 32.  This is a performance
optimization.

During status on a very large repo and there are many changes,
a significant percentage of the total run time is spent
reallocing the wt_status.changes array.

This change decreases the time in wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.

This produced a modest gain on my 1M file artificial repo, but
broke even on linux.git.

Test                                            HEAD^^            HEAD
---------------------------------------------------------------------------------------
0005.2: read-tree status br_ballast (1000001)   8.29(5.62+2.62)   8.22(5.57+2.63) -0.8%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 string-list.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/string-list.c b/string-list.c
index 45016ad..003ca18 100644
--- a/string-list.c
+++ b/string-list.c
@@ -41,10 +41,7 @@ static int add_entry(int insert_at, struct string_list *list, const char *string
 	if (exact_match)
 		return -1 - index;
 
-	if (list->nr + 1 >= list->alloc) {
-		list->alloc += 32;
-		REALLOC_ARRAY(list->items, list->alloc);
-	}
+	ALLOC_GROW(list->items, list->nr+1, list->alloc);
 	if (index < list->nr)
 		memmove(list->items + index + 1, list->items + index,
 				(list->nr - index)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/2] p0005-status: time status on very large repo
  2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
@ 2017-04-11 20:08 ` git
  2017-04-11 21:35   ` Jonathan Nieder
  2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King
  2 siblings, 1 reply; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/perf/p0005-status.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)
 create mode 100755 t/perf/p0005-status.sh

diff --git a/t/perf/p0005-status.sh b/t/perf/p0005-status.sh
new file mode 100755
index 0000000..0469aee
--- /dev/null
+++ b/t/perf/p0005-status.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+##
+## This test measures the performance of various read-tree
+## and status operations.  It is primarily interested in
+## the algorithmic costs of index operations and recursive
+## tree traversal -- and NOT disk I/O on thousands of files.
+
+test_description="Tests performance of read-tree"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+## If the test repo was generated by ./repos/many-files.sh
+## then we know something about the data shape and branches,
+## so we can isolate testing to the ballast-related commits
+## and setup sparse-checkout so we don't have to populate
+## the ballast files and directories.
+##
+## Otherwise, we make some general assumptions about the
+## repo and consider the entire history of the current
+## branch to be the ballast.
+
+git branch | grep p0006-ballast >/dev/null 2>&1
+synthetic=$?
+if test "$synthetic" = 0
+then
+    echo Assuming synthetic repo from many-files.sh
+    git branch br_base            master
+    git branch br_ballast         p0006-ballast
+    echo '/*'          >.git/info/sparse-checkout
+    echo '!ballast/*' >>.git/info/sparse-checkout
+    git config --local core.sparsecheckout 1
+else
+    echo Assuming non-synthetic repo...
+    git branch br_base            $(git rev-list HEAD | tail -n 1)
+    git branch br_ballast         HEAD
+fi
+
+test_expect_success 'setup' '
+	git checkout -q br_ballast
+'
+
+nr_files=$(git ls-files | wc -l)
+
+test_perf "read-tree status br_ballast ($nr_files)" '
+	git read-tree HEAD &&
+	git status
+'
+
+test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing
  2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
  2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
@ 2017-04-11 20:11 ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-04-11 20:11 UTC (permalink / raw)
  To: git; +Cc: git, gitster, Jeff Hostetler

On Tue, Apr 11, 2017 at 08:08:00PM +0000, git@jeffhostetler.com wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Version 4 greatly simplifies the p0005 perf test. It now uses an existing
> repo -- either real-world or artificial from t/perf/repos/many-files.sh.
> 
> Jeff Hostetler (2):
>   string-list: use ALLOC_GROW macro when reallocing string_list
>   p0005-status: time status on very large repo

Looks fine, though we may want to re-order when applying to introduce
p0005 before we show its output in the commit message of the other.
commit.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
@ 2017-04-11 21:29   ` Jonathan Nieder
  0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2017-04-11 21:29 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

git@jeffhostetler.com wrote:

> During status on a very large repo and there are many changes,
> a significant percentage of the total run time is spent
> reallocing the wt_status.changes array.
>
> This change decreases the time in wt_status_collect_changes_worktree()
> from 125 seconds to 45 seconds on my very large repository.
>
> This produced a modest gain on my 1M file artificial repo, but
> broke even on linux.git.
>
> Test                                            HEAD^^            HEAD
> ---------------------------------------------------------------------------------------
> 0005.2: read-tree status br_ballast (1000001)   8.29(5.62+2.62)   8.22(5.57+2.63) -0.8%
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  string-list.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)

Nice.  After rebasing (or just squashing together) with the p0005
patch as peff suggested,

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/2] p0005-status: time status on very large repo
  2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
@ 2017-04-11 21:35   ` Jonathan Nieder
  0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2017-04-11 21:35 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

Hi,

Jeff Hostetler <jeffhost@microsoft.com> wrote:

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Usually the commit message is a place to put some of the motivation
behind the patch or why I should want to apply it.  In this example,
everything is answered by the previous patch, which suggests that it
would be easier to understand if they were squashed into a single
patch.

[...]
> +++ b/t/perf/p0005-status.sh
> @@ -0,0 +1,51 @@
> +#!/bin/sh
> +##
> +## This test measures the performance of various read-tree
> +## and status operations.  It is primarily interested in
> +## the algorithmic costs of index operations and recursive
> +## tree traversal -- and NOT disk I/O on thousands of files.

Nit: git usually uses a single # to start comments in shell scripts
(and likewise below).

[...]
> +## If the test repo was generated by ./repos/many-files.sh
> +## then we know something about the data shape and branches,
> +## so we can isolate testing to the ballast-related commits
> +## and setup sparse-checkout so we don't have to populate
> +## the ballast files and directories.
> +##
> +## Otherwise, we make some general assumptions about the
> +## repo and consider the entire history of the current
> +## branch to be the ballast.
> +
> +git branch | grep p0006-ballast >/dev/null 2>&1
> +synthetic=$?
> +if test "$synthetic" = 0

nit: no need to run a command and read $? when using the
command directly from if would do:

	if git branch | grep p0006-ballast

This can be simplified further by using a lower-level command
for the test:

	if git rev-parse --verify refs/heads/p0006-ballast^{commit}

> +then
> +    echo Assuming synthetic repo from many-files.sh
> +    git branch br_base            master
> +    git branch br_ballast         p0006-ballast
> +    echo '/*'          >.git/info/sparse-checkout
> +    echo '!ballast/*' >>.git/info/sparse-checkout
> +    git config --local core.sparsecheckout 1
> +else
> +    echo Assuming non-synthetic repo...
> +    git branch br_base            $(git rev-list HEAD | tail -n 1)
> +    git branch br_ballast         HEAD
> +fi

This kind of setup instructions usually go in a test_expect_success
block so that the perf harness can detect whether they failed.

> +
> +test_expect_success 'setup' '
> +	git checkout -q br_ballast

nit: no need to use "-q" --- most output from test_expect_success
is suppressed unless the script is run with "-v", at which point
it becomes useful for debugging.

> +'
> +
> +nr_files=$(git ls-files | wc -l)

This also can go in the test_expect_success block.

> +
> +test_perf "read-tree status br_ballast ($nr_files)" '
> +	git read-tree HEAD &&
> +	git status
> +'

Looks nice and simple.  Thanks for writing it.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-11 21:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
2017-04-11 21:29   ` Jonathan Nieder
2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
2017-04-11 21:35   ` Jonathan Nieder
2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.