All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
@ 2015-04-17  7:30 Stefan Saasen
  2015-04-17 14:03 ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-17  7:30 UTC (permalink / raw)
  To: Git Mailing List, Jeff King

We became aware of slow merge times with the following setup:

The merge is created in a temporary location that uses alternates. The
temporary repository is on a local disk, the alternate object database
on an NFS mount.

After some investigation we believe that #33d4221 (present in git
2.2.0, absent in 2.1.4) is causing this regression in merge time.

The following are merge times (in seconds) with git@33d4221~
(2.1.2-393-gabcb865) (before the change)

      Elapsed         System            User
 Min.   :0.3700   Min.   :0.04000   Min.   :0.3000
 1st Qu.:0.3800   1st Qu.:0.05000   1st Qu.:0.3100
 Median :0.4000   Median :0.06000   Median :0.3300
 Mean   :0.4295   Mean   :0.05905   Mean   :0.3519
 3rd Qu.:0.4600   3rd Qu.:0.07000   3rd Qu.:0.3600
 Max.   :0.5900   Max.   :0.09000   Max.   :0.4900


The following are merge times with git@33d4221 (2.1.2-394-g33d4221):

      Elapsed         System            User
 Min.   : 8.58   Min.   :1.46   Min.   :0.4400
 1st Qu.: 9.63   1st Qu.:1.60   1st Qu.:0.4400
 Median :10.64   Median :1.66   Median :0.4800
 Mean   :10.50   Mean   :1.69   Mean   :0.4986
 3rd Qu.:11.13   3rd Qu.:1.81   3rd Qu.:0.5000
 Max.   :13.96   Max.   :2.05   Max.   :0.6700


As you can see the merge times are an order of magnitude slower after
the change.

The effect of  #33d4221 can be seen when strace'ing the merge:

Running strace on git@33d4221 yields
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.79    0.714852         178      4018           utime
 14.73    0.148789           3     50141     50123 lstat
 13.88    0.140198          17      8074      8067 access
  0.24    0.002455         614         4           rename
  0.15    0.001493           3       577           write
  0.06    0.000618          10        65           close
  0.04    0.000453           3       152           brk
  0.04    0.000433          27        16           mkdir
  0.03    0.000310           8        41           fstat


Running strace on git@33d4221~ yields

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.37    0.138516           3     50141     50123 lstat
  0.92    0.001292           2       577           write
  0.37    0.000520          14        38        31 access
  0.18    0.000252          36         7           getcwd
  0.17    0.000237           7        36        20 stat
  0.00    0.000000           0        40           read
  0.00    0.000000           0        87        30 open


My current hypothesis is that the additional `access`, but more
importantly the additional `utime` calls are responsible in the
increased merge times that we see.
NFS stats on the server for the tests seem to confirm this (see
nfsstat-{after,before}-change.txt on
https://bitbucket.org/snippets/ssaasen/oend).
This is certainly due to the fact that this will all happen over NFS
but in 2.1.4 this worked fine and starting with 2.2 this has become
very slow.

Looking at the detailed strace shows that utime will be called
repeatedly in same cases (e.g.
https://bitbucket.org/snippets/ssaasen/oend shows an example where the
same packfile will be updated more than 4000 times in a single merge).

http://www.spinics.net/lists/git/msg240106.html discusses a potential
improvement for this case. Would that be an acceptable avenue to
improve this situation?

Best regards,
Stefan Saasen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-17  7:30 [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Stefan Saasen
@ 2015-04-17 14:03 ` Jeff King
  2015-04-17 15:50   ` Junio C Hamano
  2015-04-18  3:35   ` Stefan Saasen
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2015-04-17 14:03 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Git Mailing List

On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:

> The merge is created in a temporary location that uses alternates. The
> temporary repository is on a local disk, the alternate object database
> on an NFS mount.

Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.

> My current hypothesis is that the additional `access`, but more
> importantly the additional `utime` calls are responsible in the
> increased merge times that we see.

Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:

diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
 	if (returnsha1)
 		hashcpy(returnsha1, sha1);
-	if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }

I think that should clear up the access() calls, but leave the utime()
ones.

> Looking at the detailed strace shows that utime will be called
> repeatedly in same cases (e.g.
> https://bitbucket.org/snippets/ssaasen/oend shows an example where the
> same packfile will be updated more than 4000 times in a single merge).
> 
> http://www.spinics.net/lists/git/msg240106.html discusses a potential
> improvement for this case. Would that be an acceptable avenue to
> improve this situation?

I think so. Here's a tentative patch:

diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
 	int pack_fd;
 	unsigned pack_local:1,
 		 pack_keep:1,
+		 freshened:1,
 		 do_not_close:1;
 	unsigned char sha1[20];
 	/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
 	struct pack_entry e;
-	return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+	if (!find_pack_entry(sha1, &e))
+		return 0;
+	if (e.p->freshened)
+		return 1;
+	return freshen_file(e.p->pack_name);
 }
 
 int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)


If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.

You may also be interested in:

  http://thread.gmane.org/gmane.comp.version-control.git/266370

which addresses another performance problem related to the
freshen/recent code in v2.2.

-Peff

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-17 14:03 ` Jeff King
@ 2015-04-17 15:50   ` Junio C Hamano
  2015-04-18  3:35   ` Stefan Saasen
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-04-17 15:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Saasen, Git Mailing List

Jeff King <peff@peff.net> writes:

> If it's not a problem, I'd love to see timings for your case with just
> the first patch, and then with both.

Thanks for two quick progress patches.

> You may also be interested in:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/266370
>
> which addresses another performance problem related to the
> freshen/recent code in v2.2.

That, too.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-17 14:03 ` Jeff King
  2015-04-17 15:50   ` Junio C Hamano
@ 2015-04-18  3:35   ` Stefan Saasen
  2015-04-20 19:53     ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-18  3:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

> If it's not a problem, I'd love to see timings for your case with just
> the first patch, and then with both.

Thanks for the swift response, much appreciated Jeff!

Here are the timings for the two patches:

Patch 1 on top of 33d4221c79

       Elapsed           System              User
 Min.   :6.110   Min.   :0.6700   Min.   :0.3600
 1st Qu.:6.580   1st Qu.:0.6900   1st Qu.:0.3900
 Median :7.260   Median :0.7100   Median :0.4100
 Mean   :7.347   Mean   :0.7248   Mean   :0.4214
 3rd Qu.:8.000   3rd Qu.:0.7400   3rd Qu.:0.4600
 Max.   :8.860   Max.   :0.8700   Max.   :0.5100

I've had to slightly tweak your second patch (`freshened` was never
set) but applying the modified patch yielded even better results
compared to patch 1:

       Elapsed           System              User
 Min.   :0.38   Min.   :0.03000   Min.   :0.2900
 1st Qu.:0.38   1st Qu.:0.04000   1st Qu.:0.3100
 Median :0.39   Median :0.06000   Median :0.3200
 Mean   :0.43   Mean   :0.05667   Mean   :0.3519
 3rd Qu.:0.42   3rd Qu.:0.07000   3rd Qu.:0.3600
 Max.   :0.68   Max.   :0.08000   Max.   :0.5700

This is pretty much back to the "before" state.
The graph really tells the whole story:
https://bytebucket.org/snippets/ssaasen/GeRE/raw/7367353a58c50ccd7c493af40ffb6ba1533e1490/git-merge-timing-patched.png
(After is the change in #33d4221, Before the parent of #33d4221 and so on)
The graph and the NFS stats can be found here:
https://bitbucket.org/snippets/ssaasen/GeRE

My tweaked version of your second patch is:

diff --git a/cache.h b/cache.h
index 51ee856..8982055 100644
--- a/cache.h
+++ b/cache.h
@@ -1168,6 +1168,7 @@ extern struct packed_git {
        int pack_fd;
        unsigned pack_local:1,
                 pack_keep:1,
+               freshened:1,
                 do_not_close:1;
        unsigned char sha1[20];
        /* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index bc6322e..c0ccd4b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned
char *sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
        struct pack_entry e;
-       return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+       if (!find_pack_entry(sha1, &e))
+              return 0;
+       if (e.p->freshened)
+              return 1;
+       return e.p->freshened = freshen_file(e.p->pack_name);
 }

 int write_sha1_file(const void *buf, unsigned long len, const char
*type, unsigned char *returnsha1)



The only change is that I assign the result of `freshen_file` to the
`freshened` flag.

I've only ran this with the test case I was using before but it looks
like this is pretty much fixing the merge time changes we observed.

Thanks again for the swift response. I've got my test setup sitting
here, happy to rerun the tests if that'd be useful.

Is there a chance to backport those changes to the 2.2+ branches?

> You may also be interested in:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/266370
>
> which addresses another performance problem related to the
> freshen/recent code in v2.2.

Thanks for the pointer, I'll have a look at that as well.

Cheers,
Stefan

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-18  3:35   ` Stefan Saasen
@ 2015-04-20 19:53     ` Jeff King
  2015-04-20 19:54       ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jeff King @ 2015-04-20 19:53 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Junio C Hamano, Git Mailing List

On Sat, Apr 18, 2015 at 01:35:51PM +1000, Stefan Saasen wrote:

> Here are the timings for the two patches:
> [...]

Thanks, that matches what I was hoping for.

> My tweaked version of your second patch is:
> [...]
> -       return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
> +       if (!find_pack_entry(sha1, &e))
> +              return 0;
> +       if (e.p->freshened)
> +              return 1;
> +       return e.p->freshened = freshen_file(e.p->pack_name);
>  }

Whooops, yeah, setting the flag is probably helpful. :)

We usually try to avoid assignments in a return like this, so I've
written it out a little more verbosely in my final version. I'll send
those patches in a moment.

  [1/2]: sha1_file: freshen pack objects before loose
  [2/2]: sha1_file: only freshen packs once per run

> Is there a chance to backport those changes to the 2.2+ branches?

That's up to Junio. These patches can be applied straight to the
jk/prune-mtime topic. Usually he would then merge the topic up to
"maint", which at this would potentially become the next v2.3.x. If an
issue is critical (e.g., a security vulnerability), he'll sometimes
merge and roll maintenance releases for older versions. But I don't know
if this counts as critical (it is for you, certainly, but I don't think
that many people are affected, as the crucial factor here is really the
slow NFS filesystem operations).

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] sha1_file: freshen pack objects before loose
  2015-04-20 19:53     ` Jeff King
@ 2015-04-20 19:54       ` Jeff King
  2015-04-21  0:46         ` Stefan Saasen
  2015-04-20 19:55       ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
  2015-04-20 20:04       ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
  2 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2015-04-20 19:54 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Junio C Hamano, Git Mailing List

When writing out an object file, we first check whether it
already exists and if so optimize out the write. Prior to
33d4221, we did this by calling has_sha1_file(), which will
check for packed objects followed by loose. Since that
commit, we check loose objects first.

For the common case of a repository whose objects are mostly
packed, this means we will make a lot of extra access()
system calls checking for loose objects. We should follow
the same packed-then-loose order that all of our other
lookups use.

Reported-by: Stefan Saasen <ssaasen@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
 	if (returnsha1)
 		hashcpy(returnsha1, sha1);
-	if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] sha1_file: only freshen packs once per run
  2015-04-20 19:53     ` Jeff King
  2015-04-20 19:54       ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
@ 2015-04-20 19:55       ` Jeff King
  2015-04-21  0:45         ` Stefan Saasen
  2015-04-20 20:04       ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
  2 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2015-04-20 19:55 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Junio C Hamano, Git Mailing List

Since 33d4221 (write_sha1_file: freshen existing objects,
2014-10-15), we update the mtime of existing objects that we
would have written out (had they not existed). For the
common case in which many objects are packed, we may update
the mtime on a single packfile repeatedly. This can result
in a noticeable performance problem if calling utime() is
expensive (e.g., because your storage is on NFS).

We can fix this by keeping a per-pack flag that lets us
freshen only once per program invocation.

An alternative would be to keep the packed_git.mtime flag up
to date as we freshen, and freshen only once every N
seconds. In practice, it's not worth the complexity. We are
racing against prune expiration times here, which inherently
must be set to accomodate reasonable program running times
(because they really care about the time between an object
being written and it becoming referenced, and the latter is
typically the last step a program takes).

Signed-off-by: Jeff King <peff@peff.net>
---
Hopefully I didn't botch the flag logic again. :) I tested with "strace
-c" myself this time, so I think it is good.

 cache.h     | 1 +
 sha1_file.c | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
 	int pack_fd;
 	unsigned pack_local:1,
 		 pack_keep:1,
+		 freshened:1,
 		 do_not_close:1;
 	unsigned char sha1[20];
 	/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..26b9b2b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,14 @@ static int freshen_loose_object(const unsigned char *sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
 	struct pack_entry e;
-	return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+	if (!find_pack_entry(sha1, &e))
+		return 0;
+	if (e.p->freshened)
+		return 1;
+	if (!freshen_file(e.p->pack_name))
+		return 0;
+	e.p->freshened = 1;
+	return 1;
 }
 
 int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-20 19:53     ` Jeff King
  2015-04-20 19:54       ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
  2015-04-20 19:55       ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
@ 2015-04-20 20:04       ` Junio C Hamano
  2015-04-20 20:09         ` Jeff King
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-04-20 20:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Saasen, Git Mailing List

Jeff King <peff@peff.net> writes:

> ... But I don't know
> if this counts as critical (it is for you, certainly, but I don't think
> that many people are affected, as the crucial factor here is really the
> slow NFS filesystem operations).

If it is critical to some people, they can downmerge to their custom
old installations of Git they maintain with ease, of course, and
that "with ease" part is the reason why I try to apply fixes to tip
of the original topic branch even though they were merged to the
mainline eons ago ;-).

Thanks.  The patches look good from cursory reading.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-20 20:04       ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
@ 2015-04-20 20:09         ` Jeff King
  2015-04-20 20:12           ` Junio C Hamano
  2015-04-21  1:49           ` Stefan Saasen
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2015-04-20 20:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Saasen, Git Mailing List

On Mon, Apr 20, 2015 at 01:04:11PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... But I don't know
> > if this counts as critical (it is for you, certainly, but I don't think
> > that many people are affected, as the crucial factor here is really the
> > slow NFS filesystem operations).
> 
> If it is critical to some people, they can downmerge to their custom
> old installations of Git they maintain with ease, of course, and
> that "with ease" part is the reason why I try to apply fixes to tip
> of the original topic branch even though they were merged to the
> mainline eons ago ;-).

I think it is a bigger deal for folks who do not ship a custom
installation, but expect to ship a third-party system that interacts
with whatever version of git their customers happen to have (in which
case they can only recommend their customers to upgrade).

I don't know how Stash or GitLab installations work. GitHub ships our
own custom git (which I maintain), though we are already on 2.3.x.

Either way, though, I do not think it is the upstream Git project's
problem.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-20 20:09         ` Jeff King
@ 2015-04-20 20:12           ` Junio C Hamano
  2015-04-20 20:28             ` Jeff King
  2015-04-21  1:49           ` Stefan Saasen
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-04-20 20:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Saasen, Git Mailing List

Jeff King <peff@peff.net> writes:

> Either way, though, I do not think it is the upstream Git project's
> problem.

The commit to pick where to queue the fixes actually is my problem,
as I have this illusion that I'd be helping these derived works by
making it easier for them to merge, not cherry-pick.

But I would imagine that they may go the cherry-pick route anyway,
in which case I may be wasting my time worrying about them X-<.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-20 20:12           ` Junio C Hamano
@ 2015-04-20 20:28             ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2015-04-20 20:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Saasen, Git Mailing List

On Mon, Apr 20, 2015 at 01:12:54PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Either way, though, I do not think it is the upstream Git project's
> > problem.
> 
> The commit to pick where to queue the fixes actually is my problem,
> as I have this illusion that I'd be helping these derived works by
> making it easier for them to merge, not cherry-pick.

True, I had just meant the actual rolling of the releases.

> But I would imagine that they may go the cherry-pick route anyway,
> in which case I may be wasting my time worrying about them X-<.

FWIW, I typically cherry-pick rather than merge. The resulting history
is not as nice, but it means I don't have to think as hard about the
history when doing so. It also means that topics may not be as well
tested (e.g., they may have been implicitly relying on some other thing
that happened upstream that I did _not_ cherry-pick). But we treat even
cherry-picked upstream topics as their own feature branches, and do our
normal internal testing and review.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] sha1_file: only freshen packs once per run
  2015-04-20 19:55       ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
@ 2015-04-21  0:45         ` Stefan Saasen
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Saasen @ 2015-04-21  0:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List

I can confirm that this patch is equivalent to the previous one.

https://bitbucket.org/snippets/ssaasen/9AXg shows both the timing and
the NFS stats showing the effect of applying this patch.

Thanks for the fix Jeff!

Cheers,
Stefan

On 21 April 2015 at 05:55, Jeff King <peff@peff.net> wrote:
> Since 33d4221 (write_sha1_file: freshen existing objects,
> 2014-10-15), we update the mtime of existing objects that we
> would have written out (had they not existed). For the
> common case in which many objects are packed, we may update
> the mtime on a single packfile repeatedly. This can result
> in a noticeable performance problem if calling utime() is
> expensive (e.g., because your storage is on NFS).
>
> We can fix this by keeping a per-pack flag that lets us
> freshen only once per program invocation.
>
> An alternative would be to keep the packed_git.mtime flag up
> to date as we freshen, and freshen only once every N
> seconds. In practice, it's not worth the complexity. We are
> racing against prune expiration times here, which inherently
> must be set to accomodate reasonable program running times
> (because they really care about the time between an object
> being written and it becoming referenced, and the latter is
> typically the last step a program takes).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Hopefully I didn't botch the flag logic again. :) I tested with "strace
> -c" myself this time, so I think it is good.
>
>  cache.h     | 1 +
>  sha1_file.c | 9 ++++++++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/cache.h b/cache.h
> index 3d3244b..72c6888 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1174,6 +1174,7 @@ extern struct packed_git {
>         int pack_fd;
>         unsigned pack_local:1,
>                  pack_keep:1,
> +                freshened:1,
>                  do_not_close:1;
>         unsigned char sha1[20];
>         /* something like ".git/objects/pack/xxxxx.pack" */
> diff --git a/sha1_file.c b/sha1_file.c
> index 822aaef..26b9b2b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2999,7 +2999,14 @@ static int freshen_loose_object(const unsigned char *sha1)
>  static int freshen_packed_object(const unsigned char *sha1)
>  {
>         struct pack_entry e;
> -       return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
> +       if (!find_pack_entry(sha1, &e))
> +               return 0;
> +       if (e.p->freshened)
> +               return 1;
> +       if (!freshen_file(e.p->pack_name))
> +               return 0;
> +       e.p->freshened = 1;
> +       return 1;
>  }
>
>  int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
> --
> 2.4.0.rc2.384.g7297a4a

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] sha1_file: freshen pack objects before loose
  2015-04-20 19:54       ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
@ 2015-04-21  0:46         ` Stefan Saasen
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Saasen @ 2015-04-21  0:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List

I didn't expect anything else (as the patch is the same as the
previous one) but I verified that applying this patch has the desired
effect (https://bitbucket.org/snippets/ssaasen/9AXg).

Thanks for the fix Jeff.

On 21 April 2015 at 05:54, Jeff King <peff@peff.net> wrote:
> When writing out an object file, we first check whether it
> already exists and if so optimize out the write. Prior to
> 33d4221, we did this by calling has_sha1_file(), which will
> check for packed objects followed by loose. Since that
> commit, we check loose objects first.
>
> For the common case of a repository whose objects are mostly
> packed, this means we will make a lot of extra access()
> system calls checking for loose objects. We should follow
> the same packed-then-loose order that all of our other
> lookups use.
>
> Reported-by: Stefan Saasen <ssaasen@atlassian.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 88f06ba..822aaef 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
>         write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
>         if (returnsha1)
>                 hashcpy(returnsha1, sha1);
> -       if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
> +       if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
>                 return 0;
>         return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
>  }
> --
> 2.4.0.rc2.384.g7297a4a
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-20 20:09         ` Jeff King
  2015-04-20 20:12           ` Junio C Hamano
@ 2015-04-21  1:49           ` Stefan Saasen
  2015-04-21 17:05             ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-21  1:49 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Git Mailing List

>> If it is critical to some people, they can downmerge to their custom
>> old installations of Git they maintain with ease, of course, and
>> that "with ease" part is the reason why I try to apply fixes to tip
>> of the original topic branch even though they were merged to the
>> mainline eons ago ;-).
>
> I think it is a bigger deal for folks who do not ship a custom
> installation, but expect to ship a third-party system that interacts
> with whatever version of git their customers happen to have (in which
> case they can only recommend their customers to upgrade).

Yes, this is the situation we are facing. We allow our customers to
use the git version that is supported/available on their OS (within a
certain range of supported versions) so our customers usually don't
compile from source.

> Either way, though, I do not think it is the upstream Git project's
> problem.

That's fair enough, I was mostly enquiring about the official git
versions this will land in so that we can advise customers what git
version to use (or not to use).

I've noticed Peff's patches on pu which suggest they will be available
in git 2.5?
Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?

While I certainly agree that this is specific to Git on NFS and not a
more widespread git performance problem, I'd love to be able to
message something other than "skip all the git version between and
including git 2.2 - 2.4".

I appreciate your consideration and thanks again for the swift response on this.

Cheers,
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-21  1:49           ` Stefan Saasen
@ 2015-04-21 17:05             ` Junio C Hamano
  2015-04-21 21:45               ` Junio C Hamano
  2015-04-22  0:04               ` Stefan Saasen
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-04-21 17:05 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Jeff King, Git Mailing List

Stefan Saasen <ssaasen@atlassian.com> writes:

> I've noticed Peff's patches on pu which suggest they will be available
> in git 2.5?

Being on 'pu' (or 'next' for that matter) is not a suggestion for a
change to appear in any future version at all, even though it often
means that it would soon be merged to 'master' and will be in the
upcoming release to be on 'next' in early part of a development
cycle.  Some larger topics would stay on 'next' for a few cycles.

> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?

The topic will hopefully be merged to 'master' after 2.4 final is
released end of this month, down to 'maint' early May and will ship
with 2.4.1, unless there is unforeseen issues discovered in the
change while people try it out while it is in 'next' (which will
happen today, hopefully).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-21 17:05             ` Junio C Hamano
@ 2015-04-21 21:45               ` Junio C Hamano
  2015-04-22  1:46                 ` Stefan Saasen
  2015-04-22  0:04               ` Stefan Saasen
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-04-21 21:45 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Jeff King, Git Mailing List, Kyle J. McKay

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Saasen <ssaasen@atlassian.com> writes:
>
>> I've noticed Peff's patches on pu which suggest they will be available
>> in git 2.5?
>
> Being on 'pu' (or 'next' for that matter) is not a suggestion for a
> change to appear in any future version at all, even though it often
> means that it would soon be merged to 'master' and will be in the
> upcoming release to be on 'next' in early part of a development
> cycle.  Some larger topics would stay on 'next' for a few cycles.
>
>> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
>
> The topic will hopefully be merged to 'master' after 2.4 final is
> released end of this month, down to 'maint' early May and will ship
> with 2.4.1, unless there is unforeseen issues discovered in the
> change while people try it out while it is in 'next' (which will
> happen today, hopefully).

... and then if I do not forget and if the topic is really important
for real-world users, I am OK to merge it down to 2.3 and even 2.2
maintenance tracks later.  But that will happen only after the topic
hits 'maint', which will happen only after the topic hits 'master'.

What you _can_ help is the "if I do not forget" part ;-)  Also see a
similar discussion we had recently

  http://thread.gmane.org/gmane.comp.version-control.git/264365

The key sentence from my part in the thread is 

> When I say "the tip of 'master' is meant to be more stable than
> any tagged versions", I do mean it.

and the reasoning behind it that is given in the paragraph before
that, though.

Perhaps companies like Atlassian that rely on the stability of the
open source Git can spare some resources and join forces with like
minded folks on LTS of older maintenance tracks, if they are truly
interested in.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-21 17:05             ` Junio C Hamano
  2015-04-21 21:45               ` Junio C Hamano
@ 2015-04-22  0:04               ` Stefan Saasen
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Saasen @ 2015-04-22  0:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

>> I've noticed Peff's patches on pu which suggest they will be available
>> in git 2.5?
>
> Being on 'pu' (or 'next' for that matter) is not a suggestion for a
> change to appear in any future version at all, even though it often
> means that it would soon be merged to 'master' and will be in the
> upcoming release to be on 'next' in early part of a development
> cycle.  Some larger topics would stay on 'next' for a few cycles.
>
>> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
>
> The topic will hopefully be merged to 'master' after 2.4 final is
> released end of this month, down to 'maint' early May and will ship
> with 2.4.1, unless there is unforeseen issues discovered in the
> change while people try it out while it is in 'next' (which will
> happen today, hopefully).

Thanks for the clarification Junio.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-21 21:45               ` Junio C Hamano
@ 2015-04-22  1:46                 ` Stefan Saasen
  2015-04-22 22:00                   ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-22  1:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List, Kyle J. McKay

> Perhaps companies like Atlassian that rely on the stability of the
> open source Git can spare some resources and join forces with like
> minded folks on LTS of older maintenance tracks, if they are truly
> interested in.

We certainly can and would like to. I'm not entirely sure what that
would entail though?

>From reading through $gmane/264365 I've identified the following
responsibilities/opportunities to help:

>    - Monitor "git log --first-parent maint-lts..master" and find
>      the tip of topic branches that need to be down-merged;
>
>    - Down-merge such topics to maint-lts; this might involve
>      cherry-picking instead of merge, as the bugfix topics may
>      originally be done on the codebase newer than maint-lts;

and more importantly testing the maint-lts version to ensure
backported changes don't introduce regressions and the maint-lts
branch is stable.

This suggests specific, spaced LTS versions but in the same thread you
mention maint-2.1or maint-2.2.
So a different model could be maintaining old versions in a sliding
window fashion (e.g. critical issues would be backported to the last 6
months worth of git releases).

Maybe I'm getting ahead of myself here :)

Anyway, long story short. We're interested to help but I'm not
entirely sure what that would look like at the moment. Are there
formed ideas floating around or would you be looking for some form of
proposal instead?

Cheers,
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
  2015-04-22  1:46                 ` Stefan Saasen
@ 2015-04-22 22:00                   ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-04-22 22:00 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Jeff King, Git Mailing List, Kyle J. McKay

Stefan Saasen <ssaasen@atlassian.com> writes:

> Anyway, long story short. We're interested to help but I'm not
> entirely sure what that would look like at the moment. Are there
> formed ideas floating around or would you be looking for some form of
> proposal instead?

I am not proposing anything or looking for proposals myself,
actually.  It is just somebody expressed interest in having tested
older maintenance track that is kept alive in the past, so I was
merely trying to help connect you with that old thread.

If those who are interested in having such LTS track(s) need
something specific from me, and if it will not be unrealistic
maintenance burden, I am willing to help.  That's all.

For example, LTS group for whatever reason may nominate 2.2.x track
as a base that they want to keep alive longer than other maintenance
tracks and promise to test changes to them to keep it stable.  Then
I can help the effort by making sure people's bugfix patches would
apply down to 2.2.x track (often people make mistake of using newer
facility to fix or test the fix for an ancient bug, and bugfix topic
branch ends up forked at a point much newer than where it should
be).

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-04-22 22:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-17  7:30 [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Stefan Saasen
2015-04-17 14:03 ` Jeff King
2015-04-17 15:50   ` Junio C Hamano
2015-04-18  3:35   ` Stefan Saasen
2015-04-20 19:53     ` Jeff King
2015-04-20 19:54       ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
2015-04-21  0:46         ` Stefan Saasen
2015-04-20 19:55       ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
2015-04-21  0:45         ` Stefan Saasen
2015-04-20 20:04       ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
2015-04-20 20:09         ` Jeff King
2015-04-20 20:12           ` Junio C Hamano
2015-04-20 20:28             ` Jeff King
2015-04-21  1:49           ` Stefan Saasen
2015-04-21 17:05             ` Junio C Hamano
2015-04-21 21:45               ` Junio C Hamano
2015-04-22  1:46                 ` Stefan Saasen
2015-04-22 22:00                   ` Junio C Hamano
2015-04-22  0:04               ` Stefan Saasen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.