* [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
@ 2015-04-17 7:30 Stefan Saasen
2015-04-17 14:03 ` Jeff King
0 siblings, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-17 7:30 UTC (permalink / raw)
To: Git Mailing List, Jeff King
We became aware of slow merge times with the following setup:
The merge is created in a temporary location that uses alternates. The
temporary repository is on a local disk, the alternate object database
on an NFS mount.
After some investigation we believe that #33d4221 (present in git
2.2.0, absent in 2.1.4) is causing this regression in merge time.
The following are merge times (in seconds) with git@33d4221~
(2.1.2-393-gabcb865) (before the change)
Elapsed System User
Min. :0.3700 Min. :0.04000 Min. :0.3000
1st Qu.:0.3800 1st Qu.:0.05000 1st Qu.:0.3100
Median :0.4000 Median :0.06000 Median :0.3300
Mean :0.4295 Mean :0.05905 Mean :0.3519
3rd Qu.:0.4600 3rd Qu.:0.07000 3rd Qu.:0.3600
Max. :0.5900 Max. :0.09000 Max. :0.4900
The following are merge times with git@33d4221 (2.1.2-394-g33d4221):
Elapsed System User
Min. : 8.58 Min. :1.46 Min. :0.4400
1st Qu.: 9.63 1st Qu.:1.60 1st Qu.:0.4400
Median :10.64 Median :1.66 Median :0.4800
Mean :10.50 Mean :1.69 Mean :0.4986
3rd Qu.:11.13 3rd Qu.:1.81 3rd Qu.:0.5000
Max. :13.96 Max. :2.05 Max. :0.6700
As you can see the merge times are an order of magnitude slower after
the change.
The effect of #33d4221 can be seen when strace'ing the merge:
Running strace on git@33d4221 yields
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
70.79 0.714852 178 4018 utime
14.73 0.148789 3 50141 50123 lstat
13.88 0.140198 17 8074 8067 access
0.24 0.002455 614 4 rename
0.15 0.001493 3 577 write
0.06 0.000618 10 65 close
0.04 0.000453 3 152 brk
0.04 0.000433 27 16 mkdir
0.03 0.000310 8 41 fstat
Running strace on git@33d4221~ yields
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.37 0.138516 3 50141 50123 lstat
0.92 0.001292 2 577 write
0.37 0.000520 14 38 31 access
0.18 0.000252 36 7 getcwd
0.17 0.000237 7 36 20 stat
0.00 0.000000 0 40 read
0.00 0.000000 0 87 30 open
My current hypothesis is that the additional `access`, but more
importantly the additional `utime` calls are responsible in the
increased merge times that we see.
NFS stats on the server for the tests seem to confirm this (see
nfsstat-{after,before}-change.txt on
https://bitbucket.org/snippets/ssaasen/oend).
This is certainly due to the fact that this will all happen over NFS
but in 2.1.4 this worked fine and starting with 2.2 this has become
very slow.
Looking at the detailed strace shows that utime will be called
repeatedly in same cases (e.g.
https://bitbucket.org/snippets/ssaasen/oend shows an example where the
same packfile will be updated more than 4000 times in a single merge).
http://www.spinics.net/lists/git/msg240106.html discusses a potential
improvement for this case. Would that be an acceptable avenue to
improve this situation?
Best regards,
Stefan Saasen
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-17 7:30 [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Stefan Saasen
@ 2015-04-17 14:03 ` Jeff King
2015-04-17 15:50 ` Junio C Hamano
2015-04-18 3:35 ` Stefan Saasen
0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2015-04-17 14:03 UTC (permalink / raw)
To: Stefan Saasen; +Cc: Git Mailing List
On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:
> The merge is created in a temporary location that uses alternates. The
> temporary repository is on a local disk, the alternate object database
> on an NFS mount.
Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.
> My current hypothesis is that the additional `access`, but more
> importantly the additional `utime` calls are responsible in the
> increased merge times that we see.
Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:
diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
- if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+ if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
}
I think that should clear up the access() calls, but leave the utime()
ones.
> Looking at the detailed strace shows that utime will be called
> repeatedly in same cases (e.g.
> https://bitbucket.org/snippets/ssaasen/oend shows an example where the
> same packfile will be updated more than 4000 times in a single merge).
>
> http://www.spinics.net/lists/git/msg240106.html discusses a potential
> improvement for this case. Would that be an acceptable avenue to
> improve this situation?
I think so. Here's a tentative patch:
diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
pack_keep:1,
+ freshened:1,
do_not_close:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1)
static int freshen_packed_object(const unsigned char *sha1)
{
struct pack_entry e;
- return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+ if (!find_pack_entry(sha1, &e))
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ return freshen_file(e.p->pack_name);
}
int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.
You may also be interested in:
http://thread.gmane.org/gmane.comp.version-control.git/266370
which addresses another performance problem related to the
freshen/recent code in v2.2.
-Peff
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-17 14:03 ` Jeff King
@ 2015-04-17 15:50 ` Junio C Hamano
2015-04-18 3:35 ` Stefan Saasen
1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-04-17 15:50 UTC (permalink / raw)
To: Jeff King; +Cc: Stefan Saasen, Git Mailing List
Jeff King <peff@peff.net> writes:
> If it's not a problem, I'd love to see timings for your case with just
> the first patch, and then with both.
Thanks for two quick progress patches.
> You may also be interested in:
>
> http://thread.gmane.org/gmane.comp.version-control.git/266370
>
> which addresses another performance problem related to the
> freshen/recent code in v2.2.
That, too.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-17 14:03 ` Jeff King
2015-04-17 15:50 ` Junio C Hamano
@ 2015-04-18 3:35 ` Stefan Saasen
2015-04-20 19:53 ` Jeff King
1 sibling, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-18 3:35 UTC (permalink / raw)
To: Jeff King; +Cc: Git Mailing List
> If it's not a problem, I'd love to see timings for your case with just
> the first patch, and then with both.
Thanks for the swift response, much appreciated Jeff!
Here are the timings for the two patches:
Patch 1 on top of 33d4221c79
Elapsed System User
Min. :6.110 Min. :0.6700 Min. :0.3600
1st Qu.:6.580 1st Qu.:0.6900 1st Qu.:0.3900
Median :7.260 Median :0.7100 Median :0.4100
Mean :7.347 Mean :0.7248 Mean :0.4214
3rd Qu.:8.000 3rd Qu.:0.7400 3rd Qu.:0.4600
Max. :8.860 Max. :0.8700 Max. :0.5100
I've had to slightly tweak your second patch (`freshened` was never
set) but applying the modified patch yielded even better results
compared to patch 1:
Elapsed System User
Min. :0.38 Min. :0.03000 Min. :0.2900
1st Qu.:0.38 1st Qu.:0.04000 1st Qu.:0.3100
Median :0.39 Median :0.06000 Median :0.3200
Mean :0.43 Mean :0.05667 Mean :0.3519
3rd Qu.:0.42 3rd Qu.:0.07000 3rd Qu.:0.3600
Max. :0.68 Max. :0.08000 Max. :0.5700
This is pretty much back to the "before" state.
The graph really tells the whole story:
https://bytebucket.org/snippets/ssaasen/GeRE/raw/7367353a58c50ccd7c493af40ffb6ba1533e1490/git-merge-timing-patched.png
(After is the change in #33d4221, Before the parent of #33d4221 and so on)
The graph and the NFS stats can be found here:
https://bitbucket.org/snippets/ssaasen/GeRE
My tweaked version of your second patch is:
diff --git a/cache.h b/cache.h
index 51ee856..8982055 100644
--- a/cache.h
+++ b/cache.h
@@ -1168,6 +1168,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
pack_keep:1,
+ freshened:1,
do_not_close:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index bc6322e..c0ccd4b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned
char *sha1)
static int freshen_packed_object(const unsigned char *sha1)
{
struct pack_entry e;
- return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+ if (!find_pack_entry(sha1, &e))
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ return e.p->freshened = freshen_file(e.p->pack_name);
}
int write_sha1_file(const void *buf, unsigned long len, const char
*type, unsigned char *returnsha1)
The only change is that I assign the result of `freshen_file` to the
`freshened` flag.
I've only ran this with the test case I was using before but it looks
like this is pretty much fixing the merge time changes we observed.
Thanks again for the swift response. I've got my test setup sitting
here, happy to rerun the tests if that'd be useful.
Is there a chance to backport those changes to the 2.2+ branches?
> You may also be interested in:
>
> http://thread.gmane.org/gmane.comp.version-control.git/266370
>
> which addresses another performance problem related to the
> freshen/recent code in v2.2.
Thanks for the pointer, I'll have a look at that as well.
Cheers,
Stefan
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-18 3:35 ` Stefan Saasen
@ 2015-04-20 19:53 ` Jeff King
2015-04-20 19:54 ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Jeff King @ 2015-04-20 19:53 UTC (permalink / raw)
To: Stefan Saasen; +Cc: Junio C Hamano, Git Mailing List
On Sat, Apr 18, 2015 at 01:35:51PM +1000, Stefan Saasen wrote:
> Here are the timings for the two patches:
> [...]
Thanks, that matches what I was hoping for.
> My tweaked version of your second patch is:
> [...]
> - return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
> + if (!find_pack_entry(sha1, &e))
> + return 0;
> + if (e.p->freshened)
> + return 1;
> + return e.p->freshened = freshen_file(e.p->pack_name);
> }
Whooops, yeah, setting the flag is probably helpful. :)
We usually try to avoid assignments in a return like this, so I've
written it out a little more verbosely in my final version. I'll send
those patches in a moment.
[1/2]: sha1_file: freshen pack objects before loose
[2/2]: sha1_file: only freshen packs once per run
> Is there a chance to backport those changes to the 2.2+ branches?
That's up to Junio. These patches can be applied straight to the
jk/prune-mtime topic. Usually he would then merge the topic up to
"maint", which at this would potentially become the next v2.3.x. If an
issue is critical (e.g., a security vulnerability), he'll sometimes
merge and roll maintenance releases for older versions. But I don't know
if this counts as critical (it is for you, certainly, but I don't think
that many people are affected, as the crucial factor here is really the
slow NFS filesystem operations).
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] sha1_file: freshen pack objects before loose
2015-04-20 19:53 ` Jeff King
@ 2015-04-20 19:54 ` Jeff King
2015-04-21 0:46 ` Stefan Saasen
2015-04-20 19:55 ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
2015-04-20 20:04 ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
2 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2015-04-20 19:54 UTC (permalink / raw)
To: Stefan Saasen; +Cc: Junio C Hamano, Git Mailing List
When writing out an object file, we first check whether it
already exists and if so optimize out the write. Prior to
33d4221, we did this by calling has_sha1_file(), which will
check for packed objects followed by loose. Since that
commit, we check loose objects first.
For the common case of a repository whose objects are mostly
packed, this means we will make a lot of extra access()
system calls checking for loose objects. We should follow
the same packed-then-loose order that all of our other
lookups use.
Reported-by: Stefan Saasen <ssaasen@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
---
sha1_file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
- if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+ if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
}
--
2.4.0.rc2.384.g7297a4a
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/2] sha1_file: only freshen packs once per run
2015-04-20 19:53 ` Jeff King
2015-04-20 19:54 ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
@ 2015-04-20 19:55 ` Jeff King
2015-04-21 0:45 ` Stefan Saasen
2015-04-20 20:04 ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
2 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2015-04-20 19:55 UTC (permalink / raw)
To: Stefan Saasen; +Cc: Junio C Hamano, Git Mailing List
Since 33d4221 (write_sha1_file: freshen existing objects,
2014-10-15), we update the mtime of existing objects that we
would have written out (had they not existed). For the
common case in which many objects are packed, we may update
the mtime on a single packfile repeatedly. This can result
in a noticeable performance problem if calling utime() is
expensive (e.g., because your storage is on NFS).
We can fix this by keeping a per-pack flag that lets us
freshen only once per program invocation.
An alternative would be to keep the packed_git.mtime flag up
to date as we freshen, and freshen only once every N
seconds. In practice, it's not worth the complexity. We are
racing against prune expiration times here, which inherently
must be set to accomodate reasonable program running times
(because they really care about the time between an object
being written and it becoming referenced, and the latter is
typically the last step a program takes).
Signed-off-by: Jeff King <peff@peff.net>
---
Hopefully I didn't botch the flag logic again. :) I tested with "strace
-c" myself this time, so I think it is good.
cache.h | 1 +
sha1_file.c | 9 ++++++++-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
pack_keep:1,
+ freshened:1,
do_not_close:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..26b9b2b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,14 @@ static int freshen_loose_object(const unsigned char *sha1)
static int freshen_packed_object(const unsigned char *sha1)
{
struct pack_entry e;
- return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+ if (!find_pack_entry(sha1, &e))
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ if (!freshen_file(e.p->pack_name))
+ return 0;
+ e.p->freshened = 1;
+ return 1;
}
int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
--
2.4.0.rc2.384.g7297a4a
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-20 19:53 ` Jeff King
2015-04-20 19:54 ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
2015-04-20 19:55 ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
@ 2015-04-20 20:04 ` Junio C Hamano
2015-04-20 20:09 ` Jeff King
2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-04-20 20:04 UTC (permalink / raw)
To: Jeff King; +Cc: Stefan Saasen, Git Mailing List
Jeff King <peff@peff.net> writes:
> ... But I don't know
> if this counts as critical (it is for you, certainly, but I don't think
> that many people are affected, as the crucial factor here is really the
> slow NFS filesystem operations).
If it is critical to some people, they can downmerge to their custom
old installations of Git they maintain with ease, of course, and
that "with ease" part is the reason why I try to apply fixes to tip
of the original topic branch even though they were merged to the
mainline eons ago ;-).
Thanks. The patches look good from cursory reading.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-20 20:04 ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
@ 2015-04-20 20:09 ` Jeff King
2015-04-20 20:12 ` Junio C Hamano
2015-04-21 1:49 ` Stefan Saasen
0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2015-04-20 20:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Saasen, Git Mailing List
On Mon, Apr 20, 2015 at 01:04:11PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > ... But I don't know
> > if this counts as critical (it is for you, certainly, but I don't think
> > that many people are affected, as the crucial factor here is really the
> > slow NFS filesystem operations).
>
> If it is critical to some people, they can downmerge to their custom
> old installations of Git they maintain with ease, of course, and
> that "with ease" part is the reason why I try to apply fixes to tip
> of the original topic branch even though they were merged to the
> mainline eons ago ;-).
I think it is a bigger deal for folks who do not ship a custom
installation, but expect to ship a third-party system that interacts
with whatever version of git their customers happen to have (in which
case they can only recommend their customers to upgrade).
I don't know how Stash or GitLab installations work. GitHub ships our
own custom git (which I maintain), though we are already on 2.3.x.
Either way, though, I do not think it is the upstream Git project's
problem.
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-20 20:09 ` Jeff King
@ 2015-04-20 20:12 ` Junio C Hamano
2015-04-20 20:28 ` Jeff King
2015-04-21 1:49 ` Stefan Saasen
1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-04-20 20:12 UTC (permalink / raw)
To: Jeff King; +Cc: Stefan Saasen, Git Mailing List
Jeff King <peff@peff.net> writes:
> Either way, though, I do not think it is the upstream Git project's
> problem.
The commit to pick where to queue the fixes actually is my problem,
as I have this illusion that I'd be helping these derived works by
making it easier for them to merge, not cherry-pick.
But I would imagine that they may go the cherry-pick route anyway,
in which case I may be wasting my time worrying about them X-<.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-20 20:12 ` Junio C Hamano
@ 2015-04-20 20:28 ` Jeff King
0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2015-04-20 20:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Saasen, Git Mailing List
On Mon, Apr 20, 2015 at 01:12:54PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Either way, though, I do not think it is the upstream Git project's
> > problem.
>
> The commit to pick where to queue the fixes actually is my problem,
> as I have this illusion that I'd be helping these derived works by
> making it easier for them to merge, not cherry-pick.
True, I had just meant the actual rolling of the releases.
> But I would imagine that they may go the cherry-pick route anyway,
> in which case I may be wasting my time worrying about them X-<.
FWIW, I typically cherry-pick rather than merge. The resulting history
is not as nice, but it means I don't have to think as hard about the
history when doing so. It also means that topics may not be as well
tested (e.g., they may have been implicitly relying on some other thing
that happened upstream that I did _not_ cherry-pick). But we treat even
cherry-picked upstream topics as their own feature branches, and do our
normal internal testing and review.
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] sha1_file: only freshen packs once per run
2015-04-20 19:55 ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
@ 2015-04-21 0:45 ` Stefan Saasen
0 siblings, 0 replies; 19+ messages in thread
From: Stefan Saasen @ 2015-04-21 0:45 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Git Mailing List
I can confirm that this patch is equivalent to the previous one.
https://bitbucket.org/snippets/ssaasen/9AXg shows both the timing and
the NFS stats showing the effect of applying this patch.
Thanks for the fix Jeff!
Cheers,
Stefan
On 21 April 2015 at 05:55, Jeff King <peff@peff.net> wrote:
> Since 33d4221 (write_sha1_file: freshen existing objects,
> 2014-10-15), we update the mtime of existing objects that we
> would have written out (had they not existed). For the
> common case in which many objects are packed, we may update
> the mtime on a single packfile repeatedly. This can result
> in a noticeable performance problem if calling utime() is
> expensive (e.g., because your storage is on NFS).
>
> We can fix this by keeping a per-pack flag that lets us
> freshen only once per program invocation.
>
> An alternative would be to keep the packed_git.mtime flag up
> to date as we freshen, and freshen only once every N
> seconds. In practice, it's not worth the complexity. We are
> racing against prune expiration times here, which inherently
> must be set to accomodate reasonable program running times
> (because they really care about the time between an object
> being written and it becoming referenced, and the latter is
> typically the last step a program takes).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Hopefully I didn't botch the flag logic again. :) I tested with "strace
> -c" myself this time, so I think it is good.
>
> cache.h | 1 +
> sha1_file.c | 9 ++++++++-
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/cache.h b/cache.h
> index 3d3244b..72c6888 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1174,6 +1174,7 @@ extern struct packed_git {
> int pack_fd;
> unsigned pack_local:1,
> pack_keep:1,
> + freshened:1,
> do_not_close:1;
> unsigned char sha1[20];
> /* something like ".git/objects/pack/xxxxx.pack" */
> diff --git a/sha1_file.c b/sha1_file.c
> index 822aaef..26b9b2b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2999,7 +2999,14 @@ static int freshen_loose_object(const unsigned char *sha1)
> static int freshen_packed_object(const unsigned char *sha1)
> {
> struct pack_entry e;
> - return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
> + if (!find_pack_entry(sha1, &e))
> + return 0;
> + if (e.p->freshened)
> + return 1;
> + if (!freshen_file(e.p->pack_name))
> + return 0;
> + e.p->freshened = 1;
> + return 1;
> }
>
> int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
> --
> 2.4.0.rc2.384.g7297a4a
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] sha1_file: freshen pack objects before loose
2015-04-20 19:54 ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
@ 2015-04-21 0:46 ` Stefan Saasen
0 siblings, 0 replies; 19+ messages in thread
From: Stefan Saasen @ 2015-04-21 0:46 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Git Mailing List
I didn't expect anything else (as the patch is the same as the
previous one) but I verified that applying this patch has the desired
effect (https://bitbucket.org/snippets/ssaasen/9AXg).
Thanks for the fix Jeff.
On 21 April 2015 at 05:54, Jeff King <peff@peff.net> wrote:
> When writing out an object file, we first check whether it
> already exists and if so optimize out the write. Prior to
> 33d4221, we did this by calling has_sha1_file(), which will
> check for packed objects followed by loose. Since that
> commit, we check loose objects first.
>
> For the common case of a repository whose objects are mostly
> packed, this means we will make a lot of extra access()
> system calls checking for loose objects. We should follow
> the same packed-then-loose order that all of our other
> lookups use.
>
> Reported-by: Stefan Saasen <ssaasen@atlassian.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> sha1_file.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 88f06ba..822aaef 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
> write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
> if (returnsha1)
> hashcpy(returnsha1, sha1);
> - if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
> + if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
> return 0;
> return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
> }
> --
> 2.4.0.rc2.384.g7297a4a
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-20 20:09 ` Jeff King
2015-04-20 20:12 ` Junio C Hamano
@ 2015-04-21 1:49 ` Stefan Saasen
2015-04-21 17:05 ` Junio C Hamano
1 sibling, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-21 1:49 UTC (permalink / raw)
To: Jeff King, Junio C Hamano; +Cc: Git Mailing List
>> If it is critical to some people, they can downmerge to their custom
>> old installations of Git they maintain with ease, of course, and
>> that "with ease" part is the reason why I try to apply fixes to tip
>> of the original topic branch even though they were merged to the
>> mainline eons ago ;-).
>
> I think it is a bigger deal for folks who do not ship a custom
> installation, but expect to ship a third-party system that interacts
> with whatever version of git their customers happen to have (in which
> case they can only recommend their customers to upgrade).
Yes, this is the situation we are facing. We allow our customers to
use the git version that is supported/available on their OS (within a
certain range of supported versions) so our customers usually don't
compile from source.
> Either way, though, I do not think it is the upstream Git project's
> problem.
That's fair enough, I was mostly enquiring about the official git
versions this will land in so that we can advise customers what git
version to use (or not to use).
I've noticed Peff's patches on pu which suggest they will be available
in git 2.5?
Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
While I certainly agree that this is specific to Git on NFS and not a
more widespread git performance problem, I'd love to be able to
message something other than "skip all the git version between and
including git 2.2 - 2.4".
I appreciate your consideration and thanks again for the swift response on this.
Cheers,
Stefan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-21 1:49 ` Stefan Saasen
@ 2015-04-21 17:05 ` Junio C Hamano
2015-04-21 21:45 ` Junio C Hamano
2015-04-22 0:04 ` Stefan Saasen
0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-04-21 17:05 UTC (permalink / raw)
To: Stefan Saasen; +Cc: Jeff King, Git Mailing List
Stefan Saasen <ssaasen@atlassian.com> writes:
> I've noticed Peff's patches on pu which suggest they will be available
> in git 2.5?
Being on 'pu' (or 'next' for that matter) is not a suggestion for a
change to appear in any future version at all, even though it often
means that it would soon be merged to 'master' and will be in the
upcoming release to be on 'next' in early part of a development
cycle. Some larger topics would stay on 'next' for a few cycles.
> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
The topic will hopefully be merged to 'master' after 2.4 final is
released end of this month, down to 'maint' early May and will ship
with 2.4.1, unless there is unforeseen issues discovered in the
change while people try it out while it is in 'next' (which will
happen today, hopefully).
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-21 17:05 ` Junio C Hamano
@ 2015-04-21 21:45 ` Junio C Hamano
2015-04-22 1:46 ` Stefan Saasen
2015-04-22 0:04 ` Stefan Saasen
1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-04-21 21:45 UTC (permalink / raw)
To: Stefan Saasen; +Cc: Jeff King, Git Mailing List, Kyle J. McKay
Junio C Hamano <gitster@pobox.com> writes:
> Stefan Saasen <ssaasen@atlassian.com> writes:
>
>> I've noticed Peff's patches on pu which suggest they will be available
>> in git 2.5?
>
> Being on 'pu' (or 'next' for that matter) is not a suggestion for a
> change to appear in any future version at all, even though it often
> means that it would soon be merged to 'master' and will be in the
> upcoming release to be on 'next' in early part of a development
> cycle. Some larger topics would stay on 'next' for a few cycles.
>
>> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
>
> The topic will hopefully be merged to 'master' after 2.4 final is
> released end of this month, down to 'maint' early May and will ship
> with 2.4.1, unless there is unforeseen issues discovered in the
> change while people try it out while it is in 'next' (which will
> happen today, hopefully).
... and then if I do not forget and if the topic is really important
for real-world users, I am OK to merge it down to 2.3 and even 2.2
maintenance tracks later. But that will happen only after the topic
hits 'maint', which will happen only after the topic hits 'master'.
What you _can_ help is the "if I do not forget" part ;-) Also see a
similar discussion we had recently
http://thread.gmane.org/gmane.comp.version-control.git/264365
The key sentence from my part in the thread is
> When I say "the tip of 'master' is meant to be more stable than
> any tagged versions", I do mean it.
and the reasoning behind it that is given in the paragraph before
that, though.
Perhaps companies like Atlassian that rely on the stability of the
open source Git can spare some resources and join forces with like
minded folks on LTS of older maintenance tracks, if they are truly
interested in.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-21 17:05 ` Junio C Hamano
2015-04-21 21:45 ` Junio C Hamano
@ 2015-04-22 0:04 ` Stefan Saasen
1 sibling, 0 replies; 19+ messages in thread
From: Stefan Saasen @ 2015-04-22 0:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Git Mailing List
>> I've noticed Peff's patches on pu which suggest they will be available
>> in git 2.5?
>
> Being on 'pu' (or 'next' for that matter) is not a suggestion for a
> change to appear in any future version at all, even though it often
> means that it would soon be merged to 'master' and will be in the
> upcoming release to be on 'next' in early part of a development
> cycle. Some larger topics would stay on 'next' for a few cycles.
>
>> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
>
> The topic will hopefully be merged to 'master' after 2.4 final is
> released end of this month, down to 'maint' early May and will ship
> with 2.4.1, unless there is unforeseen issues discovered in the
> change while people try it out while it is in 'next' (which will
> happen today, hopefully).
Thanks for the clarification Junio.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-21 21:45 ` Junio C Hamano
@ 2015-04-22 1:46 ` Stefan Saasen
2015-04-22 22:00 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Stefan Saasen @ 2015-04-22 1:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Git Mailing List, Kyle J. McKay
> Perhaps companies like Atlassian that rely on the stability of the
> open source Git can spare some resources and join forces with like
> minded folks on LTS of older maintenance tracks, if they are truly
> interested in.
We certainly can and would like to. I'm not entirely sure what that
would entail though?
>From reading through $gmane/264365 I've identified the following
responsibilities/opportunities to help:
> - Monitor "git log --first-parent maint-lts..master" and find
> the tip of topic branches that need to be down-merged;
>
> - Down-merge such topics to maint-lts; this might involve
> cherry-picking instead of merge, as the bugfix topics may
> originally be done on the codebase newer than maint-lts;
and more importantly testing the maint-lts version to ensure
backported changes don't introduce regressions and the maint-lts
branch is stable.
This suggests specific, spaced LTS versions but in the same thread you
mention maint-2.1or maint-2.2.
So a different model could be maintaining old versions in a sliding
window fashion (e.g. critical issues would be backported to the last 6
months worth of git releases).
Maybe I'm getting ahead of myself here :)
Anyway, long story short. We're interested to help but I'm not
entirely sure what that would look like at the moment. Are there
formed ideas floating around or would you be looking for some form of
proposal instead?
Cheers,
Stefan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
2015-04-22 1:46 ` Stefan Saasen
@ 2015-04-22 22:00 ` Junio C Hamano
0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-04-22 22:00 UTC (permalink / raw)
To: Stefan Saasen; +Cc: Jeff King, Git Mailing List, Kyle J. McKay
Stefan Saasen <ssaasen@atlassian.com> writes:
> Anyway, long story short. We're interested to help but I'm not
> entirely sure what that would look like at the moment. Are there
> formed ideas floating around or would you be looking for some form of
> proposal instead?
I am not proposing anything or looking for proposals myself,
actually. It is just somebody expressed interest in having tested
older maintenance track that is kept alive in the past, so I was
merely trying to help connect you with that old thread.
If those who are interested in having such LTS track(s) need
something specific from me, and if it will not be unrealistic
maintenance burden, I am willing to help. That's all.
For example, LTS group for whatever reason may nominate 2.2.x track
as a base that they want to keep alive longer than other maintenance
tracks and promise to test changes to them to keep it stable. Then
I can help the effort by making sure people's bugfix patches would
apply down to 2.2.x track (often people make mistake of using newer
facility to fix or test the fix for an ancient bug, and bugfix topic
branch ends up forked at a point much newer than where it should
be).
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-04-22 22:00 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-17 7:30 [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Stefan Saasen
2015-04-17 14:03 ` Jeff King
2015-04-17 15:50 ` Junio C Hamano
2015-04-18 3:35 ` Stefan Saasen
2015-04-20 19:53 ` Jeff King
2015-04-20 19:54 ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
2015-04-21 0:46 ` Stefan Saasen
2015-04-20 19:55 ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
2015-04-21 0:45 ` Stefan Saasen
2015-04-20 20:04 ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
2015-04-20 20:09 ` Jeff King
2015-04-20 20:12 ` Junio C Hamano
2015-04-20 20:28 ` Jeff King
2015-04-21 1:49 ` Stefan Saasen
2015-04-21 17:05 ` Junio C Hamano
2015-04-21 21:45 ` Junio C Hamano
2015-04-22 1:46 ` Stefan Saasen
2015-04-22 22:00 ` Junio C Hamano
2015-04-22 0:04 ` Stefan Saasen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.