All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] hash: reduce size of algo member of object ID
@ 2021-10-03  5:51 René Scharfe
  2021-10-04  7:31 ` Eric Wong
  2021-10-04  8:13 ` Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: René Scharfe @ 2021-10-03  5:51 UTC (permalink / raw)
  To: Git List
  Cc: Andrzej Hunt, Eric Wong, Jeff King, Junio C Hamano,
	Carlo Marcelo Arenas Belón, brian m. carlson

cf0983213c (hash: add an algo member to struct object_id, 2021-04-26)
introduced the algo member as an int.  This increased the size of struct
object_id by 4 bytes (12.5%) and imposed a 4-byte alignment.  Currently
we only need to stored the values 0, 1 and 2 in it.  Let's use an
unsigned char instead to reduce the size overhead and lift the alignment
requirement.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
Not sure how to measure the performance impact of this change.  The perf
tests mentioned by cf0983213c don't show much of a difference with
GIT_PERF_REPEAT_COUNT=10 for me:

Test                                             origin/master       HEAD
--------------------------------------------------------------------------------------------
0001.1: rev-list --all                           0.11(0.08+0.02)     0.11(0.08+0.02) +0.0%
0001.2: rev-list --all --objects                 3.04(2.98+0.05)     3.04(2.98+0.05) +0.0%
0001.3: rev-list --parents                       0.05(0.04+0.01)     0.05(0.03+0.01) +0.0%
0001.5: rev-list -- dummy                        0.21(0.20+0.01)     0.21(0.19+0.01) +0.0%
0001.6: rev-list --parents -- dummy              0.22(0.20+0.01)     0.22(0.20+0.01) +0.0%
0001.8: rev-list $commit --not --all             0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%
0001.9: rev-list --objects $commit --not --all   0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%
1450.1: fsck                                     20.20(19.71+0.47)   20.18(19.70+0.48) -0.1%

 hash.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hash.h b/hash.h
index 9e25c40e9a..24d8f7cd21 100644
--- a/hash.h
+++ b/hash.h
@@ -115,7 +115,7 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s

 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
-	int algo;	/* XXX requires 4-byte alignment */
+	unsigned char algo;
 };

 /* A suitably aligned type for stack allocations of hash contexts. */
--
2.33.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] hash: reduce size of algo member of object ID
  2021-10-03  5:51 [PATCH] hash: reduce size of algo member of object ID René Scharfe
@ 2021-10-04  7:31 ` Eric Wong
  2021-10-04  8:13 ` Jeff King
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Wong @ 2021-10-04  7:31 UTC (permalink / raw)
  To: René Scharfe
  Cc: git, Andrzej Hunt, Jeff King, Junio C Hamano,
	Carlo Marcelo Arenas Belón, brian m. carlson

René Scharfe <l.s.r@web.de> wrote:
> cf0983213c (hash: add an algo member to struct object_id, 2021-04-26)
> introduced the algo member as an int.  This increased the size of struct
> object_id by 4 bytes (12.5%) and imposed a 4-byte alignment.  Currently
> we only need to stored the values 0, 1 and 2 in it.  Let's use an
> unsigned char instead to reduce the size overhead and lift the alignment
> requirement.

I like it.  Btw, would a bitfield enum be portable enough for us?

	enum git_hash_algo algo:8;  /* or algo:2 */

I've used those in other projects, but AFAIK they were for
gcc||clang-only.

> Not sure how to measure the performance impact of this change.  The perf
> tests mentioned by cf0983213c don't show much of a difference with
> GIT_PERF_REPEAT_COUNT=10 for me:

Thanks for that info.  I'm not familiar enough with most git
internals to know if it ought to make a difference as-is.

In my experience, changes like this open the door to further
struct-packing opportunities (possibly combined with conditional
use of __attribute__((packed)) ).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hash: reduce size of algo member of object ID
  2021-10-03  5:51 [PATCH] hash: reduce size of algo member of object ID René Scharfe
  2021-10-04  7:31 ` Eric Wong
@ 2021-10-04  8:13 ` Jeff King
  2021-10-04  8:20   ` Jeff King
  2021-10-04  8:47   ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 7+ messages in thread
From: Jeff King @ 2021-10-04  8:13 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Andrzej Hunt, Eric Wong, Junio C Hamano,
	Carlo Marcelo Arenas Belón, brian m. carlson

On Sun, Oct 03, 2021 at 07:51:40AM +0200, René Scharfe wrote:

> cf0983213c (hash: add an algo member to struct object_id, 2021-04-26)
> introduced the algo member as an int.  This increased the size of struct
> object_id by 4 bytes (12.5%) and imposed a 4-byte alignment.  Currently
> we only need to stored the values 0, 1 and 2 in it.  Let's use an
> unsigned char instead to reduce the size overhead and lift the alignment
> requirement.
> 
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> Not sure how to measure the performance impact of this change.  The perf
> tests mentioned by cf0983213c don't show much of a difference with
> GIT_PERF_REPEAT_COUNT=10 for me:

Not surprising. If going from nothing to an int didn't have an impact on
those tests, then going from an int to char isn't likely to, either.

> 0001.1: rev-list --all                           0.11(0.08+0.02)     0.11(0.08+0.02) +0.0%
> 0001.2: rev-list --all --objects                 3.04(2.98+0.05)     3.04(2.98+0.05) +0.0%
> 0001.3: rev-list --parents                       0.05(0.04+0.01)     0.05(0.03+0.01) +0.0%
> 0001.5: rev-list -- dummy                        0.21(0.20+0.01)     0.21(0.19+0.01) +0.0%
> 0001.6: rev-list --parents -- dummy              0.22(0.20+0.01)     0.22(0.20+0.01) +0.0%
> 0001.8: rev-list $commit --not --all             0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%
> 0001.9: rev-list --objects $commit --not --all   0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%

I do think these probably aren't very good tests. They're going to
mostly be dealing with actual object structs. Your patch changes the
size of "struct object_id", but the object structs themselves still need
4-byte alignment. And they have a bunch of padding in the first place.

If we instrument "git version --build-options" like this:

diff --git a/help.c b/help.c
index 3c3bdec213..718e32cadf 100644
--- a/help.c
+++ b/help.c
@@ -11,6 +11,8 @@
 #include "version.h"
 #include "refs.h"
 #include "parse-options.h"
+#include "blob.h"
+#include "tag.h"
 
 struct category_description {
 	uint32_t category;
@@ -663,6 +665,11 @@ void get_version_info(struct strbuf *buf, int show_build_options)
 		strbuf_addf(buf, "sizeof-long: %d\n", (int)sizeof(long));
 		strbuf_addf(buf, "sizeof-size_t: %d\n", (int)sizeof(size_t));
 		strbuf_addf(buf, "shell-path: %s\n", SHELL_PATH);
+		strbuf_addf(buf, "sizeof-object_id: %d\n", (int)sizeof(struct object_id));
+		strbuf_addf(buf, "sizeof-commit: %d\n", (int)sizeof(struct commit));
+		strbuf_addf(buf, "sizeof-blob: %d\n", (int)sizeof(struct blob));
+		strbuf_addf(buf, "sizeof-tree: %d\n", (int)sizeof(struct tree));
+		strbuf_addf(buf, "sizeof-tag: %d\n", (int)sizeof(struct tag));
 		/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
 	}
 }

then we can see that your patch doesn't the change the size of any of
the structs at all! Even the original cf0983213c going from nothing to
an int changed only "struct blob" (it went from 32 to 36 bytes).

A more interesting test is something that stores a bunch of oids. A good
candidate is "cat-file --batch-all-objects". In the default mode, it
uses an oid_array to sort the set of objects. In unordered mode, it uses
an oidset to mark seen ones.

Here are hyperfine results. The three versions are:

  - none: cf0983213c^
  - int: cf0983213c
  - char: cf0983213c with the equivalent of your patch on top

All compiled with "gcc -O2", and run in a fully packed clone of
linux.git.

It looks like adding the "algo" field did make a big difference for the
oid_array case, but changing it to a char doesn't seem to help at all:

  $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"'
  Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)"
    Time (mean ± σ):      1.653 s ±  0.009 s    [User: 1.607 s, System: 0.046 s]
    Range (min … max):    1.640 s …  1.670 s    10 runs
   
  Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)"
    Time (mean ± σ):      1.067 s ±  0.012 s    [User: 1.017 s, System: 0.050 s]
    Range (min … max):    1.053 s …  1.089 s    10 runs
   
  Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)"
    Time (mean ± σ):      1.092 s ±  0.013 s    [User: 1.046 s, System: 0.046 s]
    Range (min … max):    1.080 s …  1.116 s    10 runs
   
  Summary
    './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran
      1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"'
      1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"'

I'm actually surprised it had this much of an impact. But I guess this
benchmark really is mostly just memcpy-ing oids into a big array,
sorting it, and printing the result. If that array is 12% bigger, we'd
expect at least a 12% speedup. But adding in non-linear elements like
growing the array (though I guess that is amortized linear) and sorting
(which picks up an extra log(n) term) make the difference.

It's _kind of_ silly in a sense, since usually you'd ask for other parts
of the object, which will make the speed difference relatively smaller.
But just dumping a bunch of oids is actually not an unreasonable thing
to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too
(even when you're using 20-byte sha1), but I don't think there's an easy
way to get out of that.

Using --unordered switches us to using a khash-backed oidset. This
doesn't seem to show any meaningful difference in the three cases (the
averages follow the pattern I'd expect, but we're within the noise).

  $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --unordered --batch-check="%(objectname)"'
  Benchmark #1: ./git.none cat-file --batch-all-objects --unordered --batch-check="%(objectname)"
    Time (mean ± σ):      2.125 s ±  0.024 s    [User: 2.071 s, System: 0.054 s]
    Range (min … max):    2.092 s …  2.182 s    10 runs
   
  Benchmark #2: ./git.int cat-file --batch-all-objects --unordered --batch-check="%(objectname)"
    Time (mean ± σ):      2.152 s ±  0.016 s    [User: 2.083 s, System: 0.068 s]
    Range (min … max):    2.123 s …  2.175 s    10 runs
   
  Benchmark #3: ./git.char cat-file --batch-all-objects --unordered --batch-check="%(objectname)"
    Time (mean ± σ):      2.137 s ±  0.013 s    [User: 2.079 s, System: 0.058 s]
    Range (min … max):    2.127 s …  2.167 s    10 runs
   
  Summary
    './git.none cat-file --batch-all-objects --unordered --batch-check="%(objectname)"' ran
      1.01 ± 0.01 times faster than './git.char cat-file --batch-all-objects --unordered --batch-check="%(objectname)"'
      1.01 ± 0.01 times faster than './git.int cat-file --batch-all-objects --unordered --batch-check="%(objectname)"'

What surprises me most here is that --unordered is so much slower than
its sorted cousin. It's purpose is to make things faster (though it
likely still does so for queries that actually look at the object
properties, since hitting them in pack order is beneficial there).

-Peff

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] hash: reduce size of algo member of object ID
  2021-10-04  8:13 ` Jeff King
@ 2021-10-04  8:20   ` Jeff King
  2021-10-05  0:04     ` brian m. carlson
  2021-10-05 17:28     ` René Scharfe
  2021-10-04  8:47   ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 7+ messages in thread
From: Jeff King @ 2021-10-04  8:20 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Andrzej Hunt, Eric Wong, Junio C Hamano,
	Carlo Marcelo Arenas Belón, brian m. carlson

On Mon, Oct 04, 2021 at 04:13:34AM -0400, Jeff King wrote:

> It looks like adding the "algo" field did make a big difference for the
> oid_array case, but changing it to a char doesn't seem to help at all:
> 
>   $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"'
>   Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)"
>     Time (mean ± σ):      1.653 s ±  0.009 s    [User: 1.607 s, System: 0.046 s]
>     Range (min … max):    1.640 s …  1.670 s    10 runs
>    
>   Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)"
>     Time (mean ± σ):      1.067 s ±  0.012 s    [User: 1.017 s, System: 0.050 s]
>     Range (min … max):    1.053 s …  1.089 s    10 runs
>    
>   Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)"
>     Time (mean ± σ):      1.092 s ±  0.013 s    [User: 1.046 s, System: 0.046 s]
>     Range (min … max):    1.080 s …  1.116 s    10 runs
>    
>   Summary
>     './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran
>       1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"'
>       1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"'
> 
> I'm actually surprised it had this much of an impact. But I guess this
> benchmark really is mostly just memcpy-ing oids into a big array,
> sorting it, and printing the result. If that array is 12% bigger, we'd
> expect at least a 12% speedup. But adding in non-linear elements like
> growing the array (though I guess that is amortized linear) and sorting
> (which picks up an extra log(n) term) make the difference.
> 
> It's _kind of_ silly in a sense, since usually you'd ask for other parts
> of the object, which will make the speed difference relatively smaller.
> But just dumping a bunch of oids is actually not an unreasonable thing
> to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too
> (even when you're using 20-byte sha1), but I don't think there's an easy
> way to get out of that.

Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes
actually made it _faster_ than not having an algo field. Now I'm
super-confused. I could believe that it gave us some better alignment,
but the original struct was 32 bytes. 36 seems like a strictly worse
number.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hash: reduce size of algo member of object ID
  2021-10-04  8:13 ` Jeff King
  2021-10-04  8:20   ` Jeff King
@ 2021-10-04  8:47   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-04  8:47 UTC (permalink / raw)
  To: Jeff King
  Cc: René Scharfe, Git List, Andrzej Hunt, Eric Wong,
	Junio C Hamano, Carlo Marcelo Arenas Belón,
	brian m. carlson


On Mon, Oct 04 2021, Jeff King wrote:

> On Sun, Oct 03, 2021 at 07:51:40AM +0200, René Scharfe wrote:
>
>> cf0983213c (hash: add an algo member to struct object_id, 2021-04-26)
>> introduced the algo member as an int.  This increased the size of struct
>> object_id by 4 bytes (12.5%) and imposed a 4-byte alignment.  Currently
>> we only need to stored the values 0, 1 and 2 in it.  Let's use an
>> unsigned char instead to reduce the size overhead and lift the alignment
>> requirement.
>> 
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>> Not sure how to measure the performance impact of this change.  The perf
>> tests mentioned by cf0983213c don't show much of a difference with
>> GIT_PERF_REPEAT_COUNT=10 for me:
>
> Not surprising. If going from nothing to an int didn't have an impact on
> those tests, then going from an int to char isn't likely to, either.
>
>> 0001.1: rev-list --all                           0.11(0.08+0.02)     0.11(0.08+0.02) +0.0%
>> 0001.2: rev-list --all --objects                 3.04(2.98+0.05)     3.04(2.98+0.05) +0.0%
>> 0001.3: rev-list --parents                       0.05(0.04+0.01)     0.05(0.03+0.01) +0.0%
>> 0001.5: rev-list -- dummy                        0.21(0.20+0.01)     0.21(0.19+0.01) +0.0%
>> 0001.6: rev-list --parents -- dummy              0.22(0.20+0.01)     0.22(0.20+0.01) +0.0%
>> 0001.8: rev-list $commit --not --all             0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%
>> 0001.9: rev-list --objects $commit --not --all   0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%
>
> I do think these probably aren't very good tests. They're going to
> mostly be dealing with actual object structs. Your patch changes the
> size of "struct object_id", but the object structs themselves still need
> 4-byte alignment. And they have a bunch of padding in the first place.
>
> If we instrument "git version --build-options" like this:
>
> diff --git a/help.c b/help.c
> index 3c3bdec213..718e32cadf 100644
> --- a/help.c
> +++ b/help.c
> @@ -11,6 +11,8 @@
>  #include "version.h"
>  #include "refs.h"
>  #include "parse-options.h"
> +#include "blob.h"
> +#include "tag.h"
>  
>  struct category_description {
>  	uint32_t category;
> @@ -663,6 +665,11 @@ void get_version_info(struct strbuf *buf, int show_build_options)
>  		strbuf_addf(buf, "sizeof-long: %d\n", (int)sizeof(long));
>  		strbuf_addf(buf, "sizeof-size_t: %d\n", (int)sizeof(size_t));
>  		strbuf_addf(buf, "shell-path: %s\n", SHELL_PATH);
> +		strbuf_addf(buf, "sizeof-object_id: %d\n", (int)sizeof(struct object_id));
> +		strbuf_addf(buf, "sizeof-commit: %d\n", (int)sizeof(struct commit));
> +		strbuf_addf(buf, "sizeof-blob: %d\n", (int)sizeof(struct blob));
> +		strbuf_addf(buf, "sizeof-tree: %d\n", (int)sizeof(struct tree));
> +		strbuf_addf(buf, "sizeof-tag: %d\n", (int)sizeof(struct tag));
>  		/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
>  	}
>  }
>
> then we can see that your patch doesn't the change the size of any of
> the structs at all! Even the original cf0983213c going from nothing to
> an int changed only "struct blob" (it went from 32 to 36 bytes).
>
> A more interesting test is something that stores a bunch of oids. A good
> candidate is "cat-file --batch-all-objects". In the default mode, it
> uses an oid_array to sort the set of objects. In unordered mode, it uses
> an oidset to mark seen ones.
>
> Here are hyperfine results. The three versions are:
>
>   - none: cf0983213c^
>   - int: cf0983213c
>   - char: cf0983213c with the equivalent of your patch on top
>
> All compiled with "gcc -O2", and run in a fully packed clone of
> linux.git.
>
> It looks like adding the "algo" field did make a big difference for the
> oid_array case, but changing it to a char doesn't seem to help at all:
>
>   $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"'
>   Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)"
>     Time (mean ± σ):      1.653 s ±  0.009 s    [User: 1.607 s, System: 0.046 s]
>     Range (min … max):    1.640 s …  1.670 s    10 runs
>    
>   Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)"
>     Time (mean ± σ):      1.067 s ±  0.012 s    [User: 1.017 s, System: 0.050 s]
>     Range (min … max):    1.053 s …  1.089 s    10 runs
>    
>   Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)"
>     Time (mean ± σ):      1.092 s ±  0.013 s    [User: 1.046 s, System: 0.046 s]
>     Range (min … max):    1.080 s …  1.116 s    10 runs
>    
>   Summary
>     './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran
>       1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"'
>       1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"'

*Rubs eyes*

Maybe I'm being really stupid here, but doesn't this say the opposite of
your summary? I.e. that adding the "int" field in cf0983213c made it
faster. It was ~1.6s before, now it's ~1.1s? I haven't used "hyperfine"
before, but this output is consistent with that:
    
    $ hyperfine --warmup 2 -L v 1.6,1.0 'sleep {v}'
    Benchmark #1: sleep 1.6
      Time (mean ± σ):      1.604 s ±  0.000 s    [User: 2.2 ms, System: 1.1 ms]
      Range (min … max):    1.603 s …  1.604 s    10 runs
     
    Benchmark #2: sleep 1.0
      Time (mean ± σ):      1.004 s ±  0.000 s    [User: 2.5 ms, System: 0.9 ms]
      Range (min … max):    1.004 s …  1.004 s    10 runs
     
    Summary
      'sleep 1.0' ran
        1.60 ± 0.00 times faster than 'sleep 1.6'

[Goes and reads the downthread
<YVq5XCyLDr0SPBzx@coredump.intra.peff.net> where I see you made the same
discovery, should have probably done that first :)]

Anyway, just for experimenting I played with removing "int algo"
entirely on top of master. Diff on top at the end. That gives the same
sorts of results:
    
    $ hyperfine --warmup 2 -L v algo-02,algo-03,no-algo-02,no-algo-03 '~/g/git/git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"'
    Benchmark #1: ~/g/git/git.algo-02 cat-file --batch-all-objects --batch-check="%(objectname)"
      Time (mean ± σ):      1.283 s ±  0.022 s    [User: 1.217 s, System: 0.066 s]
      Range (min … max):    1.265 s …  1.336 s    10 runs
    
    Benchmark #2: ~/g/git/git.algo-03 cat-file --batch-all-objects --batch-check="%(objectname)"
      Time (mean ± σ):      1.261 s ±  0.032 s    [User: 1.195 s, System: 0.066 s]
      Range (min … max):    1.240 s …  1.343 s    10 runs
    
    Benchmark #3: ~/g/git/git.no-algo-02 cat-file --batch-all-objects --batch-check="%(objectname)"
      Time (mean ± σ):      1.911 s ±  0.034 s    [User: 1.853 s, System: 0.058 s]
      Range (min … max):    1.878 s …  1.977 s    10 runs
    
    Benchmark #4: ~/g/git/git.no-algo-03 cat-file --batch-all-objects --batch-check="%(objectname)"
      Time (mean ± σ):      1.880 s ±  0.011 s    [User: 1.824 s, System: 0.056 s]
      Range (min … max):    1.864 s …  1.905 s    10 runs
    
    Summary
      '~/g/git/git.algo-03 cat-file --batch-all-objects --batch-check="%(objectname)"' ran
        1.02 ± 0.03 times faster than '~/g/git/git.algo-02 cat-file --batch-all-objects --batch-check="%(objectname)"'
        1.49 ± 0.04 times faster than '~/g/git/git.no-algo-03 cat-file --batch-all-objects --batch-check="%(objectname)"'
        1.52 ± 0.05 times faster than '~/g/git/git.no-algo-02 cat-file --batch-all-objects --batch-check="%(objectname)"'

Of course I did that thinking it might be worthwhile to provide say a
compile-time option for SHA-1-only deployments of git, but that's when I
went along with your misreading of the hyperfine output, seems it
actually makes it fasterer :)

Aside: I recently started testing git on HP/UX's aCC, which is quite
whiny about things like this:

    "blame.c", line 2677: warning #4232-D: conversion from "struct object *" to a more strictly aligned type "struct commit *" may cause misaligned access
                found = (struct commit *)obj;

Which I initially thought might be changed by something like René's
patch, but is (I think) due to the timestamp_t in commit.h, presumably
it's 32bit on HP/UX still? And int is 64 (I haven't actually been able
to fully complie git yet, so...).

 Just an aside, but also wondering how if anything that & other
alignment might have to do with these results on platforms/compilers
that are less whiny about it. Or maybe it's all aligned on x86_64
(again, didn't check).

> I'm actually surprised it had this much of an impact. But I guess this
> benchmark really is mostly just memcpy-ing oids into a big array,
> sorting it, and printing the result. If that array is 12% bigger, we'd
> expect at least a 12% speedup. But adding in non-linear elements like
> growing the array (though I guess that is amortized linear) and sorting
> (which picks up an extra log(n) term) make the difference.
>
> It's _kind of_ silly in a sense, since usually you'd ask for other parts
> of the object, which will make the speed difference relatively smaller.
> But just dumping a bunch of oids is actually not an unreasonable thing
> to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too
> (even when you're using 20-byte sha1), but I don't think there's an easy
> way to get out of that.

[I wrote the below before seeing that you misread the "hyperfine"
output, but most of it applies still]:

Not easy, but it might not be super-duper-hard either, I think it might
be a worthwhile thing to try, and these results seem to back that up.

I had a mostly-working patch for that that I never submitted and just
hacked up one afternoon. I was experimenting with it for something
different: To have Git support a SHA-160 (completely non-standard, the
lowest is SHA-224), i.e. a SHA-256 truncated to SHA-1 size.

The advantage being that for say a system that has various
outside-of-git (e.g. DB columns) hardcoded to SHA-1 size you could
convert the repo to SHA-256, but use SHA-1-sized hashes publicly. Less
secure than SHA-256 obviously, but only proportionally to the reduction
in bits, and would still be less hairy than a SHA-1 derivative.

I didn't think it was worthwhile & threw it away after playing with it,
but anyway...

Once you've got that, which requires disconnecting the hard assumption
in hash.h and friends about hash type == hash length, i.e. oid_to_hex()
and friends need to have variants that take 160, 224 etc.

You can also do it the other way around. Say have a format like the
commit-graph reference 80 bit objects, but have the object store stored
in the full 256 bits.

That obviously creates all sorts of hairy edge cases where none exist
today. I.e. if you set the storage hash a SHA-1 repo to be 1/4 the size
that it is now you can't push/pull anything, or if you could we'd need
to piggy-back on the planned on-the-fly SHA-1<->SHA-256 rewriting.

But it might be interesting & useful for these performance
reasons. I.e. it's pretty easy (and I had some throwaway version of this
working with only the "occasional" segfault) to get a copy of "git"
running that say stores all OIDs as 1/4 their size.

As long as you know you've got no collisions you can use that as a quick
side-index. All your problems with interoperability are ones we'll need
to deal with for SHA-1<->SHA-256, although you'll have the extra
potential problem of collisions (which is X% likely depending on your
object numbers & size of truncation you choose).

Diff at end:

diff --git a/builtin/show-index.c b/builtin/show-index.c
index 0e0b9fb95bc..b63a3bf60b9 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -74,7 +74,6 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
 		for (i = 0; i < nr; i++) {
 			if (fread(entries[i].oid.hash, hashsz, 1, stdin) != 1)
 				die("unable to read sha1 %u/%u", i, nr);
-			entries[i].oid.algo = hash_algo_by_ptr(the_hash_algo);
 		}
 		for (i = 0; i < nr; i++)
 			if (fread(&entries[i].crc, 4, 1, stdin) != 1)
diff --git a/hash.h b/hash.h
index 9e25c40e9ac..bd1855b65ec 100644
--- a/hash.h
+++ b/hash.h
@@ -115,7 +115,6 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
-	int algo;	/* XXX requires 4-byte alignment */
 };
 
 /* A suitably aligned type for stack allocations of hash contexts. */
@@ -213,12 +212,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
 {
-	const struct git_hash_algo *algop;
-	if (!oid1->algo)
-		algop = the_hash_algo;
-	else
-		algop = &hash_algos[oid1->algo];
-	return hashcmp_algop(oid1->hash, oid2->hash, algop);
+	return hashcmp_algop(oid1->hash, oid2->hash, the_hash_algo);
 }
 
 static inline int hasheq_algop(const unsigned char *sha1, const unsigned char *sha2, const struct git_hash_algo *algop)
@@ -239,12 +233,7 @@ static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 
 static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)
 {
-	const struct git_hash_algo *algop;
-	if (!oid1->algo)
-		algop = the_hash_algo;
-	else
-		algop = &hash_algos[oid1->algo];
-	return hasheq_algop(oid1->hash, oid2->hash, algop);
+	return hasheq_algop(oid1->hash, oid2->hash, the_hash_algo);
 }
 
 static inline int is_null_oid(const struct object_id *oid)
@@ -260,23 +249,16 @@ static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
 static inline void oidcpy(struct object_id *dst, const struct object_id *src)
 {
 	memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ);
-	dst->algo = src->algo;
 }
 
 /* Like oidcpy() but zero-pads the unused bytes in dst's hash array. */
 static inline void oidcpy_with_padding(struct object_id *dst,
 				       const struct object_id *src)
 {
-	size_t hashsz;
-
-	if (!src->algo)
-		hashsz = the_hash_algo->rawsz;
-	else
-		hashsz = hash_algos[src->algo].rawsz;
+	size_t hashsz = the_hash_algo->rawsz;
 
 	memcpy(dst->hash, src->hash, hashsz);
 	memset(dst->hash + hashsz, 0, GIT_MAX_RAWSZ - hashsz);
-	dst->algo = src->algo;
 }
 
 static inline struct object_id *oiddup(const struct object_id *src)
@@ -294,13 +276,11 @@ static inline void hashclr(unsigned char *hash)
 static inline void oidclr(struct object_id *oid)
 {
 	memset(oid->hash, 0, GIT_MAX_RAWSZ);
-	oid->algo = hash_algo_by_ptr(the_hash_algo);
 }
 
 static inline void oidread(struct object_id *oid, const unsigned char *hash)
 {
 	memcpy(oid->hash, hash, the_hash_algo->rawsz);
-	oid->algo = hash_algo_by_ptr(the_hash_algo);
 }
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
@@ -325,7 +305,7 @@ static inline int is_empty_tree_oid(const struct object_id *oid)
 
 static inline void oid_set_algo(struct object_id *oid, const struct git_hash_algo *algop)
 {
-	oid->algo = hash_algo_by_ptr(algop);
+	return;
 }
 
 const char *empty_tree_oid_hex(void);
diff --git a/hex.c b/hex.c
index 4f64d346963..6538e415a37 100644
--- a/hex.c
+++ b/hex.c
@@ -143,7 +143,7 @@ char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
 
 char *oid_to_hex_r(char *buffer, const struct object_id *oid)
 {
-	return hash_to_hex_algop_r(buffer, oid->hash, &hash_algos[oid->algo]);
+	return hash_to_hex_algop_r(buffer, oid->hash, &hash_algos[GIT_HASH_SHA1]);
 }
 
 char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
@@ -161,5 +161,5 @@ char *hash_to_hex(const unsigned char *hash)
 
 char *oid_to_hex(const struct object_id *oid)
 {
-	return hash_to_hex_algop(oid->hash, &hash_algos[oid->algo]);
+	return hash_to_hex_algop(oid->hash, &hash_algos[GIT_HASH_SHA1]);
 }
diff --git a/http-push.c b/http-push.c
index 3309aaf004a..3ce453e14a4 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1011,8 +1011,6 @@ static void remote_ls(const char *path, int flags,
 /* extract hex from sharded "xx/x{38}" filename */
 static int get_oid_hex_from_objpath(const char *path, struct object_id *oid)
 {
-	oid->algo = hash_algo_by_ptr(the_hash_algo);
-
 	if (strlen(path) != the_hash_algo->hexsz + 1)
 		return -1;
 
diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..be1385c5e72 100644
--- a/object-file.c
+++ b/object-file.c
@@ -58,27 +58,21 @@
 
 static const struct object_id empty_tree_oid = {
 	.hash = EMPTY_TREE_SHA1_BIN_LITERAL,
-	.algo = GIT_HASH_SHA1,
 };
 static const struct object_id empty_blob_oid = {
 	.hash = EMPTY_BLOB_SHA1_BIN_LITERAL,
-	.algo = GIT_HASH_SHA1,
 };
 static const struct object_id null_oid_sha1 = {
 	.hash = {0},
-	.algo = GIT_HASH_SHA1,
 };
 static const struct object_id empty_tree_oid_sha256 = {
 	.hash = EMPTY_TREE_SHA256_BIN_LITERAL,
-	.algo = GIT_HASH_SHA256,
 };
 static const struct object_id empty_blob_oid_sha256 = {
 	.hash = EMPTY_BLOB_SHA256_BIN_LITERAL,
-	.algo = GIT_HASH_SHA256,
 };
 static const struct object_id null_oid_sha256 = {
 	.hash = {0},
-	.algo = GIT_HASH_SHA256,
 };
 
 static void git_hash_sha1_init(git_hash_ctx *ctx)
@@ -105,7 +99,6 @@ static void git_hash_sha1_final_oid(struct object_id *oid, git_hash_ctx *ctx)
 {
 	git_SHA1_Final(oid->hash, &ctx->sha1);
 	memset(oid->hash + GIT_SHA1_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA1_RAWSZ);
-	oid->algo = GIT_HASH_SHA1;
 }
 
 
@@ -137,7 +130,6 @@ static void git_hash_sha256_final_oid(struct object_id *oid, git_hash_ctx *ctx)
 	 * but keep it in case we extend the hash size again.
 	 */
 	memset(oid->hash + GIT_SHA256_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA256_RAWSZ);
-	oid->algo = GIT_HASH_SHA256;
 }
 
 static void git_hash_unknown_init(git_hash_ctx *ctx)
diff --git a/oidtree.c b/oidtree.c
index 0d39389bee2..61f4d9515b5 100644
--- a/oidtree.c
+++ b/oidtree.c
@@ -33,9 +33,6 @@ void oidtree_insert(struct oidtree *ot, const struct object_id *oid)
 	struct cb_node *on;
 	struct object_id k;
 
-	if (!oid->algo)
-		BUG("oidtree_insert requires oid->algo");
-
 	on = mem_pool_alloc(&ot->mem_pool, sizeof(*on) + sizeof(*oid));
 
 	/*
@@ -62,13 +59,6 @@ int oidtree_contains(struct oidtree *ot, const struct object_id *oid)
 
 	oidcpy_with_padding(&k, oid);
 
-	if (oid->algo == GIT_HASH_UNKNOWN)
-		klen -= sizeof(oid->algo);
-
-	/* cb_lookup relies on memcmp on the struct, so order matters: */
-	klen += BUILD_ASSERT_OR_ZERO(offsetof(struct object_id, hash) <
-				offsetof(struct object_id, algo));
-
 	return cb_lookup(&ot->tree, (const uint8_t *)&k, klen) ? 1 : 0;
 }
 
@@ -80,9 +70,6 @@ static enum cb_next iter(struct cb_node *n, void *arg)
 	/* Copy to provide 4-byte alignment needed by struct object_id. */
 	memcpy(&k, n->k, sizeof(k));
 
-	if (x->algo != GIT_HASH_UNKNOWN && x->algo != k.algo)
-		return CB_CONTINUE;
-
 	if (x->last_nibble_at) {
 		if ((k.hash[*x->last_nibble_at] ^ x->last_byte) & 0xf0)
 			return CB_CONTINUE;
@@ -100,7 +87,6 @@ void oidtree_each(struct oidtree *ot, const struct object_id *oid,
 
 	x.fn = fn;
 	x.arg = arg;
-	x.algo = oid->algo;
 	if (oidhexsz & 1) {
 		x.last_byte = oid->hash[klen];
 		x.last_nibble_at = &klen;
diff --git a/t/helper/test-oidtree.c b/t/helper/test-oidtree.c
index 180ee28dd93..6e22b422ddc 100644
--- a/t/helper/test-oidtree.c
+++ b/t/helper/test-oidtree.c
@@ -13,7 +13,7 @@ int cmd__oidtree(int argc, const char **argv)
 	struct oidtree ot;
 	struct strbuf line = STRBUF_INIT;
 	int nongit_ok;
-	int algo = GIT_HASH_UNKNOWN;
+	int algo = GIT_HASH_SHA1;
 
 	oidtree_init(&ot);
 	setup_git_directory_gently(&nongit_ok);
@@ -25,7 +25,6 @@ int cmd__oidtree(int argc, const char **argv)
 		if (skip_prefix(line.buf, "insert ", &arg)) {
 			if (get_oid_hex_any(arg, &oid) == GIT_HASH_UNKNOWN)
 				die("insert not a hexadecimal oid: %s", arg);
-			algo = oid.algo;
 			oidtree_insert(&ot, &oid);
 		} else if (skip_prefix(line.buf, "contains ", &arg)) {
 			if (get_oid_hex(arg, &oid))
@@ -37,7 +36,6 @@ int cmd__oidtree(int argc, const char **argv)
 			memcpy(buf, arg, strlen(arg));
 			buf[hash_algos[algo].hexsz] = '\0';
 			get_oid_hex_any(buf, &oid);
-			oid.algo = algo;
 			oidtree_each(&ot, &oid, strlen(arg), print_oid, NULL);
 		} else if (!strcmp(line.buf, "clear")) {
 			oidtree_clear(&ot);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] hash: reduce size of algo member of object ID
  2021-10-04  8:20   ` Jeff King
@ 2021-10-05  0:04     ` brian m. carlson
  2021-10-05 17:28     ` René Scharfe
  1 sibling, 0 replies; 7+ messages in thread
From: brian m. carlson @ 2021-10-05  0:04 UTC (permalink / raw)
  To: Jeff King
  Cc: René Scharfe, Git List, Andrzej Hunt, Eric Wong,
	Junio C Hamano, Carlo Marcelo Arenas Belón

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

On 2021-10-04 at 08:20:44, Jeff King wrote:
> Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes
> actually made it _faster_ than not having an algo field. Now I'm
> super-confused. I could believe that it gave us some better alignment,
> but the original struct was 32 bytes. 36 seems like a strictly worse
> number.

My guess is that the increased alignment means that memcpy can perform
much better.  Just because x86 has "fast" unaligned access doesn't mean
it's free; there remains a penalty for that, although other
architectures which support unaligned access have much worse ones.
memcpy and memcmp will perform better when they can use 32-bit chunks to
read without having to process the potentially unaligned pieces at the
beginning and end.

For the record, I have no particular stylistic opinion about whether we
should adopt the proposed patch, but of course if it's faster as it is,
we should probably leave it.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hash: reduce size of algo member of object ID
  2021-10-04  8:20   ` Jeff King
  2021-10-05  0:04     ` brian m. carlson
@ 2021-10-05 17:28     ` René Scharfe
  1 sibling, 0 replies; 7+ messages in thread
From: René Scharfe @ 2021-10-05 17:28 UTC (permalink / raw)
  To: Jeff King
  Cc: Git List, Andrzej Hunt, Eric Wong, Junio C Hamano,
	Carlo Marcelo Arenas Belón, brian m. carlson

Am 04.10.21 um 10:20 schrieb Jeff King:
> On Mon, Oct 04, 2021 at 04:13:34AM -0400, Jeff King wrote:
>
>> It looks like adding the "algo" field did make a big difference for the
>> oid_array case, but changing it to a char doesn't seem to help at all:
>>
>>   $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"'
>>   Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)"
>>     Time (mean ± σ):      1.653 s ±  0.009 s    [User: 1.607 s, System: 0.046 s]
>>     Range (min … max):    1.640 s …  1.670 s    10 runs
>>
>>   Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)"
>>     Time (mean ± σ):      1.067 s ±  0.012 s    [User: 1.017 s, System: 0.050 s]
>>     Range (min … max):    1.053 s …  1.089 s    10 runs
>>
>>   Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)"
>>     Time (mean ± σ):      1.092 s ±  0.013 s    [User: 1.046 s, System: 0.046 s]
>>     Range (min … max):    1.080 s …  1.116 s    10 runs
>>
>>   Summary
>>     './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran
>>       1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"'
>>       1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"'
>>
>> I'm actually surprised it had this much of an impact. But I guess this
>> benchmark really is mostly just memcpy-ing oids into a big array,
>> sorting it, and printing the result. If that array is 12% bigger, we'd
>> expect at least a 12% speedup. But adding in non-linear elements like
>> growing the array (though I guess that is amortized linear) and sorting
>> (which picks up an extra log(n) term) make the difference.
>>
>> It's _kind of_ silly in a sense, since usually you'd ask for other parts
>> of the object, which will make the speed difference relatively smaller.
>> But just dumping a bunch of oids is actually not an unreasonable thing
>> to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too
>> (even when you're using 20-byte sha1), but I don't think there's an easy
>> way to get out of that.
>
> Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes
> actually made it _faster_ than not having an algo field. Now I'm
> super-confused. I could believe that it gave us some better alignment,
> but the original struct was 32 bytes. 36 seems like a strictly worse
> number.

Apple clang 13 on arm64-apple-darwin20.6.0:

cf0983213c^ (no algo):
Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)"
  Time (mean ± σ):      1.105 s ±  0.017 s    [User: 1.058 s, System: 0.045 s]
  Range (min … max):    1.086 s …  1.126 s    10 runs

cf0983213c (int algo):
Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)"
  Time (mean ± σ):      1.186 s ±  0.015 s    [User: 1.134 s, System: 0.051 s]
  Range (min … max):    1.162 s …  1.201 s    10 runs

cf0983213c (unsigned char algo):
Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)"
  Time (mean ± σ):      1.409 s ±  0.013 s    [User: 1.367 s, System: 0.041 s]
  Range (min … max):    1.397 s …  1.429 s    10 runs


current master (int algo):
Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)"
  Time (mean ± σ):      1.217 s ±  0.018 s    [User: 1.170 s, System: 0.046 s]
  Range (min … max):    1.202 s …  1.246 s    10 runs

current master with the patch (unsigned char algo):
Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)"
  Time (mean ± σ):      1.465 s ±  0.012 s    [User: 1.417 s, System: 0.046 s]
  Range (min … max):    1.449 s …  1.480 s    10 runs

OK, I don't like my patch anymore.  Weird, though.

René

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-05 17:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-03  5:51 [PATCH] hash: reduce size of algo member of object ID René Scharfe
2021-10-04  7:31 ` Eric Wong
2021-10-04  8:13 ` Jeff King
2021-10-04  8:20   ` Jeff King
2021-10-05  0:04     ` brian m. carlson
2021-10-05 17:28     ` René Scharfe
2021-10-04  8:47   ` Ævar Arnfjörð Bjarmason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.