All of lore.kernel.org
 help / color / mirror / Atom feed
* struct hashmap_entry packing
@ 2014-07-28 17:17 Jeff King
  2014-07-29 20:40 ` Karsten Blees
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2014-07-28 17:17 UTC (permalink / raw)
  To: Karsten Blees; +Cc: git

Hi Karsten,

The hashmap_entry documentation claims:

  `struct hashmap_entry`::

	An opaque structure representing an entry in the hash table,
	which must be used as first member of user data structures.
	Ideally it should be followed by an int-sized member to prevent
	unused memory on 64-bit systems due to alignment.

I'm not sure if the statement about alignment is true. If I have a
struct like:

    struct magic {
	    struct hashmap_entry map;
	    int x;
    };

the statement above implies that I should be able to fit this into only
16 bytes on an LP64 system. But I can't convince gcc to do it. And I
think that makes sense, if you consider code like:

   memset(&magic.map, 0, sizeof(struct hashmap_entry));

The sizeof() has to be the same regardless of whether the hashmap_entry
is standalone or in another struct, and therefore must be padded up to
16 bytes. If we stored "x" in that padding in the combined struct, it
would be overwritten by our memset.

Am I missing anything? If this is the case, we should probably drop that
bit from the documentation. It's possible that we could get around it by
embedding the hashmap_entry elements directly into the parent struct,
but we would be counting on a reader dereferencing it as a hashmap_entry
seeing the members at the exact same offset. I'd imagine that's one of
those things that holds most of the time, but is violating the standard.
It's probably not worth it to save a few bytes.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct hashmap_entry packing
  2014-07-28 17:17 struct hashmap_entry packing Jeff King
@ 2014-07-29 20:40 ` Karsten Blees
  2014-08-01 22:37   ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Karsten Blees @ 2014-07-29 20:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 28.07.2014 19:17, schrieb Jeff King:
> Hi Karsten,
> 
> The hashmap_entry documentation claims:
> 
>   `struct hashmap_entry`::
> 
> 	An opaque structure representing an entry in the hash table,
> 	which must be used as first member of user data structures.
> 	Ideally it should be followed by an int-sized member to prevent
> 	unused memory on 64-bit systems due to alignment.
> 
> I'm not sure if the statement about alignment is true. If I have a
> struct like:
> 
>     struct magic {
> 	    struct hashmap_entry map;
> 	    int x;
>     };
> 
> the statement above implies that I should be able to fit this into only
> 16 bytes on an LP64 system. But I can't convince gcc to do it. And I
> think that makes sense, if you consider code like:
> 
>    memset(&magic.map, 0, sizeof(struct hashmap_entry));
> 
> The sizeof() has to be the same regardless of whether the hashmap_entry
> is standalone or in another struct, and therefore must be padded up to
> 16 bytes. If we stored "x" in that padding in the combined struct, it
> would be overwritten by our memset.
> 

The struct-packing patch was ultimately dropped because there was no way
to reliably make it work on all platforms. See [1] for discussion, [2] for
the final, 'most compatible' version.

> Am I missing anything? If this is the case, we should probably drop that
> bit from the documentation.

Hmmm. Now that we have "__attribute__((packed))" in pack-bitmap.h, perhaps
we should do the same for stuct hashmap_entry? (Which was the original
proposal anyway...). Only works for GCC, but that should cover most builds
/ platforms.

Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
binary format of .bitmap files is incompatible between GCC and other
builds, correct?

> It's possible that we could get around it by
> embedding the hashmap_entry elements directly into the parent struct,

Already tried that, see [3].

[1] http://article.gmane.org/gmane.comp.version-control.git/239069
[2] http://article.gmane.org/gmane.comp.version-control.git/241865
[3] http://article.gmane.org/gmane.comp.version-control.git/239435

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct hashmap_entry packing
  2014-07-29 20:40 ` Karsten Blees
@ 2014-08-01 22:37   ` Jeff King
  2014-08-01 23:10     ` [PATCH] pack-bitmap: do not use gcc packed attribute Jeff King
  2014-08-04 19:20     ` struct hashmap_entry packing Karsten Blees
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2014-08-01 22:37 UTC (permalink / raw)
  To: Karsten Blees; +Cc: git

On Tue, Jul 29, 2014 at 10:40:12PM +0200, Karsten Blees wrote:

> > The sizeof() has to be the same regardless of whether the hashmap_entry
> > is standalone or in another struct, and therefore must be padded up to
> > 16 bytes. If we stored "x" in that padding in the combined struct, it
> > would be overwritten by our memset.
> > 
> 
> The struct-packing patch was ultimately dropped because there was no way
> to reliably make it work on all platforms. See [1] for discussion, [2] for
> the final, 'most compatible' version.

Thanks for the pointers; I should have guessed there was more to it and
searched the archive myself.

> Hmmm. Now that we have "__attribute__((packed))" in pack-bitmap.h, perhaps
> we should do the same for stuct hashmap_entry? (Which was the original
> proposal anyway...). Only works for GCC, but that should cover most builds
> / platforms.

I don't see any reason to avoid the packed attribute, if it helps us. As
you noted, anything using __attribute__ probably supports it, and if
not, we can conditionally #define PACKED_STRUCT or something, like we do
for NORETURN. Since it's purely an optimization, if another compiler
doesn't use it, no big deal.

That being said, I don't know if those padding bytes are actually
causing a measurable slowdown. It may not even be worth the trouble.

> Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
> binary format of .bitmap files is incompatible between GCC and other
> builds, correct?

The on-disk format is defined by JGit; if there are differences between
the builds, that's a bug (and I would not be too surprised if there is
one, as bitmaps have gotten very extensive testing on 32- and 64-bit
gcc, but probably not much elsewhere).

We do use structs to represent disk structures in other bits of the
packfile code (e.g., struct pack_idx_header), but the struct is vanilla
enough that we assume every compiler gives us two tightly-packed 32-bit
integers without having to bother with the "packed" attribute (and it
seems to have worked in practice).

We should probably be more careful with that bitmap code. It looks like
it wouldn't be too bad to drop it. I'll see if I can come up with a
patch.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] pack-bitmap: do not use gcc packed attribute
  2014-08-01 22:37   ` Jeff King
@ 2014-08-01 23:10     ` Jeff King
  2014-08-01 23:12       ` Jeff King
  2014-08-04 19:19       ` Karsten Blees
  2014-08-04 19:20     ` struct hashmap_entry packing Karsten Blees
  1 sibling, 2 replies; 12+ messages in thread
From: Jeff King @ 2014-08-01 23:10 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Junio C Hamano, git

On Fri, Aug 01, 2014 at 06:37:39PM -0400, Jeff King wrote:

> > Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
> > binary format of .bitmap files is incompatible between GCC and other
> > builds, correct?
> 
> The on-disk format is defined by JGit; if there are differences between
> the builds, that's a bug (and I would not be too surprised if there is
> one, as bitmaps have gotten very extensive testing on 32- and 64-bit
> gcc, but probably not much elsewhere).
> 
> We do use structs to represent disk structures in other bits of the
> packfile code (e.g., struct pack_idx_header), but the struct is vanilla
> enough that we assume every compiler gives us two tightly-packed 32-bit
> integers without having to bother with the "packed" attribute (and it
> seems to have worked in practice).
> 
> We should probably be more careful with that bitmap code. It looks like
> it wouldn't be too bad to drop it. I'll see if I can come up with a
> patch.

I confirmed that this does break horribly without the packed attribute
(as you'd expect; it's asking for 48-bit alignment!). p5310 notices it,
_if_ you have jgit installed to check against.

Here's a fix.

-- >8 --
Subject: pack-bitmap: do not use gcc packed attribute

The "__attribute__" flag may be a noop on some compilers.
That's OK as long as the code is correct without the
attribute, but in this case it is not. We would typically
end up with a struct that is 2 bytes too long due to struct
padding, breaking both reading and writing of bitmaps.

We can work around this by using an array of unsigned char
to represent the data, and relying on get/put_be32 to handle
alignment issues as we interact with the array.

Signed-off-by: Jeff King <peff@peff.net>
---
The accessors may be overkill; each function is called only a single
time in the whole codebase. But doing it this way rather than accessing
entry[4] inline at least puts the magic constants all in one place.

 pack-bitmap-write.c | 10 ++++------
 pack-bitmap.c       | 12 ++++++------
 pack-bitmap.h       | 42 +++++++++++++++++++++++++++++++++++++-----
 3 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5f1791a..f885a7a 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -473,7 +473,7 @@ static void write_selected_commits_v1(struct sha1file *f,
 
 	for (i = 0; i < writer.selected_nr; ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
-		struct bitmap_disk_entry on_disk;
+		unsigned char on_disk[BITMAP_DISK_ENTRY_LEN];
 
 		int commit_pos =
 			sha1_pos(stored->commit->object.sha1, index, index_nr, sha1_access);
@@ -481,11 +481,9 @@ static void write_selected_commits_v1(struct sha1file *f,
 		if (commit_pos < 0)
 			die("BUG: trying to write commit not in index");
 
-		on_disk.object_pos = htonl(commit_pos);
-		on_disk.xor_offset = stored->xor_offset;
-		on_disk.flags = stored->flags;
-
-		sha1write(f, &on_disk, sizeof(on_disk));
+		bitmap_disk_entry_create(on_disk, commit_pos,
+					 stored->xor_offset, stored->flags);
+		sha1write(f, on_disk, sizeof(on_disk));
 		dump_bitmap(f, stored->write_as);
 	}
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 91e4101..1b2a473 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -203,7 +203,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 
 	uint32_t i;
 	struct stored_bitmap **recent_bitmaps;
-	struct bitmap_disk_entry *entry;
+	unsigned char *entry;
 
 	recent_bitmaps = xcalloc(MAX_XOR_OFFSET, sizeof(struct stored_bitmap));
 
@@ -214,14 +214,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		uint32_t commit_idx_pos;
 		const unsigned char *sha1;
 
-		entry = (struct bitmap_disk_entry *)(index->map + index->map_pos);
-		index->map_pos += sizeof(struct bitmap_disk_entry);
+		entry = index->map + index->map_pos;
+		index->map_pos += BITMAP_DISK_ENTRY_LEN;
 
-		commit_idx_pos = ntohl(entry->object_pos);
+		commit_idx_pos = bitmap_disk_entry_object_pos(entry);
 		sha1 = nth_packed_object_sha1(index->pack, commit_idx_pos);
 
-		xor_offset = (int)entry->xor_offset;
-		flags = (int)entry->flags;
+		xor_offset = (int)bitmap_disk_entry_xor_offset(entry);
+		flags = (int)bitmap_disk_entry_flags(entry);
 
 		bitmap = read_bitmap_1(index);
 		if (!bitmap)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 8b7f4e9..0d57706 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -5,11 +5,43 @@
 #include "khash.h"
 #include "pack-objects.h"
 
-struct bitmap_disk_entry {
-	uint32_t object_pos;
-	uint8_t xor_offset;
-	uint8_t flags;
-} __attribute__((packed));
+/*
+ * This is the equivalent of:
+ *
+ *	uint32_t object_pos;
+ *	uint8_t xor_offset;
+ *	uint8_t flags;
+ *
+ * but due to the funny sizing, we cannot rely on the compiler to give us the
+ * exact struct packing we want. So let's treat it as an array and just provide
+ * a few helpers for accessing the components.
+ */
+#define BITMAP_DISK_ENTRY_LEN 6
+
+static inline void bitmap_disk_entry_create(unsigned char *on_disk,
+					    uint32_t object_pos,
+					    uint8_t xor_offset,
+					    uint8_t flags)
+{
+	put_be32(on_disk, object_pos);
+	on_disk[4] = xor_offset;
+	on_disk[5] = flags;
+}
+
+static inline uint32_t bitmap_disk_entry_object_pos(unsigned char *on_disk)
+{
+	return get_be32(on_disk);
+}
+
+static inline uint8_t bitmap_disk_entry_xor_offset(unsigned char *on_disk)
+{
+	return on_disk[4];
+}
+
+static inline uint8_t bitmap_disk_entry_flags(unsigned char *on_disk)
+{
+	return on_disk[5];
+}
 
 struct bitmap_disk_header {
 	char magic[4];
-- 
2.1.0.rc0.286.g5c67d74

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] pack-bitmap: do not use gcc packed attribute
  2014-08-01 23:10     ` [PATCH] pack-bitmap: do not use gcc packed attribute Jeff King
@ 2014-08-01 23:12       ` Jeff King
  2014-08-04 19:19       ` Karsten Blees
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff King @ 2014-08-01 23:12 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Junio C Hamano, git

On Fri, Aug 01, 2014 at 07:10:44PM -0400, Jeff King wrote:

> I confirmed that this does break horribly without the packed attribute
> (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it,
> _if_ you have jgit installed to check against.

Er, that should be t5310, of course. p5310 is the perf test, which does
not notice the problem. :)

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pack-bitmap: do not use gcc packed attribute
  2014-08-01 23:10     ` [PATCH] pack-bitmap: do not use gcc packed attribute Jeff King
  2014-08-01 23:12       ` Jeff King
@ 2014-08-04 19:19       ` Karsten Blees
  2014-08-05 18:38         ` Vicent Martí
  2014-08-05 18:47         ` Jeff King
  1 sibling, 2 replies; 12+ messages in thread
From: Karsten Blees @ 2014-08-04 19:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Am 02.08.2014 01:10, schrieb Jeff King:
> On Fri, Aug 01, 2014 at 06:37:39PM -0400, Jeff King wrote:
> 
>>> Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
>>> binary format of .bitmap files is incompatible between GCC and other
>>> builds, correct?
>>
>> The on-disk format is defined by JGit; if there are differences between
>> the builds, that's a bug (and I would not be too surprised if there is
>> one, as bitmaps have gotten very extensive testing on 32- and 64-bit
>> gcc, but probably not much elsewhere).
>>
>> We do use structs to represent disk structures in other bits of the
>> packfile code (e.g., struct pack_idx_header), but the struct is vanilla
>> enough that we assume every compiler gives us two tightly-packed 32-bit
>> integers without having to bother with the "packed" attribute (and it
>> seems to have worked in practice).
>>
>> We should probably be more careful with that bitmap code. It looks like
>> it wouldn't be too bad to drop it. I'll see if I can come up with a
>> patch.
> 
> I confirmed that this does break horribly without the packed attribute
> (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it,
> _if_ you have jgit installed to check against.
> 
> Here's a fix.
> 
> Subject: pack-bitmap: do not use gcc packed attribute
> 
> The "__attribute__" flag may be a noop on some compilers.
> That's OK as long as the code is correct without the
> attribute, but in this case it is not. We would typically
> end up with a struct that is 2 bytes too long due to struct
> padding, breaking both reading and writing of bitmaps.
> 
> We can work around this by using an array of unsigned char
> to represent the data, and relying on get/put_be32 to handle
> alignment issues as we interact with the array.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> The accessors may be overkill; each function is called only a single
> time in the whole codebase. But doing it this way rather than accessing
> entry[4] inline at least puts the magic constants all in one place.
> 
[...]

Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
structure directly, without copying to a uchar[6] first.

When writing, sha1write already buffers the data, so calling this with 4/1/1
bytes of payload shouldn't affect performance.

Similarly for reading - we already have a function to read a bitmap and
advance the 'file' position, why not have similar functions to read uint8/32
in a stream-based fashion?

This raises the question why we read via mmap at all (without munmap()ing the
file when done...). We copy all data into internal data structures anyway. Is
an fopen/fread-based solution (with fread_u8/_u32 helpers) that much slower?


Here's what I came up with (just a sketch, commit message is lacky and the
helper functions deserve a better place / name):

----8<-----
Subject: [PATCH] pack-bitmap: do not use packed structs to read / write bitmap
 files

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 pack-bitmap-write.c | 18 +++++++++++++-----
 pack-bitmap.c       | 21 ++++++++++++++-------
 pack-bitmap.h       |  6 ------
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5f1791a..01995cb 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -465,6 +465,16 @@ static const unsigned char *sha1_access(size_t pos, void *table)
 	return index[pos]->sha1;
 }
 
+static inline void sha1write_u8(struct sha1file *f, uint8_t data)
+{
+	sha1write(f, &data, sizeof(data));
+}
+static inline void sha1write_u32(struct sha1file *f, uint32_t data)
+{
+	data = htonl(data);
+	sha1write(f, &data, sizeof(data));
+}
+
 static void write_selected_commits_v1(struct sha1file *f,
 				      struct pack_idx_entry **index,
 				      uint32_t index_nr)
@@ -473,7 +483,6 @@ static void write_selected_commits_v1(struct sha1file *f,
 
 	for (i = 0; i < writer.selected_nr; ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
-		struct bitmap_disk_entry on_disk;
 
 		int commit_pos =
 			sha1_pos(stored->commit->object.sha1, index, index_nr, sha1_access);
@@ -481,11 +490,10 @@ static void write_selected_commits_v1(struct sha1file *f,
 		if (commit_pos < 0)
 			die("BUG: trying to write commit not in index");
 
-		on_disk.object_pos = htonl(commit_pos);
-		on_disk.xor_offset = stored->xor_offset;
-		on_disk.flags = stored->flags;
+		sha1write_u32(f, commit_pos);
+		sha1write_u8(f, stored->xor_offset);
+		sha1write_u8(f, stored->flags);
 
-		sha1write(f, &on_disk, sizeof(on_disk));
 		dump_bitmap(f, stored->write_as);
 	}
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 91e4101..cb1b2dd 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -197,13 +197,23 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 	return stored;
 }
 
+static inline uint32_t read_u32(const unsigned char *buffer, size_t *pos)
+{
+	uint32_t result = get_be32(buffer + *pos);
+	(*pos) += sizeof(result);
+	return result;
+}
+static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
+{
+	return buffer[(*pos)++];
+}
+
 static int load_bitmap_entries_v1(struct bitmap_index *index)
 {
 	static const size_t MAX_XOR_OFFSET = 160;
 
 	uint32_t i;
 	struct stored_bitmap **recent_bitmaps;
-	struct bitmap_disk_entry *entry;
 
 	recent_bitmaps = xcalloc(MAX_XOR_OFFSET, sizeof(struct stored_bitmap));
 
@@ -214,15 +224,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		uint32_t commit_idx_pos;
 		const unsigned char *sha1;
 
-		entry = (struct bitmap_disk_entry *)(index->map + index->map_pos);
-		index->map_pos += sizeof(struct bitmap_disk_entry);
+		commit_idx_pos = read_u32(index->map, &index->map_pos);
+		xor_offset = (int) read_u8(index->map, &index->map_pos);
+		flags = (int) read_u8(index->map, &index->map_pos);
 
-		commit_idx_pos = ntohl(entry->object_pos);
 		sha1 = nth_packed_object_sha1(index->pack, commit_idx_pos);
 
-		xor_offset = (int)entry->xor_offset;
-		flags = (int)entry->flags;
-
 		bitmap = read_bitmap_1(index);
 		if (!bitmap)
 			return -1;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 8b7f4e9..487600b 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -5,12 +5,6 @@
 #include "khash.h"
 #include "pack-objects.h"
 
-struct bitmap_disk_entry {
-	uint32_t object_pos;
-	uint8_t xor_offset;
-	uint8_t flags;
-} __attribute__((packed));
-
 struct bitmap_disk_header {
 	char magic[4];
 	uint16_t version;
-- 
2.0.3.920.g16a4828.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: struct hashmap_entry packing
  2014-08-01 22:37   ` Jeff King
  2014-08-01 23:10     ` [PATCH] pack-bitmap: do not use gcc packed attribute Jeff King
@ 2014-08-04 19:20     ` Karsten Blees
  2014-08-05 18:51       ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Karsten Blees @ 2014-08-04 19:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 02.08.2014 00:37, schrieb Jeff King:
> On Tue, Jul 29, 2014 at 10:40:12PM +0200, Karsten Blees wrote:
> 
>>> The sizeof() has to be the same regardless of whether the hashmap_entry
>>> is standalone or in another struct, and therefore must be padded up to
>>> 16 bytes. If we stored "x" in that padding in the combined struct, it
>>> would be overwritten by our memset.
>>>
>>
>> The struct-packing patch was ultimately dropped because there was no way
>> to reliably make it work on all platforms. See [1] for discussion, [2] for
>> the final, 'most compatible' version.
> 
> Thanks for the pointers; I should have guessed there was more to it and
> searched the archive myself.
> 
>> Hmmm. Now that we have "__attribute__((packed))" in pack-bitmap.h, perhaps
>> we should do the same for stuct hashmap_entry? (Which was the original
>> proposal anyway...). Only works for GCC, but that should cover most builds
>> / platforms.
> 
> I don't see any reason to avoid the packed attribute, if it helps us. As
> you noted, anything using __attribute__ probably supports it, and if
> not, we can conditionally #define PACKED_STRUCT or something, like we do
> for NORETURN. Since it's purely an optimization, if another compiler
> doesn't use it, no big deal.
> 
> That being said, I don't know if those padding bytes are actually
> causing a measurable slowdown. It may not even be worth the trouble.
> 

Its not about performance (or correctness, in case of platforms that don't
support unaligned read), just about saving memory (e.g. mapping int to int
requires 24 bytes per entry, vs. 16 with packed structs).

The padding at the end of a structure is only needed for proper alignment in
arrays. Struct hashmap_entry is always used as prefix of some other structure,
never as an array, so there are no alignment issues here.

Typical memory layouts on 64-bit platforms are as follows (note that even in
the 'followed by int64' case, all members are properly aligned):


Unpacked struct followed by int32 - wastes 1/3 of memory:

      struct {
        struct hashmap_entry {
00-07     struct hashmap_entry *next;
08-11     int hash;
12-15     // padding
        } ent;
16-19   int32_t value;
20-23   // padding
      }


Packed struct followed by int32:

      struct {
        struct hashmap_entry {
00-07     struct hashmap_entry *next;
08-11     int hash;
        } ent;
12-15   int32_t value;
      }


Packed struct followed by int64:

      struct {
        struct hashmap_entry {
00-07     struct hashmap_entry *next;
08-11     int hash;
        } ent;
12-15   // padding
16-23   int64_t value;
      }


Array of packed struct (not used):

      struct hashmap_entry {
00-07   struct hashmap_entry *next;
08-11   int hash;
      }; // [0]
      struct hashmap_entry {
12-19   struct hashmap_entry *next; // !!!unaligned!!!
20-23   int hash;
      }; // [1]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pack-bitmap: do not use gcc packed attribute
  2014-08-04 19:19       ` Karsten Blees
@ 2014-08-05 18:38         ` Vicent Martí
  2014-08-05 18:47         ` Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Vicent Martí @ 2014-08-05 18:38 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Jeff King, Junio C Hamano, git

On Mon, Aug 4, 2014 at 9:19 PM, Karsten Blees <karsten.blees@gmail.com> wrote:
> This raises the question why we read via mmap at all

The first version of the pack bitmap format I wrote for GitHub was 50%
faster to load than this one because it was designed to be mmapable.
Eventually we moved to the JGit-compatible bitmap format (because I
get paid a lot of money to do as I'm told -- not because of some
inherent benefit of the JGit format), which needs to be read
sequentially, but I never bothered to change the mmap reading code.

I believe your patch makes a lot of sense -- at this point we could as
well remove the mmaping altogether and read the file sequentially.

Cheers,
vmg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pack-bitmap: do not use gcc packed attribute
  2014-08-04 19:19       ` Karsten Blees
  2014-08-05 18:38         ` Vicent Martí
@ 2014-08-05 18:47         ` Jeff King
  2014-08-06 18:58           ` Karsten Blees
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2014-08-05 18:47 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Junio C Hamano, git

On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote:

> Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
> structure directly, without copying to a uchar[6] first.

Probably. My initial attempt was to keep together the read/write logic
about which sizes each item is, but I think the result ended up
unnecessarily verbose and harder to follow.

> Here's what I came up with (just a sketch, commit message is lacky and the
> helper functions deserve a better place / name):

I like it. Want to clean it up and submit in place of mine?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct hashmap_entry packing
  2014-08-04 19:20     ` struct hashmap_entry packing Karsten Blees
@ 2014-08-05 18:51       ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2014-08-05 18:51 UTC (permalink / raw)
  To: Karsten Blees; +Cc: git

On Mon, Aug 04, 2014 at 09:20:02PM +0200, Karsten Blees wrote:

> > I don't see any reason to avoid the packed attribute, if it helps us. As
> > you noted, anything using __attribute__ probably supports it, and if
> > not, we can conditionally #define PACKED_STRUCT or something, like we do
> > for NORETURN. Since it's purely an optimization, if another compiler
> > doesn't use it, no big deal.
> > 
> > That being said, I don't know if those padding bytes are actually
> > causing a measurable slowdown. It may not even be worth the trouble.
> > 
> 
> Its not about performance (or correctness, in case of platforms that don't
> support unaligned read), just about saving memory (e.g. mapping int to int
> requires 24 bytes per entry, vs. 16 with packed structs).

The biggest things we might map are probably one entry per-object. So in
a repository like linux.git, we're talking about 32MB in the worst case.
That's not nothing, but it's also not the end of the world. I'd be more
concerned with how that trashes the cache (and consequently causes
slowdown) than somebody running out of memory.

So my general opinion is that if it's easy to get the space back, great.
But if it creates a maintenance hassle, it's not worth the effort.

That said, I really don't think it would be much maintenance hassle to
mark the hashmap_entry as packed, and compilers can either handle it or
not.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pack-bitmap: do not use gcc packed attribute
  2014-08-05 18:47         ` Jeff King
@ 2014-08-06 18:58           ` Karsten Blees
  2014-08-06 19:32             ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Karsten Blees @ 2014-08-06 18:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Am 05.08.2014 20:47, schrieb Jeff King:
> On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote:
> 
>> Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
>> structure directly, without copying to a uchar[6] first.
> 
> Probably. My initial attempt was to keep together the read/write logic
> about which sizes each item is, but I think the result ended up
> unnecessarily verbose and harder to follow.
> 

Yeah, having read / write logic in different files is confusing, esp. when
not using structs to define the file format.

>> Here's what I came up with (just a sketch, commit message is lacky and the
>> helper functions deserve a better place / name):
> 
> I like it. Want to clean it up and submit in place of mine?
> 

Will do, but it will have to wait till next week.

> -Peff
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pack-bitmap: do not use gcc packed attribute
  2014-08-06 18:58           ` Karsten Blees
@ 2014-08-06 19:32             ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2014-08-06 19:32 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Jeff King, git

Karsten Blees <karsten.blees@gmail.com> writes:

> Am 05.08.2014 20:47, schrieb Jeff King:
>> On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote:
>> 
>>> Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
>>> structure directly, without copying to a uchar[6] first.
>> 
>> Probably. My initial attempt was to keep together the read/write logic
>> about which sizes each item is, but I think the result ended up
>> unnecessarily verbose and harder to follow.
>> 
>
> Yeah, having read / write logic in different files is confusing, esp. when
> not using structs to define the file format.
>
>>> Here's what I came up with (just a sketch, commit message is lacky and the
>>> helper functions deserve a better place / name):
>> 
>> I like it. Want to clean it up and submit in place of mine?
>> 
>
> Will do, but it will have to wait till next week.

Thanks, both.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-06 19:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-28 17:17 struct hashmap_entry packing Jeff King
2014-07-29 20:40 ` Karsten Blees
2014-08-01 22:37   ` Jeff King
2014-08-01 23:10     ` [PATCH] pack-bitmap: do not use gcc packed attribute Jeff King
2014-08-01 23:12       ` Jeff King
2014-08-04 19:19       ` Karsten Blees
2014-08-05 18:38         ` Vicent Martí
2014-08-05 18:47         ` Jeff King
2014-08-06 18:58           ` Karsten Blees
2014-08-06 19:32             ` Junio C Hamano
2014-08-04 19:20     ` struct hashmap_entry packing Karsten Blees
2014-08-05 18:51       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.