All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Eric Wong <e@80x24.org>
Cc: Andrzej Hunt <andrzej@ahunt.org>,
	git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache
Date: Tue, 10 Aug 2021 21:40:45 +0200	[thread overview]
Message-ID: <d7e11b65-cd20-6a0f-7440-6003578aa680@web.de> (raw)
In-Reply-To: <20210807224957.GA5068@dcvr>

Am 08.08.21 um 00:49 schrieb Eric Wong:
> René Scharfe <l.s.r@web.de> wrote:
>> Am 06.08.21 um 17:31 schrieb Andrzej Hunt:
>>> On 29/06/2021 22:53, Eric Wong wrote:
>>>> [...snip...]
>>>> diff --git a/oidtree.c b/oidtree.c
>>>> new file mode 100644
>>>> index 0000000000..c1188d8f48
>>>> --- /dev/null
>>>> +++ b/oidtree.c
>
>>>> +struct oidtree_node {
>>>> +    /* n.k[] is used to store "struct object_id" */
>>>> +    struct cb_node n;
>>>> +};
>>>> +
>>>> [... snip ...]
>>>> +
>>>> +void oidtree_insert(struct oidtree *ot, const struct object_id *oid)
>>>> +{
>>>> +    struct oidtree_node *on;
>>>> +
>>>> +    if (!ot->mempool)
>>>> +        ot->mempool = allocate_alloc_state();
>>>> +    if (!oid->algo)
>>>> +        BUG("oidtree_insert requires oid->algo");
>>>> +
>>>> +    on = alloc_from_state(ot->mempool, sizeof(*on) + sizeof(*oid));
>>>> +    oidcpy_with_padding((struct object_id *)on->n.k, oid);
>>>
>>> I think this object_id cast introduced undefined behaviour - here's
>>> my layperson's interepretation of what's going on (full UBSAN output
>>> is pasted below):
>>>
>>> cb_node.k is a uint8_t[], and hence can be 1-byte aligned (on my
>>> machine: offsetof(struct cb_node, k) == 21). We're casting its
>>> pointer to "struct object_id *", and later try to access
>>> object_id.hash within oidcpy_with_padding. My compiler assumes that
>>> an object_id pointer needs to be 4-byte aligned, and reading from a
>>> misaligned pointer means we hit undefined behaviour. (I think the
>>> 4-byte alignment requirement comes from the fact that object_id's
>>> largest member is an int?)
>
> I seem to recall struct alignment requirements being
> architecture-dependent; and x86/x86-64 are the most liberal
> w.r.t alignment requirements.
>
>>> I'm not sure what an elegant and idiomatic fix might be - IIUC it's
>>> hard to guarantee misaligned access can't happen with a flex array
>>> that's being used for arbitrary data (you would presumably have to
>>> declare it as an array of whatever the largest supported type is, so
>>> that you can guarantee correct alignment even when cbtree is used
>>> with that type) - which might imply that k needs to be declared as a
>>> void pointer? That in turn would make cbtree.c harder to read.
>>
>> C11 has alignas.  We could also make the member before the flex array,
>> otherbits, wider, e.g. promote it to uint32_t.
>
> Ugh, no.  cb_node should be as small as possible and (for our
> current purposes) ->byte could be uint8_t.

Well, we can make both byte and otherbits uint16_t.  That would require
a good comment explaining the reasoning and probably some rework later,
but might be the least intrusive solution for now.

>> A more parsimonious solution would be to turn the int member of struct
>> object_id, algo, into an unsigned char for now and reconsider the issue
>> once we support our 200th algorithm or so.
>
> Yes, making struct object_id smaller would benefit all git users
> (at least for the next few centuries :P).

True, we're currently using 4 bytes to distinguish between SHA-1 and
SHA-256, i.e. to represent a single bit.  Reducing the size of struct
object_id from 36 bytes to 33 bytes seems quite significant.

I don't know how important the 4-byte alignment is, though.  cf0983213c
(hash: add an algo member to struct object_id, 2021-04-26) doesn't
mention it, but the notes code seems to rely on it -- strange.

Overall this seems to be a good way to go -- after the next release.

René

  parent reply	other threads:[~2021-08-10 19:41 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24  0:58 [PATCH] speed up alt_odb_usable() with many alternates Eric Wong
2021-06-27  2:47 ` [PATCH 0/5] optimizations for many odb alternates Eric Wong
2021-06-27  2:47   ` [PATCH 1/5] speed up alt_odb_usable() with many alternates Eric Wong
2021-06-27  2:47   ` [PATCH 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-06-27  2:47   ` [PATCH 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-06-27 10:23     ` René Scharfe
2021-06-28 23:09       ` Eric Wong
2021-06-27  2:47   ` [PATCH 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-06-27  2:47   ` [PATCH 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-06-29 14:42     ` Junio C Hamano
2021-06-29 20:17       ` Eric Wong
2021-06-29 20:53   ` [PATCH v2 0/5] optimizations for many alternates Eric Wong
2021-07-07 23:10     ` [PATCH v3 " Eric Wong
2021-07-07 23:10     ` [PATCH v3 1/5] speed up alt_odb_usable() with " Eric Wong
2021-07-08  0:20       ` Junio C Hamano
2021-07-08  1:14         ` Eric Wong
2021-07-08  4:30           ` Junio C Hamano
2021-07-07 23:10     ` [PATCH v3 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-07-08  4:57       ` Junio C Hamano
2021-07-07 23:10     ` [PATCH v3 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-07-07 23:10     ` [PATCH v3 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-07-07 23:10     ` [PATCH v3 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-06-29 20:53   ` [PATCH v2 1/5] speed up alt_odb_usable() with many alternates Eric Wong
2021-07-03 10:05     ` René Scharfe
2021-07-04  9:02       ` René Scharfe
2021-07-06 23:01       ` Eric Wong
2021-06-29 20:53   ` [PATCH v2 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-06-29 20:53   ` [PATCH v2 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-06-29 20:53   ` [PATCH v2 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-06-29 20:53   ` [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-07-04  9:02     ` René Scharfe
2021-07-06 23:21       ` Eric Wong
2021-07-04  9:32     ` Ævar Arnfjörð Bjarmason
2021-07-07 23:12       ` Eric Wong
2021-08-06 15:31     ` Andrzej Hunt
2021-08-06 17:53       ` René Scharfe
2021-08-07 22:49         ` Eric Wong
2021-08-09  1:35           ` Carlo Arenas
2021-08-09  1:38             ` [PATCH/RFC 0/3] pedantic errors in next Carlo Marcelo Arenas Belón
2021-08-09  1:38               ` [PATCH/RFC 1/3] oidtree: avoid nested struct oidtree_node Carlo Marcelo Arenas Belón
2021-08-09  1:38               ` [PATCH/RFC 2/3] object-store: avoid extra ';' from KHASH_INIT Carlo Marcelo Arenas Belón
2021-08-09 15:53                 ` Junio C Hamano
2021-08-09  1:38               ` [PATCH/RFC 3/3] ci: run a pedantic build as part of the GitHub workflow Carlo Marcelo Arenas Belón
2021-08-09 10:50                 ` Bagas Sanjaya
2021-08-09 22:03                   ` Carlo Arenas
2021-08-09 14:56                 ` Phillip Wood
2021-08-09 22:48                   ` Carlo Arenas
2021-08-10 15:24                     ` Phillip Wood
2021-08-10 18:25                       ` Junio C Hamano
2021-08-30 11:36                   ` Ævar Arnfjörð Bjarmason
2021-08-31 20:28                     ` Carlo Arenas
2021-08-31 20:51                       ` Ævar Arnfjörð Bjarmason
2021-08-31 23:54                         ` Carlo Arenas
2021-09-01  1:52                           ` Jeff King
2021-09-01 17:55                             ` Junio C Hamano
2021-08-30 11:40                 ` Ævar Arnfjörð Bjarmason
2021-09-01  9:19                 ` [RFC PATCH v2 0/4] developer: support pedantic Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 1/4] developer: retire USE_PARENS_AROUND_GETTEXT_N support Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 2/4] developer: enable pedantic by default Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 3/4] developer: add an alternative script for detecting broken N_() Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 4/4] developer: move detect-compiler out of the main directory Carlo Marcelo Arenas Belón
2021-09-01 10:10                   ` [RFC PATCH v2 0/4] developer: support pedantic Jeff King
2021-09-01 11:25                     ` [PATCH] gettext: remove optional non-standard parens in N_() definition Ævar Arnfjörð Bjarmason
2021-09-01 17:31                       ` Eric Sunshine
2021-09-02  9:13                       ` Jeff King
2021-09-02 19:19                       ` Junio C Hamano
2021-09-01 11:27                     ` [RFC PATCH v2 0/4] developer: support pedantic Ævar Arnfjörð Bjarmason
2021-09-01 18:03                       ` Carlo Arenas
2021-09-03 17:02                   ` [PATCH v3 0/3] support pedantic in developer mode Carlo Marcelo Arenas Belón
2021-09-03 17:02                     ` [PATCH v3 1/3] gettext: remove optional non-standard parens in N_() definition Carlo Marcelo Arenas Belón
2021-09-10 15:39                       ` Ævar Arnfjörð Bjarmason
2021-09-03 17:02                     ` [PATCH v3 2/3] win32: allow building with pedantic mode enabled Carlo Marcelo Arenas Belón
2021-09-03 18:47                       ` René Scharfe
2021-09-03 20:13                         ` Carlo Marcelo Arenas Belón
2021-09-03 20:32                           ` Junio C Hamano
2021-09-03 20:38                           ` René Scharfe
2021-09-04  9:37                             ` René Scharfe
2021-09-04 14:42                               ` Carlo Arenas
2021-09-27 23:04                       ` Jonathan Tan
2021-09-28  0:30                         ` Carlo Arenas
2021-09-28 16:50                           ` Jonathan Tan
2021-09-28 17:37                           ` Junio C Hamano
2021-09-28 20:16                             ` Jonathan Tan
2021-09-29  1:00                               ` Carlo Arenas
2021-09-29 15:55                                 ` Junio C Hamano
2021-09-03 17:02                     ` [PATCH v3 3/3] developer: enable pedantic by default Carlo Marcelo Arenas Belón
2021-09-05  7:54                     ` [PATCH v3 0/3] support pedantic in developer mode Ævar Arnfjörð Bjarmason
2021-08-09 16:44               ` [PATCH/RFC 0/3] pedantic errors in next Junio C Hamano
2021-08-09 20:10                 ` Eric Wong
2021-08-10  6:16                 ` Carlo Marcelo Arenas Belón
2021-08-10 19:30                   ` René Scharfe
2021-08-10 23:49                     ` Carlo Arenas
2021-08-11  0:57                       ` Carlo Arenas
2021-08-11 14:57                       ` René Scharfe
2021-08-11 17:20                         ` Junio C Hamano
2021-08-10 18:59             ` [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache René Scharfe
2021-08-10 19:40           ` René Scharfe [this message]
2021-08-14 20:00       ` [PATCH] oidtree: avoid unaligned access to crit-bit tree René Scharfe
2021-08-16 19:11         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7e11b65-cd20-6a0f-7440-6003578aa680@web.de \
    --to=l.s.r@web.de \
    --cc=andrzej@ahunt.org \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.