From: Phillip Wood <phillip.wood123@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Jeff King <peff@peff.net>
Subject: Re: [PATCH 7/7] xdiff: remove xdl_free(), use free() instead
Date: Wed, 13 Jul 2022 14:00:07 +0100 [thread overview]
Message-ID: <bbfa1340-9cab-35d7-2245-f6f8369d5cd4@gmail.com> (raw)
In-Reply-To: <220711.865yk47x54.gmgdl@evledraar.gmail.com>
On 11/07/2022 11:02, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jul 11 2022, Phillip Wood wrote:
>
>> Hi Ævar
>>
>> On 08/07/2022 22:26, Ævar Arnfjörð Bjarmason wrote:
>>> On Fri, Jul 08 2022, Phillip Wood wrote:
>>>
>>>> Hi Ævar
>>>>
>>>> On 08/07/2022 15:20, Ævar Arnfjörð Bjarmason wrote:
>>>>> Remove the xdl_free() wrapper macro in favor of using free()
>>>>> directly. The wrapper macro was brought in with the initial import of
>>>>> xdiff/ in 3443546f6ef (Use a *real* built-in diff generator,
>>>>> 2006-03-24).
>>>>> As subsequent discussions on the topic[1] have made clear there's no
>>>>> reason to use this wrapper.
>>>>
>>>> That link is to a message where you assert that there is no reason for
>>>> the wrapper, you seem to be the only person in that thread making that
>>>> argument. The libgit2 maintainer and others are arguing that it is
>>>> worth keeping to make it easy to swap the allocator.
>>> Arguing that it's not needed for a technical reason, but an
>>> "aesthetic
>>> and ergonomic one", per:
>>> https://lore.kernel.org/git/20220217225804.GC7@edef91d97c94/;
>>> Perhaps I misread it, but I took this as Junio concurring with what
>>> I
>>> was pointing out there:
>>> https://lore.kernel.org/git/xmqqfsohbdre.fsf@gitster.g/
>>>
>>>>> Both git itself as well as any external
>>>>> users such as libgit2 compile the xdiff/* code as part of their own
>>>>> compilation, and can thus find the right malloc() and free() at
>>>>> link-time.
>>>>
>>>> I'm struggling to see how that is simpler than the current situation
>>>> with xdl_malloc().
>>> It's simpler because when maintaining that code in git.git it's less
>>> of
>>> a special case, e.g. we have coccinelle rules that match free(), now
>>> every such rule needs to account for the xdiff special-case.
>>> And it's less buggy because if you're relying on us always using a
>>> wrapper bugs will creep in, current master has this:
>>>
>>> $ git -P grep '\bfree\(' -- xdiff
>>> xdiff/xdiff.h:#define xdl_free(ptr) free(ptr)
>>> xdiff/xmerge.c: free(c);
>>> xdiff/xmerge.c: free(next_m);
>>>
>>>> Perhaps you could show how you think libgit2 could
>>>> easily make xdiff use git__malloc() instead of malloc() without
>>>> changing the malloc() calls (the message you linked to essentially
>>>> suggests they do a search and replace). If people cannot swap in
>>>> another allocator without changing the code then you are imposing a
>>>> barrier to them contributing patches back to git's xdiff.
>>> I was suggesting that if libgit2 wanted to maintain a copy of xdiff
>>> that
>>> catered to the asthetic desires of the maintainer adjusting whatever
>>> import script you use to apply a trivial coccinelle transformation would
>>> give you what you wanted. Except without a maintenance burden on
>>> git.git, or the bugs you'd get now since you're not catching those two
>>> free() cases (or any future such cases).
>>> But just having the code use malloc() and free() directly and
>>> getting
>>> you what you get now is easy, and doesn't require any such
>>> search-replacement.
>>> Here's how:
>>> I imported the xdiff/*.[ch] files at the tip of my branch into
>>> current
>>> libgit2.git's src/libgit2/xdiff/, and then applied this on top, which is
>>> partially re-applying libgit2's own monkeypatches, and partially
>>> adjusting for upstream changes (including this one):
>>>
>>> diff --git a/src/libgit2/xdiff/git-xdiff.h b/src/libgit2/xdiff/git-xdiff.h
>>> index b75dba819..2e00764d4 100644
>>> --- a/src/libgit2/xdiff/git-xdiff.h
>>> +++ b/src/libgit2/xdiff/git-xdiff.h
>>> @@ -14,6 +14,7 @@
>>> #ifndef INCLUDE_git_xdiff_h__
>>> #define INCLUDE_git_xdiff_h__
>>>
>>> +#include <regex.h>
>>> #include "regexp.h"
>>>
>>> /* Work around C90-conformance issues */
>>> @@ -27,11 +28,10 @@
>>> # endif
>>> #endif
>>>
>>> -#define xdl_malloc(x) git__malloc(x)
>>> -#define xdl_free(ptr) git__free(ptr)
>>> -#define xdl_realloc(ptr, x) git__realloc(ptr, x)
>>> +#define malloc(x) git__malloc(x)
>>> +#define free(ptr) git__free(ptr)
>>>
>>> -#define XDL_BUG(msg) GIT_ASSERT(msg)
>>> +#define BUG(msg) GIT_ASSERT(msg)
>>>
>>> #define xdl_regex_t git_regexp
>>> #define xdl_regmatch_t git_regmatch
>>> @@ -50,4 +50,17 @@ GIT_INLINE(int) xdl_regexec_buf(
>>> return -1;
>>> }
>>>
>>> +static inline size_t st_mult(size_t a, size_t b)
>>> +{
>>> + return a * b;
>>> +}
>>> +
>>> +static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
>>> + size_t nmatch, regmatch_t pmatch[], int eflags)
>>> +{
>>> + assert(nmatch > 0 && pmatch);
>>> + pmatch[0].rm_so = 0;
>>> + pmatch[0].rm_eo = size;
>>> + return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
>>> +}
>>> #endif
>>> diff --git a/src/libgit2/xdiff/xdiff.h b/src/libgit2/xdiff/xdiff.h
>>> index a37d89fcd..5e5b525c2 100644
>>> --- a/src/libgit2/xdiff/xdiff.h
>>> +++ b/src/libgit2/xdiff/xdiff.h
>>> @@ -23,6 +23,8 @@
>>> #if !defined(XDIFF_H)
>>> #define XDIFF_H
>>>
>>> +#include "git-xdiff.h"
>>> +
>>> #ifdef __cplusplus
>>> extern "C" {
>>> #endif /* #ifdef __cplusplus */
>>> diff --git a/src/libgit2/xdiff/xinclude.h b/src/libgit2/xdiff/xinclude.h
>>> index a4285ac0e..79812ad8a 100644
>>> --- a/src/libgit2/xdiff/xinclude.h
>>> +++ b/src/libgit2/xdiff/xinclude.h
>>> @@ -23,7 +23,8 @@
>>> #if !defined(XINCLUDE_H)
>>> #define XINCLUDE_H
>>>
>>> -#include "git-compat-util.h"
>>> +#include "git-xdiff.h"
>>> +#include "git-shared-util.h"
>>> #include "xmacros.h"
>>> #include "xdiff.h"
>>> #include "xtypes.h"
>>> If you then build it and run e.g.:
>>> gdb -args ./libgit2_tests -smerge::files
>>> You'll get stack traces like this if you break on stdalloc__malloc
>>> (which it uses for me in its default configuration):
>>>
>>> (gdb) bt
>>> #0 stdalloc__malloc (len=144, file=0x87478d "/home/avar/g/libgit2/src/libgit2/xdiff/xutils.c", line=101) at /home/avar/g/libgit2/src/util/allocators/stdalloc.c:14
>>> #1 0x00000000006ec15c in xdl_cha_alloc (cha=0x7fffffffcaa8) at /home/avar/g/libgit2/src/libgit2/xdiff/xutils.c:101
>>> #2 0x00000000006eaee9 in xdl_prepare_ctx (pass=1, mf=0x7fffffffcc98, narec=13, xpp=0x7fffffffcca8, cf=0x7fffffffc7d0, xdf=0x7fffffffcaa8)
>>> at /home/avar/g/libgit2/src/libgit2/xdiff/xprepare.c:194
>>> IOW this was all seamlessly routed to your git__malloc() without us
>>> having to maintain an xdl_{malloc,free}().
>>
>> Thanks for showing this, it is really helpful to see a concrete
>> example. I was especially interested to see how you went about the
>> conversion without redefining standard library functions or
>> introducing non-namespaced identifiers in files that included
>> xdiff.h. Unfortunately you have not done that and so I don't think the
>> approach above is viable for a well behaved library.
>
> Why? Because there's some header where doing e.g.:
>
> #include "git2/something.h"
>
> Will directly include git-xdiff.h by proxy into the user's program?
No because you're redefining malloc() and introducing ALLOC_GROW() etc
in
src/libgit2/{Blame_git.c,Diff_xdiff.c,Merge_file.c,Patch_generate.c,Checkout.c}
and
Test/libgit2/merge/files.c. It is not expected that including xdiff.h
will change a bunch of symbols in the file it is included in.
Best Wishes
Phillip
>
> I admit I didn't check that, and assumed that these were only included
> by other *.c files in libgit2 itself. I.e. it would compile xdiff for
> its own use, but then expose its own API.
>
> Anyway, if it is directly included in some user-exposed *.h file then
> yes, you don't want to overwrite their "malloc", but that's a small
> matter of doing an "#undef malloc" at the end (which as the below
> linked-to patch shows we'd support by having macros like
> SHA1DC_CUSTOM_TRAILING_INCLUDE_UBC_CHECK_C).
>
> Aside from what I'm proposing here doing such "undef at the end" seems
> like a good idea in any case, because if there is any macro leakage here
> you're e.g. re-defining "XDL_BUG" and other things that aren't clearly
> in the libgit2 namespace now, no?
>
>>> Now, I think that's obviously worth adjusting, e.g. the series I've got
>>> here could make this easier for libgit2 by including st_mult() at least,
>>> I'm not sure what you want to do about regexec_buf().
>>> For the monkeypatching you do now of creating a "git-xdiff.h" I
>>> think it
>>> would be very good to get a patch like what I managed to get
>>> sha1collisiondetection.git to accept for our own use-case, which allows
>>> us to use their upstream code as-is from a submodule:
>>> https://github.com/cr-marcstevens/sha1collisiondetection/commit/b45fcef
>>
>> Thanks for the link, That's a really good example where all the
>> identifiers are namespaced and the public include file does not
>> pollute the code that is including it. If you come up with something
>> like that where the client code can set up same #defines for malloc,
>> calloc, realloc and free that would be a good way forward.
>
> I don't need to come up with that, you've got the linker to do that.
>
> I.e. for almost any "normal" use of xdiff you'd simply compile it with
> its reference to malloc(), and then you either link that library that
> already links to libc into your program.
>
> So if you use a custom malloc the xdiff code might still use libc
> malloc, or you'd link the two together and link the resulting program
> with your own custom malloc.
>
> As explained in the previous & linked-to threads this is how almost
> everyone would include & use such a library, and indeed that's how git
> itself can use malloc() and free() in its sources, but have options to
> link to libc malloc, nedmalloc etc.
>
> But instead of using the linker to resolve "malloc" to git__malloc you'd
> like to effectively include the upstream *.[ch] files directly, and
> pretend as though the upstream source used git__malloc() or whatever to
> begin with.
>
> I don't really understand why you can't just do that at the linker
> level, especially as doing so guards you better against namespace
> pollution. But whatever, the suggestion(s) above assume you've got a
> good reason, but show that we don't need to prefix every standard
> function just to accommodate that.
>
> Instead we can just provide a knob to include a header of yours after
> all others have been included (which the libgit2 monkeypatches to xdiff
> already include), and have that header define "malloc" as "git__malloc",
> "BUG" as "GIT_ASSERT" etc.
>
> And yes, if you're in turn providing an interface where others will then
> include your header that's doing that you'll need to apply some
> namespacing paranoia, namely to "#undef malloc" etc.
>
> But none of that requires git to carry prefixed versions of standard
> functions you'd wish to replace, which the patches here show.
>
>> I do not think we should require projects to be defining there own
>> versions of [C]ALLOC_*() and BUG() just to use xdiff.
>
> No, I don't think so either. I.e. the idea with these patches is that we
> could bundle up xdiff/* and git-shared-util.h into one distributed
> libgit, which down the road we could even roll independent releases for
> (as upstream seems to have forever gone away).
>
> Whereas the proposals coming out of libgit2[1] for generalizing xdiff/
> for general use seem to stop just short of the specific hacks we need to
> get it running for libgit2, but e.g. leave "xdl_malloc" defined as
> "xmalloc".
>
> That isn't a standard library function, and therefore the first thing
> you'd need to do when using it as a library is to find out how git.git
> uses that, and copy/paste it or define your own replacement.
>
> Whereas I think it should be the other way around, we should e.g. ship a
> shimmy BUG() that calls fprintf(stderr, ...) and abort(), but for
> advanced users such as libgit2 guard those with "ifndef" or whatever, so
> you can have it call GIT_ASSERT() and the like instead.
>
> 1. https://lore.kernel.org/git/20220209013354.GB7@abe733c6e288/
next prev parent reply other threads:[~2022-07-13 13:00 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-29 15:25 [PATCH 0/3] xdiff: introduce memory allocation macros Phillip Wood via GitGitGadget
2022-06-29 15:25 ` [PATCH 1/3] xdiff: introduce XDL_ALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-06-29 15:25 ` [PATCH 2/3] xdiff: introduce XDL_CALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-06-30 18:17 ` Junio C Hamano
2022-07-06 13:17 ` Phillip Wood
2022-06-29 15:25 ` [PATCH 3/3] xdiff: introduce XDL_ALLOC_GROW() Phillip Wood via GitGitGadget
2022-06-30 10:54 ` Ævar Arnfjörð Bjarmason
2022-06-30 12:03 ` Phillip Wood
2022-06-30 12:38 ` Phillip Wood
2022-06-30 13:25 ` Ævar Arnfjörð Bjarmason
2022-07-06 13:23 ` Phillip Wood
2022-07-07 11:17 ` Ævar Arnfjörð Bjarmason
2022-07-08 9:35 ` Phillip Wood
2022-07-08 14:20 ` [PATCH 0/7] xdiff: use standard alloc macros, share them via git-shared-util.h Ævar Arnfjörð Bjarmason
2022-07-08 14:20 ` [PATCH 1/7] xdiff: simplify freeing patterns around xdl_free_env() Ævar Arnfjörð Bjarmason
2022-07-08 14:20 ` [PATCH 2/7] git-shared-util.h: move "shared" allocation utilities here Ævar Arnfjörð Bjarmason
2022-07-08 14:20 ` [PATCH 3/7] git-shared-util.h: add G*() versions of *ALLOC_*() Ævar Arnfjörð Bjarmason
2022-07-11 10:06 ` Phillip Wood
2022-07-08 14:20 ` [PATCH 4/7] xdiff: use G[C]ALLOC_ARRAY(), not XDL_CALLOC_ARRAY() Ævar Arnfjörð Bjarmason
2022-07-11 10:10 ` Phillip Wood
2022-07-08 14:20 ` [PATCH 5/7] xdiff: use GALLOC_GROW(), not XDL_ALLOC_GROW() Ævar Arnfjörð Bjarmason
2022-07-11 10:13 ` Phillip Wood
2022-07-11 10:48 ` Ævar Arnfjörð Bjarmason
2022-07-13 9:09 ` Phillip Wood
2022-07-13 10:48 ` Ævar Arnfjörð Bjarmason
2022-07-13 13:21 ` Phillip Wood
2022-07-08 14:20 ` [PATCH 6/7] xdiff: remove xdl_malloc() wrapper, use malloc(), not xmalloc() Ævar Arnfjörð Bjarmason
2022-07-08 17:42 ` Phillip Wood
2022-07-08 21:44 ` Ævar Arnfjörð Bjarmason
2022-07-08 19:35 ` Jeff King
2022-07-08 21:47 ` Ævar Arnfjörð Bjarmason
2022-07-11 9:33 ` Jeff King
2022-07-08 14:20 ` [PATCH 7/7] xdiff: remove xdl_free(), use free() instead Ævar Arnfjörð Bjarmason
2022-07-08 17:51 ` Phillip Wood
2022-07-08 21:26 ` Ævar Arnfjörð Bjarmason
2022-07-11 9:26 ` Phillip Wood
2022-07-11 9:54 ` Phillip Wood
2022-07-11 10:02 ` Ævar Arnfjörð Bjarmason
2022-07-13 13:00 ` Phillip Wood [this message]
2022-07-13 13:18 ` Ævar Arnfjörð Bjarmason
2022-06-30 18:32 ` [PATCH 3/3] xdiff: introduce XDL_ALLOC_GROW() Junio C Hamano
2022-07-06 13:14 ` Phillip Wood
2022-06-30 10:46 ` [PATCH 0/3] xdiff: introduce memory allocation macros Ævar Arnfjörð Bjarmason
2022-07-08 16:25 ` [PATCH v2 0/4] " Phillip Wood via GitGitGadget
2022-07-08 16:25 ` [PATCH v2 1/4] xdiff: introduce XDL_ALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-07-08 16:25 ` [PATCH v2 2/4] xdiff: introduce xdl_calloc Phillip Wood via GitGitGadget
2022-07-08 16:25 ` [PATCH v2 3/4] xdiff: introduce XDL_CALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-07-08 16:25 ` [PATCH v2 4/4] xdiff: introduce XDL_ALLOC_GROW() Phillip Wood via GitGitGadget
2022-07-08 22:17 ` Ævar Arnfjörð Bjarmason
2022-07-11 10:00 ` Phillip Wood
2022-07-12 7:19 ` Jeff King
2022-07-13 9:38 ` Phillip Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bbfa1340-9cab-35d7-2245-f6f8369d5cd4@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).