git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH 7/7] xdiff: remove xdl_free(), use free() instead
Date: Wed, 13 Jul 2022 14:00:07 +0100	[thread overview]
Message-ID: <bbfa1340-9cab-35d7-2245-f6f8369d5cd4@gmail.com> (raw)
In-Reply-To: <220711.865yk47x54.gmgdl@evledraar.gmail.com>

On 11/07/2022 11:02, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jul 11 2022, Phillip Wood wrote:
> 
>> Hi Ævar
>>
>> On 08/07/2022 22:26, Ævar Arnfjörð Bjarmason wrote:
>>> On Fri, Jul 08 2022, Phillip Wood wrote:
>>>
>>>> Hi Ævar
>>>>
>>>> On 08/07/2022 15:20, Ævar Arnfjörð Bjarmason wrote:
>>>>> Remove the xdl_free() wrapper macro in favor of using free()
>>>>> directly. The wrapper macro was brought in with the initial import of
>>>>> xdiff/ in 3443546f6ef (Use a *real* built-in diff generator,
>>>>> 2006-03-24).
>>>>> As subsequent discussions on the topic[1] have made clear there's no
>>>>> reason to use this wrapper.
>>>>
>>>> That link is to a message where you assert that there is no reason for
>>>> the wrapper, you seem to be the only person in that thread making that
>>>> argument. The libgit2 maintainer and others are arguing that it is
>>>> worth keeping to make it easy to swap the allocator.
>>> Arguing that it's not needed for a technical reason, but an
>>> "aesthetic
>>> and ergonomic one", per:
>>> https://lore.kernel.org/git/20220217225804.GC7@edef91d97c94/;
>>> Perhaps I misread it, but I took this as Junio concurring with what
>>> I
>>> was pointing out there:
>>> https://lore.kernel.org/git/xmqqfsohbdre.fsf@gitster.g/
>>>
>>>>> Both git itself as well as any external
>>>>> users such as libgit2 compile the xdiff/* code as part of their own
>>>>> compilation, and can thus find the right malloc() and free() at
>>>>> link-time.
>>>>
>>>> I'm struggling to see how that is simpler than the current situation
>>>> with xdl_malloc().
>>> It's simpler because when maintaining that code in git.git it's less
>>> of
>>> a special case, e.g. we have coccinelle rules that match free(), now
>>> every such rule needs to account for the xdiff special-case.
>>> And it's less buggy because if you're relying on us always using a
>>> wrapper bugs will creep in, current master has this:
>>> 	
>>> 	$ git -P grep '\bfree\(' -- xdiff
>>> 	xdiff/xdiff.h:#define xdl_free(ptr) free(ptr)
>>> 	xdiff/xmerge.c:         free(c);
>>> 	xdiff/xmerge.c: free(next_m);
>>>
>>>> Perhaps you could show how you think libgit2 could
>>>> easily make xdiff use git__malloc() instead of malloc() without
>>>> changing the malloc() calls (the message you linked to essentially
>>>> suggests they do a search and replace). If people cannot swap in
>>>> another allocator without changing the code then you are imposing a
>>>> barrier to them contributing patches back to git's xdiff.
>>> I was suggesting that if libgit2 wanted to maintain a copy of xdiff
>>> that
>>> catered to the asthetic desires of the maintainer adjusting whatever
>>> import script you use to apply a trivial coccinelle transformation would
>>> give you what you wanted. Except without a maintenance burden on
>>> git.git, or the bugs you'd get now since you're not catching those two
>>> free() cases (or any future such cases).
>>> But just having the code use malloc() and free() directly and
>>> getting
>>> you what you get now is easy, and doesn't require any such
>>> search-replacement.
>>> Here's how:
>>> I imported the xdiff/*.[ch] files at the tip of my branch into
>>> current
>>> libgit2.git's src/libgit2/xdiff/, and then applied this on top, which is
>>> partially re-applying libgit2's own monkeypatches, and partially
>>> adjusting for upstream changes (including this one):
>>> 	
>>> 	diff --git a/src/libgit2/xdiff/git-xdiff.h b/src/libgit2/xdiff/git-xdiff.h
>>> 	index b75dba819..2e00764d4 100644
>>> 	--- a/src/libgit2/xdiff/git-xdiff.h
>>> 	+++ b/src/libgit2/xdiff/git-xdiff.h
>>> 	@@ -14,6 +14,7 @@
>>> 	 #ifndef INCLUDE_git_xdiff_h__
>>> 	 #define INCLUDE_git_xdiff_h__
>>> 	
>>> 	+#include <regex.h>
>>> 	 #include "regexp.h"
>>> 	
>>> 	 /* Work around C90-conformance issues */
>>> 	@@ -27,11 +28,10 @@
>>> 	 # endif
>>> 	 #endif
>>> 	
>>> 	-#define xdl_malloc(x) git__malloc(x)
>>> 	-#define xdl_free(ptr) git__free(ptr)
>>> 	-#define xdl_realloc(ptr, x) git__realloc(ptr, x)
>>> 	+#define malloc(x) git__malloc(x)
>>> 	+#define free(ptr) git__free(ptr)
>>> 	
>>> 	-#define XDL_BUG(msg) GIT_ASSERT(msg)
>>> 	+#define BUG(msg) GIT_ASSERT(msg)
>>> 	
>>> 	 #define xdl_regex_t git_regexp
>>> 	 #define xdl_regmatch_t git_regmatch
>>> 	@@ -50,4 +50,17 @@ GIT_INLINE(int) xdl_regexec_buf(
>>> 	 	return -1;
>>> 	 }
>>> 	
>>> 	+static inline size_t st_mult(size_t a, size_t b)
>>> 	+{
>>> 	+	return a * b;
>>> 	+}
>>> 	+
>>> 	+static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
>>> 	+			      size_t nmatch, regmatch_t pmatch[], int eflags)
>>> 	+{
>>> 	+	assert(nmatch > 0 && pmatch);
>>> 	+	pmatch[0].rm_so = 0;
>>> 	+	pmatch[0].rm_eo = size;
>>> 	+	return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
>>> 	+}
>>> 	 #endif
>>> 	diff --git a/src/libgit2/xdiff/xdiff.h b/src/libgit2/xdiff/xdiff.h
>>> 	index a37d89fcd..5e5b525c2 100644
>>> 	--- a/src/libgit2/xdiff/xdiff.h
>>> 	+++ b/src/libgit2/xdiff/xdiff.h
>>> 	@@ -23,6 +23,8 @@
>>> 	 #if !defined(XDIFF_H)
>>> 	 #define XDIFF_H
>>> 	
>>> 	+#include "git-xdiff.h"
>>> 	+
>>> 	 #ifdef __cplusplus
>>> 	 extern "C" {
>>> 	 #endif /* #ifdef __cplusplus */
>>> 	diff --git a/src/libgit2/xdiff/xinclude.h b/src/libgit2/xdiff/xinclude.h
>>> 	index a4285ac0e..79812ad8a 100644
>>> 	--- a/src/libgit2/xdiff/xinclude.h
>>> 	+++ b/src/libgit2/xdiff/xinclude.h
>>> 	@@ -23,7 +23,8 @@
>>> 	 #if !defined(XINCLUDE_H)
>>> 	 #define XINCLUDE_H
>>> 	
>>> 	-#include "git-compat-util.h"
>>> 	+#include "git-xdiff.h"
>>> 	+#include "git-shared-util.h"
>>> 	 #include "xmacros.h"
>>> 	 #include "xdiff.h"
>>> 	 #include "xtypes.h"
>>> If you then build it and run e.g.:
>>> 	gdb -args ./libgit2_tests -smerge::files
>>> You'll get stack traces like this if you break on stdalloc__malloc
>>> (which it uses for me in its default configuration):
>>> 	
>>> 	(gdb) bt
>>> 	#0  stdalloc__malloc (len=144, file=0x87478d "/home/avar/g/libgit2/src/libgit2/xdiff/xutils.c", line=101) at /home/avar/g/libgit2/src/util/allocators/stdalloc.c:14
>>> 	#1  0x00000000006ec15c in xdl_cha_alloc (cha=0x7fffffffcaa8) at /home/avar/g/libgit2/src/libgit2/xdiff/xutils.c:101
>>> 	#2  0x00000000006eaee9 in xdl_prepare_ctx (pass=1, mf=0x7fffffffcc98, narec=13, xpp=0x7fffffffcca8, cf=0x7fffffffc7d0, xdf=0x7fffffffcaa8)
>>> 	    at /home/avar/g/libgit2/src/libgit2/xdiff/xprepare.c:194
>>> IOW this was all seamlessly routed to your git__malloc() without us
>>> having to maintain an xdl_{malloc,free}().
>>
>> Thanks for showing this, it is really helpful to see a concrete
>> example. I was especially interested to see how you went about the
>> conversion without redefining standard library functions or
>> introducing non-namespaced identifiers in files that included
>> xdiff.h. Unfortunately you have not done that and so I don't think the
>> approach above is viable   for a well behaved library.
> 
> Why? Because there's some header where doing e.g.:
> 
> 	#include "git2/something.h"
> 
> Will directly include git-xdiff.h by proxy into the user's program?

No because you're redefining malloc() and introducing ALLOC_GROW() etc 
in 
src/libgit2/{Blame_git.c,Diff_xdiff.c,Merge_file.c,Patch_generate.c,Checkout.c} 
and
Test/libgit2/merge/files.c. It is not expected that including xdiff.h 
will change a bunch of symbols in the file it is included in.

Best Wishes

Phillip

> 
> I admit I didn't check that, and assumed that these were only included
> by other *.c files in libgit2 itself. I.e. it would compile xdiff for
> its own use, but then expose its own API.
> 
> Anyway, if it is directly included in some user-exposed *.h file then
> yes, you don't want to overwrite their "malloc", but that's a small
> matter of doing an "#undef malloc" at the end (which as the below
> linked-to patch shows we'd support by having macros like
> SHA1DC_CUSTOM_TRAILING_INCLUDE_UBC_CHECK_C).
> 
> Aside from what I'm proposing here doing such "undef at the end" seems
> like a good idea in any case, because if there is any macro leakage here
> you're e.g. re-defining "XDL_BUG" and other things that aren't clearly
> in the libgit2 namespace now, no?
> 
>>> Now, I think that's obviously worth adjusting, e.g. the series I've got
>>> here could make this easier for libgit2 by including st_mult() at least,
>>> I'm not sure what you want to do about regexec_buf().
>>> For the monkeypatching you do now of creating a "git-xdiff.h" I
>>> think it
>>> would be very good to get a patch like what I managed to get
>>> sha1collisiondetection.git to accept for our own use-case, which allows
>>> us to use their upstream code as-is from a submodule:
>>> 	https://github.com/cr-marcstevens/sha1collisiondetection/commit/b45fcef
>>
>> Thanks for the link, That's a really good example where all the
>> identifiers are namespaced and the public include file does not
>> pollute the code that is including it. If you come up with something
>> like that where the client code can set up same #defines for malloc,
>> calloc, realloc and free that would be a good way forward.
> 
> I don't need to come up with that, you've got the linker to do that.
> 
> I.e. for almost any "normal" use of xdiff you'd simply compile it with
> its reference to malloc(), and then you either link that library that
> already links to libc into your program.
> 
> So if you use a custom malloc the xdiff code might still use libc
> malloc, or you'd link the two together and link the resulting program
> with your own custom malloc.
> 
> As explained in the previous & linked-to threads this is how almost
> everyone would include & use such a library, and indeed that's how git
> itself can use malloc() and free() in its sources, but have options to
> link to libc malloc, nedmalloc etc.
> 
> But instead of using the linker to resolve "malloc" to git__malloc you'd
> like to effectively include the upstream *.[ch] files directly, and
> pretend as though the upstream source used git__malloc() or whatever to
> begin with.
> 
> I don't really understand why you can't just do that at the linker
> level, especially as doing so guards you better against namespace
> pollution. But whatever, the suggestion(s) above assume you've got a
> good reason, but show that we don't need to prefix every standard
> function just to accommodate that.
> 
> Instead we can just provide a knob to include a header of yours after
> all others have been included (which the libgit2 monkeypatches to xdiff
> already include), and have that header define "malloc" as "git__malloc",
> "BUG" as "GIT_ASSERT" etc.
> 
> And yes, if you're in turn providing an interface where others will then
> include your header that's doing that you'll need to apply some
> namespacing paranoia, namely to "#undef malloc" etc.
> 
> But none of that requires git to carry prefixed versions of standard
> functions you'd wish to replace, which the patches here show.
> 
>> I do not think we should require projects to be defining there own
>> versions of [C]ALLOC_*() and BUG() just to use xdiff.
> 
> No, I don't think so either. I.e. the idea with these patches is that we
> could bundle up xdiff/* and git-shared-util.h into one distributed
> libgit, which down the road we could even roll independent releases for
> (as upstream seems to have forever gone away).
> 
> Whereas the proposals coming out of libgit2[1] for generalizing xdiff/
> for general use seem to stop just short of the specific hacks we need to
> get it running for libgit2, but e.g. leave "xdl_malloc" defined as
> "xmalloc".
> 
> That isn't a standard library function, and therefore the first thing
> you'd need to do when using it as a library is to find out how git.git
> uses that, and copy/paste it or define your own replacement.
> 
> Whereas I think it should be the other way around, we should e.g. ship a
> shimmy BUG() that calls fprintf(stderr, ...) and abort(), but for
> advanced users such as libgit2 guard those with "ifndef" or whatever, so
> you can have it call GIT_ASSERT() and the like instead.
> 
> 1. https://lore.kernel.org/git/20220209013354.GB7@abe733c6e288/


  reply	other threads:[~2022-07-13 13:00 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 15:25 [PATCH 0/3] xdiff: introduce memory allocation macros Phillip Wood via GitGitGadget
2022-06-29 15:25 ` [PATCH 1/3] xdiff: introduce XDL_ALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-06-29 15:25 ` [PATCH 2/3] xdiff: introduce XDL_CALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-06-30 18:17   ` Junio C Hamano
2022-07-06 13:17     ` Phillip Wood
2022-06-29 15:25 ` [PATCH 3/3] xdiff: introduce XDL_ALLOC_GROW() Phillip Wood via GitGitGadget
2022-06-30 10:54   ` Ævar Arnfjörð Bjarmason
2022-06-30 12:03     ` Phillip Wood
2022-06-30 12:38       ` Phillip Wood
2022-06-30 13:25         ` Ævar Arnfjörð Bjarmason
2022-07-06 13:23           ` Phillip Wood
2022-07-07 11:17             ` Ævar Arnfjörð Bjarmason
2022-07-08  9:35               ` Phillip Wood
2022-07-08 14:20                 ` [PATCH 0/7] xdiff: use standard alloc macros, share them via git-shared-util.h Ævar Arnfjörð Bjarmason
2022-07-08 14:20                   ` [PATCH 1/7] xdiff: simplify freeing patterns around xdl_free_env() Ævar Arnfjörð Bjarmason
2022-07-08 14:20                   ` [PATCH 2/7] git-shared-util.h: move "shared" allocation utilities here Ævar Arnfjörð Bjarmason
2022-07-08 14:20                   ` [PATCH 3/7] git-shared-util.h: add G*() versions of *ALLOC_*() Ævar Arnfjörð Bjarmason
2022-07-11 10:06                     ` Phillip Wood
2022-07-08 14:20                   ` [PATCH 4/7] xdiff: use G[C]ALLOC_ARRAY(), not XDL_CALLOC_ARRAY() Ævar Arnfjörð Bjarmason
2022-07-11 10:10                     ` Phillip Wood
2022-07-08 14:20                   ` [PATCH 5/7] xdiff: use GALLOC_GROW(), not XDL_ALLOC_GROW() Ævar Arnfjörð Bjarmason
2022-07-11 10:13                     ` Phillip Wood
2022-07-11 10:48                       ` Ævar Arnfjörð Bjarmason
2022-07-13  9:09                         ` Phillip Wood
2022-07-13 10:48                           ` Ævar Arnfjörð Bjarmason
2022-07-13 13:21                             ` Phillip Wood
2022-07-08 14:20                   ` [PATCH 6/7] xdiff: remove xdl_malloc() wrapper, use malloc(), not xmalloc() Ævar Arnfjörð Bjarmason
2022-07-08 17:42                     ` Phillip Wood
2022-07-08 21:44                       ` Ævar Arnfjörð Bjarmason
2022-07-08 19:35                     ` Jeff King
2022-07-08 21:47                       ` Ævar Arnfjörð Bjarmason
2022-07-11  9:33                         ` Jeff King
2022-07-08 14:20                   ` [PATCH 7/7] xdiff: remove xdl_free(), use free() instead Ævar Arnfjörð Bjarmason
2022-07-08 17:51                     ` Phillip Wood
2022-07-08 21:26                       ` Ævar Arnfjörð Bjarmason
2022-07-11  9:26                         ` Phillip Wood
2022-07-11  9:54                           ` Phillip Wood
2022-07-11 10:02                           ` Ævar Arnfjörð Bjarmason
2022-07-13 13:00                             ` Phillip Wood [this message]
2022-07-13 13:18                               ` Ævar Arnfjörð Bjarmason
2022-06-30 18:32   ` [PATCH 3/3] xdiff: introduce XDL_ALLOC_GROW() Junio C Hamano
2022-07-06 13:14     ` Phillip Wood
2022-06-30 10:46 ` [PATCH 0/3] xdiff: introduce memory allocation macros Ævar Arnfjörð Bjarmason
2022-07-08 16:25 ` [PATCH v2 0/4] " Phillip Wood via GitGitGadget
2022-07-08 16:25   ` [PATCH v2 1/4] xdiff: introduce XDL_ALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-07-08 16:25   ` [PATCH v2 2/4] xdiff: introduce xdl_calloc Phillip Wood via GitGitGadget
2022-07-08 16:25   ` [PATCH v2 3/4] xdiff: introduce XDL_CALLOC_ARRAY() Phillip Wood via GitGitGadget
2022-07-08 16:25   ` [PATCH v2 4/4] xdiff: introduce XDL_ALLOC_GROW() Phillip Wood via GitGitGadget
2022-07-08 22:17     ` Ævar Arnfjörð Bjarmason
2022-07-11 10:00       ` Phillip Wood
2022-07-12  7:19         ` Jeff King
2022-07-13  9:38           ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbfa1340-9cab-35d7-2245-f6f8369d5cd4@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).