git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] reftable: use xmalloc() and xrealloc()
@ 2024-04-06 20:37 René Scharfe
  2024-04-08  5:44 ` Patrick Steinhardt
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2024-04-06 20:37 UTC (permalink / raw)
  To: Git List; +Cc: Han-Wen Nienhuys, Patrick Steinhardt

malloc(3) and realloc(3) can fail and return NULL.  None of the reftable
code checks for that possibility and would happily dereference NULL
pointers.  Use xmalloc() and xrealloc() instead like in the rest of Git
to report allocation errors and exit cleanly, and to also honor the
environment variable GIT_ALLOC_LIMIT.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 reftable/publicbasics.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/reftable/publicbasics.c b/reftable/publicbasics.c
index 44b84a125e..f33a65df34 100644
--- a/reftable/publicbasics.c
+++ b/reftable/publicbasics.c
@@ -19,14 +19,14 @@ void *reftable_malloc(size_t sz)
 {
 	if (reftable_malloc_ptr)
 		return (*reftable_malloc_ptr)(sz);
-	return malloc(sz);
+	return xmalloc(sz);
 }

 void *reftable_realloc(void *p, size_t sz)
 {
 	if (reftable_realloc_ptr)
 		return (*reftable_realloc_ptr)(p, sz);
-	return realloc(p, sz);
+	return xrealloc(p, sz);
 }

 void reftable_free(void *p)
--
2.44.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] reftable: use xmalloc() and xrealloc()
  2024-04-06 20:37 [PATCH] reftable: use xmalloc() and xrealloc() René Scharfe
@ 2024-04-08  5:44 ` Patrick Steinhardt
  2024-04-08 15:42   ` Junio C Hamano
  2024-04-08 17:50   ` Han-Wen Nienhuys
  0 siblings, 2 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2024-04-08  5:44 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 2088 bytes --]

On Sat, Apr 06, 2024 at 10:37:55PM +0200, René Scharfe wrote:
> malloc(3) and realloc(3) can fail and return NULL.  None of the reftable
> code checks for that possibility and would happily dereference NULL
> pointers.  Use xmalloc() and xrealloc() instead like in the rest of Git
> to report allocation errors and exit cleanly, and to also honor the
> environment variable GIT_ALLOC_LIMIT.
> 
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  reftable/publicbasics.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/reftable/publicbasics.c b/reftable/publicbasics.c
> index 44b84a125e..f33a65df34 100644
> --- a/reftable/publicbasics.c
> +++ b/reftable/publicbasics.c
> @@ -19,14 +19,14 @@ void *reftable_malloc(size_t sz)
>  {
>  	if (reftable_malloc_ptr)
>  		return (*reftable_malloc_ptr)(sz);
> -	return malloc(sz);
> +	return xmalloc(sz);
>  }
> 
>  void *reftable_realloc(void *p, size_t sz)
>  {
>  	if (reftable_realloc_ptr)
>  		return (*reftable_realloc_ptr)(p, sz);
> -	return realloc(p, sz);
> +	return xrealloc(p, sz);
>  }
> 
>  void reftable_free(void *p)

These are part of the library interfaces that should ideally not be tied
to the Git code base at all so that they can theoretically be reused by
another project like libgit2. So I think that instead of rewriting the
generic fallbacks we should call `reftable_set_alloc()` somewhen early
in Git's startup code.

It does raise the question what to do about the generic fallbacks. We
could start calling `abort()` when we observe allocation failures. It's
not exactly nice behaviour in a library though, where the caller may in
fact want to handle this case. But it may at least be better than
failing on a `NULL` pointer exception somewhere down the road. So it
might be the best alternative for now. We could then conver the reftable
library over time to handle allocation failures and, once that's done,
we can eventually drop such a call to `abort()`.

Cc'ing Han-Wen's new mail address as he no longer works at Google.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] reftable: use xmalloc() and xrealloc()
  2024-04-08  5:44 ` Patrick Steinhardt
@ 2024-04-08 15:42   ` Junio C Hamano
  2024-04-08 16:33     ` Patrick Steinhardt
  2024-04-08 17:50   ` Han-Wen Nienhuys
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2024-04-08 15:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: René Scharfe, Git List, Han-Wen Nienhuys

Patrick Steinhardt <ps@pks.im> writes:

> These are part of the library interfaces that should ideally not be tied
> to the Git code base at all so that they can theoretically be reused by
> another project like libgit2. So I think that instead of rewriting the
> generic fallbacks we should call `reftable_set_alloc()` somewhen early
> in Git's startup code.

It sounds like a sensible approach to me on the surface.

The reftable_subsystem_init() function, which would be the interface
into "reftable library" from Git side, can call such customization
functions supplied by the library.

> It does raise the question what to do about the generic fallbacks.

Generic fallbacks would be a plain vanilla malloc(), realloc(), and
friends, right?

> We could start calling `abort()` when we observe allocation
> failures. It's not exactly nice behaviour in a library though,
> where the caller may in fact want to handle this case.

But they would not be able to "handle" it, unless their substitute
alloc() function magically finds more memory and never runs out.  If
you want to allow them to "handle" the situation, the library itself
needs be prepared to see NULL returned from the allocator, and fail
the operation it was doing, and return an error.  If the caller asks
reftable_write_foo(), which may need to allocate some memory to
finish its work, it would see NULL and say "sorry, I cannot
continue", and return an error to its caller, I would imagine.

I think there are two levels of "handling" allocation and its
failure.  Substituting allocation functions would be a way to solve
only one of them (i.e. somehow allow the library client to specify a
way to supply you an unbounded amount of memory).  As long as the
library is not willing to check allocation failures and propagate
the error to the caller, you would have to "abort" the operation no
matter what before returning the control back to your client, and at
that point you would start wanting to make it customizable how to
"abort".  Their custom "abort" function might do longjmp() to try to
"recover", or simply call die() in our case where Git is the library
client, I guess.  So reftable_set_alloc() and reftable_set_abort() may
need to be there.  If you make it mandatory to call them, you can punt
and make it the responsibility of the library clients to worry about
error handling, I guess?

Thanks.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] reftable: use xmalloc() and xrealloc()
  2024-04-08 15:42   ` Junio C Hamano
@ 2024-04-08 16:33     ` Patrick Steinhardt
  0 siblings, 0 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 16:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git List, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 3542 bytes --]

On Mon, Apr 08, 2024 at 08:42:19AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > These are part of the library interfaces that should ideally not be tied
> > to the Git code base at all so that they can theoretically be reused by
> > another project like libgit2. So I think that instead of rewriting the
> > generic fallbacks we should call `reftable_set_alloc()` somewhen early
> > in Git's startup code.
> 
> It sounds like a sensible approach to me on the surface.
> 
> The reftable_subsystem_init() function, which would be the interface
> into "reftable library" from Git side, can call such customization
> functions supplied by the library.
> 
> > It does raise the question what to do about the generic fallbacks.
> 
> Generic fallbacks would be a plain vanilla malloc(), realloc(), and
> friends, right?

Yeah.

> > We could start calling `abort()` when we observe allocation
> > failures. It's not exactly nice behaviour in a library though,
> > where the caller may in fact want to handle this case.
> 
> But they would not be able to "handle" it, unless their substitute
> alloc() function magically finds more memory and never runs out.  If
> you want to allow them to "handle" the situation, the library itself
> needs be prepared to see NULL returned from the allocator, and fail
> the operation it was doing, and return an error.  If the caller asks
> reftable_write_foo(), which may need to allocate some memory to
> finish its work, it would see NULL and say "sorry, I cannot
> continue", and return an error to its caller, I would imagine.
> 
> I think there are two levels of "handling" allocation and its
> failure.  Substituting allocation functions would be a way to solve
> only one of them (i.e. somehow allow the library client to specify a
> way to supply you an unbounded amount of memory).  As long as the
> library is not willing to check allocation failures and propagate
> the error to the caller, you would have to "abort" the operation no
> matter what before returning the control back to your client, and at
> that point you would start wanting to make it customizable how to
> "abort".

I actually think that the reftable library _should_ be willing to check
for allocation failures and return proper error codes to the caller.
That would be quite an undertaking, but there is no need to do it all in
a single go. We can refactor the code over time to start handling such
failures.

> Their custom "abort" function might do longjmp() to try to "recover",
> or simply call die() in our case where Git is the library client, I
> guess.  So reftable_set_alloc() and reftable_set_abort() may need to
> be there.  If you make it mandatory to call them, you can punt and
> make it the responsibility of the library clients to worry about error
> handling, I guess?

That would be a possibility indeed. A custom "failure" function may try
to e.g. release caches such that the allocation can be retried. And if
everything fails then in theory, the caller could do a longjmp(3P).

In practice this could cause all kinds of problems though. Imagine for
example that we have acquired a lockfile and then subsequently an
allocation fails. If the application were to longjmp(3P) then all the
cleanup code would not be invoked at all, thus leaving behind a stale
lockfile.

Overall I think that handling allocation failures is the more flexible
approach in the long run, even though it requires more work.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] reftable: use xmalloc() and xrealloc()
  2024-04-08  5:44 ` Patrick Steinhardt
  2024-04-08 15:42   ` Junio C Hamano
@ 2024-04-08 17:50   ` Han-Wen Nienhuys
  2024-04-08 20:36     ` Junio C Hamano
  2024-04-09  3:24     ` Patrick Steinhardt
  1 sibling, 2 replies; 7+ messages in thread
From: Han-Wen Nienhuys @ 2024-04-08 17:50 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: René Scharfe, Git List, nasamuffin

Op ma 8 apr 2024 07:44 schreef Patrick Steinhardt <ps@pks.im>:
>
> It does raise the question what to do about the generic fallbacks. We
> could start calling `abort()` when we observe allocation failures. It's
> not exactly nice behaviour in a library though, where the caller may in
> fact want to handle this case. But it may at least be better than
> failing on a `NULL` pointer exception somewhere down the road. So it
> might be the best alternative for now. We could then conver the reftable
> library over time to handle allocation failures and, once that's done,
> we can eventually drop such a call to `abort()`.


I must admit that I didn't think this part through very much; I
believe someone told me that libgit2 has pluggable memory allocation
routines, so I tried to make the malloc pluggable here too. Handling
OOM better for the malloc calls themselves doesn't seem too difficult,

  hanwen@fedora:~/vc/git/reftable$ grep [cme]alloc *c | wc
     57     276    3469

However, it is probably pointless as long as strbuf_* functions do not
signal OOM gracefully. There was some talk of libifying strbuf. Did
that work include returning OOM error codes in case malloc returns
null? A quick look at strbuf.h suggests not.

I would just call xmalloc as default, rather than calling
reftable_set_alloc, because it might be tricky to ensure it is called
early enough.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] reftable: use xmalloc() and xrealloc()
  2024-04-08 17:50   ` Han-Wen Nienhuys
@ 2024-04-08 20:36     ` Junio C Hamano
  2024-04-09  3:24     ` Patrick Steinhardt
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2024-04-08 20:36 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Patrick Steinhardt, René Scharfe, Git List, nasamuffin

Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> However, it is probably pointless as long as strbuf_* functions do not
> signal OOM gracefully. There was some talk of libifying strbuf. Did
> that work include returning OOM error codes in case malloc returns
> null? A quick look at strbuf.h suggests not.

I would expect not.

The "libified" strbuf (aka "strbuf API in the Git std lib") will
have to be different from what we internally use from <strbuf.h>.
<gitstdlib/strbuf.h> will export gitstdlib_strbuf_addstr(), which is
"properly" libified and signals an allocation failure to its caller.

When that happens, I would expect that strbuf_addstr() would be a
thin wrapper around gitstdlib_strbuf_addstr(), and still just dies
with "we ran out of memory", i.e.

	/* strbuf.h */
	#include <strbuf.h>
	#include <gitstdlib/strbuf.h>

	void strbuf_addstr(struct strbuf *sb, const char *s)
	{
		int err = gitstdlib_strbuf_addstr(sb, s);
                if (!err)
			return; /* happy */
		switch (err) {
		case GITLIB_OOM: /* there may be others */
			die("Out of memory");
			...
		}
	}

which would keep the damage to Git codebase to the minimum when we
become the first client of the "Git std lib".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] reftable: use xmalloc() and xrealloc()
  2024-04-08 17:50   ` Han-Wen Nienhuys
  2024-04-08 20:36     ` Junio C Hamano
@ 2024-04-09  3:24     ` Patrick Steinhardt
  1 sibling, 0 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2024-04-09  3:24 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: René Scharfe, Git List, nasamuffin

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

On Mon, Apr 08, 2024 at 07:50:47PM +0200, Han-Wen Nienhuys wrote:
> Op ma 8 apr 2024 07:44 schreef Patrick Steinhardt <ps@pks.im>:
> >
> > It does raise the question what to do about the generic fallbacks. We
> > could start calling `abort()` when we observe allocation failures. It's
> > not exactly nice behaviour in a library though, where the caller may in
> > fact want to handle this case. But it may at least be better than
> > failing on a `NULL` pointer exception somewhere down the road. So it
> > might be the best alternative for now. We could then conver the reftable
> > library over time to handle allocation failures and, once that's done,
> > we can eventually drop such a call to `abort()`.
> 
> 
> I must admit that I didn't think this part through very much; I
> believe someone told me that libgit2 has pluggable memory allocation
> routines, so I tried to make the malloc pluggable here too.

That was me probably back when I was writing the libgit2 backend.

> Handling OOM better for the malloc calls themselves doesn't seem too
> difficult,
> 
>   hanwen@fedora:~/vc/git/reftable$ grep [cme]alloc *c | wc
>      57     276    3469
> 
> However, it is probably pointless as long as strbuf_* functions do not
> signal OOM gracefully. There was some talk of libifying strbuf. Did
> that work include returning OOM error codes in case malloc returns
> null? A quick look at strbuf.h suggests not.

Yeah, `strbuf` also crossed my mind. And there are some other systems
that the reftable library calls into, like the tempfiles framework, that
would continue to use `xmalloc()` and related functions.

> I would just call xmalloc as default, rather than calling
> reftable_set_alloc, because it might be tricky to ensure it is called
> early enough.

I don't think it should be particularly tricky to call
`reftable_set_alloc()` early enough. The reftable code won't do any
allocations before we set up the refdb. So calling this in our `main()`
function in "common-main.c" should be sufficient.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-09  3:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-06 20:37 [PATCH] reftable: use xmalloc() and xrealloc() René Scharfe
2024-04-08  5:44 ` Patrick Steinhardt
2024-04-08 15:42   ` Junio C Hamano
2024-04-08 16:33     ` Patrick Steinhardt
2024-04-08 17:50   ` Han-Wen Nienhuys
2024-04-08 20:36     ` Junio C Hamano
2024-04-09  3:24     ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).