git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Thread safety in some low-level functions
@ 2020-06-24 22:29 Matheus Tavares Bernardino
  2020-06-24 22:52 ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-06-24 22:29 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt, Jeff King, Jonathan Tan

Hi,

I'm working with threads in unpack-trees and noticed that
[warning|error]_errno() uses strerror(), which is not thread-safe. We
could try to avoid calling these functions in threaded code, but they
are sometimes too deep in the call stack to be noticed... (or even
avoided). The same happens with oid_to_hex(), which writes to a static
buffer.

I don't think I've ever seen a bug report involving these functions
being called racily, but this possibility is not currently excluded in
our codebase. For example, see grep_source_load_file(), which is
called by multiple threads concurrently and might call the
thread-unsafe error_errno(). (Although, I admit, the chance of a race
here must be very low...)

I still haven't been able to come up with a simple / easy change that
could make these functions thread safe, but here are my thoughts so
far:

- For strerror(), there is a thread-safe variant: strerror_r().
However IIUC, this variant is not present on Windows (although there
is strerror_s() which *seems* to be somewhat similar). Also, there are
two versions of strerror_r() on Linux: one is XSI-compliant and the
other is GNU-specific. I don't know what the situation is in other
OSes...

- Regarding, oid_to_hex(), a patch from 2010 [1] proposed a solution
using thread-local storage and pthread_once(). But as Hannes pointed
out in this other thread [2] , implementing a Windows equivalence for
pthread_once() could be tricky and voluminous. Since this thread dates
from 7 years ago, I was wondering if we would be able to implement it
nowadays with InitOnceExecuteOnce() [3].

Finally, leaving these functions thread-unsafe is also a
possibility... As I mentioned earlier, they don't seem to be causing
problems out there for now (at least not reported). But if we can find
a feasible solution to introduce thread-safety, I think it would be
great. The codebase would be more robust and we would be able to work
with threads with much more confidence.

Any thoughts?

Thanks,
Matheus

[1] https://lore.kernel.org/git/20100323173130.GC4218@fredrik-laptop/
[2] https://lore.kernel.org/git/516D5CA4.7000500@viscovery.net/
[3]: https://docs.microsoft.com/en-us/windows/win32/sync/using-one-time-initialization

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Thread safety in some low-level functions
  2020-06-24 22:29 [RFC] Thread safety in some low-level functions Matheus Tavares Bernardino
@ 2020-06-24 22:52 ` Matheus Tavares Bernardino
  2020-06-25  1:38   ` brian m. carlson
  0 siblings, 1 reply; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-06-24 22:52 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt, Jeff King, Jonathan Tan

On Wed, Jun 24, 2020 at 7:29 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> - Regarding, oid_to_hex(), a patch from 2010 [1] proposed a solution
> using thread-local storage and pthread_once(). But as Hannes pointed
> out in this other thread [2] , implementing a Windows equivalence for
> pthread_once() could be tricky and voluminous. Since this thread dates
> from 7 years ago, I was wondering if we would be able to implement it
> nowadays with InitOnceExecuteOnce() [3].

I forgot to link it before, but here is a commit in libav.git which
emulates pthread_once():
https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a

The patch is not so big, and if we consider only Vista+, it seems very
straightforward. I'm not very familiar with Windows, though, and I'm
not sure if this solution would work for us.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Thread safety in some low-level functions
  2020-06-24 22:52 ` Matheus Tavares Bernardino
@ 2020-06-25  1:38   ` brian m. carlson
  2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
  0 siblings, 1 reply; 25+ messages in thread
From: brian m. carlson @ 2020-06-25  1:38 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: git, Johannes Sixt, Jeff King, Jonathan Tan

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

On 2020-06-24 at 22:52:58, Matheus Tavares Bernardino wrote:
> On Wed, Jun 24, 2020 at 7:29 PM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > - Regarding, oid_to_hex(), a patch from 2010 [1] proposed a solution
> > using thread-local storage and pthread_once(). But as Hannes pointed
> > out in this other thread [2] , implementing a Windows equivalence for
> > pthread_once() could be tricky and voluminous. Since this thread dates
> > from 7 years ago, I was wondering if we would be able to implement it
> > nowadays with InitOnceExecuteOnce() [3].
> 
> I forgot to link it before, but here is a commit in libav.git which
> emulates pthread_once():
> https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a
> 
> The patch is not so big, and if we consider only Vista+, it seems very
> straightforward. I'm not very familiar with Windows, though, and I'm
> not sure if this solution would work for us.

I believe Git for Windows dropped support for Windows XP some time back.
Regardless, since pretty much nobody using Windows XP (with the possible
exception of ATMs) is getting security support, I'm fine with ignoring
it either way.  I am also not a Windows person, so others may have more
insight.

As for the general idea, I'm mildly in favor of it.  While I don't plan
to work on the project myself (outside of cleaning up places I'm already
working), I'm definitely not opposed to seeing patches come in to
improve thread safety.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/2] Make oid_to_hex() thread-safe
  2020-06-25  1:38   ` brian m. carlson
@ 2020-06-25 20:32     ` Matheus Tavares
  2020-06-25 20:32       ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
                         ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Matheus Tavares @ 2020-06-25 20:32 UTC (permalink / raw)
  To: git; +Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin

Some thread-unsafe functions of our codebase appear very down in the
call stack, which can be hard to notice (or avoid). Thus they are
sometimes used in threaded code unsafely. In this series we add
pthread_once() to compat/win32/ and use it in conjunction with
pthread_key to make a subset of the said functions thread-safe.

As a next step, I would love to make [warning|error|die]_errno()
thread-safe as well. strerror() is not safe on *nix, and there are some
thread functions today that call these (although the actual risk of a
race condition must be very small...)

My idea was to implement a xstrerror() wrapper which calls the
appropriate thread-safe function (dependant on the OS), or even call
strerror() itself but holding a lock to copy the result for a local
buffer (which should be OK as we don't expect contention in strerror).
We could also set a thread local buffer array, as in the second patch of
this series, to excuse callers from allocating/freeing memory.

One concern with this idea is the risk of an infinite recursion if
xstrerror() or any of its childs call [warning|error|die]_errno().
However, if we are only using strerror() and pthread_*() within the
wrapper, there should be no problem, right? Has anyone thought of
other problems with this approach?

Finally, should such change also come with a coccinelle patch to replace
usages of strerror() with xstrerror()? Or better not, as the change
would be too big?

Matheus Tavares (2):
  compat/win32/pthread: add pthread_once()
  hex: make hash_to_hex_algop() and friends thread-safe

 compat/win32/pthread.c | 22 +++++++++++++++++++++
 compat/win32/pthread.h |  5 +++++
 hex.c                  | 45 ++++++++++++++++++++++++++++++++++++++----
 thread-utils.c         | 11 +++++++++++
 thread-utils.h         |  6 ++++++
 5 files changed, 85 insertions(+), 4 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] compat/win32/pthread: add pthread_once()
  2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
@ 2020-06-25 20:32       ` Matheus Tavares
  2020-06-26  5:45         ` Christian Couder
  2020-06-25 20:32       ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Matheus Tavares @ 2020-06-25 20:32 UTC (permalink / raw)
  To: git; +Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

Note: the pthread_once() function is adapted from:
https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a

Which is LGPLv2.1. Should I add any notice/acknowledgment somewhere,
besides the comment I added above the function?

 compat/win32/pthread.c | 22 ++++++++++++++++++++++
 compat/win32/pthread.h |  5 +++++
 thread-utils.c         | 11 +++++++++++
 thread-utils.h         |  6 ++++++
 4 files changed, 44 insertions(+)

diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
index 2e7eead42c..5a7ecbd999 100644
--- a/compat/win32/pthread.c
+++ b/compat/win32/pthread.c
@@ -56,3 +56,25 @@ pthread_t pthread_self(void)
 	t.tid = GetCurrentThreadId();
 	return t;
 }
+
+/* Adapted from libav's compat/w32pthreads.h. */
+int pthread_once(pthread_once_t *once_control, void (*init_routine)(void))
+{
+	BOOL pending = FALSE;
+	int ret = 0;
+
+	if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) {
+		ret = err_win_to_posix(GetLastError());
+		goto out;
+	}
+
+	if (pending)
+		init_routine();
+
+	if(!InitOnceComplete(once_control, 0, NULL))
+		ret = err_win_to_posix(GetLastError());
+
+out:
+	/* POSIX doesn't allow pthread_once() to return EINTR */
+	return ret == EINTR ? EIO : ret;
+}
diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 737983d00b..c50f1e89c7 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -40,6 +40,11 @@ typedef int pthread_mutexattr_t;
 #define pthread_cond_signal WakeConditionVariable
 #define pthread_cond_broadcast WakeAllConditionVariable
 
+#define pthread_once_t INIT_ONCE
+
+#define PTHREAD_ONCE_INIT INIT_ONCE_STATIC_INIT
+int pthread_once(pthread_once_t *once_control, void (*init_routine)(void));
+
 /*
  * Simple thread creation implementation using pthread API
  */
diff --git a/thread-utils.c b/thread-utils.c
index 5329845691..937deb3f2e 100644
--- a/thread-utils.c
+++ b/thread-utils.c
@@ -122,4 +122,15 @@ int dummy_pthread_join(pthread_t pthread, void **retval)
 	return ENOSYS;
 }
 
+int nothreads_pthread_once(pthread_once_t *once_control,
+			   void (*init_routine)(void))
+{
+	if (*once_control == 1)
+		return 0;
+
+	init_routine();
+	*once_control = 1;
+	return 0;
+}
+
 #endif
diff --git a/thread-utils.h b/thread-utils.h
index 4961487ed9..bab9dc5e4d 100644
--- a/thread-utils.h
+++ b/thread-utils.h
@@ -19,6 +19,7 @@
 #define pthread_mutex_t int
 #define pthread_cond_t int
 #define pthread_key_t int
+#define pthread_once_t int
 
 #define pthread_mutex_init(mutex, attr) dummy_pthread_init(mutex)
 #define pthread_mutex_lock(mutex)
@@ -48,6 +49,11 @@ int dummy_pthread_join(pthread_t pthread, void **retval);
 
 int dummy_pthread_init(void *);
 
+#define PTHREAD_ONCE_INIT 0
+int nothreads_pthread_once(pthread_once_t *once_control,
+			   void (*init_routine)(void));
+#define pthread_once(once, routine) nothreads_pthread_once(once, routine)
+
 #endif
 
 int online_cpus(void);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
  2020-06-25 20:32       ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
@ 2020-06-25 20:32       ` Matheus Tavares
  2020-06-25 23:07         ` brian m. carlson
  2020-06-26  6:02         ` Christian Couder
  2020-06-26  8:22       ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder
  2020-06-26 21:54       ` [PATCH v2 " Matheus Tavares
  3 siblings, 2 replies; 25+ messages in thread
From: Matheus Tavares @ 2020-06-25 20:32 UTC (permalink / raw)
  To: git
  Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin, Fredrik Kuivinen

hash_to_hex_algop() returns a static buffer, relieving callers from the
responsibility of freeing memory after use. But the current
implementation uses the same static data for all threads and, thus, is
not thread-safe. We could avoid using this function and its wrappers
in threaded code, but they are sometimes too deep in the call stack to
be noticed or even avoided.

For example, we can take a look at the number of oid_to_hex() calls,
which calls hash_to_hex_algop():

$ git grep 'oid_to_hex(' | wc -l
818

Although these functions don't seem to be causing problems out there for
now (at least not reported), making them thread-safe makes the codebase
more robust against race conditions. We can easily do that replicating
the static buffer in each thread's local storage.

Original-patch-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 hex.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/hex.c b/hex.c
index da51e64929..1094ed25bd 100644
--- a/hex.c
+++ b/hex.c
@@ -136,12 +136,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid)
 	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
 }
 
+struct hexbuf_array {
+	int idx;
+	char bufs[4][GIT_MAX_HEXSZ + 1];
+};
+
+#ifdef HAVE_THREADS
+static pthread_key_t hexbuf_array_key;
+static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
+
+void init_hexbuf_array_key(void)
+{
+	if (pthread_key_create(&hexbuf_array_key, free))
+		die(_("failed to initialize threads' key for hash to hex conversion"));
+}
+
+#else
+static struct hexbuf_array default_hexbuf_array;
+#endif
+
 char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
 {
-	static int bufno;
-	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
-	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-	return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+	struct hexbuf_array *ha;
+
+#ifdef HAVE_THREADS
+	void *value;
+
+	if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
+		die(_("failed to initialize threads' key for hash to hex conversion"));
+
+	value = pthread_key_getspecific(&hexbuf_array_key);
+	if (value) {
+		ha = (struct hexbuf_array *) value;
+	} else {
+		ha = xmalloc(sizeof(*ha));
+		if (pthread_key_setspecific(&hexbuf_array_key, (void *)ha))
+			die(_("failed to set thread buffer for hash to hex conversion"));
+	}
+#else
+	ha = &default_hexbuf_array;
+#endif
+
+	ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs);
+	return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop);
 }
 
 char *hash_to_hex(const unsigned char *hash)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-25 20:32       ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
@ 2020-06-25 23:07         ` brian m. carlson
  2020-06-25 23:54           ` Matheus Tavares Bernardino
  2020-06-26  6:02         ` Christian Couder
  1 sibling, 1 reply; 25+ messages in thread
From: brian m. carlson @ 2020-06-25 23:07 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, j6t, jonathantanmy, peff, Johannes.Schindelin, Fredrik Kuivinen

[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]

On 2020-06-25 at 20:32:57, Matheus Tavares wrote:
> +#ifdef HAVE_THREADS
> +	void *value;
> +
> +	if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> +		die(_("failed to initialize threads' key for hash to hex conversion"));
> +
> +	value = pthread_key_getspecific(&hexbuf_array_key);

I think the portable name for this is "pthread_getspecific".  That
appears to be what POSIX specifies and what our Windows pthread compat
code uses.

> +	if (value) {
> +		ha = (struct hexbuf_array *) value;
> +	} else {
> +		ha = xmalloc(sizeof(*ha));
> +		if (pthread_key_setspecific(&hexbuf_array_key, (void *)ha))

Same thing here.

> +			die(_("failed to set thread buffer for hash to hex conversion"));
> +	}
> +#else
> +	ha = &default_hexbuf_array;
> +#endif
> +
> +	ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs);
> +	return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop);
>  }
>  
>  char *hash_to_hex(const unsigned char *hash)
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-25 23:07         ` brian m. carlson
@ 2020-06-25 23:54           ` Matheus Tavares Bernardino
  2020-06-26  0:00             ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-06-25 23:54 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin,
	Fredrik Kuivinen

On Thu, Jun 25, 2020 at 8:08 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-06-25 at 20:32:57, Matheus Tavares wrote:
> > +#ifdef HAVE_THREADS
> > +     void *value;
> > +
> > +     if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > +
> > +     value = pthread_key_getspecific(&hexbuf_array_key);
>
> I think the portable name for this is "pthread_getspecific".  That
> appears to be what POSIX specifies and what our Windows pthread compat
> code uses.

Right, thanks for noticing that!

(I wonder how the Windows build passed successfully in our CI [1]...
Shouldn't the compiler have failed due to a "undefined reference"
error?)

[1]: https://github.com/matheustavares/git/runs/808649609

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-25 23:54           ` Matheus Tavares Bernardino
@ 2020-06-26  0:00             ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-06-26  0:00 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin,
	Fredrik Kuivinen

On Thu, Jun 25, 2020 at 8:54 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Thu, Jun 25, 2020 at 8:08 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > On 2020-06-25 at 20:32:57, Matheus Tavares wrote:
> > > +#ifdef HAVE_THREADS
> > > +     void *value;
> > > +
> > > +     if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> > > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > > +
> > > +     value = pthread_key_getspecific(&hexbuf_array_key);
> >
> > I think the portable name for this is "pthread_getspecific".  That
> > appears to be what POSIX specifies and what our Windows pthread compat
> > code uses.
>
> Right, thanks for noticing that!
>
> (I wonder how the Windows build passed successfully in our CI [1]...
> Shouldn't the compiler have failed due to a "undefined reference"
> error?)

Oh, I know why... I forgot to include "thread-utils.h", so
HAVE_THREADS is never defined and this code is always ignored (on both
Windows and Linux).  That's embarrassing... I'll correct that for v2.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] compat/win32/pthread: add pthread_once()
  2020-06-25 20:32       ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
@ 2020-06-26  5:45         ` Christian Couder
  2020-06-26 14:13           ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Couder @ 2020-06-26  5:45 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Johannes Schindelin

On Fri, Jun 26, 2020 at 3:00 AM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---

The commit message might want to explain a bit the purpose of adding
these features.

> Note: the pthread_once() function is adapted from:
> https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a
>
> Which is LGPLv2.1. Should I add any notice/acknowledgment somewhere,
> besides the comment I added above the function?

Yeah, I think you should also tell in the commit message where the
code comes from (along with the hash of the commit) and that libav is
LGPLv2.1 which is compatible with GPLv2 as explained in section 3 of
the LGPLv2.1.

>  compat/win32/pthread.c | 22 ++++++++++++++++++++++
>  compat/win32/pthread.h |  5 +++++
>  thread-utils.c         | 11 +++++++++++
>  thread-utils.h         |  6 ++++++
>  4 files changed, 44 insertions(+)
>
> diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
> index 2e7eead42c..5a7ecbd999 100644
> --- a/compat/win32/pthread.c
> +++ b/compat/win32/pthread.c
> @@ -56,3 +56,25 @@ pthread_t pthread_self(void)
>         t.tid = GetCurrentThreadId();
>         return t;
>  }
> +
> +/* Adapted from libav's compat/w32pthreads.h. */
> +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void))
> +{
> +       BOOL pending = FALSE;
> +       int ret = 0;
> +
> +       if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) {

We put a space between "if" and the following "(". It might also be
interesting to know perhaps in the commit message how much you adapted
the code.

For example perhaps a good strategy would be in the commit that
imports the code to do the minimal amount of change so that it builds
and passes the test, and then to have another commit that adapts the
style of the code.

> +               ret = err_win_to_posix(GetLastError());
> +               goto out;
> +       }
> +
> +       if (pending)
> +               init_routine();
> +
> +       if(!InitOnceComplete(once_control, 0, NULL))

Space missing between "if" and the following "(".

> +               ret = err_win_to_posix(GetLastError());
> +
> +out:
> +       /* POSIX doesn't allow pthread_once() to return EINTR */
> +       return ret == EINTR ? EIO : ret;
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-25 20:32       ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
  2020-06-25 23:07         ` brian m. carlson
@ 2020-06-26  6:02         ` Christian Couder
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Couder @ 2020-06-26  6:02 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Johannes Schindelin, Fredrik Kuivinen

On Fri, Jun 26, 2020 at 1:35 AM Matheus Tavares
<matheus.bernardino@usp.br> wrote:

[...]

> Although these functions don't seem to be causing problems out there for
> now (at least not reported), making them thread-safe makes the codebase
> more robust against race conditions. We can easily do that replicating

Maybe s/that replicating/that by replicating/.

> the static buffer in each thread's local storage.
>
> Original-patch-by: Fredrik Kuivinen <frekui@gmail.com>

If Fredrik gave his "Signed-off-by:", maybe it's better to keep it too.

> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/2] Make oid_to_hex() thread-safe
  2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
  2020-06-25 20:32       ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
  2020-06-25 20:32       ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
@ 2020-06-26  8:22       ` Christian Couder
  2020-06-26 16:22         ` Matheus Tavares Bernardino
  2020-06-26 21:54       ` [PATCH v2 " Matheus Tavares
  3 siblings, 1 reply; 25+ messages in thread
From: Christian Couder @ 2020-06-26  8:22 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Johannes Schindelin

On Fri, Jun 26, 2020 at 1:35 AM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Some thread-unsafe functions of our codebase appear very down in the
> call stack, which can be hard to notice (or avoid). Thus they are
> sometimes used in threaded code unsafely. In this series we add
> pthread_once() to compat/win32/ and use it in conjunction with
> pthread_key to make a subset of the said functions thread-safe.

Great!

> As a next step, I would love to make [warning|error|die]_errno()
> thread-safe as well. strerror() is not safe on *nix, and there are some
> thread functions today that call these (although the actual risk of a
> race condition must be very small...)
>
> My idea was to implement a xstrerror() wrapper which calls the
> appropriate thread-safe function (dependant on the OS),

Yeah, that works if there are appropriate thread-safe functions for
all the OS we are interested in, or if we can fallback to strerror()
or calling it holding a lock.

> or even call
> strerror() itself but holding a lock to copy the result for a local
> buffer (which should be OK as we don't expect contention in strerror).

I agree that it should be ok.

> We could also set a thread local buffer array, as in the second patch of
> this series, to excuse callers from allocating/freeing memory.

I don't think caller allocating/freeing memory for error strings is a
performance or code simplification issue.

> One concern with this idea is the risk of an infinite recursion if
> xstrerror() or any of its childs call [warning|error|die]_errno().
> However, if we are only using strerror() and pthread_*() within the
> wrapper, there should be no problem, right?

Yeah, I agree.

> Has anyone thought of
> other problems with this approach?
>
> Finally, should such change also come with a coccinelle patch to replace
> usages of strerror() with xstrerror()? Or better not, as the change
> would be too big?

I would agree that it's better to avoid too big changes. I am not sure
how much we want to automate and check that though.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] compat/win32/pthread: add pthread_once()
  2020-06-26  5:45         ` Christian Couder
@ 2020-06-26 14:13           ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-06-26 14:13 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Johannes Schindelin

On Fri, Jun 26, 2020 at 2:45 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Fri, Jun 26, 2020 at 3:00 AM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
>
> The commit message might want to explain a bit the purpose of adding
> these features.

Will do!

> > Note: the pthread_once() function is adapted from:
> > https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a
> >
> > Which is LGPLv2.1. Should I add any notice/acknowledgment somewhere,
> > besides the comment I added above the function?
>
> Yeah, I think you should also tell in the commit message where the
> code comes from (along with the hash of the commit) and that libav is
> LGPLv2.1 which is compatible with GPLv2 as explained in section 3 of
> the LGPLv2.1.

Sounds good, I'll add that information.

> >  compat/win32/pthread.c | 22 ++++++++++++++++++++++
> >  compat/win32/pthread.h |  5 +++++
> >  thread-utils.c         | 11 +++++++++++
> >  thread-utils.h         |  6 ++++++
> >  4 files changed, 44 insertions(+)
> >
> > diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
> > index 2e7eead42c..5a7ecbd999 100644
> > --- a/compat/win32/pthread.c
> > +++ b/compat/win32/pthread.c
> > @@ -56,3 +56,25 @@ pthread_t pthread_self(void)
> >         t.tid = GetCurrentThreadId();
> >         return t;
> >  }
> > +
> > +/* Adapted from libav's compat/w32pthreads.h. */
> > +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void))
> > +{
> > +       BOOL pending = FALSE;
> > +       int ret = 0;
> > +
> > +       if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) {
>
> We put a space between "if" and the following "(". It might also be
> interesting to know perhaps in the commit message how much you adapted
> the code.
>
> For example perhaps a good strategy would be in the commit that
> imports the code to do the minimal amount of change so that it builds
> and passes the test, and then to have another commit that adapts the
> style of the code.

Makes sense. Alternatively, since the code ported from libav is quite
small, I think we might as well already adapt the style in this same
commit. I'll fix that for v2, thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/2] Make oid_to_hex() thread-safe
  2020-06-26  8:22       ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder
@ 2020-06-26 16:22         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-06-26 16:22 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Johannes Schindelin

Hi, Christian

Thanks for the feedback!

On Fri, Jun 26, 2020 at 5:22 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Fri, Jun 26, 2020 at 1:35 AM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > My idea was to implement a xstrerror() wrapper which calls the
> > appropriate thread-safe function (dependant on the OS),
[...]
> > We could also set a thread local buffer array, as in the second patch of
> > this series, to excuse callers from allocating/freeing memory.
>
> I don't think caller allocating/freeing memory for error strings is a
> performance or code simplification issue.

Yeah, I agree that passing a buffer to xstrerror() should be fine. The
problem is that we already have many uses of strerror() (98, according
to git-grep), which expect a static buffer in return. So the change
would be too big :(

> > Finally, should such change also come with a coccinelle patch to replace
> > usages of strerror() with xstrerror()? Or better not, as the change
> > would be too big?
>
> I would agree that it's better to avoid too big changes. I am not sure
> how much we want to automate and check that though.

Another alternative would be to replace the definition of strerror() with a:

#define strerror(errnum) xstrerror(errnum)

This way, all the 98 current strerror() callers would start using the
thread-safe version without the need to change them.... However, I
don't think I've ever seen such a replacement in our codebase before
(besides the banned functions in banned.h). Also, people might expect
strerror() to refer to the system version, and perhaps make wrong
assumptions about it? I'm not sure which is the best alternative, it's
a tricky issue :(

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 0/2] Make oid_to_hex() thread-safe
  2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
                         ` (2 preceding siblings ...)
  2020-06-26  8:22       ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder
@ 2020-06-26 21:54       ` Matheus Tavares
  2020-06-26 21:54         ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
  2020-06-26 21:54         ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
  3 siblings, 2 replies; 25+ messages in thread
From: Matheus Tavares @ 2020-06-26 21:54 UTC (permalink / raw)
  To: git
  Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin, christian.couder

Some thread-unsafe functions of our codebase appear very down in the
call stack, which can be hard to notice (or avoid). Thus they are
sometimes used in threaded code unsafely. In this series we add
pthread_once() to compat/win32/ and use it in conjunction with
pthread_key to make a subset of the said functions thread-safe.

Changes since v1:

Patch 1:
- Fixed style problems.
- Wrote a more descriptive commit message, with motivation and info
  about where the code was ported from (including commit, license and
  modifications made).
- Corrected usage of InitOnceComplete() in pthread_once(). It was
  previously being called unconditionally, which lead to errors in
  subsequent calls to pthread_once(), after the initialization had
  already succeded.

Patch 2:
- Used the correct pthread_[get|set]specific() functions in hex.c, as
  Brian pointed out.
- Imported thread_utils.h to hex.c (providing the HAVE_THREADS macro)
- Added Fredrik's S-o-B
- Fixed spelling error
- Added the static identifier to init_hexbuf_array_key() in hex.c
- Provided an example in the commit message where oid_to_hex() was being
  used unsafelly.

Matheus Tavares (2):
  compat/win32/pthread: add pthread_once()
  hex: make hash_to_hex_algop() and friends thread-safe

 compat/win32/pthread.c | 18 +++++++++++++++++
 compat/win32/pthread.h |  5 +++++
 hex.c                  | 46 ++++++++++++++++++++++++++++++++++++++----
 thread-utils.c         | 11 ++++++++++
 thread-utils.h         |  7 +++++++
 5 files changed, 83 insertions(+), 4 deletions(-)

Range-diff against v1:
1:  e5a10d3f21 ! 1:  783fcddf8d compat/win32/pthread: add pthread_once()
    @@ Metadata
      ## Commit message ##
         compat/win32/pthread: add pthread_once()
     
    +    Add pthread_once() emulation for Windows. This function provides an easy
    +    way to do thread-safe one-time initializations in multithreaded code. It
    +    will be used in the next commit to make hash_to_hex_algop()
    +    thread-safe.
    +
    +    The pthread_once() implementation added comes from libav
    +    (https://git.libav.org/?p=libav.git), commit b22693b ("w32pthreads: Add
    +    pthread_once emulation", 2015-10-07). The code is licensed under
    +    LGPLv2.1 which is compatible with GPLv2. Only the section for support on
    +    Windows Vista+ has been ported, as that's the minimum version required
    +    to build Git for Windows.  Also, the code was modified to (1) check and
    +    return error codes; and (2) do not call InitOnceComplete() again after a
    +    successful initialization, as that results in an error.
    +
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## compat/win32/pthread.c ##
    @@ compat/win32/pthread.c: pthread_t pthread_self(void)
     +	BOOL pending = FALSE;
     +	int ret = 0;
     +
    -+	if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) {
    ++	if (!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) {
     +		ret = err_win_to_posix(GetLastError());
    -+		goto out;
    -+	}
    -+
    -+	if (pending)
    ++	} else if (pending) {
     +		init_routine();
    ++		if (!InitOnceComplete(once_control, 0, NULL))
    ++			ret = err_win_to_posix(GetLastError());
    ++	}
     +
    -+	if(!InitOnceComplete(once_control, 0, NULL))
    -+		ret = err_win_to_posix(GetLastError());
    -+
    -+out:
     +	/* POSIX doesn't allow pthread_once() to return EINTR */
     +	return ret == EINTR ? EIO : ret;
     +}
    @@ thread-utils.h: int dummy_pthread_join(pthread_t pthread, void **retval);
      int dummy_pthread_init(void *);
      
     +#define PTHREAD_ONCE_INIT 0
    ++#define pthread_once(once, routine) nothreads_pthread_once(once, routine)
    ++
     +int nothreads_pthread_once(pthread_once_t *once_control,
     +			   void (*init_routine)(void));
    -+#define pthread_once(once, routine) nothreads_pthread_once(once, routine)
     +
      #endif
      
2:  0104cd9c76 ! 2:  b47445fa1c hex: make hash_to_hex_algop() and friends thread-safe
    @@ Commit message
         in threaded code, but they are sometimes too deep in the call stack to
         be noticed or even avoided.
     
    -    For example, we can take a look at the number of oid_to_hex() calls,
    -    which calls hash_to_hex_algop():
    +    grep.c:grep_source_load_oid(), for example, uses the thread-unsafe
    +    oid_to_hex() (on errors) despite being called in threaded code. And
    +    oid_to_hex() -- which calls hash_to_hex_algop() -- is used in many other
    +    places, as well:
     
         $ git grep 'oid_to_hex(' | wc -l
         818
     
    -    Although these functions don't seem to be causing problems out there for
    -    now (at least not reported), making them thread-safe makes the codebase
    -    more robust against race conditions. We can easily do that replicating
    -    the static buffer in each thread's local storage.
    +    Although hash_to_hex_algop() and its wrappers don't seem to be causing
    +    problems out there for now (at least not reported), making them
    +    thread-safe makes the codebase more robust against race conditions. We
    +    can easily do that by replicating the static buffer in each thread's
    +    local storage.
     
         Original-patch-by: Fredrik Kuivinen <frekui@gmail.com>
    +    Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## hex.c ##
    +@@
    + #include "cache.h"
    ++#include "thread-utils.h"
    + 
    + const signed char hexval_table[256] = {
    + 	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 00-07 */
     @@ hex.c: char *oid_to_hex_r(char *buffer, const struct object_id *oid)
      	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
      }
    @@ hex.c: char *oid_to_hex_r(char *buffer, const struct object_id *oid)
     +static pthread_key_t hexbuf_array_key;
     +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
     +
    -+void init_hexbuf_array_key(void)
    ++static void init_hexbuf_array_key(void)
     +{
     +	if (pthread_key_create(&hexbuf_array_key, free))
     +		die(_("failed to initialize threads' key for hash to hex conversion"));
    @@ hex.c: char *oid_to_hex_r(char *buffer, const struct object_id *oid)
     +	if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
     +		die(_("failed to initialize threads' key for hash to hex conversion"));
     +
    -+	value = pthread_key_getspecific(&hexbuf_array_key);
    ++	value = pthread_getspecific(hexbuf_array_key);
     +	if (value) {
     +		ha = (struct hexbuf_array *) value;
     +	} else {
     +		ha = xmalloc(sizeof(*ha));
    -+		if (pthread_key_setspecific(&hexbuf_array_key, (void *)ha))
    ++		if (pthread_setspecific(hexbuf_array_key, (void *)ha))
     +			die(_("failed to set thread buffer for hash to hex conversion"));
     +	}
     +#else
-- 
2.26.2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/2] compat/win32/pthread: add pthread_once()
  2020-06-26 21:54       ` [PATCH v2 " Matheus Tavares
@ 2020-06-26 21:54         ` Matheus Tavares
  2020-06-26 21:54         ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
  1 sibling, 0 replies; 25+ messages in thread
From: Matheus Tavares @ 2020-06-26 21:54 UTC (permalink / raw)
  To: git
  Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin, christian.couder

Add pthread_once() emulation for Windows. This function provides an easy
way to do thread-safe one-time initializations in multithreaded code. It
will be used in the next commit to make hash_to_hex_algop()
thread-safe.

The pthread_once() implementation added comes from libav
(https://git.libav.org/?p=libav.git), commit b22693b ("w32pthreads: Add
pthread_once emulation", 2015-10-07). The code is licensed under
LGPLv2.1 which is compatible with GPLv2. Only the section for support on
Windows Vista+ has been ported, as that's the minimum version required
to build Git for Windows.  Also, the code was modified to (1) check and
return error codes; and (2) do not call InitOnceComplete() again after a
successful initialization, as that results in an error.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 compat/win32/pthread.c | 18 ++++++++++++++++++
 compat/win32/pthread.h |  5 +++++
 thread-utils.c         | 11 +++++++++++
 thread-utils.h         |  7 +++++++
 4 files changed, 41 insertions(+)

diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
index 2e7eead42c..cb32f8c504 100644
--- a/compat/win32/pthread.c
+++ b/compat/win32/pthread.c
@@ -56,3 +56,21 @@ pthread_t pthread_self(void)
 	t.tid = GetCurrentThreadId();
 	return t;
 }
+
+/* Adapted from libav's compat/w32pthreads.h. */
+int pthread_once(pthread_once_t *once_control, void (*init_routine)(void))
+{
+	BOOL pending = FALSE;
+	int ret = 0;
+
+	if (!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) {
+		ret = err_win_to_posix(GetLastError());
+	} else if (pending) {
+		init_routine();
+		if (!InitOnceComplete(once_control, 0, NULL))
+			ret = err_win_to_posix(GetLastError());
+	}
+
+	/* POSIX doesn't allow pthread_once() to return EINTR */
+	return ret == EINTR ? EIO : ret;
+}
diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 737983d00b..c50f1e89c7 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -40,6 +40,11 @@ typedef int pthread_mutexattr_t;
 #define pthread_cond_signal WakeConditionVariable
 #define pthread_cond_broadcast WakeAllConditionVariable
 
+#define pthread_once_t INIT_ONCE
+
+#define PTHREAD_ONCE_INIT INIT_ONCE_STATIC_INIT
+int pthread_once(pthread_once_t *once_control, void (*init_routine)(void));
+
 /*
  * Simple thread creation implementation using pthread API
  */
diff --git a/thread-utils.c b/thread-utils.c
index 5329845691..937deb3f2e 100644
--- a/thread-utils.c
+++ b/thread-utils.c
@@ -122,4 +122,15 @@ int dummy_pthread_join(pthread_t pthread, void **retval)
 	return ENOSYS;
 }
 
+int nothreads_pthread_once(pthread_once_t *once_control,
+			   void (*init_routine)(void))
+{
+	if (*once_control == 1)
+		return 0;
+
+	init_routine();
+	*once_control = 1;
+	return 0;
+}
+
 #endif
diff --git a/thread-utils.h b/thread-utils.h
index 4961487ed9..8f9786217a 100644
--- a/thread-utils.h
+++ b/thread-utils.h
@@ -19,6 +19,7 @@
 #define pthread_mutex_t int
 #define pthread_cond_t int
 #define pthread_key_t int
+#define pthread_once_t int
 
 #define pthread_mutex_init(mutex, attr) dummy_pthread_init(mutex)
 #define pthread_mutex_lock(mutex)
@@ -48,6 +49,12 @@ int dummy_pthread_join(pthread_t pthread, void **retval);
 
 int dummy_pthread_init(void *);
 
+#define PTHREAD_ONCE_INIT 0
+#define pthread_once(once, routine) nothreads_pthread_once(once, routine)
+
+int nothreads_pthread_once(pthread_once_t *once_control,
+			   void (*init_routine)(void));
+
 #endif
 
 int online_cpus(void);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-26 21:54       ` [PATCH v2 " Matheus Tavares
  2020-06-26 21:54         ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
@ 2020-06-26 21:54         ` Matheus Tavares
  2020-06-29 15:11           ` Johannes Schindelin
  2020-07-16 11:29           ` Johannes Schindelin
  1 sibling, 2 replies; 25+ messages in thread
From: Matheus Tavares @ 2020-06-26 21:54 UTC (permalink / raw)
  To: git
  Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin,
	christian.couder, Fredrik Kuivinen

hash_to_hex_algop() returns a static buffer, relieving callers from the
responsibility of freeing memory after use. But the current
implementation uses the same static data for all threads and, thus, is
not thread-safe. We could avoid using this function and its wrappers
in threaded code, but they are sometimes too deep in the call stack to
be noticed or even avoided.

grep.c:grep_source_load_oid(), for example, uses the thread-unsafe
oid_to_hex() (on errors) despite being called in threaded code. And
oid_to_hex() -- which calls hash_to_hex_algop() -- is used in many other
places, as well:

$ git grep 'oid_to_hex(' | wc -l
818

Although hash_to_hex_algop() and its wrappers don't seem to be causing
problems out there for now (at least not reported), making them
thread-safe makes the codebase more robust against race conditions. We
can easily do that by replicating the static buffer in each thread's
local storage.

Original-patch-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 hex.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/hex.c b/hex.c
index da51e64929..4f2f163d5e 100644
--- a/hex.c
+++ b/hex.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "thread-utils.h"
 
 const signed char hexval_table[256] = {
 	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 00-07 */
@@ -136,12 +137,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid)
 	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
 }
 
+struct hexbuf_array {
+	int idx;
+	char bufs[4][GIT_MAX_HEXSZ + 1];
+};
+
+#ifdef HAVE_THREADS
+static pthread_key_t hexbuf_array_key;
+static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
+
+static void init_hexbuf_array_key(void)
+{
+	if (pthread_key_create(&hexbuf_array_key, free))
+		die(_("failed to initialize threads' key for hash to hex conversion"));
+}
+
+#else
+static struct hexbuf_array default_hexbuf_array;
+#endif
+
 char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
 {
-	static int bufno;
-	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
-	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-	return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+	struct hexbuf_array *ha;
+
+#ifdef HAVE_THREADS
+	void *value;
+
+	if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
+		die(_("failed to initialize threads' key for hash to hex conversion"));
+
+	value = pthread_getspecific(hexbuf_array_key);
+	if (value) {
+		ha = (struct hexbuf_array *) value;
+	} else {
+		ha = xmalloc(sizeof(*ha));
+		if (pthread_setspecific(hexbuf_array_key, (void *)ha))
+			die(_("failed to set thread buffer for hash to hex conversion"));
+	}
+#else
+	ha = &default_hexbuf_array;
+#endif
+
+	ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs);
+	return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop);
 }
 
 char *hash_to_hex(const unsigned char *hash)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-26 21:54         ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
@ 2020-06-29 15:11           ` Johannes Schindelin
  2020-06-30 20:37             ` Matheus Tavares Bernardino
  2020-07-16 11:29           ` Johannes Schindelin
  1 sibling, 1 reply; 25+ messages in thread
From: Johannes Schindelin @ 2020-06-29 15:11 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, sandals, j6t, jonathantanmy, peff, christian.couder,
	Fredrik Kuivinen

Hi Matheus,

I am fine with the Windows changes (although I have to admit that I did
not find time to test things yet).

There is one problem in that I do not necessarily know that the memory is
released correctly when threads end; You will notice that the
`pthread_key_create()` shim in `compat/win32/pthread.h` does not use the
`destructor` parameter at all. The documentation at

	https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage

is also not terribly clear _how_ the memory is released that was assigned
via `TlsSetValue()`. I notice that the example uses `LocalAlloc()`, but we
override `malloc()` via the `compat/nedmalloc/` functions, so that should
cause problems unless I am wrong.

Maybe there is an expert reading this who could jump in and help
understand the ramifications?

A couple more things:

On Fri, 26 Jun 2020, Matheus Tavares wrote:

> hash_to_hex_algop() returns a static buffer, relieving callers from the
> responsibility of freeing memory after use. But the current
> implementation uses the same static data for all threads and, thus, is
> not thread-safe. We could avoid using this function and its wrappers
> in threaded code, but they are sometimes too deep in the call stack to
> be noticed or even avoided.
>
> grep.c:grep_source_load_oid(), for example, uses the thread-unsafe
> oid_to_hex() (on errors) despite being called in threaded code. And
> oid_to_hex() -- which calls hash_to_hex_algop() -- is used in many other
> places, as well:
>
> $ git grep 'oid_to_hex(' | wc -l
> 818
>
> Although hash_to_hex_algop() and its wrappers don't seem to be causing
> problems out there for now (at least not reported), making them
> thread-safe makes the codebase more robust against race conditions. We
> can easily do that by replicating the static buffer in each thread's
> local storage.
>
> Original-patch-by: Fredrik Kuivinen <frekui@gmail.com>
> Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  hex.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/hex.c b/hex.c
> index da51e64929..4f2f163d5e 100644
> --- a/hex.c
> +++ b/hex.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "thread-utils.h"
>
>  const signed char hexval_table[256] = {
>  	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 00-07 */
> @@ -136,12 +137,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid)
>  	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
>  }
>
> +struct hexbuf_array {
> +	int idx;

Is there a specific reason why you renamed `bufno` to `idx`? If not, I'd
rather keep the old name.

> +	char bufs[4][GIT_MAX_HEXSZ + 1];
> +};
> +
> +#ifdef HAVE_THREADS
> +static pthread_key_t hexbuf_array_key;
> +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
> +
> +static void init_hexbuf_array_key(void)
> +{
> +	if (pthread_key_create(&hexbuf_array_key, free))
> +		die(_("failed to initialize threads' key for hash to hex conversion"));
> +}
> +
> +#else
> +static struct hexbuf_array default_hexbuf_array;
> +#endif
> +
>  char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
>  {
> -	static int bufno;
> -	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
> -	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
> -	return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
> +	struct hexbuf_array *ha;
> +
> +#ifdef HAVE_THREADS
> +	void *value;
> +
> +	if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> +		die(_("failed to initialize threads' key for hash to hex conversion"));
> +
> +	value = pthread_getspecific(hexbuf_array_key);
> +	if (value) {
> +		ha = (struct hexbuf_array *) value;
> +	} else {
> +		ha = xmalloc(sizeof(*ha));
> +		if (pthread_setspecific(hexbuf_array_key, (void *)ha))
> +			die(_("failed to set thread buffer for hash to hex conversion"));
> +	}
> +#else
> +	ha = &default_hexbuf_array;
> +#endif

This introduces two ugly `#ifdef HAVE_THREADS` constructs which are
problematic because they are the most likely places to introduce compile
errors.

I wonder whether you considered introducing a function (and probably a
macro) that transparently gives you a thread-specific instance of a given
data type? The caller would look something like

	struct hexbuf_array *hex_array;

	GET_THREADSPECIFIC(hex_array);

where the macro would look somewhat like this:

	#define GET_THREADSPECIFIC(var) \
		if (get_thread_specific(&var, sizeof(var)) < 0)
			die(_("Failed to get thread-specific %s"), #var);

and the function would allocate and assign the variable. I guess this
scheme won't work, though, as `pthread_once()` does not take a callback
parameter, right? And we would need that parameter to be able to create
the `pthread_key_t`. Hmm.

Ciao,
Dscho

> +
> +	ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs);
> +	return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop);
>  }
>
>  char *hash_to_hex(const unsigned char *hash)
> --
> 2.26.2
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-29 15:11           ` Johannes Schindelin
@ 2020-06-30 20:37             ` Matheus Tavares Bernardino
  2020-07-01 11:32               ` Johannes Schindelin
  0 siblings, 1 reply; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-06-30 20:37 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Christian Couder, Fredrik Kuivinen

On Tue, Jun 30, 2020 at 12:01 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> I am fine with the Windows changes (although I have to admit that I did
> not find time to test things yet).
>
> There is one problem in that I do not necessarily know that the memory is
> released correctly when threads end; You will notice that the
> `pthread_key_create()` shim in `compat/win32/pthread.h` does not use the
> `destructor` parameter at all. The documentation at
>
>       https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage
>
> is also not terribly clear _how_ the memory is released that was assigned
> via `TlsSetValue()`.

Yes, I wasn't sure about that either... It would be great if someone
familiar with TLS memory on Windows could help us with this.

> A couple more things:
>
> On Fri, 26 Jun 2020, Matheus Tavares wrote:
> >
[...]
> > +struct hexbuf_array {
> > +     int idx;
>
> Is there a specific reason why you renamed `bufno` to `idx`? If not, I'd
> rather keep the old name.

OK, I'll change it back in v3.

> > +     char bufs[4][GIT_MAX_HEXSZ + 1];
> > +};
> > +
> > +#ifdef HAVE_THREADS
> > +static pthread_key_t hexbuf_array_key;
> > +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
> > +
> > +static void init_hexbuf_array_key(void)
> > +{
> > +     if (pthread_key_create(&hexbuf_array_key, free))
> > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > +}
> > +
> > +#else
> > +static struct hexbuf_array default_hexbuf_array;
> > +#endif
> > +
> >  char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
> >  {
> > -     static int bufno;
> > -     static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
> > -     bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
> > -     return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
> > +     struct hexbuf_array *ha;
> > +
> > +#ifdef HAVE_THREADS
> > +     void *value;
> > +
> > +     if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > +
> > +     value = pthread_getspecific(hexbuf_array_key);
> > +     if (value) {
> > +             ha = (struct hexbuf_array *) value;
> > +     } else {
> > +             ha = xmalloc(sizeof(*ha));
> > +             if (pthread_setspecific(hexbuf_array_key, (void *)ha))
> > +                     die(_("failed to set thread buffer for hash to hex conversion"));
> > +     }
> > +#else
> > +     ha = &default_hexbuf_array;
> > +#endif
>
> This introduces two ugly `#ifdef HAVE_THREADS` constructs which are
> problematic because they are the most likely places to introduce compile
> errors.
>
> I wonder whether you considered introducing a function (and probably a
> macro) that transparently gives you a thread-specific instance of a given
> data type? The caller would look something like
>
>         struct hexbuf_array *hex_array;
>
>         GET_THREADSPECIFIC(hex_array);
>
> where the macro would look somewhat like this:
>
>         #define GET_THREADSPECIFIC(var) \
>                 if (get_thread_specific(&var, sizeof(var)) < 0)
>                         die(_("Failed to get thread-specific %s"), #var);
>
> and the function would allocate and assign the variable.

Hmm, we would need two macros, wouldn't we? GET_THREADSPECIFIC(var)
and a DECLARE_THREADSPECIFIC(var, destructor), to declare the
pthread_once_t and pthread_key_t variables, as well as define a
initialization function for the key (i.e. the callback to
pthread_once()). Or we could provide these declarations ourselves, but
then we would need the "ifdef HAVE_THREADS" guard to avoid calling
pthread_key_create() when there is no multithreading.

I think that would work, yes. Alternatively, I think we could adjust
the dummy pthread_key_* functions in thread-utils.h to emulate the
real ones when HAVE_THREADS == false. Then we could eliminate the
`#ifdef HAVE_THREADS` guards and use the same code for both cases here
(and everywhere else pthread_key is used). I haven't thought about it
carefully enough yet, but I think this *might* be as simple as adding
the following defines in the "#ifdef NO_PTHREADS" section of
thread-utils.h:

#define pthread_key_t void *
/*
 * The destructor is not used in this case as the main thread will only
 * exit when the program terminates.
 */
#define pthread_key_create(key_ptr, unused) return_0((*key_ptr) = NULL)
#define pthread_setspecific(key, value) return_0((key) = (value))
#define pthread_getspecific(key) (key)
#define pthread_key_delete(key) return_0(NULL)

static inline int return_0(void *unused)
{
        return 0;
}

That's the general idea, but we might as well define a `struct
dummy_pthread_key_t` instead of defining the key directly as `void *`
(and have functions instead of macros). This way we could store, e.g.,
an "initialized" flag that could be used to return an error code on
double-initializations.

What do you think?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-30 20:37             ` Matheus Tavares Bernardino
@ 2020-07-01 11:32               ` Johannes Schindelin
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Schindelin @ 2020-07-01 11:32 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Christian Couder, Fredrik Kuivinen

Hi Matheus,

On Tue, 30 Jun 2020, Matheus Tavares Bernardino wrote:

> On Tue, Jun 30, 2020 at 12:01 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Matheus,
> >
> > I am fine with the Windows changes (although I have to admit that I did
> > not find time to test things yet).
> >
> > There is one problem in that I do not necessarily know that the memory is
> > released correctly when threads end; You will notice that the
> > `pthread_key_create()` shim in `compat/win32/pthread.h` does not use the
> > `destructor` parameter at all. The documentation at
> >
> >       https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage
> >
> > is also not terribly clear _how_ the memory is released that was assigned
> > via `TlsSetValue()`.
>
> Yes, I wasn't sure about that either... It would be great if someone
> familiar with TLS memory on Windows could help us with this.

From reading
https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage,
I get the impression that it is the duty of the thread function to take
care of releasing the memory.

Looking at our code base, I do not see any obvious single point where we
could take care of that.

And it does not look like there is a function you can register to be
called just before threads shut down.

Hmm.

A very involved solution would be to override our `pthread_create()`
emulation to override the function being called, passing as function
parameter a pointer to a struct containing the originally-specified thread
function and parameter, and the overriding function would then call that
function, save its return value, take care of releasing the memory, and
return the saved return value.

Likewise, our `pthread_setspecific()` shim in `compat/win32/pthread.h`
would need to learn to append the passed pointer to a list, and we should
probably also edit the `pthread_key_create()` shim to record the function
to call when releasing the memory at the end of a thread.

This strikes me as pretty ugly and complex.

Yet when I read the LGPLv2'ed code of
https://sourceware.org/pthreads-win32/ that is exactly what they use.
Since their source code is still in CVS, and their anonymous CVS viewer no
longer works, I had to find a copy to point to:
https://github.com/jingyu/pthreads-w32/blob/master/pthread_key_create.c

So we could now start to require a non-minimal pthreads emulation on
Windows. I am somewhat opposed to that idea.

Now I wonder just how involved it _really_ would be to convert all of
those `oid_to_hex()` calls into what is effectively an `_r()` variant,
where you _have_ to pass a buffer to store the result. That would make it
a ton easier to keep Git portable.

I do understand that 800+ hits for `oid_to_hex()` make this somewhat of a
pain to implement...

Maybe there is a way to convince Coccinelle to do all that work for us?

> > A couple more things:
> >
> > On Fri, 26 Jun 2020, Matheus Tavares wrote:
> > >
> [...]
> > > +struct hexbuf_array {
> > > +     int idx;
> >
> > Is there a specific reason why you renamed `bufno` to `idx`? If not, I'd
> > rather keep the old name.
>
> OK, I'll change it back in v3.
>
> > > +     char bufs[4][GIT_MAX_HEXSZ + 1];
> > > +};
> > > +
> > > +#ifdef HAVE_THREADS
> > > +static pthread_key_t hexbuf_array_key;
> > > +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
> > > +
> > > +static void init_hexbuf_array_key(void)
> > > +{
> > > +     if (pthread_key_create(&hexbuf_array_key, free))
> > > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > > +}
> > > +
> > > +#else
> > > +static struct hexbuf_array default_hexbuf_array;
> > > +#endif
> > > +
> > >  char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
> > >  {
> > > -     static int bufno;
> > > -     static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
> > > -     bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
> > > -     return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
> > > +     struct hexbuf_array *ha;
> > > +
> > > +#ifdef HAVE_THREADS
> > > +     void *value;
> > > +
> > > +     if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> > > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > > +
> > > +     value = pthread_getspecific(hexbuf_array_key);
> > > +     if (value) {
> > > +             ha = (struct hexbuf_array *) value;
> > > +     } else {
> > > +             ha = xmalloc(sizeof(*ha));
> > > +             if (pthread_setspecific(hexbuf_array_key, (void *)ha))
> > > +                     die(_("failed to set thread buffer for hash to hex conversion"));
> > > +     }
> > > +#else
> > > +     ha = &default_hexbuf_array;
> > > +#endif
> >
> > This introduces two ugly `#ifdef HAVE_THREADS` constructs which are
> > problematic because they are the most likely places to introduce compile
> > errors.
> >
> > I wonder whether you considered introducing a function (and probably a
> > macro) that transparently gives you a thread-specific instance of a given
> > data type? The caller would look something like
> >
> >         struct hexbuf_array *hex_array;
> >
> >         GET_THREADSPECIFIC(hex_array);
> >
> > where the macro would look somewhat like this:
> >
> >         #define GET_THREADSPECIFIC(var) \
> >                 if (get_thread_specific(&var, sizeof(var)) < 0)
> >                         die(_("Failed to get thread-specific %s"), #var);
> >
> > and the function would allocate and assign the variable.
>
> Hmm, we would need two macros, wouldn't we? GET_THREADSPECIFIC(var)
> and a DECLARE_THREADSPECIFIC(var, destructor), to declare the
> pthread_once_t and pthread_key_t variables, as well as define a
> initialization function for the key (i.e. the callback to
> pthread_once()). Or we could provide these declarations ourselves, but
> then we would need the "ifdef HAVE_THREADS" guard to avoid calling
> pthread_key_create() when there is no multithreading.

Right. I wanted to avoid the need to call `DECLARE_THREADSPECIFIC()`, in
particular because that would have to be _outside_ of any function
(because it has to define a function, and nested functions are not really
supported in C, at least not widely).

Ciao,
Dscho

>
> I think that would work, yes. Alternatively, I think we could adjust
> the dummy pthread_key_* functions in thread-utils.h to emulate the
> real ones when HAVE_THREADS == false. Then we could eliminate the
> `#ifdef HAVE_THREADS` guards and use the same code for both cases here
> (and everywhere else pthread_key is used). I haven't thought about it
> carefully enough yet, but I think this *might* be as simple as adding
> the following defines in the "#ifdef NO_PTHREADS" section of
> thread-utils.h:
>
> #define pthread_key_t void *
> /*
>  * The destructor is not used in this case as the main thread will only
>  * exit when the program terminates.
>  */
> #define pthread_key_create(key_ptr, unused) return_0((*key_ptr) = NULL)
> #define pthread_setspecific(key, value) return_0((key) = (value))
> #define pthread_getspecific(key) (key)
> #define pthread_key_delete(key) return_0(NULL)
>
> static inline int return_0(void *unused)
> {
>         return 0;
> }
>
> That's the general idea, but we might as well define a `struct
> dummy_pthread_key_t` instead of defining the key directly as `void *`
> (and have functions instead of macros). This way we could store, e.g.,
> an "initialized" flag that could be used to return an error code on
> double-initializations.
>
> What do you think?
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-06-26 21:54         ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
  2020-06-29 15:11           ` Johannes Schindelin
@ 2020-07-16 11:29           ` Johannes Schindelin
  2020-07-18  3:09             ` Matheus Tavares Bernardino
  2020-07-18  3:52             ` Matheus Tavares
  1 sibling, 2 replies; 25+ messages in thread
From: Johannes Schindelin @ 2020-07-16 11:29 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, sandals, j6t, jonathantanmy, peff, christian.couder,
	Fredrik Kuivinen

Hi Matheus,

On Fri, 26 Jun 2020, Matheus Tavares wrote:

> @@ -136,12 +137,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid)
>  	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
>  }
>
> +struct hexbuf_array {
> +	int idx;
> +	char bufs[4][GIT_MAX_HEXSZ + 1];
> +};
> +
> +#ifdef HAVE_THREADS
> +static pthread_key_t hexbuf_array_key;
> +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
> +
> +static void init_hexbuf_array_key(void)
> +{
> +	if (pthread_key_create(&hexbuf_array_key, free))
> +		die(_("failed to initialize threads' key for hash to hex conversion"));
> +}
> +
> +#else
> +static struct hexbuf_array default_hexbuf_array;
> +#endif
> +
>  char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
>  {
> -	static int bufno;
> -	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
> -	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
> -	return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
> +	struct hexbuf_array *ha;
> +
> +#ifdef HAVE_THREADS
> +	void *value;
> +
> +	if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> +		die(_("failed to initialize threads' key for hash to hex conversion"));
> +
> +	value = pthread_getspecific(hexbuf_array_key);
> +	if (value) {
> +		ha = (struct hexbuf_array *) value;
> +	} else {
> +		ha = xmalloc(sizeof(*ha));

I just realized (while trying to debug something independent) that this
leaves `ha->idx` uninitialized. So you will need at least this patch to
fix a bug that currently haunts `seen`'s CI builds (you can use
`--valgrind`, like I did, to identify the problem):

-- snip --
diff --git a/hex.c b/hex.c
index 4f2f163d5e7..365ba94ab11 100644
--- a/hex.c
+++ b/hex.c
@@ -171,6 +171,7 @@ char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *a
 		ha = (struct hexbuf_array *) value;
 	} else {
 		ha = xmalloc(sizeof(*ha));
+		ha->idx = 0;
 		if (pthread_setspecific(hexbuf_array_key, (void *)ha))
 			die(_("failed to set thread buffer for hash to hex conversion"));
 	}
-- snap --

But as I mentioned before, I would be much more in favor of abandoning
this thread-local idea (because it is _still_ fragile, as the same thread
could try to make use of more than four hex values in the same `printf()`,
for example) and instead using Coccinelle to convert all those
`oid_to_hex()` calls to `oid_to_hex_r()` calls.

Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think
this here semantic patch should get you going:

-- snipsnap --
@@
expression E;
@@
  {
++   char hex[GIT_MAX_HEXSZ + 1];
     ...
-    oid_to_hex(E)
+    oid_to_hex_r(hex, E)
     ...
  }

@@
expression E1, E2;
@@
  {
++   char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1];
     ...
-    oid_to_hex(E1)
+    oid_to_hex_r(hex1, E1)
     ...
-    oid_to_hex(E2)
+    oid_to_hex_r(hex2, E2)
     ...
  }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-07-16 11:29           ` Johannes Schindelin
@ 2020-07-18  3:09             ` Matheus Tavares Bernardino
  2020-08-10 14:15               ` Johannes Schindelin
  2020-07-18  3:52             ` Matheus Tavares
  1 sibling, 1 reply; 25+ messages in thread
From: Matheus Tavares Bernardino @ 2020-07-18  3:09 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Christian Couder, Fredrik Kuivinen

Hi, Dscho

On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> On Fri, 26 Jun 2020, Matheus Tavares wrote:
>
> > +     value = pthread_getspecific(hexbuf_array_key);
> > +     if (value) {
> > +             ha = (struct hexbuf_array *) value;
> > +     } else {
> > +             ha = xmalloc(sizeof(*ha));
>
> I just realized (while trying to debug something independent) that this
> leaves `ha->idx` uninitialized.

Thanks for catching that! I fixed it in my local branch.

> But as I mentioned before, I would be much more in favor of abandoning
> this thread-local idea (because it is _still_ fragile, as the same thread
> could try to make use of more than four hex values in the same `printf()`,
> for example) and instead using Coccinelle to convert all those
> `oid_to_hex()` calls to `oid_to_hex_r()` calls.

Yeah, I agree that removing oid_to_hex() in favor of oid_to_hex_r()
would be great. Unfortunately, I only used Coccinelle for basic
things, such as function renaming. And I won't have the time to study
it further at the moment :( Therefore, I think I'll ask Junio to drop
this series for now, until I or someone else finds some time to work
on the semantic patch.

Alternatively, if using thread-local storage is still an option, I
think I might have solved the problems we had in the previous
iteration with memory leaks on Windows. I changed our
pthread_key_create() emulation to start using the destructor callback
on Windows, through the Fiber Local Storage (FLS) API. As the
documentation says [1] "If no fiber switching occurs, FLS acts exactly
the same as thread local storage". The advantage over TLS is that
FLSAlloc() does take a callback parameter.

I also removed the ugly `#ifdef HAVE_THREADS` guards on the last
patch, as you suggested, and added some tests for our pthread_key
emulation. In case you want to take a look to see if it might be worth
pursuing this route, the patches are here:
https://github.com/matheustavares/git/commits/safe_oid_to_hex_v3

[1]: https://docs.microsoft.com/en-us/windows/win32/procthread/fibers#fiber-local-storage

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-07-16 11:29           ` Johannes Schindelin
  2020-07-18  3:09             ` Matheus Tavares Bernardino
@ 2020-07-18  3:52             ` Matheus Tavares
  2020-07-26 17:41               ` René Scharfe
  1 sibling, 1 reply; 25+ messages in thread
From: Matheus Tavares @ 2020-07-18  3:52 UTC (permalink / raw)
  To: johannes.schindelin
  Cc: christian.couder, frekui, git, j6t, jonathantanmy,
	matheus.bernardino, peff, sandals

On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think
> this here semantic patch should get you going:
>
> -- snipsnap --
> @@
> expression E;
> @@
>   {
> ++   char hex[GIT_MAX_HEXSZ + 1];
>      ...
> -    oid_to_hex(E)
> +    oid_to_hex_r(hex, E)
>      ...
>   }
>
> @@
> expression E1, E2;
> @@
>   {
> ++   char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1];
>      ...
> -    oid_to_hex(E1)
> +    oid_to_hex_r(hex1, E1)
>      ...
> -    oid_to_hex(E2)
> +    oid_to_hex_r(hex2, E2)
>      ...
>   }

Thanks for this nice example! This already worked very well in some of
my tests :)

However, with my _very_ limited notion of Coccinelle, I didn't
understand why some code snippets didn't match the above rules. For
example, the structure below:

func(...)
{
	if (cond)
		func2("%s", oid_to_hex(a));
}

I thought it could be because the `if` statement is missing the curly
brackets (and it does work if I add the brackets), but to my surprise,
adding another oid_to_hex() call in an `else` case also made the code
match the rule:

func(...)
{
	if (cond)
		func2("%s", oid_to_hex(a));
	else
		func2("%s", oid_to_hex(a));
}

The following snippet also correctly matches, but spatch introduces only
one `hex` variable:

	if (cond)
		func2("%s, %s", oid_to_hex(a), oid_to_hex(b));
	else
		func2("%s", oid_to_hex(a));

We will probably want our semantic rules to handle an arbitrary number
of `oid_to_hex()` calls in each function, but in scenarios like the
above one, we only really need 2 hex buffers despite having 3 calls...
That might be a little tricky, I guess.

Another thing that might be tricky in this conversion is checking for
name conflicts with the added `hex` variable (but maybe Coccinelle
already has a facilitator mechanism for such cases? IDK).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-07-18  3:52             ` Matheus Tavares
@ 2020-07-26 17:41               ` René Scharfe
  0 siblings, 0 replies; 25+ messages in thread
From: René Scharfe @ 2020-07-26 17:41 UTC (permalink / raw)
  To: Matheus Tavares, johannes.schindelin
  Cc: christian.couder, frekui, git, j6t, jonathantanmy, peff, sandals

Am 18.07.20 um 05:52 schrieb Matheus Tavares:
> On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>>
>> Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think
>> this here semantic patch should get you going:
>>
>> -- snipsnap --
>> @@
>> expression E;
>> @@
>>   {
>> ++   char hex[GIT_MAX_HEXSZ + 1];
>>      ...
>> -    oid_to_hex(E)
>> +    oid_to_hex_r(hex, E)
>>      ...
>>   }
>>
>> @@
>> expression E1, E2;
>> @@
>>   {
>> ++   char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1];
>>      ...
>> -    oid_to_hex(E1)
>> +    oid_to_hex_r(hex1, E1)
>>      ...
>> -    oid_to_hex(E2)
>> +    oid_to_hex_r(hex2, E2)
>>      ...
>>   }
>
> Thanks for this nice example! This already worked very well in some of
> my tests :)
>
> However, with my _very_ limited notion of Coccinelle, I didn't
> understand why some code snippets didn't match the above rules. For
> example, the structure below:
>
> func(...)
> {
> 	if (cond)
> 		func2("%s", oid_to_hex(a));
> }
>
> I thought it could be because the `if` statement is missing the curly
> brackets (and it does work if I add the brackets), but to my surprise,
> adding another oid_to_hex() call in an `else` case also made the code
> match the rule:
>
> func(...)
> {
> 	if (cond)
> 		func2("%s", oid_to_hex(a));
> 	else
> 		func2("%s", oid_to_hex(a));
> }
>
> The following snippet also correctly matches, but spatch introduces only
> one `hex` variable:
>
> 	if (cond)
> 		func2("%s, %s", oid_to_hex(a), oid_to_hex(b));
> 	else
> 		func2("%s", oid_to_hex(a));
>

The produced diff looks like this:

+	char hex[GIT_MAX_HEXSZ + 1];
 	if (cond)
-		func2("%s, %s", oid_to_hex(a), oid_to_hex(b));
+		func2("%s, %s", oid_to_hex_r(hex, a), oid_to_hex_r(hex, b));
 	else
-		func2("%s", oid_to_hex(a));
+		func2("%s", oid_to_hex_r(hex, a));

So the first rule was applied (the one for a single oid_to_hex call),
but we need the second one (for two oid_to_hex calls).  Using the same
hex buffer for two different values won't work.

> We will probably want our semantic rules to handle an arbitrary number
> of `oid_to_hex()` calls in each function, but in scenarios like the
> above one, we only really need 2 hex buffers despite having 3 calls...

oid_to_hex() has two interesting features, and we need to make sure they
are preserved for callers that use them: It has a ring of four buffers,
so you can use it for four different values, and those buffers are
static, so its results can be passed around arbitrarily.

> That might be a little tricky, I guess.

It may be impossible to cover all cases.  E.g. callers of oid_to_hex()
could return that buffer (like e.g. diff_aligned_abbrev()) or save
them in some global variable (like e.g. string_list_append() with a
non-DUP string_list).  But safe conversion rules can got us surprisingly
far.

> Another thing that might be tricky in this conversion is checking for
> name conflicts with the added `hex` variable (but maybe Coccinelle
> already has a facilitator mechanism for such cases? IDK).

That's easy.  There exists no hex_buffer, yet.  We can just claim that
name for automatic conversions.  It's a bit too long for people to
type out (we have a few variables named hexbuffer, though), but not
crazy long.

So what is safe?  Calls of oid_to_hex() in the argument list of many
functions is.  E.g. puts() and printf() just consume the string that
is passed to them, and they don't store it anywhere.  That means no
static storage is needed for those.

string_list_append() is only safe if it's the duplicating variant.
Since this is a runtime option of the underlying string_list this is
hard to prove in a Coccinelle rule.  The time is better spent
converting them manually, I think.

And function calls with more than four oid_to_hex() calls are broken
already, so we only need to have rules for up to four of them.  Here
is the simplest rule I can come up with for handling up to four
oid_to_hex() calls:

@@
identifier fn =~ "^(.*printf.*|puts)$";
@@
(
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer2,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer3,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer4,
      ...
    ), ...
  )
|
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer2,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer3,
      ...
    ), ...
  )
|
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer2,
      ...
    ), ...
  )
|
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...
  )
)

So the sub-rules are ordered from matching all four possible
oid_to_hex() calls down to a single one.  Only safe consumers are
matched.  That regex can and should be extended.

Having a list of good consumers has the nice side-effect of speeding up
the diff generation.  It still takes a few minutes on my system, though.

We still need to declare of the local buffers.  We can add them on
demand to each function with rules like this:

@avoid_duplicates@
identifier buf =~ "^hex_buffer[2-4]?$";
@@
- char buf[GIT_MAX_HEXSZ + 1];

@hex_buffer4_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer4[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer4 ...+>
  }

@hex_buffer3_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer3[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer3 ...+>
  }

@hex_buffer2_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer2[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer2 ...+>
  }

@hex_buffer_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer ...+>
  }

Why remove them first?  To avoid duplicates when other convertible
oid_to_hex() calls are added later and the semantic patch applied
again.

Why declare the buffers with function scope?  To avoid shadowing.

Why are the rules reversed?  They add declarations at the top, so
hex_buffer to hex_buffer4 end up being declared in that order in
the resulting file.

Why not use the "fresh identifier" feature of Coccinelle to generate
an unused name each time?  I don't know how to integrate that into
the 4/3/2/1 rule above without having to repeat the list of safe
consumers or declaring unused buffers.  And reusing buffers between
safe consumers is fine.

René

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
  2020-07-18  3:09             ` Matheus Tavares Bernardino
@ 2020-08-10 14:15               ` Johannes Schindelin
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Schindelin @ 2020-08-10 14:15 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King,
	Christian Couder, Fredrik Kuivinen

Hi Matheus,

On Sat, 18 Jul 2020, Matheus Tavares Bernardino wrote:

> On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> Alternatively, if using thread-local storage is still an option, I
> think I might have solved the problems we had in the previous
> iteration with memory leaks on Windows. I changed our
> pthread_key_create() emulation to start using the destructor callback
> on Windows, through the Fiber Local Storage (FLS) API. As the
> documentation says [1] "If no fiber switching occurs, FLS acts exactly
> the same as thread local storage". The advantage over TLS is that
> FLSAlloc() does take a callback parameter.

Okay, but I am still not so enthusiastic. We can fix this in a much
simpler way, I believe, than introducing the first thread-local storage
user. Let's leave TLS until the time we actually need it?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-08-10 14:15 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 22:29 [RFC] Thread safety in some low-level functions Matheus Tavares Bernardino
2020-06-24 22:52 ` Matheus Tavares Bernardino
2020-06-25  1:38   ` brian m. carlson
2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
2020-06-25 20:32       ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
2020-06-26  5:45         ` Christian Couder
2020-06-26 14:13           ` Matheus Tavares Bernardino
2020-06-25 20:32       ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
2020-06-25 23:07         ` brian m. carlson
2020-06-25 23:54           ` Matheus Tavares Bernardino
2020-06-26  0:00             ` Matheus Tavares Bernardino
2020-06-26  6:02         ` Christian Couder
2020-06-26  8:22       ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder
2020-06-26 16:22         ` Matheus Tavares Bernardino
2020-06-26 21:54       ` [PATCH v2 " Matheus Tavares
2020-06-26 21:54         ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
2020-06-26 21:54         ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
2020-06-29 15:11           ` Johannes Schindelin
2020-06-30 20:37             ` Matheus Tavares Bernardino
2020-07-01 11:32               ` Johannes Schindelin
2020-07-16 11:29           ` Johannes Schindelin
2020-07-18  3:09             ` Matheus Tavares Bernardino
2020-08-10 14:15               ` Johannes Schindelin
2020-07-18  3:52             ` Matheus Tavares
2020-07-26 17:41               ` René Scharfe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).