git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] commit-graph: handle file descriptor exhaustion
@ 2020-04-23 21:40 Taylor Blau
  2020-04-23 21:41 ` [PATCH 1/4] commit-graph.c: don't use discarded graph_name in error Taylor Blau
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Taylor Blau @ 2020-04-23 21:40 UTC (permalink / raw)
  To: git; +Cc: peff, szeder.dev, dstolee, gitster

Hi,

This is another series to handle an issue we started seeing at GitHub on
repositories that have too many commit-graph layers.

This issue evaded me for several days, but the gist is that having too
many descriptors open caused us to fail 'git_mkstemp_mode', which in
turn left some values in 'ctx->commit_graph_filenames_after'
uninitialized. This in turn caused a general protection fault when
trying to free these uninitialized entries. Frustratingly, it causes
stray 'commit-graph-chain.lock' files to stick around even after the
process has gone away, causing all subsequent commit-graph write
operations to fail.

The first two patches are semi-related cleanups and other preparation.
The third patch fixes the main issue described here. Peff contributed
the fourth patch which teaches the commit-graph machinery to be more
willing to close file descriptors.

These patches are very much owed to the help that Stolee and Peff
provided while writing them and for their willingness to let me bounce
some ideas off of them.

Thanks in advance for your review.

Jeff King (1):
  commit-graph: close descriptors after mmap

Taylor Blau (3):
  commit-graph.c: don't use discarded graph_name in error
  t/test-lib.sh: make ULIMIT_FILE_DESCRIPTORS available to tests
  commit-graph.c: gracefully handle file descriptor exhaustion

 commit-graph.c                | 21 ++++++++-------------
 commit-graph.h                |  3 +--
 fuzz-commit-graph.c           |  5 ++---
 t/t1400-update-ref.sh         |  9 ---------
 t/t5324-split-commit-graph.sh | 13 +++++++++++++
 t/test-lib.sh                 |  9 +++++++++
 6 files changed, 33 insertions(+), 27 deletions(-)

--
2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] commit-graph.c: don't use discarded graph_name in error
  2020-04-23 21:40 [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
@ 2020-04-23 21:41 ` Taylor Blau
  2020-04-23 21:41 ` [PATCH 2/4] t/test-lib.sh: make ULIMIT_FILE_DESCRIPTORS available to tests Taylor Blau
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2020-04-23 21:41 UTC (permalink / raw)
  To: git; +Cc: peff, szeder.dev, dstolee, gitster

When writing a commit-graph layer, we do so in a temporary file which is
renamed into place. If we fail to create a temporary file, for e.g.,
because we have too many open files, then 'git_mkstemp_mode' sets the
pattern to the empty string, in which case we get an error something
along the lines of:

  error: unable to create ''

It's not useful to show the pattern here at all, since we (1) know the
pattern is well-formed, and (2) would have already shown the dirname
when trying to create the leading directories. So, replace this error
with something friendlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index b8737f0ce9..afde075962 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1397,7 +1397,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 
 		fd = git_mkstemp_mode(ctx->graph_name, 0444);
 		if (fd < 0) {
-			error(_("unable to create '%s'"), ctx->graph_name);
+			error(_("unable to create temporary graph layer"));
 			return -1;
 		}
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] t/test-lib.sh: make ULIMIT_FILE_DESCRIPTORS available to tests
  2020-04-23 21:40 [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
  2020-04-23 21:41 ` [PATCH 1/4] commit-graph.c: don't use discarded graph_name in error Taylor Blau
@ 2020-04-23 21:41 ` Taylor Blau
  2020-04-23 21:41 ` [PATCH 3/4] commit-graph.c: gracefully handle file descriptor exhaustion Taylor Blau
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2020-04-23 21:41 UTC (permalink / raw)
  To: git; +Cc: peff, szeder.dev, dstolee, gitster

In t1400 the prerequisite 'ULIMIT_FILE_DESCRIPTORS' is defined and used
to effectively guard the helper function 'run_with_limited_open_files'
from being used on systems that do not satisfy this prerequisite.

In the subsequent patch, we will introduce another test outside of t1400
that would benefit from using this prerequisite. So, move it to
'test-lib.sh' instead so that it can be used by multiple tests.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t1400-update-ref.sh | 9 ---------
 t/test-lib.sh         | 9 +++++++++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index a6224ef65f..1e7428a379 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -1354,15 +1354,6 @@ test_expect_success 'fails with duplicate ref update via symref' '
 	test_cmp expect actual
 '
 
-run_with_limited_open_files () {
-	(ulimit -n 32 && "$@")
-}
-
-test_lazy_prereq ULIMIT_FILE_DESCRIPTORS '
-	test_have_prereq !MINGW,!CYGWIN &&
-	run_with_limited_open_files true
-'
-
 test_expect_success ULIMIT_FILE_DESCRIPTORS 'large transaction creating branches does not burst open file limit' '
 (
 	for i in $(test_seq 33)
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 9fe390bd5a..03bf2a4b93 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1618,6 +1618,15 @@ test_lazy_prereq ULIMIT_STACK_SIZE '
 	run_with_limited_stack true
 '
 
+run_with_limited_open_files () {
+	(ulimit -n 32 && "$@")
+}
+
+test_lazy_prereq ULIMIT_FILE_DESCRIPTORS '
+	test_have_prereq !MINGW,!CYGWIN &&
+	run_with_limited_open_files true
+'
+
 build_option () {
 	git version --build-options |
 	sed -ne "s/^$1: //p"
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] commit-graph.c: gracefully handle file descriptor exhaustion
  2020-04-23 21:40 [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
  2020-04-23 21:41 ` [PATCH 1/4] commit-graph.c: don't use discarded graph_name in error Taylor Blau
  2020-04-23 21:41 ` [PATCH 2/4] t/test-lib.sh: make ULIMIT_FILE_DESCRIPTORS available to tests Taylor Blau
@ 2020-04-23 21:41 ` Taylor Blau
  2021-06-24  9:51   ` t5324-split-commit-graph.sh flaky due to assumptions about ulimit behavior Ævar Arnfjörð Bjarmason
  2020-04-23 21:41 ` [PATCH 4/4] commit-graph: close descriptors after mmap Taylor Blau
  2020-04-23 21:43 ` [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
  4 siblings, 1 reply; 14+ messages in thread
From: Taylor Blau @ 2020-04-23 21:41 UTC (permalink / raw)
  To: git; +Cc: peff, szeder.dev, dstolee, gitster

When writing a layered commit-graph, the commit-graph machinery uses
'commit_graph_filenames_after' and 'commit_graph_hash_after' to keep
track of the layers in the chain that we are in the process of writing.

When the number of commit-graph layers shrinks, we initialize all
entries in the aforementioned arrays, because we know the structure of
the new commit-graph chain immediately (since there are no new layers,
there are no unknown hash values).

But when the number of commit-graph layers grows (i.e., that
'num_commit_graphs_after > num_commit_graphs_before'), then we leave
some entries in the filenames and hashes arrays as uninitialized,
because we will fill them in later as those values become available.

For instance, we rely on 'write_commit_graph_file's to store the
filename and hash of the last layer in the new chain, which is the one
that it is responsible for writing. But, it's possible that
'write_commit_graph_file' may fail, e.g., from file descriptor
exhaustion. In this case it is possible that 'git_mkstemp_mode' will
fail, and that function will return early *before* setting the values
for the last commit-graph layer's filename and hash.

This causes a number of upleasant side-effects. For instance, trying to
'free()' each entry in 'ctx->commit_graph_filenames_after' (and
similarly for the hashes array) causes us to 'free()' uninitialized
memory, since the area is allocated with 'malloc()' and is therefore
subject to contain garbage (which is left alone when
'write_commit_graph_file' returns early).

This can manifest in other issues, like a general protection fault,
and/or leaving a stray 'commit-graph-chain.lock' around after the
process dies. (The reasoning for this is still a mystery to me, since
we'd otherwise usually expect the kernel to run tempfile.c's 'atexit()'
handlers in the case of a normal death...)

To resolve this, initialize the memory with 'CALLOC_ARRAY' so that
uninitialized entries are filled with zeros, and can thus be 'free()'d
as a noop instead of causing a fault.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c                |  4 ++--
 t/t5324-split-commit-graph.sh | 13 +++++++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index afde075962..b2d2fdfe3d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1602,8 +1602,8 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 		free(old_graph_name);
 	}
 
-	ALLOC_ARRAY(ctx->commit_graph_filenames_after, ctx->num_commit_graphs_after);
-	ALLOC_ARRAY(ctx->commit_graph_hash_after, ctx->num_commit_graphs_after);
+	CALLOC_ARRAY(ctx->commit_graph_filenames_after, ctx->num_commit_graphs_after);
+	CALLOC_ARRAY(ctx->commit_graph_hash_after, ctx->num_commit_graphs_after);
 
 	for (i = 0; i < ctx->num_commit_graphs_after &&
 		    i < ctx->num_commit_graphs_before; i++)
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index e5d8d64170..0d1db31b0a 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -381,4 +381,17 @@ test_expect_success '--split=replace replaces the chain' '
 	graph_read_expect 2
 '
 
+test_expect_success ULIMIT_FILE_DESCRIPTORS 'handles file descriptor exhaustion' '
+	git init ulimit &&
+	(
+		cd ulimit &&
+		for i in $(test_seq 64)
+		do
+			test_commit $i &&
+			test_might_fail run_with_limited_open_files git commit-graph write \
+				--split=no-merge --reachable || return 1
+		done
+	)
+'
+
 test_done
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] commit-graph: close descriptors after mmap
  2020-04-23 21:40 [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
                   ` (2 preceding siblings ...)
  2020-04-23 21:41 ` [PATCH 3/4] commit-graph.c: gracefully handle file descriptor exhaustion Taylor Blau
@ 2020-04-23 21:41 ` Taylor Blau
  2020-04-23 22:04   ` Junio C Hamano
  2020-04-23 21:43 ` [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
  4 siblings, 1 reply; 14+ messages in thread
From: Taylor Blau @ 2020-04-23 21:41 UTC (permalink / raw)
  To: git; +Cc: peff, szeder.dev, dstolee, gitster

From: Jeff King <peff@peff.net>

We don't ever refer to the descriptor after mmap-ing it. And keeping it
open means we can run out of descriptors in degenerate cases (e.g.,
thousands of split chain files). Let's close it as soon as possible.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c      | 15 +++++----------
 commit-graph.h      |  3 +--
 fuzz-commit-graph.c |  5 ++---
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b2d2fdfe3d..e9b458539f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -69,7 +69,6 @@ static uint8_t oid_version(void)
 static struct commit_graph *alloc_commit_graph(void)
 {
 	struct commit_graph *g = xcalloc(1, sizeof(*g));
-	g->graph_fd = -1;
 
 	return g;
 }
@@ -123,14 +122,13 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
 		return NULL;
 	}
 	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
-	ret = parse_commit_graph(graph_map, fd, graph_size);
+	close(fd);
+	ret = parse_commit_graph(graph_map, graph_size);
 
 	if (ret)
 		ret->odb = odb;
-	else {
+	else
 		munmap(graph_map, graph_size);
-		close(fd);
-	}
 
 	return ret;
 }
@@ -165,8 +163,7 @@ static int verify_commit_graph_lite(struct commit_graph *g)
 	return 0;
 }
 
-struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-					size_t graph_size)
+struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size)
 {
 	const unsigned char *data, *chunk_lookup;
 	uint32_t i;
@@ -209,7 +206,6 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 
 	graph->hash_len = the_hash_algo->rawsz;
 	graph->num_chunks = *(unsigned char*)(data + 6);
-	graph->graph_fd = fd;
 	graph->data = graph_map;
 	graph->data_len = graph_size;
 
@@ -2125,10 +2121,9 @@ void free_commit_graph(struct commit_graph *g)
 {
 	if (!g)
 		return;
-	if (g->graph_fd >= 0) {
+	if (g->data) {
 		munmap((void *)g->data, g->data_len);
 		g->data = NULL;
-		close(g->graph_fd);
 	}
 	free(g->filename);
 	free(g);
diff --git a/commit-graph.h b/commit-graph.h
index 98ef121924..a0a2c4a1e5 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -66,8 +66,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
 						 struct object_directory *odb);
 struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
-struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-					size_t graph_size);
+struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
 
 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index 0157acbf2e..9fd1c04edd 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -1,8 +1,7 @@
 #include "commit-graph.h"
 #include "repository.h"
 
-struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-					size_t graph_size);
+struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
 
 int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
 
@@ -11,7 +10,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;
 
 	initialize_the_repository();
-	g = parse_commit_graph((void *)data, -1, size);
+	g = parse_commit_graph((void *)data, size);
 	repo_clear(the_repository);
 	free(g);
 
-- 
2.26.0.113.ge9739cdccc

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] commit-graph: handle file descriptor exhaustion
  2020-04-23 21:40 [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
                   ` (3 preceding siblings ...)
  2020-04-23 21:41 ` [PATCH 4/4] commit-graph: close descriptors after mmap Taylor Blau
@ 2020-04-23 21:43 ` Taylor Blau
  4 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2020-04-23 21:43 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, szeder.dev, dstolee, gitster

...I should mention that this is based on my series in [1], since it's
much easier to express the new test in t5324 with '--split=no-merge'.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/cover.1586836700.git.me@ttaylorr.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] commit-graph: close descriptors after mmap
  2020-04-23 21:41 ` [PATCH 4/4] commit-graph: close descriptors after mmap Taylor Blau
@ 2020-04-23 22:04   ` Junio C Hamano
  2020-04-24  3:56     ` Jeff King
  2020-04-24 13:17     ` Derrick Stolee
  0 siblings, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2020-04-23 22:04 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, szeder.dev, dstolee

Taylor Blau <me@ttaylorr.com> writes:

> From: Jeff King <peff@peff.net>
>
> We don't ever refer to the descriptor after mmap-ing it. And keeping it
> open means we can run out of descriptors in degenerate cases (e.g.,
> thousands of split chain files). Let's close it as soon as possible.

Yikes.  

Sorry, I should have looked at the use of mmap in this topioc more
carefully when we queued the series.  It is an easy mistake to make
by anybody new to the API to leave it open while the region is in
use.

With this fix, with or without the other topics still in flight, I
do not think no code touches graph_fd.  Should we remove the
graph_fd field from the structure as well?




 commit-graph.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/commit-graph.h b/commit-graph.h
index a0a2c4a1e5..1254eae948 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -40,8 +40,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r,
 				      const struct commit *c);
 
 struct commit_graph {
-	int graph_fd;
-
 	const unsigned char *data;
 	size_t data_len;
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] commit-graph: close descriptors after mmap
  2020-04-23 22:04   ` Junio C Hamano
@ 2020-04-24  3:56     ` Jeff King
  2020-04-24 13:17     ` Derrick Stolee
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff King @ 2020-04-24  3:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, szeder.dev, dstolee

On Thu, Apr 23, 2020 at 03:04:19PM -0700, Junio C Hamano wrote:

> With this fix, with or without the other topics still in flight, I
> do not think no code touches graph_fd.  Should we remove the
> graph_fd field from the structure as well?

Oops, I thought I did.

> diff --git a/commit-graph.h b/commit-graph.h
> index a0a2c4a1e5..1254eae948 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -40,8 +40,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r,
>  				      const struct commit *c);
>  
>  struct commit_graph {
> -	int graph_fd;
> -

Yes, this should definitely be squashed in.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] commit-graph: close descriptors after mmap
  2020-04-23 22:04   ` Junio C Hamano
  2020-04-24  3:56     ` Jeff King
@ 2020-04-24 13:17     ` Derrick Stolee
  2020-04-24 16:35       ` Taylor Blau
  2020-04-24 20:02       ` Junio C Hamano
  1 sibling, 2 replies; 14+ messages in thread
From: Derrick Stolee @ 2020-04-24 13:17 UTC (permalink / raw)
  To: Junio C Hamano, Taylor Blau; +Cc: git, peff, szeder.dev, dstolee

On 4/23/2020 6:04 PM, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
>> From: Jeff King <peff@peff.net>
>>
>> We don't ever refer to the descriptor after mmap-ing it. And keeping it
>> open means we can run out of descriptors in degenerate cases (e.g.,
>> thousands of split chain files). Let's close it as soon as possible.
> 
> Yikes.  
> 
> Sorry, I should have looked at the use of mmap in this topioc more
> carefully when we queued the series.  It is an easy mistake to make
> by anybody new to the API to leave it open while the region is in
> use.

You are right. I was new when first contributing the commit-graph. It
was also easier to miss because we only had one commit-graph open at
the time. Adding in the incremental file format led to multiple file
descriptors being open.

However, this idea of closing a descriptor after an mmap is new to
me. So I thought about other situations where I made the same mistake.
Please see the patch below.

> With this fix, with or without the other topics still in flight, I
> do not think no code touches graph_fd.  Should we remove the
> graph_fd field from the structure as well?

I agree that this should be done.

Thanks,
-Stolee

-->8--

From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 24 Apr 2020 13:11:13 +0000
Subject: [PATCH] multi-pack-index: close file descriptor after mmap

We recently discovered that the commit-graph was not closing its
file descriptor after memory-mapping the file contents. After this
mmap() succeeds, there is no need to keep the file descriptor open.
In fact, there is signficant reason to close it so we do not run
out of descriptors.

This is entirely my fault for not knowing that we can have an open
mmap while also closing the descriptor. Some could blame this on
being a new contributor when the series was first submitted, but
even years later this is still new information to me.

That made me realize that I used the same pattern when opening a
multi-pack-index. Since this file is not (yet) incremental, there
will be at most one descriptor open for this reason. It is still
worth fixing, especially if we extend the format to be incremental
like the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c | 4 +---
 midx.h | 2 --
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index 1527e464a7..60d30e873b 100644
--- a/midx.c
+++ b/midx.c
@@ -72,9 +72,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	FREE_AND_NULL(midx_name);
 
 	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
 
 	FLEX_ALLOC_STR(m, object_dir, object_dir);
-	m->fd = fd;
 	m->data = midx_map;
 	m->data_len = midx_size;
 	m->local = local;
@@ -190,8 +190,6 @@ void close_midx(struct multi_pack_index *m)
 		return;
 
 	munmap((unsigned char *)m->data, m->data_len);
-	close(m->fd);
-	m->fd = -1;
 
 	for (i = 0; i < m->num_packs; i++) {
 		if (m->packs[i])
diff --git a/midx.h b/midx.h
index e6fa356b5c..b18cf53bc4 100644
--- a/midx.h
+++ b/midx.h
@@ -12,8 +12,6 @@ struct repository;
 struct multi_pack_index {
 	struct multi_pack_index *next;
 
-	int fd;
-
 	const unsigned char *data;
 	size_t data_len;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] commit-graph: close descriptors after mmap
  2020-04-24 13:17     ` Derrick Stolee
@ 2020-04-24 16:35       ` Taylor Blau
  2020-04-24 20:02       ` Junio C Hamano
  1 sibling, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2020-04-24 16:35 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Taylor Blau, git, peff, szeder.dev, dstolee

On Fri, Apr 24, 2020 at 09:17:16AM -0400, Derrick Stolee wrote:
> On 4/23/2020 6:04 PM, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> >> From: Jeff King <peff@peff.net>
> >>
> >> We don't ever refer to the descriptor after mmap-ing it. And keeping it
> >> open means we can run out of descriptors in degenerate cases (e.g.,
> >> thousands of split chain files). Let's close it as soon as possible.
> >
> > Yikes.
> >
> > Sorry, I should have looked at the use of mmap in this topioc more
> > carefully when we queued the series.  It is an easy mistake to make
> > by anybody new to the API to leave it open while the region is in
> > use.
>
> You are right. I was new when first contributing the commit-graph. It
> was also easier to miss because we only had one commit-graph open at
> the time. Adding in the incremental file format led to multiple file
> descriptors being open.
>
> However, this idea of closing a descriptor after an mmap is new to
> me. So I thought about other situations where I made the same mistake.
> Please see the patch below.

It's new to me, too :). If I had known it beforehand, then I would have
written the fourth patch here myself. But, I didn't, so I am grateful to
Peff for teaching me something new here.

> > With this fix, with or without the other topics still in flight, I
> > do not think no code touches graph_fd.  Should we remove the
> > graph_fd field from the structure as well?
>
> I agree that this should be done.
>
> Thanks,
> -Stolee
>
> -->8--

For what it's worth, this didn't apply quite right with 'git am -3 -c',
since it didn't seem to recognize that this was your scissors line. If I
edit your mail myself by replacing this line with '-- >8 --', then 'git
am' applies it just fine.

> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 24 Apr 2020 13:11:13 +0000
> Subject: [PATCH] multi-pack-index: close file descriptor after mmap
>
> We recently discovered that the commit-graph was not closing its
> file descriptor after memory-mapping the file contents. After this
> mmap() succeeds, there is no need to keep the file descriptor open.
> In fact, there is signficant reason to close it so we do not run
> out of descriptors.
>
> This is entirely my fault for not knowing that we can have an open
> mmap while also closing the descriptor. Some could blame this on
> being a new contributor when the series was first submitted, but
> even years later this is still new information to me.
>
> That made me realize that I used the same pattern when opening a
> multi-pack-index. Since this file is not (yet) incremental, there
> will be at most one descriptor open for this reason. It is still
> worth fixing, especially if we extend the format to be incremental
> like the commit-graph.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  midx.c | 4 +---
>  midx.h | 2 --
>  2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 1527e464a7..60d30e873b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -72,9 +72,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
>  	FREE_AND_NULL(midx_name);
>
>  	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +	close(fd);

Right, we want to close as soon as we have mmaped the file.

>
>  	FLEX_ALLOC_STR(m, object_dir, object_dir);
> -	m->fd = fd;
>  	m->data = midx_map;
>  	m->data_len = midx_size;
>  	m->local = local;
> @@ -190,8 +190,6 @@ void close_midx(struct multi_pack_index *m)
>  		return;
>
>  	munmap((unsigned char *)m->data, m->data_len);
> -	close(m->fd);
> -	m->fd = -1;

...and not down here. Thanks.

>  	for (i = 0; i < m->num_packs; i++) {
>  		if (m->packs[i])
> diff --git a/midx.h b/midx.h
> index e6fa356b5c..b18cf53bc4 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -12,8 +12,6 @@ struct repository;
>  struct multi_pack_index {
>  	struct multi_pack_index *next;
>
> -	int fd;
> -

:). Even better!

>  	const unsigned char *data;
>  	size_t data_len;
>
> --
> 2.26.2

This looks great to me, and thanks for being proactive about the fix.

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] commit-graph: close descriptors after mmap
  2020-04-24 13:17     ` Derrick Stolee
  2020-04-24 16:35       ` Taylor Blau
@ 2020-04-24 20:02       ` Junio C Hamano
  2020-04-27 10:57         ` Derrick Stolee
  1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2020-04-24 20:02 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, peff, szeder.dev, dstolee

Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 24 Apr 2020 13:11:13 +0000
> Subject: [PATCH] multi-pack-index: close file descriptor after mmap
>
> We recently discovered that the commit-graph was not closing its
> file descriptor after memory-mapping the file contents. After this
> mmap() succeeds, there is no need to keep the file descriptor open.
> In fact, there is signficant reason to close it so we do not run
> out of descriptors.

The above is sufficient a justification.  Let's leave the remaining
two paragraphs under three-dashes.

> This is entirely my fault for not knowing that we can have an open
> mmap while also closing the descriptor. Some could blame this on
> being a new contributor when the series was first submitted, but
> even years later this is still new information to me.
>
> That made me realize that I used the same pattern when opening a
> multi-pack-index. Since this file is not (yet) incremental, there
> will be at most one descriptor open for this reason. It is still
> worth fixing, especially if we extend the format to be incremental
> like the commit-graph.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  midx.c | 4 +---
>  midx.h | 2 --
>  2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 1527e464a7..60d30e873b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -72,9 +72,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
>  	FREE_AND_NULL(midx_name);
>  
>  	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +	close(fd);
>  
>  	FLEX_ALLOC_STR(m, object_dir, object_dir);
> -	m->fd = fd;
>  	m->data = midx_map;
>  	m->data_len = midx_size;
>  	m->local = local;
> @@ -190,8 +190,6 @@ void close_midx(struct multi_pack_index *m)
>  		return;
>  
>  	munmap((unsigned char *)m->data, m->data_len);
> -	close(m->fd);
> -	m->fd = -1;
>  
>  	for (i = 0; i < m->num_packs; i++) {
>  		if (m->packs[i])
> diff --git a/midx.h b/midx.h
> index e6fa356b5c..b18cf53bc4 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -12,8 +12,6 @@ struct repository;
>  struct multi_pack_index {
>  	struct multi_pack_index *next;
>  
> -	int fd;
> -
>  	const unsigned char *data;
>  	size_t data_len;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] commit-graph: close descriptors after mmap
  2020-04-24 20:02       ` Junio C Hamano
@ 2020-04-27 10:57         ` Derrick Stolee
  0 siblings, 0 replies; 14+ messages in thread
From: Derrick Stolee @ 2020-04-27 10:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, szeder.dev, dstolee

On 4/24/2020 4:02 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>> Date: Fri, 24 Apr 2020 13:11:13 +0000
>> Subject: [PATCH] multi-pack-index: close file descriptor after mmap
>>
>> We recently discovered that the commit-graph was not closing its
>> file descriptor after memory-mapping the file contents. After this
>> mmap() succeeds, there is no need to keep the file descriptor open.
>> In fact, there is signficant reason to close it so we do not run
>> out of descriptors.
> 
> The above is sufficient a justification.  Let's leave the remaining
> two paragraphs under three-dashes.

Works for me! I also thought there were too many first-person pronouns,
but erred on the side of reporting "how did we figure out this was an
issue?" in the message.

-Stolee

^ permalink raw reply	[flat|nested] 14+ messages in thread

* t5324-split-commit-graph.sh flaky due to assumptions about ulimit behavior
  2020-04-23 21:41 ` [PATCH 3/4] commit-graph.c: gracefully handle file descriptor exhaustion Taylor Blau
@ 2021-06-24  9:51   ` Ævar Arnfjörð Bjarmason
  2021-06-24 15:52     ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24  9:51 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, szeder.dev, dstolee, gitster


On Thu, Apr 23 2020, Taylor Blau wrote:

> When writing a layered commit-graph, the commit-graph machinery uses
> 'commit_graph_filenames_after' and 'commit_graph_hash_after' to keep
> track of the layers in the chain that we are in the process of writing.
>
> When the number of commit-graph layers shrinks, we initialize all
> entries in the aforementioned arrays, because we know the structure of
> the new commit-graph chain immediately (since there are no new layers,
> there are no unknown hash values).
>
> But when the number of commit-graph layers grows (i.e., that
> 'num_commit_graphs_after > num_commit_graphs_before'), then we leave
> some entries in the filenames and hashes arrays as uninitialized,
> because we will fill them in later as those values become available.
>
> For instance, we rely on 'write_commit_graph_file's to store the
> filename and hash of the last layer in the new chain, which is the one
> that it is responsible for writing. But, it's possible that
> 'write_commit_graph_file' may fail, e.g., from file descriptor
> exhaustion. In this case it is possible that 'git_mkstemp_mode' will
> fail, and that function will return early *before* setting the values
> for the last commit-graph layer's filename and hash.
>
> This causes a number of upleasant side-effects. For instance, trying to
> 'free()' each entry in 'ctx->commit_graph_filenames_after' (and
> similarly for the hashes array) causes us to 'free()' uninitialized
> memory, since the area is allocated with 'malloc()' and is therefore
> subject to contain garbage (which is left alone when
> 'write_commit_graph_file' returns early).
>
> This can manifest in other issues, like a general protection fault,
> and/or leaving a stray 'commit-graph-chain.lock' around after the
> process dies. (The reasoning for this is still a mystery to me, since
> we'd otherwise usually expect the kernel to run tempfile.c's 'atexit()'
> handlers in the case of a normal death...)
>
> To resolve this, initialize the memory with 'CALLOC_ARRAY' so that
> uninitialized entries are filled with zeros, and can thus be 'free()'d
> as a noop instead of causing a fault.
> [...]
> diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
> index e5d8d64170..0d1db31b0a 100755
> --- a/t/t5324-split-commit-graph.sh
> +++ b/t/t5324-split-commit-graph.sh
> @@ -381,4 +381,17 @@ test_expect_success '--split=replace replaces the chain' '
>  	graph_read_expect 2
>  '
>  
> +test_expect_success ULIMIT_FILE_DESCRIPTORS 'handles file descriptor exhaustion' '
> +	git init ulimit &&
> +	(
> +		cd ulimit &&
> +		for i in $(test_seq 64)
> +		do
> +			test_commit $i &&
> +			test_might_fail run_with_limited_open_files git commit-graph write \
> +				--split=no-merge --reachable || return 1
> +		done
> +	)
> +'
> +
>  test_done

This test started failing for me, and now I can't reproduce it anymore,
but I reproduced enough of it to think it was odd that it hadn't failed
before.

I.e. here we do an "ulimit -n 32" and then run a command, which makes a
lot of assumptions about how git is compiled, starts up etc, a lot of
which are outside of our control and up to the OS. It's not 32 files we
open, but 32 everything. When I could reproduce this it was failing on
opening some libpcre file or other, so maybe I linked to one too many
libraries.

The one other test that uses this pattern seems like it could be
similarly affected, but I haven't had that fail: d415ad022d8
(update-ref: test handling large transactions properly, 2015-04-14)

Since I can't reproduce this anymore maybe I'm reporting a
nothingburger. I just wonder if this can really work reliably in the
general case, and whether a reliably version of this pattern doesn't
need to be one/some of:

 1. Some GIT_TEST_* mode that sets the (unportable) ulimit itself in the
    code, after we reach some point. This is how e.g. web-based REPLs
    often work, load all your known libraries, forbid any file openings
    (or just N number) after that.

 2. Ditto, but have the GIT_TEST_* print to stderr if we reach some
    "checkpoint", have the test only run under limit N if we can reach
    that point (i.e. binary search or brute force to find the exact N
    limit).

 3. Maybe we can assume this would work reliably in cases of a really
    high limit of N, i.e. the d415ad022d8 test doesn't do this, but my
    understanding of it is that we're trying to guard against having all
    loose refs opened at once. So if we create e.g. 2k refs and operate
    on them we can set the limit to "1999".

    That's still assuming the same things about ulimit that make/made
    this test flaky, but we can probably safely assume that just getting
    to "git <something>" being invoked won't open >1k files, but maybe
    not >32.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: t5324-split-commit-graph.sh flaky due to assumptions about ulimit behavior
  2021-06-24  9:51   ` t5324-split-commit-graph.sh flaky due to assumptions about ulimit behavior Ævar Arnfjörð Bjarmason
@ 2021-06-24 15:52     ` Jeff King
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2021-06-24 15:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, szeder.dev, dstolee, gitster

On Thu, Jun 24, 2021 at 11:51:09AM +0200, Ævar Arnfjörð Bjarmason wrote:

> I.e. here we do an "ulimit -n 32" and then run a command, which makes a
> lot of assumptions about how git is compiled, starts up etc, a lot of
> which are outside of our control and up to the OS. It's not 32 files we
> open, but 32 everything. When I could reproduce this it was failing on
> opening some libpcre file or other, so maybe I linked to one too many
> libraries.
> 
> The one other test that uses this pattern seems like it could be
> similarly affected, but I haven't had that fail: d415ad022d8
> (update-ref: test handling large transactions properly, 2015-04-14)
> 
> Since I can't reproduce this anymore maybe I'm reporting a
> nothingburger. I just wonder if this can really work reliably in the
> general case, and whether a reliably version of this pattern doesn't
> need to be one/some of:
> 
>  1. Some GIT_TEST_* mode that sets the (unportable) ulimit itself in the
>     code, after we reach some point. This is how e.g. web-based REPLs
>     often work, load all your known libraries, forbid any file openings
>     (or just N number) after that.
> 
>  2. Ditto, but have the GIT_TEST_* print to stderr if we reach some
>     "checkpoint", have the test only run under limit N if we can reach
>     that point (i.e. binary search or brute force to find the exact N
>     limit).
> 
>  3. Maybe we can assume this would work reliably in cases of a really
>     high limit of N, i.e. the d415ad022d8 test doesn't do this, but my
>     understanding of it is that we're trying to guard against having all
>     loose refs opened at once. So if we create e.g. 2k refs and operate
>     on them we can set the limit to "1999".
> 
>     That's still assuming the same things about ulimit that make/made
>     this test flaky, but we can probably safely assume that just getting
>     to "git <something>" being invoked won't open >1k files, but maybe
>     not >32.

Yes, we could probably just set the "32" a bit higher. Something like
"128" may be more conservative. I'd be hesitant to go as high as
something like 1999; system defaults are often much lower than that
anyway and may get rejected. We may have to adjust the tests to match
the idea of what's "a lot of descriptors". (Likewise, the "ulimit -s"
stuff has to make the same guess).

The prereq also tries running with the lower ulimit, but it only runs
"true". Perhaps running something like "git version" would give us a
more realistic idea of the baseline for running a git command (in which
case the test would be auto-skipped if somehow 32 descriptors isn't
enough on some system). That's not foolproof, though (it might take 31
to run "git version", but 33 to run "update-ref" or something).

I'm willing to let it lie unless you have a current problem. This is all
known to be guesswork/heuristics, and the hope was that it simply
wouldn't come up. If it's causing even occasional pain we might need to
deal with it. But if it's an ephemeral thing that maybe went away, I'm
content to stick my fingers back in my ears. :)

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-06-24 15:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-23 21:40 [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau
2020-04-23 21:41 ` [PATCH 1/4] commit-graph.c: don't use discarded graph_name in error Taylor Blau
2020-04-23 21:41 ` [PATCH 2/4] t/test-lib.sh: make ULIMIT_FILE_DESCRIPTORS available to tests Taylor Blau
2020-04-23 21:41 ` [PATCH 3/4] commit-graph.c: gracefully handle file descriptor exhaustion Taylor Blau
2021-06-24  9:51   ` t5324-split-commit-graph.sh flaky due to assumptions about ulimit behavior Ævar Arnfjörð Bjarmason
2021-06-24 15:52     ` Jeff King
2020-04-23 21:41 ` [PATCH 4/4] commit-graph: close descriptors after mmap Taylor Blau
2020-04-23 22:04   ` Junio C Hamano
2020-04-24  3:56     ` Jeff King
2020-04-24 13:17     ` Derrick Stolee
2020-04-24 16:35       ` Taylor Blau
2020-04-24 20:02       ` Junio C Hamano
2020-04-27 10:57         ` Derrick Stolee
2020-04-23 21:43 ` [PATCH 0/4] commit-graph: handle file descriptor exhaustion Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).