From: Matheus Tavares <matheus.bernardino@usp.br> To: git@vger.kernel.org Cc: phil.hord@gmail.com, dstolee@microsoft.com, jonathantanmy@google.com, stefanbeller@gmail.com Subject: [PATCH 1/2] packfile: fix race condition on unpack_entry() Date: Mon, 28 Sep 2020 13:50:34 -0300 [thread overview] Message-ID: <42a7948f94cb57ebd9c37c3850b46b1ac9813ec6.1601311803.git.matheus.bernardino@usp.br> (raw) In-Reply-To: <cover.1601311803.git.matheus.bernardino@usp.br> The third phase of unpack_entry() performs the following sequence in a loop, until all the deltas enumerated in phase one are applied and the entry is fully reconstructed: 1. Add the current base entry to the delta base cache 2. Unpack the next delta 3. Patch the unpacked delta on top of the base When the optional object reading lock is enabled, the above steps will be performed while holding the lock. However, step 2. momentarily releases it so that inflation can be performed in parallel for increased performance. Because the `base` buffer inserted in the cache at 1. is not duplicated, another thread can potentially free() it while the lock is released at 2. (e.g. when there is no space left in the cache to insert another entry). In this case, the later attempt to dereference `base` at 3. will cause a segmentation fault. This problem was observed during a multithreaded git-grep execution on a repository with large objects. To fix the race condition (and later segmentation fault), let's reorder the aforementioned steps so that the lock is not released between 1. and 3. An alternative solution which would not require the reordering would be to duplicate `base` before inserting it in the cache. However, as Phil Hord mentioned, memcpy()'ing large bases can negatively affect performance: in his experiments, this alternative approach slowed git-grep down by 10% to 20%. Reported-by: Phil Hord <phil.hord@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- packfile.c | 41 ++++++++++++++++++++++++----------------- 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/packfile.c b/packfile.c index 9ef27508f2..0319418d88 100644 --- a/packfile.c +++ b/packfile.c @@ -1775,12 +1775,10 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset, void *external_base = NULL; unsigned long delta_size, base_size = size; int i; + off_t base_obj_offset = obj_offset; data = NULL; - if (base) - add_delta_base_cache(p, obj_offset, base, base_size, type); - if (!base) { /* * We're probably in deep shit, but let's try to fetch @@ -1818,24 +1816,33 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset, "at offset %"PRIuMAX" from %s", (uintmax_t)curpos, p->pack_name); data = NULL; - free(external_base); - continue; - } + } else { + data = patch_delta(base, base_size, delta_data, + delta_size, &size); - data = patch_delta(base, base_size, - delta_data, delta_size, - &size); + /* + * We could not apply the delta; warn the user, but + * keep going. Our failure will be noticed either in + * the next iteration of the loop, or if this is the + * final delta, in the caller when we return NULL. + * Those code paths will take care of making a more + * explicit warning and retrying with another copy of + * the object. + */ + if (!data) + error("failed to apply delta"); + } /* - * We could not apply the delta; warn the user, but keep going. - * Our failure will be noticed either in the next iteration of - * the loop, or if this is the final delta, in the caller when - * we return NULL. Those code paths will take care of making - * a more explicit warning and retrying with another copy of - * the object. + * We delay adding `base` to the cache until the end of the loop + * because unpack_compressed_entry() momentarily releases the + * obj_read_mutex, giving another thread the chance to access + * the cache. Therefore, if `base` was already there, this other + * thread could free() it (e.g. to make space for another entry) + * before we are done using it. */ - if (!data) - error("failed to apply delta"); + if (!external_base) + add_delta_base_cache(p, base_obj_offset, base, base_size, type); free(delta_data); free(external_base); -- 2.28.0
next prev parent reply other threads:[~2020-09-28 16:51 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-25 2:36 RFC - concurrency causes segfault in git grep since 2.26.0 Phil Hord 2020-09-25 5:52 ` Matheus Tavares 2020-09-25 19:53 ` Phil Hord 2020-09-28 16:50 ` [PATCH 0/2] Fix race condition and memory leak in delta base cache Matheus Tavares 2020-09-28 16:50 ` Matheus Tavares [this message] 2020-09-28 18:05 ` [PATCH 1/2] packfile: fix race condition on unpack_entry() Junio C Hamano 2020-09-28 16:50 ` [PATCH 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares 2020-09-28 18:22 ` Junio C Hamano 2020-09-29 0:01 ` [PATCH v2 0/2] Fix race condition and memory leak in delta base cache Matheus Tavares 2020-09-29 0:01 ` [PATCH v2 1/2] packfile: fix race condition on unpack_entry() Matheus Tavares 2020-10-02 20:06 ` Phil Hord 2020-09-29 0:01 ` [PATCH v2 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=42a7948f94cb57ebd9c37c3850b46b1ac9813ec6.1601311803.git.matheus.bernardino@usp.br \ --to=matheus.bernardino@usp.br \ --cc=dstolee@microsoft.com \ --cc=git@vger.kernel.org \ --cc=jonathantanmy@google.com \ --cc=phil.hord@gmail.com \ --cc=stefanbeller@gmail.com \ --subject='Re: [PATCH 1/2] packfile: fix race condition on unpack_entry()' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).