* [PATCH v3 01/19] convert: make convert_attrs() and convert structs public
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 23:40 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
` (20 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
parallel checkout implementation.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: squash and reword msg]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 23 ++---------------------
convert.h | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+), 21 deletions(-)
diff --git a/convert.c b/convert.c
index ee360c2f07..eb14714979 100644
--- a/convert.c
+++ b/convert.c
@@ -24,17 +24,6 @@
#define CONVERT_STAT_BITS_TXT_CRLF 0x2
#define CONVERT_STAT_BITS_BIN 0x4
-enum crlf_action {
- CRLF_UNDEFINED,
- CRLF_BINARY,
- CRLF_TEXT,
- CRLF_TEXT_INPUT,
- CRLF_TEXT_CRLF,
- CRLF_AUTO,
- CRLF_AUTO_INPUT,
- CRLF_AUTO_CRLF
-};
-
struct text_stat {
/* NUL, CR, LF and CRLF counts */
unsigned nul, lonecr, lonelf, crlf;
@@ -1297,18 +1286,10 @@ static int git_path_check_ident(struct attr_check_item *check)
return !!ATTR_TRUE(value);
}
-struct conv_attrs {
- struct convert_driver *drv;
- enum crlf_action attr_action; /* What attr says */
- enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
- int ident;
- const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
-};
-
static struct attr_check *check;
-static void convert_attrs(const struct index_state *istate,
- struct conv_attrs *ca, const char *path)
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path)
{
struct attr_check_item *ccheck = NULL;
diff --git a/convert.h b/convert.h
index e29d1026a6..aeb4a1be9a 100644
--- a/convert.h
+++ b/convert.h
@@ -37,6 +37,27 @@ enum eol {
#endif
};
+enum crlf_action {
+ CRLF_UNDEFINED,
+ CRLF_BINARY,
+ CRLF_TEXT,
+ CRLF_TEXT_INPUT,
+ CRLF_TEXT_CRLF,
+ CRLF_AUTO,
+ CRLF_AUTO_INPUT,
+ CRLF_AUTO_CRLF
+};
+
+struct convert_driver;
+
+struct conv_attrs {
+ struct convert_driver *drv;
+ enum crlf_action attr_action; /* What attr says */
+ enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
+ int ident;
+ const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
+};
+
enum ce_delay_state {
CE_NO_DELAY = 0,
CE_CAN_DELAY = 1,
@@ -102,6 +123,9 @@ void convert_to_git_filter_fd(const struct index_state *istate,
int would_convert_to_git_filter_fd(const struct index_state *istate,
const char *path);
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path);
+
/*
* Initialize the checkout metadata with the given values. Any argument may be
* NULL if it is not applicable. The treeish should be a commit if that is
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 01/19] convert: make convert_attrs() and convert structs public
2020-10-29 2:14 ` [PATCH v3 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
@ 2020-10-29 23:40 ` Junio C Hamano
2020-10-30 17:01 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Junio C Hamano @ 2020-10-29 23:40 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
Matheus Tavares <matheus.bernardino@usp.br> writes:
> diff --git a/convert.h b/convert.h
> index e29d1026a6..aeb4a1be9a 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -37,6 +37,27 @@ enum eol {
> #endif
> };
>
> +enum crlf_action {
> + CRLF_UNDEFINED,
> + CRLF_BINARY,
> + CRLF_TEXT,
> + CRLF_TEXT_INPUT,
> + CRLF_TEXT_CRLF,
> + CRLF_AUTO,
> + CRLF_AUTO_INPUT,
> + CRLF_AUTO_CRLF
> +};
> +
> +struct convert_driver;
> +
> +struct conv_attrs {
> + struct convert_driver *drv;
> + enum crlf_action attr_action; /* What attr says */
> + enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
> + int ident;
> + const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
> +};
> +
> enum ce_delay_state {
> CE_NO_DELAY = 0,
> CE_CAN_DELAY = 1,
> @@ -102,6 +123,9 @@ void convert_to_git_filter_fd(const struct index_state *istate,
> int would_convert_to_git_filter_fd(const struct index_state *istate,
> const char *path);
>
> +void convert_attrs(const struct index_state *istate,
> + struct conv_attrs *ca, const char *path);
> +
> /*
> * Initialize the checkout metadata with the given values. Any argument may be
> * NULL if it is not applicable. The treeish should be a commit if that is
The new global symbols are reasonable, I would think, with a
possible exception of "crlf_action", which may want to also have
"conv" or "convert" somewhere in its name.
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v3 01/19] convert: make convert_attrs() and convert structs public
2020-10-29 23:40 ` Junio C Hamano
@ 2020-10-30 17:01 ` Matheus Tavares Bernardino
2020-10-30 17:38 ` Junio C Hamano
0 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-10-30 17:01 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jeff Hostetler, Christian Couder, Jeff King, Elijah Newren,
Jonathan Nieder, Martin Ågren, Jeff Hostetler
On Thu, Oct 29, 2020 at 8:40 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > diff --git a/convert.h b/convert.h
> > index e29d1026a6..aeb4a1be9a 100644
> > --- a/convert.h
> > +++ b/convert.h
> > @@ -37,6 +37,27 @@ enum eol {
> > #endif
> > };
> >
> > +enum crlf_action {
> > + CRLF_UNDEFINED,
> > + CRLF_BINARY,
> > + CRLF_TEXT,
> > + CRLF_TEXT_INPUT,
> > + CRLF_TEXT_CRLF,
> > + CRLF_AUTO,
> > + CRLF_AUTO_INPUT,
> > + CRLF_AUTO_CRLF
> > +};
> > +
> > +struct convert_driver;
> > +
> > +struct conv_attrs {
> > + struct convert_driver *drv;
> > + enum crlf_action attr_action; /* What attr says */
> > + enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
> > + int ident;
> > + const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
> > +};
> > +
> > enum ce_delay_state {
> > CE_NO_DELAY = 0,
> > CE_CAN_DELAY = 1,
> > @@ -102,6 +123,9 @@ void convert_to_git_filter_fd(const struct index_state *istate,
> > int would_convert_to_git_filter_fd(const struct index_state *istate,
> > const char *path);
> >
> > +void convert_attrs(const struct index_state *istate,
> > + struct conv_attrs *ca, const char *path);
> > +
> > /*
> > * Initialize the checkout metadata with the given values. Any argument may be
> > * NULL if it is not applicable. The treeish should be a commit if that is
>
> The new global symbols are reasonable, I would think, with a
> possible exception of "crlf_action", which may want to also have
> "conv" or "convert" somewhere in its name.
OK. Maybe `enum crlf_conv_action`? In this case, should I also change
the prefix of the enum values? I'm not sure if it's worth it, though,
since there are about 52 occurrences of them.
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v3 01/19] convert: make convert_attrs() and convert structs public
2020-10-30 17:01 ` Matheus Tavares Bernardino
@ 2020-10-30 17:38 ` Junio C Hamano
0 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-30 17:38 UTC (permalink / raw)
To: Matheus Tavares Bernardino
Cc: git, Jeff Hostetler, Christian Couder, Jeff King, Elijah Newren,
Jonathan Nieder, Martin Ågren, Jeff Hostetler
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:
>> > +enum crlf_action {
>> > + CRLF_UNDEFINED,
>> > + CRLF_BINARY,
>> > + CRLF_TEXT,
>> > + CRLF_TEXT_INPUT,
>> > + CRLF_TEXT_CRLF,
>> > + CRLF_AUTO,
>> > + CRLF_AUTO_INPUT,
>> > + CRLF_AUTO_CRLF
>> > +};
>> > +
>> > +struct convert_driver;
>> > +
>> > +struct conv_attrs {
>> > + struct convert_driver *drv;
>> ...
>> > +void convert_attrs(const struct index_state *istate,
>> > + struct conv_attrs *ca, const char *path);
>> > +
>> > /*
>> > * Initialize the checkout metadata with the given values. Any argument may be
>> > * NULL if it is not applicable. The treeish should be a commit if that is
>>
>> The new global symbols are reasonable, I would think, with a
>> possible exception of "crlf_action", which may want to also have
>> "conv" or "convert" somewhere in its name.
>
> OK. Maybe `enum crlf_conv_action`? In this case, should I also change
Either that, or "conv_crlf_action" (or even use the fully spelled
"convert_" as the prefix common to the global symbols from the
subsystem).
> In this case, should I also change
> the prefix of the enum values? I'm not sure if it's worth it, though,
> since there are about 52 occurrences of them.
At the use sites, these constants will be passed to, or compared
with values returned by, the API functions whose names make it clear
that they are from the "convert_" family, so I think it is OK to
leave the values as-is, as long as there is no unrelated symbol
whose name starts with "CRLF_" (and "git grep '^CRLF_'" tells me
that there is not any).
Thanks.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 23:48 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
` (19 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
precomputed 'struct conv_attrs', not relying, thus, on a index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
- We will already load the conversion attributes in checkout_entry(),
before conversion, to decide whether a path is eligible for parallel
checkout. Therefore, it would be wasteful to load them again later,
for the actual conversion.
- The parallel workers will be responsible for reading, converting and
writing blobs to the working tree. They won't have access to the main
process' index state, so they cannot load the attributes. Instead,
they will receive the preloaded ones and call the _ca() variant of
the conversion functions. Furthermore, the attributes machinery is
optimized to handle paths in sequential order, so it's better to leave
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: squash, remove one function definition and reword]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 50 ++++++++++++++++++++++++++++++++++++--------------
convert.h | 9 +++++++++
2 files changed, 45 insertions(+), 14 deletions(-)
diff --git a/convert.c b/convert.c
index eb14714979..191a42a0ae 100644
--- a/convert.c
+++ b/convert.c
@@ -1447,7 +1447,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
ident_to_git(dst->buf, dst->len, dst, ca.ident);
}
-static int convert_to_working_tree_internal(const struct index_state *istate,
+static int convert_to_working_tree_internal(const struct conv_attrs *ca,
const char *path, const char *src,
size_t len, struct strbuf *dst,
int normalizing,
@@ -1455,11 +1455,8 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
struct delayed_checkout *dco)
{
int ret = 0, ret_filter = 0;
- struct conv_attrs ca;
-
- convert_attrs(istate, &ca, path);
- ret |= ident_to_worktree(src, len, dst, ca.ident);
+ ret |= ident_to_worktree(src, len, dst, ca->ident);
if (ret) {
src = dst->buf;
len = dst->len;
@@ -1469,24 +1466,24 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
* is a smudge or process filter (even if the process filter doesn't
* support smudge). The filters might expect CRLFs.
*/
- if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
- ret |= crlf_to_worktree(src, len, dst, ca.crlf_action);
+ if ((ca->drv && (ca->drv->smudge || ca->drv->process)) || !normalizing) {
+ ret |= crlf_to_worktree(src, len, dst, ca->crlf_action);
if (ret) {
src = dst->buf;
len = dst->len;
}
}
- ret |= encode_to_worktree(path, src, len, dst, ca.working_tree_encoding);
+ ret |= encode_to_worktree(path, src, len, dst, ca->working_tree_encoding);
if (ret) {
src = dst->buf;
len = dst->len;
}
ret_filter = apply_filter(
- path, src, len, -1, dst, ca.drv, CAP_SMUDGE, meta, dco);
- if (!ret_filter && ca.drv && ca.drv->required)
- die(_("%s: smudge filter %s failed"), path, ca.drv->name);
+ path, src, len, -1, dst, ca->drv, CAP_SMUDGE, meta, dco);
+ if (!ret_filter && ca->drv && ca->drv->required)
+ die(_("%s: smudge filter %s failed"), path, ca->drv->name);
return ret | ret_filter;
}
@@ -1497,7 +1494,9 @@ int async_convert_to_working_tree(const struct index_state *istate,
const struct checkout_metadata *meta,
void *dco)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, dco);
}
int convert_to_working_tree(const struct index_state *istate,
@@ -1505,13 +1504,36 @@ int convert_to_working_tree(const struct index_state *istate,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, NULL);
+}
+
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
+{
+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, dco);
+}
+
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
+{
+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, NULL);
}
int renormalize_buffer(const struct index_state *istate, const char *path,
const char *src, size_t len, struct strbuf *dst)
{
- int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
+ struct conv_attrs ca;
+ int ret;
+
+ convert_attrs(istate, &ca, path);
+ ret = convert_to_working_tree_internal(&ca, path, src, len, dst, 1, NULL, NULL);
if (ret) {
src = dst->buf;
len = dst->len;
diff --git a/convert.h b/convert.h
index aeb4a1be9a..46d537d1ae 100644
--- a/convert.h
+++ b/convert.h
@@ -100,11 +100,20 @@ int convert_to_working_tree(const struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta);
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta);
int async_convert_to_working_tree(const struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta,
void *dco);
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco);
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
int renormalize_buffer(const struct index_state *istate,
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants
2020-10-29 2:14 ` [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
@ 2020-10-29 23:48 ` Junio C Hamano
0 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-29 23:48 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
Matheus Tavares <matheus.bernardino@usp.br> writes:
> -static int convert_to_working_tree_internal(const struct index_state *istate,
> +static int convert_to_working_tree_internal(const struct conv_attrs *ca,
Makes sense. Once we know conv_attrs, we do not need the istate to
convert the contents.
> @@ -1497,7 +1494,9 @@ int async_convert_to_working_tree(const struct index_state *istate,
> const struct checkout_metadata *meta,
> void *dco)
> {
> - return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
> + struct conv_attrs ca;
> + convert_attrs(istate, &ca, path);
> + return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, dco);
> }
>
> @@ -1505,13 +1504,36 @@ int convert_to_working_tree(const struct index_state *istate,
> size_t len, struct strbuf *dst,
> const struct checkout_metadata *meta)
> {
> - return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
> + struct conv_attrs ca;
> + convert_attrs(istate, &ca, path);
> + return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, NULL);
> +}
OK, these naturally implement "let's lift convert_attrs() out of the
callee and move it to the callers". However...
> +int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
> + const char *path, const char *src,
> + size_t len, struct strbuf *dst,
> + const struct checkout_metadata *meta,
> + void *dco)
> +{
> + return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, dco);
> +}
> +
> +int convert_to_working_tree_ca(const struct conv_attrs *ca,
> + const char *path, const char *src,
> + size_t len, struct strbuf *dst,
> + const struct checkout_metadata *meta)
> +{
> + return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, NULL);
> }
... shouldn't they be implemented as thin wrappers around these new
*_ca() variants of the API functions? Otherwise, the *_ca()
variants are not yet used by anybody yet at this step, are they?
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 03/19] convert: add get_stream_filter_ca() variant
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 23:51 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 04/19] convert: add conv_attrs classification Matheus Tavares
` (18 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Like the previous patch, we will also need to call get_stream_filter()
with a precomputed `struct conv_attrs`, when we add support for parallel
checkout workers. So add the _ca() variant which takes the conversion
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: move header comment to ca() variant and reword msg]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 28 +++++++++++++++++-----------
convert.h | 2 ++
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/convert.c b/convert.c
index 191a42a0ae..bd4d3f01cd 100644
--- a/convert.c
+++ b/convert.c
@@ -1960,34 +1960,31 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
}
/*
- * Return an appropriately constructed filter for the path, or NULL if
+ * Return an appropriately constructed filter for the given ca, or NULL if
* the contents cannot be filtered without reading the whole thing
* in-core.
*
* Note that you would be crazy to set CRLF, smudge/clean or ident to a
* large binary blob you would want us not to slurp into the memory!
*/
-struct stream_filter *get_stream_filter(const struct index_state *istate,
- const char *path,
- const struct object_id *oid)
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid)
{
- struct conv_attrs ca;
struct stream_filter *filter = NULL;
- convert_attrs(istate, &ca, path);
- if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
+ if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
return NULL;
- if (ca.working_tree_encoding)
+ if (ca->working_tree_encoding)
return NULL;
- if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
return NULL;
- if (ca.ident)
+ if (ca->ident)
filter = ident_filter(oid);
- if (output_eol(ca.crlf_action) == EOL_CRLF)
+ if (output_eol(ca->crlf_action) == EOL_CRLF)
filter = cascade_filter(filter, lf_to_crlf_filter());
else
filter = cascade_filter(filter, &null_filter_singleton);
@@ -1995,6 +1992,15 @@ struct stream_filter *get_stream_filter(const struct index_state *istate,
return filter;
}
+struct stream_filter *get_stream_filter(const struct index_state *istate,
+ const char *path,
+ const struct object_id *oid)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return get_stream_filter_ca(&ca, oid);
+}
+
void free_stream_filter(struct stream_filter *filter)
{
filter->vtbl->free(filter);
diff --git a/convert.h b/convert.h
index 46d537d1ae..262c1a1d46 100644
--- a/convert.h
+++ b/convert.h
@@ -169,6 +169,8 @@ struct stream_filter; /* opaque */
struct stream_filter *get_stream_filter(const struct index_state *istate,
const char *path,
const struct object_id *);
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid);
void free_stream_filter(struct stream_filter *);
int is_null_stream_filter(struct stream_filter *);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 03/19] convert: add get_stream_filter_ca() variant
2020-10-29 2:14 ` [PATCH v3 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
@ 2020-10-29 23:51 ` Junio C Hamano
0 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-29 23:51 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
Matheus Tavares <matheus.bernardino@usp.br> writes:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Like the previous patch, we will also need to call get_stream_filter()
> with a precomputed `struct conv_attrs`, when we add support for parallel
> checkout workers. So add the _ca() variant which takes the conversion
> attributes struct as a parameter.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> [matheus.bernardino: move header comment to ca() variant and reword msg]
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> convert.c | 28 +++++++++++++++++-----------
> convert.h | 2 ++
> 2 files changed, 19 insertions(+), 11 deletions(-)
Same idea as 02/19, which is sound.
It makes readers wonder why this one is separate, while
convert_to_working_tree(), async_convert_to_working_tree(), and
renormalize_buffer() were done in a single patch in a single step,
though.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 04/19] convert: add conv_attrs classification
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (2 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 23:53 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 05/19] entry: extract a header file for entry.c functions Matheus Tavares
` (17 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
This will be used in a later commit when deciding whether to add a file
to the parallel or delayed queue during checkout. For now, we can also
use it in get_stream_filter_ca() to simplify the function (as the
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: use classification in get_stream_filter_ca()]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 26 +++++++++++++++++++-------
convert.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+), 7 deletions(-)
diff --git a/convert.c b/convert.c
index bd4d3f01cd..c0b45149b5 100644
--- a/convert.c
+++ b/convert.c
@@ -1972,13 +1972,7 @@ struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
{
struct stream_filter *filter = NULL;
- if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
- return NULL;
-
- if (ca->working_tree_encoding)
- return NULL;
-
- if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ if (classify_conv_attrs(ca) != CA_CLASS_STREAMABLE)
return NULL;
if (ca->ident)
@@ -2034,3 +2028,21 @@ void clone_checkout_metadata(struct checkout_metadata *dst,
if (blob)
oidcpy(&dst->blob, blob);
}
+
+enum conv_attrs_classification classify_conv_attrs(const struct conv_attrs *ca)
+{
+ if (ca->drv) {
+ if (ca->drv->process)
+ return CA_CLASS_INCORE_PROCESS;
+ if (ca->drv->smudge || ca->drv->clean)
+ return CA_CLASS_INCORE_FILTER;
+ }
+
+ if (ca->working_tree_encoding)
+ return CA_CLASS_INCORE;
+
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ return CA_CLASS_INCORE;
+
+ return CA_CLASS_STREAMABLE;
+}
diff --git a/convert.h b/convert.h
index 262c1a1d46..523ba9b140 100644
--- a/convert.h
+++ b/convert.h
@@ -190,4 +190,37 @@ int stream_filter(struct stream_filter *,
const char *input, size_t *isize_p,
char *output, size_t *osize_p);
+enum conv_attrs_classification {
+ /*
+ * The blob must be loaded into a buffer before it can be
+ * smudged. All smudging is done in-proc.
+ */
+ CA_CLASS_INCORE,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * single-file driver filter, such as rot13.
+ */
+ CA_CLASS_INCORE_FILTER,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * long-running driver process, such as LFS. This might or
+ * might not use delayed operations. (The important thing is
+ * that there is a single subordinate long-running process
+ * handling all associated blobs and in case of delayed
+ * operations, may hold per-blob state.)
+ */
+ CA_CLASS_INCORE_PROCESS,
+
+ /*
+ * The blob can be streamed and smudged without needing to
+ * completely read it into a buffer.
+ */
+ CA_CLASS_STREAMABLE,
+};
+
+enum conv_attrs_classification classify_conv_attrs(
+ const struct conv_attrs *ca);
+
#endif /* CONVERT_H */
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 04/19] convert: add conv_attrs classification
2020-10-29 2:14 ` [PATCH v3 04/19] convert: add conv_attrs classification Matheus Tavares
@ 2020-10-29 23:53 ` Junio C Hamano
0 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-29 23:53 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
Matheus Tavares <matheus.bernardino@usp.br> writes:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Create `enum conv_attrs_classification` to express the different ways
> that attributes are handled for a blob during checkout.
>
> This will be used in a later commit when deciding whether to add a file
> to the parallel or delayed queue during checkout. For now, we can also
> use it in get_stream_filter_ca() to simplify the function (as the
> classifying logic is the same).
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> [matheus.bernardino: use classification in get_stream_filter_ca()]
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> convert.c | 26 +++++++++++++++++++-------
> convert.h | 33 +++++++++++++++++++++++++++++++++
> 2 files changed, 52 insertions(+), 7 deletions(-)
Yup, having an actual user of the new layer of abstraction in the
same patch makes it more easily understandable. If only the new
function and enum were presented without anybody using, it would
have been much harder to swallow, without visible and immediate
benefit.
Looking good.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 05/19] entry: extract a header file for entry.c functions
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (3 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 04/19] convert: add conv_attrs classification Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-30 21:36 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
` (16 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
The declarations of entry.c's public functions and structures currently
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
new entry.h header.
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
apply.c | 1 +
builtin/checkout-index.c | 1 +
builtin/checkout.c | 1 +
builtin/difftool.c | 1 +
cache.h | 24 -----------------------
entry.c | 9 +--------
entry.h | 41 ++++++++++++++++++++++++++++++++++++++++
unpack-trees.c | 1 +
8 files changed, 47 insertions(+), 32 deletions(-)
create mode 100644 entry.h
diff --git a/apply.c b/apply.c
index 76dba93c97..ddec80b4b0 100644
--- a/apply.c
+++ b/apply.c
@@ -21,6 +21,7 @@
#include "quote.h"
#include "rerere.h"
#include "apply.h"
+#include "entry.h"
struct gitdiff_data {
struct strbuf *root;
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 4bbfc92dce..9276ed0258 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -11,6 +11,7 @@
#include "quote.h"
#include "cache-tree.h"
#include "parse-options.h"
+#include "entry.h"
#define CHECKOUT_ALL 4
static int nul_term_line;
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0951f8fee5..b18b9d6f3c 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -26,6 +26,7 @@
#include "unpack-trees.h"
#include "wt-status.h"
#include "xdiff-interface.h"
+#include "entry.h"
static const char * const checkout_usage[] = {
N_("git checkout [<options>] <branch>"),
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 7ac432b881..dfa22b67eb 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -23,6 +23,7 @@
#include "lockfile.h"
#include "object-store.h"
#include "dir.h"
+#include "entry.h"
static int trust_exit_code;
diff --git a/cache.h b/cache.h
index c0072d43b1..ccfeb9ba2b 100644
--- a/cache.h
+++ b/cache.h
@@ -1706,30 +1706,6 @@ const char *show_ident_date(const struct ident_split *id,
*/
int ident_cmp(const struct ident_split *, const struct ident_split *);
-struct checkout {
- struct index_state *istate;
- const char *base_dir;
- int base_dir_len;
- struct delayed_checkout *delayed_checkout;
- struct checkout_metadata meta;
- unsigned force:1,
- quiet:1,
- not_new:1,
- clone:1,
- refresh_cache:1;
-};
-#define CHECKOUT_INIT { NULL, "" }
-
-#define TEMPORARY_FILENAME_LENGTH 25
-int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath, int *nr_checkouts);
-void enable_delayed_checkout(struct checkout *state);
-int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
-/*
- * Unlink the last component and schedule the leading directories for
- * removal, such that empty directories get removed.
- */
-void unlink_entry(const struct cache_entry *ce);
-
struct cache_def {
struct strbuf path;
int flags;
diff --git a/entry.c b/entry.c
index a0532f1f00..b0b8099699 100644
--- a/entry.c
+++ b/entry.c
@@ -6,6 +6,7 @@
#include "submodule.h"
#include "progress.h"
#include "fsmonitor.h"
+#include "entry.h"
static void create_directories(const char *path, int path_len,
const struct checkout *state)
@@ -429,14 +430,6 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-/*
- * Write the contents from ce out to the working tree.
- *
- * When topath[] is not NULL, instead of writing to the working tree
- * file named by ce, a temporary file is created by this function and
- * its name is returned in topath[], which must be able to hold at
- * least TEMPORARY_FILENAME_LENGTH bytes long.
- */
int checkout_entry(struct cache_entry *ce, const struct checkout *state,
char *topath, int *nr_checkouts)
{
diff --git a/entry.h b/entry.h
new file mode 100644
index 0000000000..2d69185448
--- /dev/null
+++ b/entry.h
@@ -0,0 +1,41 @@
+#ifndef ENTRY_H
+#define ENTRY_H
+
+#include "cache.h"
+#include "convert.h"
+
+struct checkout {
+ struct index_state *istate;
+ const char *base_dir;
+ int base_dir_len;
+ struct delayed_checkout *delayed_checkout;
+ struct checkout_metadata meta;
+ unsigned force:1,
+ quiet:1,
+ not_new:1,
+ clone:1,
+ refresh_cache:1;
+};
+#define CHECKOUT_INIT { NULL, "" }
+
+#define TEMPORARY_FILENAME_LENGTH 25
+
+/*
+ * Write the contents from ce out to the working tree.
+ *
+ * When topath[] is not NULL, instead of writing to the working tree
+ * file named by ce, a temporary file is created by this function and
+ * its name is returned in topath[], which must be able to hold at
+ * least TEMPORARY_FILENAME_LENGTH bytes long.
+ */
+int checkout_entry(struct cache_entry *ce, const struct checkout *state,
+ char *topath, int *nr_checkouts);
+void enable_delayed_checkout(struct checkout *state);
+int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
+/*
+ * Unlink the last component and schedule the leading directories for
+ * removal, such that empty directories get removed.
+ */
+void unlink_entry(const struct cache_entry *ce);
+
+#endif /* ENTRY_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 323280dd48..a511fadd89 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -16,6 +16,7 @@
#include "fsmonitor.h"
#include "object-store.h"
#include "promisor-remote.h"
+#include "entry.h"
/*
* Error messages expected by scripts out of plumbing commands such as
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 05/19] entry: extract a header file for entry.c functions
2020-10-29 2:14 ` [PATCH v3 05/19] entry: extract a header file for entry.c functions Matheus Tavares
@ 2020-10-30 21:36 ` Junio C Hamano
0 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-30 21:36 UTC (permalink / raw)
To: Matheus Tavares; +Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren
Matheus Tavares <matheus.bernardino@usp.br> writes:
> The declarations of entry.c's public functions and structures currently
> reside in cache.h. Although not many, they contribute to the size of
> cache.h and, when changed, cause the unnecessary recompilation of
> modules that don't really use these functions. So let's move them to a
> new entry.h header.
Good idea. This is mostly moving things around, so there are only a
few minor nits.
> diff --git a/entry.h b/entry.h
> new file mode 100644
> index 0000000000..2d69185448
> --- /dev/null
> +++ b/entry.h
> @@ -0,0 +1,41 @@
> +#ifndef ENTRY_H
> +#define ENTRY_H
> +
> +#include "cache.h"
> +#include "convert.h"
> +
> +struct checkout {
> + struct index_state *istate;
> + const char *base_dir;
> + int base_dir_len;
> + struct delayed_checkout *delayed_checkout;
> + struct checkout_metadata meta;
> + unsigned force:1,
> + quiet:1,
> + not_new:1,
> + clone:1,
> + refresh_cache:1;
> +};
> +#define CHECKOUT_INIT { NULL, "" }
> +
It makes sense to have a blank here, like you did, as we just
completed the definition of "struct checkout" and things directly
related to it.
> +#define TEMPORARY_FILENAME_LENGTH 25
> +
> +/*
> + * Write the contents from ce out to the working tree.
> + *
> + * When topath[] is not NULL, instead of writing to the working tree
> + * file named by ce, a temporary file is created by this function and
> + * its name is returned in topath[], which must be able to hold at
> + * least TEMPORARY_FILENAME_LENGTH bytes long.
> + */
> +int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> + char *topath, int *nr_checkouts);
The comment before the above block applies to both the function and
to the TEMPORARY_FILENAME_LENGTH preprocessor macro. And this is
where we conclude the definition related to the function so it is a
good idea to have a blank line here....
> +void enable_delayed_checkout(struct checkout *state);
> +int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
....and here, as we have finished talking about the "delayed" stuff.
> +/*
> + * Unlink the last component and schedule the leading directories for
> + * removal, such that empty directories get removed.
> + */
> +void unlink_entry(const struct cache_entry *ce);
> +
> +#endif /* ENTRY_H */
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 323280dd48..a511fadd89 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -16,6 +16,7 @@
> #include "fsmonitor.h"
> #include "object-store.h"
> #include "promisor-remote.h"
> +#include "entry.h"
>
> /*
> * Error messages expected by scripts out of plumbing commands such as
n
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 06/19] entry: make fstat_output() and read_blob_entry() public
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (4 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 05/19] entry: extract a header file for entry.c functions Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
` (15 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
These two functions will be used by the parallel checkout code, so let's
make them public. Note: fstat_output() is renamed to
fstat_checkout_output(), now that it has become public, seeking to avoid
future name collisions.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 8 ++++----
entry.h | 2 ++
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/entry.c b/entry.c
index b0b8099699..b36071a610 100644
--- a/entry.c
+++ b/entry.c
@@ -84,7 +84,7 @@ static int create_file(const char *path, unsigned int mode)
return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
}
-static void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
{
enum object_type type;
void *blob_data = read_object_file(&ce->oid, &type, size);
@@ -109,7 +109,7 @@ static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempf
}
}
-static int fstat_output(int fd, const struct checkout *state, struct stat *st)
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st)
{
/* use fstat() only when path == ce->name */
if (fstat_is_reliable() &&
@@ -132,7 +132,7 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path,
return -1;
result |= stream_blob_to_fd(fd, &ce->oid, filter, 1);
- *fstat_done = fstat_output(fd, state, statbuf);
+ *fstat_done = fstat_checkout_output(fd, state, statbuf);
result |= close(fd);
if (result)
@@ -346,7 +346,7 @@ static int write_entry(struct cache_entry *ce,
wrote = write_in_full(fd, new_blob, size);
if (!to_tempfile)
- fstat_done = fstat_output(fd, state, &st);
+ fstat_done = fstat_checkout_output(fd, state, &st);
close(fd);
free(new_blob);
if (wrote < 0)
diff --git a/entry.h b/entry.h
index 2d69185448..f860e60846 100644
--- a/entry.h
+++ b/entry.h
@@ -37,5 +37,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
* removal, such that empty directories get removed.
*/
void unlink_entry(const struct cache_entry *ce);
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
#endif /* ENTRY_H */
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 07/19] entry: extract cache_entry update from write_entry()
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (5 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
` (14 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
This code will be used by the parallel checkout functions, outside
entry.c, so extract it to a public function.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 25 ++++++++++++++++---------
entry.h | 2 ++
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/entry.c b/entry.c
index b36071a610..1d2df188e5 100644
--- a/entry.c
+++ b/entry.c
@@ -251,6 +251,18 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
return errs;
}
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st)
+{
+ if (state->refresh_cache) {
+ assert(state->istate);
+ fill_stat_cache_info(state->istate, ce, st);
+ ce->ce_flags |= CE_UPDATE_IN_BASE;
+ mark_fsmonitor_invalid(state->istate, ce);
+ state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ }
+}
+
static int write_entry(struct cache_entry *ce,
char *path, const struct checkout *state, int to_tempfile)
{
@@ -371,15 +383,10 @@ static int write_entry(struct cache_entry *ce,
finish:
if (state->refresh_cache) {
- assert(state->istate);
- if (!fstat_done)
- if (lstat(ce->name, &st) < 0)
- return error_errno("unable to stat just-written file %s",
- ce->name);
- fill_stat_cache_info(state->istate, ce, &st);
- ce->ce_flags |= CE_UPDATE_IN_BASE;
- mark_fsmonitor_invalid(state->istate, ce);
- state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ if (!fstat_done && lstat(ce->name, &st) < 0)
+ return error_errno("unable to stat just-written file %s",
+ ce->name);
+ update_ce_after_write(state, ce , &st);
}
delayed:
return 0;
diff --git a/entry.h b/entry.h
index f860e60846..664aed1576 100644
--- a/entry.h
+++ b/entry.h
@@ -39,5 +39,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
void unlink_entry(const struct cache_entry *ce);
void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st);
#endif /* ENTRY_H */
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry()
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (6 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-30 21:58 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
` (13 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
In a following patch, checkout_entry() will use conv_attrs to decide
whether an entry should be enqueued for parallel checkout or not. But
the attributes lookup only happens lower in this call stack. To avoid
the unnecessary work of loading the attributes twice, let's move it up
to checkout_entry(), and pass the loaded struct down to write_entry().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/entry.c b/entry.c
index 1d2df188e5..8237859b12 100644
--- a/entry.c
+++ b/entry.c
@@ -263,8 +263,9 @@ void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
}
}
-static int write_entry(struct cache_entry *ce,
- char *path, const struct checkout *state, int to_tempfile)
+/* Note: ca is used (and required) iff the entry refers to a regular file. */
+static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca,
+ const struct checkout *state, int to_tempfile)
{
unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
struct delayed_checkout *dco = state->delayed_checkout;
@@ -281,8 +282,7 @@ static int write_entry(struct cache_entry *ce,
clone_checkout_metadata(&meta, &state->meta, &ce->oid);
if (ce_mode_s_ifmt == S_IFREG) {
- struct stream_filter *filter = get_stream_filter(state->istate, ce->name,
- &ce->oid);
+ struct stream_filter *filter = get_stream_filter_ca(ca, &ce->oid);
if (filter &&
!streaming_write_entry(ce, path, filter,
state, to_tempfile,
@@ -329,14 +329,17 @@ static int write_entry(struct cache_entry *ce,
* Convert from git internal format to working tree format
*/
if (dco && dco->state != CE_NO_DELAY) {
- ret = async_convert_to_working_tree(state->istate, ce->name, new_blob,
- size, &buf, &meta, dco);
+ ret = async_convert_to_working_tree_ca(ca, ce->name,
+ new_blob, size,
+ &buf, &meta, dco);
if (ret && string_list_has_string(&dco->paths, ce->name)) {
free(new_blob);
goto delayed;
}
- } else
- ret = convert_to_working_tree(state->istate, ce->name, new_blob, size, &buf, &meta);
+ } else {
+ ret = convert_to_working_tree_ca(ca, ce->name, new_blob,
+ size, &buf, &meta);
+ }
if (ret) {
free(new_blob);
@@ -442,6 +445,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
+ struct conv_attrs ca;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -454,8 +458,13 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
}
- if (topath)
- return write_entry(ce, topath, state, 1);
+ if (topath) {
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca, ce->name);
+ return write_entry(ce, topath, &ca, state, 1);
+ }
+ return write_entry(ce, topath, NULL, state, 1);
+ }
strbuf_reset(&path);
strbuf_add(&path, state->base_dir, state->base_dir_len);
@@ -517,9 +526,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
create_directories(path.buf, path.len, state);
+
if (nr_checkouts)
(*nr_checkouts)++;
- return write_entry(ce, path.buf, state, 0);
+
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca, ce->name);
+ return write_entry(ce, path.buf, &ca, state, 0);
+ }
+
+ return write_entry(ce, path.buf, NULL, state, 0);
}
void unlink_entry(const struct cache_entry *ce)
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry()
2020-10-29 2:14 ` [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
@ 2020-10-30 21:58 ` Junio C Hamano
0 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-30 21:58 UTC (permalink / raw)
To: Matheus Tavares; +Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren
Matheus Tavares <matheus.bernardino@usp.br> writes:
> +/* Note: ca is used (and required) iff the entry refers to a regular file. */
This reflects how the current code happens to work, and it is
unlikely to change (in other words, I offhand do not think of a
reason why attributes may affect checking out a symlink or a
submodule), so that's probably OK. I mention this specifically
because ...
> +static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca,
> + const struct checkout *state, int to_tempfile)
> {
> unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
> struct delayed_checkout *dco = state->delayed_checkout;
> @@ -281,8 +282,7 @@ static int write_entry(struct cache_entry *ce,
> clone_checkout_metadata(&meta, &state->meta, &ce->oid);
>
> if (ce_mode_s_ifmt == S_IFREG) {
> - struct stream_filter *filter = get_stream_filter(state->istate, ce->name,
> - &ce->oid);
> + struct stream_filter *filter = get_stream_filter_ca(ca, &ce->oid);
> if (filter &&
> !streaming_write_entry(ce, path, filter,
> state, to_tempfile,
> @@ -329,14 +329,17 @@ static int write_entry(struct cache_entry *ce,
> * Convert from git internal format to working tree format
> */
> if (dco && dco->state != CE_NO_DELAY) {
> - ret = async_convert_to_working_tree(state->istate, ce->name, new_blob,
> - size, &buf, &meta, dco);
> + ret = async_convert_to_working_tree_ca(ca, ce->name,
> + new_blob, size,
> + &buf, &meta, dco);
> if (ret && string_list_has_string(&dco->paths, ce->name)) {
> free(new_blob);
> goto delayed;
> }
> - } else
> - ret = convert_to_working_tree(state->istate, ce->name, new_blob, size, &buf, &meta);
> + } else {
> + ret = convert_to_working_tree_ca(ca, ce->name, new_blob,
> + size, &buf, &meta);
> + }
>
> if (ret) {
> free(new_blob);
> @@ -442,6 +445,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> {
> static struct strbuf path = STRBUF_INIT;
> struct stat st;
> + struct conv_attrs ca;
>
> if (ce->ce_flags & CE_WT_REMOVE) {
> if (topath)
> @@ -454,8 +458,13 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> return 0;
> }
>
> - if (topath)
> - return write_entry(ce, topath, state, 1);
> + if (topath) {
> + if (S_ISREG(ce->ce_mode)) {
> + convert_attrs(state->istate, &ca, ce->name);
> + return write_entry(ce, topath, &ca, state, 1);
> + }
> + return write_entry(ce, topath, NULL, state, 1);
> + }
... it looked somewhat upside-down at the first glance that we
decide if lower level routines are allowed to use the ca at this
high level in the callchain. But it is the point of this change
to lift the point to make the decision to use attributes higher in
the callchain, so it would be OK (or "unavoidable").
I wonder if it is worth to avoid early return from the inner block,
like this:
struct conv_attrs *use_ca = NULL;
...
if (topath) {
struct conv_attrs ca;
if (S_ISREG(...)) {
convert_attrs(... &ca ...);
use_ca = &ca;
}
return write_entry(ce, topath, use_ca, state, 1);
}
which would make it easier to further add code that is common to
both regular file and other things before we call write_entry().
The same comment applies to the codepath where a new file gets
created in the next hunk.
> @@ -517,9 +526,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> return 0;
>
> create_directories(path.buf, path.len, state);
> +
> if (nr_checkouts)
> (*nr_checkouts)++;
> - return write_entry(ce, path.buf, state, 0);
> +
> + if (S_ISREG(ce->ce_mode)) {
> + convert_attrs(state->istate, &ca, ce->name);
> + return write_entry(ce, path.buf, &ca, state, 0);
> + }
> +
> + return write_entry(ce, path.buf, NULL, state, 0);
> }
>
> void unlink_entry(const struct cache_entry *ce)
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (7 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-30 22:02 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
` (12 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
The parallel checkout machinery will call checkout_entry() for entries
that could not be written in parallel due to path collisions. At this
point, we will already be holding the conversion attributes for each
entry, and it would be wasteful to let checkout_entry() load these
again. Instead, let's add the checkout_entry_ca() variant, which
optionally takes a preloaded conv_attrs struct.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 23 ++++++++++++-----------
entry.h | 13 +++++++++++--
2 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/entry.c b/entry.c
index 8237859b12..9d79a5671f 100644
--- a/entry.c
+++ b/entry.c
@@ -440,12 +440,13 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts)
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts)
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
- struct conv_attrs ca;
+ struct conv_attrs ca_buf;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -459,11 +460,11 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
}
if (topath) {
- if (S_ISREG(ce->ce_mode)) {
- convert_attrs(state->istate, &ca, ce->name);
- return write_entry(ce, topath, &ca, state, 1);
+ if (S_ISREG(ce->ce_mode) && !ca) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
}
- return write_entry(ce, topath, NULL, state, 1);
+ return write_entry(ce, topath, ca, state, 1);
}
strbuf_reset(&path);
@@ -530,12 +531,12 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
if (nr_checkouts)
(*nr_checkouts)++;
- if (S_ISREG(ce->ce_mode)) {
- convert_attrs(state->istate, &ca, ce->name);
- return write_entry(ce, path.buf, &ca, state, 0);
+ if (S_ISREG(ce->ce_mode) && !ca) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
}
- return write_entry(ce, path.buf, NULL, state, 0);
+ return write_entry(ce, path.buf, ca, state, 0);
}
void unlink_entry(const struct cache_entry *ce)
diff --git a/entry.h b/entry.h
index 664aed1576..2081fbbbab 100644
--- a/entry.h
+++ b/entry.h
@@ -27,9 +27,18 @@ struct checkout {
* file named by ce, a temporary file is created by this function and
* its name is returned in topath[], which must be able to hold at
* least TEMPORARY_FILENAME_LENGTH bytes long.
+ *
+ * With checkout_entry_ca(), callers can optionally pass a preloaded
+ * conv_attrs struct (to avoid reloading it), when ce refers to a
+ * regular file. If ca is NULL, the attributes will be loaded
+ * internally when (and if) needed.
*/
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts);
+#define checkout_entry(ce, state, topath, nr_checkouts) \
+ checkout_entry_ca(ce, NULL, state, topath, nr_checkouts)
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts);
+
void enable_delayed_checkout(struct checkout *state);
int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
/*
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs
2020-10-29 2:14 ` [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
@ 2020-10-30 22:02 ` Junio C Hamano
0 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-30 22:02 UTC (permalink / raw)
To: Matheus Tavares; +Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren
Matheus Tavares <matheus.bernardino@usp.br> writes:
> The parallel checkout machinery will call checkout_entry() for entries
> that could not be written in parallel due to path collisions. At this
> point, we will already be holding the conversion attributes for each
> entry, and it would be wasteful to let checkout_entry() load these
> again. Instead, let's add the checkout_entry_ca() variant, which
> optionally takes a preloaded conv_attrs struct.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
I think my review comment to 08/19 is partly taken care of with this
step. Perhaps the progression will become simpler to understand if
we add this new helper first? I dunno. In either case, the end
result of applying both 08 and 09 looks quite nicer than the state
after up to 07 are applied.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (8 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-11-02 19:35 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 11/19] parallel-checkout: make it truly parallel Matheus Tavares
` (11 subsequent siblings)
21 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
This new interface allows us to enqueue some of the entries being
checked out to later call write_entry() for them in parallel. For now,
the parallel checkout machinery is enabled by default and there is no
user configuration, but run_parallel_checkout() just writes the queued
entries in sequence (without spawning additional workers). The next
patch will actually implement the parallelism and, later, we will make
it configurable.
When there are path collisions among the entries being written (which
can happen e.g. with case-sensitive files in case-insensitive file
systems), the parallel checkout code detects the problem and marks the
item with PC_ITEM_COLLIDED. Later, these items are sequentially fed to
checkout_entry() again. This is similar to the way the sequential code
deals with collisions, overwriting the previously checked out entries
with the subsequent ones. The only difference is that, when we start
writing the entries in parallel, we won't be able to determine which of
the colliding entries will survive on disk (for the sequential
algorithm, it is always the last one).
I also experimented with the idea of not overwriting colliding entries,
and it seemed to work well in my simple tests. However, because just one
entry of each colliding group would be actually written, the others
would have null lstat() fields on the index. This might not be a problem
by itself, but it could cause performance penalties for subsequent
commands that need to refresh the index: when the st_size value cached
is 0, read-cache.c:ie_modified() will go to the filesystem to see if the
contents match. As mentioned in the function:
* Immediately after read-tree or update-index --cacheinfo,
* the length field is zero, as we have never even read the
* lstat(2) information once, and we cannot trust DATA_CHANGED
* returned by ie_match_stat() which in turn was returned by
* ce_match_stat_basic() to signal that the filesize of the
* blob changed. We have to actually go to the filesystem to
* see if the contents match, and if so, should answer "unchanged".
So, if we have N entries in a colliding group and we decide to write and
lstat() only one of them, every subsequent git-status will have to read,
convert, and hash the written file N - 1 times, to check that the N - 1
unwritten entries are dirty. By checking out all colliding entries (like
the sequential code does), we only pay the overhead once.
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
Makefile | 1 +
entry.c | 17 +-
parallel-checkout.c | 368 ++++++++++++++++++++++++++++++++++++++++++++
parallel-checkout.h | 27 ++++
unpack-trees.c | 6 +-
5 files changed, 416 insertions(+), 3 deletions(-)
create mode 100644 parallel-checkout.c
create mode 100644 parallel-checkout.h
diff --git a/Makefile b/Makefile
index 1fb0ec1705..10ee5e709b 100644
--- a/Makefile
+++ b/Makefile
@@ -945,6 +945,7 @@ LIB_OBJS += pack-revindex.o
LIB_OBJS += pack-write.o
LIB_OBJS += packfile.o
LIB_OBJS += pager.o
+LIB_OBJS += parallel-checkout.o
LIB_OBJS += parse-options-cb.o
LIB_OBJS += parse-options.o
LIB_OBJS += patch-delta.o
diff --git a/entry.c b/entry.c
index 9d79a5671f..6676954431 100644
--- a/entry.c
+++ b/entry.c
@@ -7,6 +7,7 @@
#include "progress.h"
#include "fsmonitor.h"
#include "entry.h"
+#include "parallel-checkout.h"
static void create_directories(const char *path, int path_len,
const struct checkout *state)
@@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
- if (dup == ce)
- break;
+ if (dup == ce) {
+ /*
+ * Parallel checkout creates the files in no particular
+ * order. So the other side of the collision may appear
+ * after the given cache_entry in the array.
+ */
+ if (parallel_checkout_status() == PC_RUNNING)
+ continue;
+ else
+ break;
+ }
if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
@@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
ca = &ca_buf;
}
+ if (!enqueue_checkout(ce, ca))
+ return 0;
+
return write_entry(ce, path.buf, ca, state, 0);
}
diff --git a/parallel-checkout.c b/parallel-checkout.c
new file mode 100644
index 0000000000..981dbe6ff3
--- /dev/null
+++ b/parallel-checkout.c
@@ -0,0 +1,368 @@
+#include "cache.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "streaming.h"
+
+enum pc_item_status {
+ PC_ITEM_PENDING = 0,
+ PC_ITEM_WRITTEN,
+ /*
+ * The entry could not be written because there was another file
+ * already present in its path or leading directories. Since
+ * checkout_entry_ca() removes such files from the working tree before
+ * enqueueing the entry for parallel checkout, it means that there was
+ * a path collision among the entries being written.
+ */
+ PC_ITEM_COLLIDED,
+ PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+ /* pointer to a istate->cache[] entry. Not owned by us. */
+ struct cache_entry *ce;
+ struct conv_attrs ca;
+ struct stat st;
+ enum pc_item_status status;
+};
+
+struct parallel_checkout {
+ enum pc_status status;
+ struct parallel_checkout_item *items;
+ size_t nr, alloc;
+};
+
+static struct parallel_checkout parallel_checkout = { 0 };
+
+enum pc_status parallel_checkout_status(void)
+{
+ return parallel_checkout.status;
+}
+
+void init_parallel_checkout(void)
+{
+ if (parallel_checkout.status != PC_UNINITIALIZED)
+ BUG("parallel checkout already initialized");
+
+ parallel_checkout.status = PC_ACCEPTING_ENTRIES;
+}
+
+static void finish_parallel_checkout(void)
+{
+ if (parallel_checkout.status == PC_UNINITIALIZED)
+ BUG("cannot finish parallel checkout: not initialized yet");
+
+ free(parallel_checkout.items);
+ memset(¶llel_checkout, 0, sizeof(parallel_checkout));
+}
+
+static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
+ const struct conv_attrs *ca)
+{
+ enum conv_attrs_classification c;
+
+ if (!S_ISREG(ce->ce_mode))
+ return 0;
+
+ c = classify_conv_attrs(ca);
+ switch (c) {
+ case CA_CLASS_INCORE:
+ return 1;
+
+ case CA_CLASS_INCORE_FILTER:
+ /*
+ * It would be safe to allow concurrent instances of
+ * single-file smudge filters, like rot13, but we should not
+ * assume that all filters are parallel-process safe. So we
+ * don't allow this.
+ */
+ return 0;
+
+ case CA_CLASS_INCORE_PROCESS:
+ /*
+ * The parallel queue and the delayed queue are not compatible,
+ * so they must be kept completely separated. And we can't tell
+ * if a long-running process will delay its response without
+ * actually asking it to perform the filtering. Therefore, this
+ * type of filter is not allowed in parallel checkout.
+ *
+ * Furthermore, there should only be one instance of the
+ * long-running process filter as we don't know how it is
+ * managing its own concurrency. So, spreading the entries that
+ * requisite such a filter among the parallel workers would
+ * require a lot more inter-process communication. We would
+ * probably have to designate a single process to interact with
+ * the filter and send all the necessary data to it, for each
+ * entry.
+ */
+ return 0;
+
+ case CA_CLASS_STREAMABLE:
+ return 1;
+
+ default:
+ BUG("unsupported conv_attrs classification '%d'", c);
+ }
+}
+
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
+{
+ struct parallel_checkout_item *pc_item;
+
+ if (parallel_checkout.status != PC_ACCEPTING_ENTRIES ||
+ !is_eligible_for_parallel_checkout(ce, ca))
+ return -1;
+
+ ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
+ parallel_checkout.alloc);
+
+ pc_item = ¶llel_checkout.items[parallel_checkout.nr++];
+ pc_item->ce = ce;
+ memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
+ pc_item->status = PC_ITEM_PENDING;
+
+ return 0;
+}
+
+static int handle_results(struct checkout *state)
+{
+ int ret = 0;
+ size_t i;
+ int have_pending = 0;
+
+ /*
+ * We first update the successfully written entries with the collected
+ * stat() data, so that they can be found by mark_colliding_entries(),
+ * in the next loop, when necessary.
+ */
+ for (i = 0; i < parallel_checkout.nr; ++i) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+ if (pc_item->status == PC_ITEM_WRITTEN)
+ update_ce_after_write(state, pc_item->ce, &pc_item->st);
+ }
+
+ for (i = 0; i < parallel_checkout.nr; ++i) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+
+ switch(pc_item->status) {
+ case PC_ITEM_WRITTEN:
+ /* Already handled */
+ break;
+ case PC_ITEM_COLLIDED:
+ /*
+ * The entry could not be checked out due to a path
+ * collision with another entry. Since there can only
+ * be one entry of each colliding group on the disk, we
+ * could skip trying to check out this one and move on.
+ * However, this would leave the unwritten entries with
+ * null stat() fields on the index, which could
+ * potentially slow down subsequent operations that
+ * require refreshing it: git would not be able to
+ * trust st_size and would have to go to the filesystem
+ * to see if the contents match (see ie_modified()).
+ *
+ * Instead, let's pay the overhead only once, now, and
+ * call checkout_entry_ca() again for this file, to
+ * have it's stat() data stored in the index. This also
+ * has the benefit of adding this entry and its
+ * colliding pair to the collision report message.
+ * Additionally, this overwriting behavior is consistent
+ * with what the sequential checkout does, so it doesn't
+ * add any extra overhead.
+ */
+ ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
+ state, NULL, NULL);
+ break;
+ case PC_ITEM_PENDING:
+ have_pending = 1;
+ /* fall through */
+ case PC_ITEM_FAILED:
+ ret = -1;
+ break;
+ default:
+ BUG("unknown checkout item status in parallel checkout");
+ }
+ }
+
+ if (have_pending)
+ error(_("parallel checkout finished with pending entries"));
+
+ return ret;
+}
+
+static int reset_fd(int fd, const char *path)
+{
+ if (lseek(fd, 0, SEEK_SET) != 0)
+ return error_errno("failed to rewind descriptor of %s", path);
+ if (ftruncate(fd, 0))
+ return error_errno("failed to truncate file %s", path);
+ return 0;
+}
+
+static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
+ const char *path)
+{
+ int ret;
+ struct stream_filter *filter;
+ struct strbuf buf = STRBUF_INIT;
+ char *new_blob;
+ unsigned long size;
+ size_t newsize = 0;
+ ssize_t wrote;
+
+ /* Sanity check */
+ assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
+
+ filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
+ if (filter) {
+ if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
+ /* On error, reset fd to try writing without streaming */
+ if (reset_fd(fd, path))
+ return -1;
+ } else {
+ return 0;
+ }
+ }
+
+ new_blob = read_blob_entry(pc_item->ce, &size);
+ if (!new_blob)
+ return error("unable to read sha1 file of %s (%s)", path,
+ oid_to_hex(&pc_item->ce->oid));
+
+ /*
+ * checkout metadata is used to give context for external process
+ * filters. Files requiring such filters are not eligible for parallel
+ * checkout, so pass NULL.
+ */
+ ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
+ new_blob, size, &buf, NULL);
+
+ if (ret) {
+ free(new_blob);
+ new_blob = strbuf_detach(&buf, &newsize);
+ size = newsize;
+ }
+
+ wrote = write_in_full(fd, new_blob, size);
+ free(new_blob);
+ if (wrote < 0)
+ return error("unable to write file %s", path);
+
+ return 0;
+}
+
+static int close_and_clear(int *fd)
+{
+ int ret = 0;
+
+ if (*fd >= 0) {
+ ret = close(*fd);
+ *fd = -1;
+ }
+
+ return ret;
+}
+
+static int check_leading_dirs(const char *path, int len, int prefix_len)
+{
+ const char *slash = path + len;
+
+ while (slash > path && *slash != '/')
+ slash--;
+
+ return has_dirs_only_path(path, slash - path, prefix_len);
+}
+
+static void write_pc_item(struct parallel_checkout_item *pc_item,
+ struct checkout *state)
+{
+ unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
+ int fd = -1, fstat_done = 0;
+ struct strbuf path = STRBUF_INIT;
+
+ strbuf_add(&path, state->base_dir, state->base_dir_len);
+ strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
+
+ /*
+ * At this point, leading dirs should have already been created. But if
+ * a symlink being checked out has collided with one of the dirs, due to
+ * file system folding rules, it's possible that the dirs are no longer
+ * present. So we have to check again, and report any path collisions.
+ */
+ if (!check_leading_dirs(path.buf, path.len, state->base_dir_len)) {
+ pc_item->status = PC_ITEM_COLLIDED;
+ goto out;
+ }
+
+ fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode);
+
+ if (fd < 0) {
+ if (errno == EEXIST || errno == EISDIR) {
+ /*
+ * Errors which probably represent a path collision.
+ * Suppress the error message and mark the item to be
+ * retried later, sequentially. ENOTDIR and ENOENT are
+ * also interesting, but check_leading_dirs() should
+ * have already caught these cases.
+ */
+ pc_item->status = PC_ITEM_COLLIDED;
+ } else {
+ error_errno("failed to open file %s", path.buf);
+ pc_item->status = PC_ITEM_FAILED;
+ }
+ goto out;
+ }
+
+ if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
+ /* Error was already reported. */
+ pc_item->status = PC_ITEM_FAILED;
+ goto out;
+ }
+
+ fstat_done = fstat_checkout_output(fd, state, &pc_item->st);
+
+ if (close_and_clear(&fd)) {
+ error_errno("unable to close file %s", path.buf);
+ pc_item->status = PC_ITEM_FAILED;
+ goto out;
+ }
+
+ if (state->refresh_cache && !fstat_done && lstat(path.buf, &pc_item->st) < 0) {
+ error_errno("unable to stat just-written file %s", path.buf);
+ pc_item->status = PC_ITEM_FAILED;
+ goto out;
+ }
+
+ pc_item->status = PC_ITEM_WRITTEN;
+
+out:
+ /*
+ * No need to check close() return. At this point, either fd is already
+ * closed, or we are on an error path, that has already been reported.
+ */
+ close_and_clear(&fd);
+ strbuf_release(&path);
+}
+
+static void write_items_sequentially(struct checkout *state)
+{
+ size_t i;
+
+ for (i = 0; i < parallel_checkout.nr; ++i)
+ write_pc_item(¶llel_checkout.items[i], state);
+}
+
+int run_parallel_checkout(struct checkout *state)
+{
+ int ret;
+
+ if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
+ BUG("cannot run parallel checkout: uninitialized or already running");
+
+ parallel_checkout.status = PC_RUNNING;
+
+ write_items_sequentially(state);
+ ret = handle_results(state);
+
+ finish_parallel_checkout();
+ return ret;
+}
diff --git a/parallel-checkout.h b/parallel-checkout.h
new file mode 100644
index 0000000000..e6d6fc01ea
--- /dev/null
+++ b/parallel-checkout.h
@@ -0,0 +1,27 @@
+#ifndef PARALLEL_CHECKOUT_H
+#define PARALLEL_CHECKOUT_H
+
+struct cache_entry;
+struct checkout;
+struct conv_attrs;
+
+enum pc_status {
+ PC_UNINITIALIZED = 0,
+ PC_ACCEPTING_ENTRIES,
+ PC_RUNNING,
+};
+
+enum pc_status parallel_checkout_status(void);
+void init_parallel_checkout(void);
+
+/*
+ * Return -1 if parallel checkout is currently not enabled or if the entry is
+ * not eligible for parallel checkout. Otherwise, enqueue the entry for later
+ * write and return 0.
+ */
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+
+/* Write all the queued entries, returning 0 on success.*/
+int run_parallel_checkout(struct checkout *state);
+
+#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index a511fadd89..1b1da7485a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -17,6 +17,7 @@
#include "object-store.h"
#include "promisor-remote.h"
#include "entry.h"
+#include "parallel-checkout.h"
/*
* Error messages expected by scripts out of plumbing commands such as
@@ -438,7 +439,6 @@ static int check_updates(struct unpack_trees_options *o,
if (should_update_submodules())
load_gitmodules_file(index, &state);
- enable_delayed_checkout(&state);
if (has_promisor_remote()) {
/*
* Prefetch the objects that are to be checked out in the loop
@@ -461,6 +461,9 @@ static int check_updates(struct unpack_trees_options *o,
to_fetch.oid, to_fetch.nr);
oid_array_clear(&to_fetch);
}
+
+ enable_delayed_checkout(&state);
+ init_parallel_checkout();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
@@ -474,6 +477,7 @@ static int check_updates(struct unpack_trees_options *o,
}
}
stop_progress(&progress);
+ errs |= run_parallel_checkout(&state);
errs |= finish_delayed_checkout(&state, NULL);
git_attr_set_direction(GIT_ATTR_CHECKIN);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout
2020-10-29 2:14 ` [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2020-11-02 19:35 ` Junio C Hamano
2020-11-03 3:48 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Junio C Hamano @ 2020-11-02 19:35 UTC (permalink / raw)
To: Matheus Tavares; +Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren
Matheus Tavares <matheus.bernardino@usp.br> writes:
> This new interface allows us to enqueue some of the entries being
> checked out to later call write_entry() for them in parallel. For now,
> the parallel checkout machinery is enabled by default and there is no
> user configuration, but run_parallel_checkout() just writes the queued
> entries in sequence (without spawning additional workers).
In other words, this would show the worst case overhead caused by
the framework to allow parallel checkout, relative to the current
code. Which is quite a sensible and separate step to have in the
series. I like it.
> The next
> patch will actually implement the parallelism and, later, we will make
> it configurable.
OK.
> When there are path collisions among the entries being written (which
> can happen e.g. with case-sensitive files in case-insensitive file
> systems), the parallel checkout code detects the problem and marks the
> item with PC_ITEM_COLLIDED. Later, these items are sequentially fed to
> checkout_entry() again. This is similar to the way the sequential code
> deals with collisions, overwriting the previously checked out entries
> with the subsequent ones. The only difference is that, when we start
> writing the entries in parallel, we won't be able to determine which of
> the colliding entries will survive on disk (for the sequential
> algorithm, it is always the last one).
Sure. "The last one" determinism does not buy us very much, but it
is prudent to keep such a behavioural difference in mind.
> I also experimented with the idea of not overwriting colliding entries,
> and it seemed to work well in my simple tests. However, because just one
> entry of each colliding group would be actually written, the others
> would have null lstat() fields on the index. This might not be a problem
> by itself, but it could cause performance penalties for subsequent
> commands that need to refresh the index: when the st_size value cached
> is 0, read-cache.c:ie_modified() will go to the filesystem to see if the
> contents match. As mentioned in the function:
>
> * Immediately after read-tree or update-index --cacheinfo,
> * the length field is zero, as we have never even read the
> * lstat(2) information once, and we cannot trust DATA_CHANGED
> * returned by ie_match_stat() which in turn was returned by
> * ce_match_stat_basic() to signal that the filesize of the
> * blob changed. We have to actually go to the filesystem to
> * see if the contents match, and if so, should answer "unchanged".
>
> So, if we have N entries in a colliding group and we decide to write and
> lstat() only one of them, every subsequent git-status will have to read,
> convert, and hash the written file N - 1 times, to check that the N - 1
> unwritten entries are dirty. By checking out all colliding entries (like
> the sequential code does), we only pay the overhead once.
And the cost is to writing them out N times is not free, either, I
presume?
But I do not see the point of wasting engineering effort by trying
to make it more efficient to create a corrupt working tree that is
unusable because some paths that ought to exist are missing, so I
think it is OK.
> Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> Makefile | 1 +
> entry.c | 17 +-
> parallel-checkout.c | 368 ++++++++++++++++++++++++++++++++++++++++++++
> parallel-checkout.h | 27 ++++
> unpack-trees.c | 6 +-
> 5 files changed, 416 insertions(+), 3 deletions(-)
> create mode 100644 parallel-checkout.c
> create mode 100644 parallel-checkout.h
>
> diff --git a/Makefile b/Makefile
> index 1fb0ec1705..10ee5e709b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -945,6 +945,7 @@ LIB_OBJS += pack-revindex.o
> LIB_OBJS += pack-write.o
> LIB_OBJS += packfile.o
> LIB_OBJS += pager.o
> +LIB_OBJS += parallel-checkout.o
> LIB_OBJS += parse-options-cb.o
> LIB_OBJS += parse-options.o
> LIB_OBJS += patch-delta.o
> diff --git a/entry.c b/entry.c
> index 9d79a5671f..6676954431 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -7,6 +7,7 @@
> #include "progress.h"
> #include "fsmonitor.h"
> #include "entry.h"
> +#include "parallel-checkout.h"
>
> static void create_directories(const char *path, int path_len,
> const struct checkout *state)
> @@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
> for (i = 0; i < state->istate->cache_nr; i++) {
> struct cache_entry *dup = state->istate->cache[i];
>
> - if (dup == ce)
> - break;
> + if (dup == ce) {
> + /*
> + * Parallel checkout creates the files in no particular
> + * order. So the other side of the collision may appear
> + * after the given cache_entry in the array.
> + */
> + if (parallel_checkout_status() == PC_RUNNING)
> + continue;
> + else
> + break;
> + }
>
> if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> continue;
> @@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
> ca = &ca_buf;
> }
>
> + if (!enqueue_checkout(ce, ca))
> + return 0;
> +
> return write_entry(ce, path.buf, ca, state, 0);
It it is not wrong but feels strange that paths that cannot be
handled by parallel codepath for whatever reason are written using
the fallback code, but the fallback actually touches the disk before
the queued paths for parallel writeout ;-) What's the reason why
some paths cannot be handled by the new codepath again? Also, can a
path that is handled by the fallback code collide with other paths
that are handled by the parallel codepath, and what happens for
these paths?
> }
>
> diff --git a/parallel-checkout.c b/parallel-checkout.c
> new file mode 100644
> index 0000000000..981dbe6ff3
> --- /dev/null
> +++ b/parallel-checkout.c
> @@ -0,0 +1,368 @@
> +#include "cache.h"
> +#include "entry.h"
> +#include "parallel-checkout.h"
> +#include "streaming.h"
> +
> +enum pc_item_status {
> + PC_ITEM_PENDING = 0,
> + PC_ITEM_WRITTEN,
> + /*
> + * The entry could not be written because there was another file
> + * already present in its path or leading directories. Since
> + * checkout_entry_ca() removes such files from the working tree before
> + * enqueueing the entry for parallel checkout, it means that there was
> + * a path collision among the entries being written.
> + */
> + PC_ITEM_COLLIDED,
> + PC_ITEM_FAILED,
> +};
> +
> +struct parallel_checkout_item {
> + /* pointer to a istate->cache[] entry. Not owned by us. */
> + struct cache_entry *ce;
> + struct conv_attrs ca;
> + struct stat st;
> + enum pc_item_status status;
> +};
> +
> +struct parallel_checkout {
> + enum pc_status status;
> + struct parallel_checkout_item *items;
> + size_t nr, alloc;
> +};
> +
> +static struct parallel_checkout parallel_checkout = { 0 };
Can't we let this handled by BSS by not explicitly giving an initial
value?
> +enum pc_status parallel_checkout_status(void)
> +{
> + return parallel_checkout.status;
> +}
> +
> +void init_parallel_checkout(void)
> +{
> + if (parallel_checkout.status != PC_UNINITIALIZED)
> + BUG("parallel checkout already initialized");
> +
> + parallel_checkout.status = PC_ACCEPTING_ENTRIES;
> +}
> +
> +static void finish_parallel_checkout(void)
> +{
> + if (parallel_checkout.status == PC_UNINITIALIZED)
> + BUG("cannot finish parallel checkout: not initialized yet");
> +
> + free(parallel_checkout.items);
> + memset(¶llel_checkout, 0, sizeof(parallel_checkout));
> +}
> +
> +static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
> + const struct conv_attrs *ca)
> +{
> + enum conv_attrs_classification c;
> +
> + if (!S_ISREG(ce->ce_mode))
> + return 0;
> +
> + c = classify_conv_attrs(ca);
> + switch (c) {
> + case CA_CLASS_INCORE:
> + return 1;
> +
> + case CA_CLASS_INCORE_FILTER:
> + /*
> + * It would be safe to allow concurrent instances of
> + * single-file smudge filters, like rot13, but we should not
> + * assume that all filters are parallel-process safe. So we
> + * don't allow this.
> + */
> + return 0;
> +
> + case CA_CLASS_INCORE_PROCESS:
> + /*
> + * The parallel queue and the delayed queue are not compatible,
> + * so they must be kept completely separated. And we can't tell
> + * if a long-running process will delay its response without
> + * actually asking it to perform the filtering. Therefore, this
> + * type of filter is not allowed in parallel checkout.
> + *
> + * Furthermore, there should only be one instance of the
> + * long-running process filter as we don't know how it is
> + * managing its own concurrency. So, spreading the entries that
> + * requisite such a filter among the parallel workers would
> + * require a lot more inter-process communication. We would
> + * probably have to designate a single process to interact with
> + * the filter and send all the necessary data to it, for each
> + * entry.
> + */
> + return 0;
> +
> + case CA_CLASS_STREAMABLE:
> + return 1;
> +
> + default:
> + BUG("unsupported conv_attrs classification '%d'", c);
> + }
> +}
OK, the comments fairly clearly explain the reason for each case.
Good.
> +static int handle_results(struct checkout *state)
> +{
> + int ret = 0;
> + size_t i;
> + int have_pending = 0;
> +
> + /*
> + * We first update the successfully written entries with the collected
> + * stat() data, so that they can be found by mark_colliding_entries(),
> + * in the next loop, when necessary.
> + */
> + for (i = 0; i < parallel_checkout.nr; ++i) {
We encourage post_increment++ when there is no particular reason to
do otherwise in this codebase (I won't repeat in the remainder of
this review).
> +static int reset_fd(int fd, const char *path)
> +{
> + if (lseek(fd, 0, SEEK_SET) != 0)
> + return error_errno("failed to rewind descriptor of %s", path);
> + if (ftruncate(fd, 0))
> + return error_errno("failed to truncate file %s", path);
> + return 0;
> +}
This is in the error codepath when streaming fails, and we'll later
attempt the normal "read object in-core, write it out" codepath, but
is it enough to just ftruncate() it? I am wondering why it is OK
not to unlink() the failed one---is it the caller who is responsible
for opening the file descriptor to write to, and at the layer of the
caller of this helper there is no way to re-open it, or something
like that?
... /me looks ahead and it seems the answer is "yes".
> +static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
> + const char *path)
> ...
> + if (filter) {
> + if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
> + /* On error, reset fd to try writing without streaming */
> + if (reset_fd(fd, path))
> + return -1;
> + } else {
> + return 0;
> + }
> + }
> +
> + new_blob = read_blob_entry(pc_item->ce, &size);
> ...
> + wrote = write_in_full(fd, new_blob, size);
> +static int check_leading_dirs(const char *path, int len, int prefix_len)
> +{
> + const char *slash = path + len;
> +
> + while (slash > path && *slash != '/')
> + slash--;
It is kind of surprising that we do not give us an easy-to-use
helper to find the separtor between dirname and basename. If there
were, we do not even need this helper function with an unclear name
(i.e. "check" does not mean much to those who are trying to
understand the caller---"leading directories are checked for
what???" will be their question).
Perhaps create or find such a helper to remove this function and use
has_dirs_only_path() directly in the caller?
> + return has_dirs_only_path(path, slash - path, prefix_len);
> +}
> +static void write_pc_item(struct parallel_checkout_item *pc_item,
> + struct checkout *state)
> +{
> + unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
> + int fd = -1, fstat_done = 0;
> + struct strbuf path = STRBUF_INIT;
> +
> + strbuf_add(&path, state->base_dir, state->base_dir_len);
> + strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
> +
> + /*
> + * At this point, leading dirs should have already been created. But if
> + * a symlink being checked out has collided with one of the dirs, due to
> + * file system folding rules, it's possible that the dirs are no longer
Is "file system folding rule" clear to readers of the code after
this patch lands? It isn't at least to me.
> + * present. So we have to check again, and report any path collisions.
> + */
> + if (!check_leading_dirs(path.buf, path.len, state->base_dir_len)) {
> + pc_item->status = PC_ITEM_COLLIDED;
> + goto out;
> + }
Thanks.
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout
2020-11-02 19:35 ` Junio C Hamano
@ 2020-11-03 3:48 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-11-03 3:48 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jeff Hostetler, Christian Couder, Jeff King, Elijah Newren,
Jonathan Nieder, Martin Ågren
On Mon, Nov 2, 2020 at 4:35 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
[...]
> >
> > @@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
> > ca = &ca_buf;
> > }
> >
> > + if (!enqueue_checkout(ce, ca))
> > + return 0;
> > +
> > return write_entry(ce, path.buf, ca, state, 0);
>
> It it is not wrong but feels strange that paths that cannot be
> handled by parallel codepath for whatever reason are written using
> the fallback code, but the fallback actually touches the disk before
> the queued paths for parallel writeout ;-)
Yeah... I also considered having a second "sequential_checkout_item"
queue, and iterating it after the parallel-eligible entries. But I
thought that it might be better to write the ineligible entries right
away and save a little memory (especially for the regular files, for
which we would also have to hold the conversion attributes).
With that said, I ended up adding a second queue in part 2, just for
symlinks. By postponing the checkout of symlinks we can avoid the
check_leading_dirs() function and the additional lstat() calls in the
workers. This also makes it possible to create the leading directories
in parallel (in part 3) with raceproof_create_file(), which is quite
nice as it only calls stat() when open() fails. And since symlinks
probably appear in smaller numbers than regular files, this second
queue should never get too long.
> What's the reason why
> some paths cannot be handled by the new codepath again?
Submodules and symlinks are not eligible for parallel checkout mainly
because it would be hard to detect collisions when they are involved.
For symlinks, one worker could create the symlink a/b => d right
before another worker tries to open() and write() a/b/c, which would
then produce the wrong a/d/c file. And for submodules, we could have a
worker checking out a submodule S while another worker writes the
colliding regular file s/f.
As for regular files, we don't parallelize the checkout of entries
which require external filters, mainly because we cannot guarantee
that such filters are parallel-process safe. But also, the
delayed-checkout queue is incompatible with the parallel-checkout
queue (in the sense that each entry should only be present in one of
the two queues).
> Also, can a
> path that is handled by the fallback code collide with other paths
> that are handled by the parallel codepath, and what happens for
> these paths?
Yes, it can happen. But the parallel-checkout machinery should be
ready for it. There are two cases:
1. Both paths collide in the basename (e.g. a/b and a/B)
2. One path collide in the dirname (e.g. a/b and a/B/c)
For both cases, the collision will happen when trying to write the
parallel-eligible path. This happens because, for now, all paths that
are ineligible for parallel-checkout are checked out first. So, in the
first case, we will detect the collision when open() fails in
write_pc_item().
The second case is a little trickier, since [in part 1] we create the
leading directories right before enqueueing an entry for
parallel-checkout. An ineligible entry could then collide with the
dirname of an already enqueued parallel-eligible entry, removing (and
replacing) the created dirs. Also, the ineligible entry could be a
symlink, and we want to avoid the case of workers writing the entry
a/b/c at a/d/c due to a symlink in b. These collisions with the
dirname are detected when has_dirs_only_path() fails in
check_leading_dirs().
Furthermore, there is no risk that has_dirs_only_path() succeeds, but
then another entry collides with the leading directories before the
actual checkout. Because, when we start the workers, no file or
directory is ever removed.
> > }
> >
> > diff --git a/parallel-checkout.c b/parallel-checkout.c
> > new file mode 100644
> > index 0000000000..981dbe6ff3
> > --- /dev/null
> > +++ b/parallel-checkout.c
> > @@ -0,0 +1,368 @@
> > +#include "cache.h"
> > +#include "entry.h"
> > +#include "parallel-checkout.h"
> > +#include "streaming.h"
> > +
> > +enum pc_item_status {
> > + PC_ITEM_PENDING = 0,
> > + PC_ITEM_WRITTEN,
> > + /*
> > + * The entry could not be written because there was another file
> > + * already present in its path or leading directories. Since
> > + * checkout_entry_ca() removes such files from the working tree before
> > + * enqueueing the entry for parallel checkout, it means that there was
> > + * a path collision among the entries being written.
> > + */
> > + PC_ITEM_COLLIDED,
> > + PC_ITEM_FAILED,
> > +};
> > +
> > +struct parallel_checkout_item {
> > + /* pointer to a istate->cache[] entry. Not owned by us. */
> > + struct cache_entry *ce;
> > + struct conv_attrs ca;
> > + struct stat st;
> > + enum pc_item_status status;
> > +};
> > +
> > +struct parallel_checkout {
> > + enum pc_status status;
> > + struct parallel_checkout_item *items;
> > + size_t nr, alloc;
> > +};
> > +
> > +static struct parallel_checkout parallel_checkout = { 0 };
>
> Can't we let this handled by BSS by not explicitly giving an initial
> value?
Good catch, thanks.
> > +enum pc_status parallel_checkout_status(void)
> > +{
> > + return parallel_checkout.status;
> > +}
> > +
> > +void init_parallel_checkout(void)
> > +{
> > + if (parallel_checkout.status != PC_UNINITIALIZED)
> > + BUG("parallel checkout already initialized");
> > +
> > + parallel_checkout.status = PC_ACCEPTING_ENTRIES;
> > +}
> > +
> > +static void finish_parallel_checkout(void)
> > +{
> > + if (parallel_checkout.status == PC_UNINITIALIZED)
> > + BUG("cannot finish parallel checkout: not initialized yet");
> > +
> > + free(parallel_checkout.items);
> > + memset(¶llel_checkout, 0, sizeof(parallel_checkout));
> > +}
> > +
> > +static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
> > + const struct conv_attrs *ca)
> > +{
> > + enum conv_attrs_classification c;
> > +
> > + if (!S_ISREG(ce->ce_mode))
> > + return 0;
> > +
> > + c = classify_conv_attrs(ca);
> > + switch (c) {
> > + case CA_CLASS_INCORE:
> > + return 1;
> > +
> > + case CA_CLASS_INCORE_FILTER:
> > + /*
> > + * It would be safe to allow concurrent instances of
> > + * single-file smudge filters, like rot13, but we should not
> > + * assume that all filters are parallel-process safe. So we
> > + * don't allow this.
> > + */
> > + return 0;
> > +
> > + case CA_CLASS_INCORE_PROCESS:
> > + /*
> > + * The parallel queue and the delayed queue are not compatible,
> > + * so they must be kept completely separated. And we can't tell
> > + * if a long-running process will delay its response without
> > + * actually asking it to perform the filtering. Therefore, this
> > + * type of filter is not allowed in parallel checkout.
> > + *
> > + * Furthermore, there should only be one instance of the
> > + * long-running process filter as we don't know how it is
> > + * managing its own concurrency. So, spreading the entries that
> > + * requisite such a filter among the parallel workers would
> > + * require a lot more inter-process communication. We would
> > + * probably have to designate a single process to interact with
> > + * the filter and send all the necessary data to it, for each
> > + * entry.
> > + */
> > + return 0;
> > +
> > + case CA_CLASS_STREAMABLE:
> > + return 1;
> > +
> > + default:
> > + BUG("unsupported conv_attrs classification '%d'", c);
> > + }
> > +}
>
> OK, the comments fairly clearly explain the reason for each case.
> Good.
>
> > +static int handle_results(struct checkout *state)
> > +{
> > + int ret = 0;
> > + size_t i;
> > + int have_pending = 0;
> > +
> > + /*
> > + * We first update the successfully written entries with the collected
> > + * stat() data, so that they can be found by mark_colliding_entries(),
> > + * in the next loop, when necessary.
> > + */
> > + for (i = 0; i < parallel_checkout.nr; ++i) {
>
> We encourage post_increment++ when there is no particular reason to
> do otherwise in this codebase (I won't repeat in the remainder of
> this review).
OK, I will fix the pre-increments, thanks.
> > +static int reset_fd(int fd, const char *path)
> > +{
> > + if (lseek(fd, 0, SEEK_SET) != 0)
> > + return error_errno("failed to rewind descriptor of %s", path);
> > + if (ftruncate(fd, 0))
> > + return error_errno("failed to truncate file %s", path);
> > + return 0;
> > +}
>
> This is in the error codepath when streaming fails, and we'll later
> attempt the normal "read object in-core, write it out" codepath, but
> is it enough to just ftruncate() it? I am wondering why it is OK
> not to unlink() the failed one---is it the caller who is responsible
> for opening the file descriptor to write to, and at the layer of the
> caller of this helper there is no way to re-open it, or something
> like that?
Right. We also avoid unlinking the failed one to keep the invariant
that the first worker to successfully open(O_CREAT | O_EXCL) a file
has the "ownership" for that path. So other workers that try to open
the same path will know that there is a collision and can immediately
abort checking out their entry.
> ... /me looks ahead and it seems the answer is "yes".
>
> > +static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
> > + const char *path)
> > ...
> > + if (filter) {
> > + if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
> > + /* On error, reset fd to try writing without streaming */
> > + if (reset_fd(fd, path))
> > + return -1;
> > + } else {
> > + return 0;
> > + }
> > + }
> > +
> > + new_blob = read_blob_entry(pc_item->ce, &size);
> > ...
> > + wrote = write_in_full(fd, new_blob, size);
>
> > +static int check_leading_dirs(const char *path, int len, int prefix_len)
> > +{
> > + const char *slash = path + len;
> > +
> > + while (slash > path && *slash != '/')
> > + slash--;
>
> It is kind of surprising that we do not give us an easy-to-use
> helper to find the separtor between dirname and basename. If there
> were, we do not even need this helper function with an unclear name
> (i.e. "check" does not mean much to those who are trying to
> understand the caller---"leading directories are checked for
> what???" will be their question).
>
> Perhaps create or find such a helper to remove this function and use
> has_dirs_only_path() directly in the caller?
OK, I'll look into it. It would be better if we can reuse an already
present helper, since this call to has_dirs_only_path() will be
removed in part 2.
> > + return has_dirs_only_path(path, slash - path, prefix_len);
> > +}
>
> > +static void write_pc_item(struct parallel_checkout_item *pc_item,
> > + struct checkout *state)
> > +{
> > + unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
> > + int fd = -1, fstat_done = 0;
> > + struct strbuf path = STRBUF_INIT;
> > +
> > + strbuf_add(&path, state->base_dir, state->base_dir_len);
> > + strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
> > +
> > + /*
> > + * At this point, leading dirs should have already been created. But if
> > + * a symlink being checked out has collided with one of the dirs, due to
> > + * file system folding rules, it's possible that the dirs are no longer
>
> Is "file system folding rule" clear to readers of the code after
> this patch lands? It isn't at least to me.
OK, I will rephrase this paragraph to make it clearer.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v3 11/19] parallel-checkout: make it truly parallel
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (9 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 12/19] parallel-checkout: support progress displaying Matheus Tavares
` (10 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Use multiple worker processes to distribute the queued entries and call
write_checkout_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could
affect performance due to lock contention in the kernel. Work stealing
(or any other format of re-distribution) is not implemented yet.
The parallel version was benchmarked during three operations in the
linux repo, with cold cache: cloning v5.8, checking out v5.8 from
v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The
four tables below show the mean run times and standard deviations for
5 runs in: a local file system with SSD, a local file system with HDD, a
Linux NFS server, and Amazon EFS. The numbers of workers were chosen
based on what produces the best result for each case.
Local SSD:
Clone Checkout I Checkout II
Sequential 8.171 s ± 0.206 s 8.735 s ± 0.230 s 4.166 s ± 0.246 s
10 workers 3.277 s ± 0.138 s 3.774 s ± 0.188 s 2.561 s ± 0.120 s
Speedup 2.49 ± 0.12 2.31 ± 0.13 1.63 ± 0.12
Local HDD:
Clone Checkout I Checkout II
Sequential 35.157 s ± 0.205 s 48.835 s ± 0.407 s 47.302 s ± 1.435 s
8 workers 35.538 s ± 0.325 s 49.353 s ± 0.826 s 48.919 s ± 0.416 s
Speedup 0.99 ± 0.01 0.99 ± 0.02 0.97 ± 0.03
Linux NFS server (v4.1, on EBS, single availability zone):
Clone Checkout I Checkout II
Sequential 216.070 s ± 3.611 s 211.169 s ± 3.147 s 57.446 s ± 1.301 s
32 workers 67.997 s ± 0.740 s 66.563 s ± 0.457 s 23.708 s ± 0.622 s
Speedup 3.18 ± 0.06 3.17 ± 0.05 2.42 ± 0.08
EFS (v4.1, replicated over multiple availability zones):
Clone Checkout I Checkout II
Sequential 1249.329 s ± 13.857 s 1438.979 s ± 78.792 s 543.919 s ± 18.745 s
64 workers 225.864 s ± 12.433 s 316.345 s ± 1.887 s 183.648 s ± 10.095 s
Speedup 5.53 ± 0.31 4.55 ± 0.25 2.96 ± 0.19
The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. In fact, it can
even increase the run time. For this reason, the sequential code is
still the default. Two settings are added to optionally enable and
configure the new parallel version as desired.
Local SSD tests were executed in an i7-7700HQ (4 cores with
hyper-threading) running Manjaro Linux. Local HDD tests were executed in
an i7-2600 (also 4 cores with hyper-threading), HDD Seagate Barracuda
7200 rpm SATA 3.0, running Debian 9.13. NFS and EFS tests were
executed in an Amazon EC2 c5n.large instance, with 2 vCPUs. The Linux
NFS server was running on a m6g.large instance with 1 TB, EBS GP2
volume. Before each timing, the linux repository was removed (or checked
out back), and `sync && sysctl vm.drop_caches=3` was executed.
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
.gitignore | 1 +
Documentation/config/checkout.txt | 21 +++
Makefile | 1 +
builtin.h | 1 +
builtin/checkout--helper.c | 142 +++++++++++++++
git.c | 2 +
parallel-checkout.c | 280 +++++++++++++++++++++++++++---
parallel-checkout.h | 84 ++++++++-
unpack-trees.c | 10 +-
9 files changed, 508 insertions(+), 34 deletions(-)
create mode 100644 builtin/checkout--helper.c
diff --git a/.gitignore b/.gitignore
index 6232d33924..1a341ea184 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,7 @@
/git-check-mailmap
/git-check-ref-format
/git-checkout
+/git-checkout--helper
/git-checkout-index
/git-cherry
/git-cherry-pick
diff --git a/Documentation/config/checkout.txt b/Documentation/config/checkout.txt
index 6b646813ab..23e8f7cde0 100644
--- a/Documentation/config/checkout.txt
+++ b/Documentation/config/checkout.txt
@@ -16,3 +16,24 @@ will checkout the '<something>' branch on another remote,
and by linkgit:git-worktree[1] when 'git worktree add' refers to a
remote branch. This setting might be used for other checkout-like
commands or functionality in the future.
+
+checkout.workers::
+ The number of parallel workers to use when updating the working tree.
+ The default is one, i.e. sequential execution. If set to a value less
+ than one, Git will use as many workers as the number of logical cores
+ available. This setting and `checkout.thresholdForParallelism` affect
+ all commands that perform checkout. E.g. checkout, clone, reset,
+ sparse-checkout, etc.
++
+Note: parallel checkout usually delivers better performance for repositories
+located on SSDs or over NFS. For repositories on spinning disks and/or machines
+with a small number of cores, the default sequential checkout often performs
+better. The size and compression level of a repository might also influence how
+well the parallel version performs.
+
+checkout.thresholdForParallelism::
+ When running parallel checkout with a small number of files, the cost
+ of subprocess spawning and inter-process communication might outweigh
+ the parallelization gains. This setting allows to define the minimum
+ number of files for which parallel checkout should be attempted. The
+ default is 100.
diff --git a/Makefile b/Makefile
index 10ee5e709b..535e6e94aa 100644
--- a/Makefile
+++ b/Makefile
@@ -1063,6 +1063,7 @@ BUILTIN_OBJS += builtin/check-attr.o
BUILTIN_OBJS += builtin/check-ignore.o
BUILTIN_OBJS += builtin/check-mailmap.o
BUILTIN_OBJS += builtin/check-ref-format.o
+BUILTIN_OBJS += builtin/checkout--helper.o
BUILTIN_OBJS += builtin/checkout-index.o
BUILTIN_OBJS += builtin/checkout.o
BUILTIN_OBJS += builtin/clean.o
diff --git a/builtin.h b/builtin.h
index 53fb290963..2abbe14b0b 100644
--- a/builtin.h
+++ b/builtin.h
@@ -123,6 +123,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix);
int cmd_bundle(int argc, const char **argv, const char *prefix);
int cmd_cat_file(int argc, const char **argv, const char *prefix);
int cmd_checkout(int argc, const char **argv, const char *prefix);
+int cmd_checkout__helper(int argc, const char **argv, const char *prefix);
int cmd_checkout_index(int argc, const char **argv, const char *prefix);
int cmd_check_attr(int argc, const char **argv, const char *prefix);
int cmd_check_ignore(int argc, const char **argv, const char *prefix);
diff --git a/builtin/checkout--helper.c b/builtin/checkout--helper.c
new file mode 100644
index 0000000000..67fe37cf11
--- /dev/null
+++ b/builtin/checkout--helper.c
@@ -0,0 +1,142 @@
+#include "builtin.h"
+#include "config.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static void packet_to_pc_item(char *line, int len,
+ struct parallel_checkout_item *pc_item)
+{
+ struct pc_item_fixed_portion *fixed_portion;
+ char *encoding, *variant;
+
+ if (len < sizeof(struct pc_item_fixed_portion))
+ BUG("checkout worker received too short item (got %dB, exp %dB)",
+ len, (int)sizeof(struct pc_item_fixed_portion));
+
+ fixed_portion = (struct pc_item_fixed_portion *)line;
+
+ if (len - sizeof(struct pc_item_fixed_portion) !=
+ fixed_portion->name_len + fixed_portion->working_tree_encoding_len)
+ BUG("checkout worker received corrupted item");
+
+ variant = line + sizeof(struct pc_item_fixed_portion);
+
+ /*
+ * Note: the main process uses zero length to communicate that the
+ * encoding is NULL. There is no use case in actually sending an empty
+ * string since it's considered as NULL when ca.working_tree_encoding
+ * is set at git_path_check_encoding().
+ */
+ if (fixed_portion->working_tree_encoding_len) {
+ encoding = xmemdupz(variant,
+ fixed_portion->working_tree_encoding_len);
+ variant += fixed_portion->working_tree_encoding_len;
+ } else {
+ encoding = NULL;
+ }
+
+ memset(pc_item, 0, sizeof(*pc_item));
+ pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+ pc_item->ce->ce_namelen = fixed_portion->name_len;
+ pc_item->ce->ce_mode = fixed_portion->ce_mode;
+ memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
+ oidcpy(&pc_item->ce->oid, &fixed_portion->oid);
+
+ pc_item->id = fixed_portion->id;
+ pc_item->ca.crlf_action = fixed_portion->crlf_action;
+ pc_item->ca.ident = fixed_portion->ident;
+ pc_item->ca.working_tree_encoding = encoding;
+}
+
+static void report_result(struct parallel_checkout_item *pc_item)
+{
+ struct pc_item_result res = { 0 };
+ size_t size;
+
+ res.id = pc_item->id;
+ res.status = pc_item->status;
+
+ if (pc_item->status == PC_ITEM_WRITTEN) {
+ res.st = pc_item->st;
+ size = sizeof(res);
+ } else {
+ size = PC_ITEM_RESULT_BASE_SIZE;
+ }
+
+ packet_write(1, (const char *)&res, size);
+}
+
+/* Free the worker-side malloced data, but not pc_item itself. */
+static void release_pc_item_data(struct parallel_checkout_item *pc_item)
+{
+ free((char *)pc_item->ca.working_tree_encoding);
+ discard_cache_entry(pc_item->ce);
+}
+
+static void worker_loop(struct checkout *state)
+{
+ struct parallel_checkout_item *items = NULL;
+ size_t i, nr = 0, alloc = 0;
+
+ while (1) {
+ int len;
+ char *line = packet_read_line(0, &len);
+
+ if (!line)
+ break;
+
+ ALLOC_GROW(items, nr + 1, alloc);
+ packet_to_pc_item(line, len, &items[nr++]);
+ }
+
+ for (i = 0; i < nr; ++i) {
+ struct parallel_checkout_item *pc_item = &items[i];
+ write_pc_item(pc_item, state);
+ report_result(pc_item);
+ release_pc_item_data(pc_item);
+ }
+
+ packet_flush(1);
+
+ free(items);
+}
+
+static const char * const checkout_helper_usage[] = {
+ N_("git checkout--helper [<options>]"),
+ NULL
+};
+
+int cmd_checkout__helper(int argc, const char **argv, const char *prefix)
+{
+ struct checkout state = CHECKOUT_INIT;
+ struct option checkout_helper_options[] = {
+ OPT_STRING(0, "prefix", &state.base_dir, N_("string"),
+ N_("when creating files, prepend <string>")),
+ OPT_END()
+ };
+
+ if (argc == 2 && !strcmp(argv[1], "-h"))
+ usage_with_options(checkout_helper_usage,
+ checkout_helper_options);
+
+ git_config(git_default_config, NULL);
+ argc = parse_options(argc, argv, prefix, checkout_helper_options,
+ checkout_helper_usage, 0);
+ if (argc > 0)
+ usage_with_options(checkout_helper_usage, checkout_helper_options);
+
+ if (state.base_dir)
+ state.base_dir_len = strlen(state.base_dir);
+
+ /*
+ * Setting this on worker won't actually update the index. We just need
+ * to pretend so to induce the checkout machinery to stat() the written
+ * entries.
+ */
+ state.refresh_cache = 1;
+
+ worker_loop(&state);
+ return 0;
+}
diff --git a/git.c b/git.c
index 4bdcdad2cc..384f144593 100644
--- a/git.c
+++ b/git.c
@@ -487,6 +487,8 @@ static struct cmd_struct commands[] = {
{ "check-mailmap", cmd_check_mailmap, RUN_SETUP },
{ "check-ref-format", cmd_check_ref_format, NO_PARSEOPT },
{ "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
+ { "checkout--helper", cmd_checkout__helper,
+ RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX },
{ "checkout-index", cmd_checkout_index,
RUN_SETUP | NEED_WORK_TREE},
{ "cherry", cmd_cherry, RUN_SETUP },
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 981dbe6ff3..a5508e27c2 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,28 +1,15 @@
#include "cache.h"
#include "entry.h"
#include "parallel-checkout.h"
+#include "pkt-line.h"
+#include "run-command.h"
#include "streaming.h"
+#include "thread-utils.h"
+#include "config.h"
-enum pc_item_status {
- PC_ITEM_PENDING = 0,
- PC_ITEM_WRITTEN,
- /*
- * The entry could not be written because there was another file
- * already present in its path or leading directories. Since
- * checkout_entry_ca() removes such files from the working tree before
- * enqueueing the entry for parallel checkout, it means that there was
- * a path collision among the entries being written.
- */
- PC_ITEM_COLLIDED,
- PC_ITEM_FAILED,
-};
-
-struct parallel_checkout_item {
- /* pointer to a istate->cache[] entry. Not owned by us. */
- struct cache_entry *ce;
- struct conv_attrs ca;
- struct stat st;
- enum pc_item_status status;
+struct pc_worker {
+ struct child_process cp;
+ size_t next_to_complete, nr_to_complete;
};
struct parallel_checkout {
@@ -38,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
return parallel_checkout.status;
}
+#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100
+
+void get_parallel_checkout_configs(int *num_workers, int *threshold)
+{
+ if (git_config_get_int("checkout.workers", num_workers))
+ *num_workers = 1;
+ else if (*num_workers < 1)
+ *num_workers = online_cpus();
+
+ if (git_config_get_int("checkout.thresholdForParallelism", threshold))
+ *threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
+}
+
void init_parallel_checkout(void)
{
if (parallel_checkout.status != PC_UNINITIALIZED)
@@ -115,10 +115,12 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
parallel_checkout.alloc);
- pc_item = ¶llel_checkout.items[parallel_checkout.nr++];
+ pc_item = ¶llel_checkout.items[parallel_checkout.nr];
pc_item->ce = ce;
memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
pc_item->status = PC_ITEM_PENDING;
+ pc_item->id = parallel_checkout.nr;
+ parallel_checkout.nr++;
return 0;
}
@@ -231,7 +233,8 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
/*
* checkout metadata is used to give context for external process
* filters. Files requiring such filters are not eligible for parallel
- * checkout, so pass NULL.
+ * checkout, so pass NULL. Note: if that changes, the metadata must also
+ * be passed from the main process to the workers.
*/
ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
new_blob, size, &buf, NULL);
@@ -272,8 +275,8 @@ static int check_leading_dirs(const char *path, int len, int prefix_len)
return has_dirs_only_path(path, slash - path, prefix_len);
}
-static void write_pc_item(struct parallel_checkout_item *pc_item,
- struct checkout *state)
+void write_pc_item(struct parallel_checkout_item *pc_item,
+ struct checkout *state)
{
unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
int fd = -1, fstat_done = 0;
@@ -343,6 +346,221 @@ static void write_pc_item(struct parallel_checkout_item *pc_item,
strbuf_release(&path);
}
+static void send_one_item(int fd, struct parallel_checkout_item *pc_item)
+{
+ size_t len_data;
+ char *data, *variant;
+ struct pc_item_fixed_portion *fixed_portion;
+ const char *working_tree_encoding = pc_item->ca.working_tree_encoding;
+ size_t name_len = pc_item->ce->ce_namelen;
+ size_t working_tree_encoding_len = working_tree_encoding ?
+ strlen(working_tree_encoding) : 0;
+
+ len_data = sizeof(struct pc_item_fixed_portion) + name_len +
+ working_tree_encoding_len;
+
+ data = xcalloc(1, len_data);
+
+ fixed_portion = (struct pc_item_fixed_portion *)data;
+ fixed_portion->id = pc_item->id;
+ fixed_portion->ce_mode = pc_item->ce->ce_mode;
+ fixed_portion->crlf_action = pc_item->ca.crlf_action;
+ fixed_portion->ident = pc_item->ca.ident;
+ fixed_portion->name_len = name_len;
+ fixed_portion->working_tree_encoding_len = working_tree_encoding_len;
+ /*
+ * We use hashcpy() instead of oidcpy() because the hash[] positions
+ * after `the_hash_algo->rawsz` might not be initialized. And Valgrind
+ * would complain about passing uninitialized bytes to a syscall
+ * (write(2)). There is no real harm in this case, but the warning could
+ * hinder the detection of actual errors.
+ */
+ hashcpy(fixed_portion->oid.hash, pc_item->ce->oid.hash);
+
+ variant = data + sizeof(*fixed_portion);
+ if (working_tree_encoding_len) {
+ memcpy(variant, working_tree_encoding, working_tree_encoding_len);
+ variant += working_tree_encoding_len;
+ }
+ memcpy(variant, pc_item->ce->name, name_len);
+
+ packet_write(fd, data, len_data);
+
+ free(data);
+}
+
+static void send_batch(int fd, size_t start, size_t nr)
+{
+ size_t i;
+ for (i = 0; i < nr; ++i)
+ send_one_item(fd, ¶llel_checkout.items[start + i]);
+ packet_flush(fd);
+}
+
+static struct pc_worker *setup_workers(struct checkout *state, int num_workers)
+{
+ struct pc_worker *workers;
+ int i, workers_with_one_extra_item;
+ size_t base_batch_size, next_to_assign = 0;
+
+ ALLOC_ARRAY(workers, num_workers);
+
+ for (i = 0; i < num_workers; ++i) {
+ struct child_process *cp = &workers[i].cp;
+
+ child_process_init(cp);
+ cp->git_cmd = 1;
+ cp->in = -1;
+ cp->out = -1;
+ cp->clean_on_exit = 1;
+ strvec_push(&cp->args, "checkout--helper");
+ if (state->base_dir_len)
+ strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
+ if (start_command(cp))
+ die(_("failed to spawn checkout worker"));
+ }
+
+ base_batch_size = parallel_checkout.nr / num_workers;
+ workers_with_one_extra_item = parallel_checkout.nr % num_workers;
+
+ for (i = 0; i < num_workers; ++i) {
+ struct pc_worker *worker = &workers[i];
+ size_t batch_size = base_batch_size;
+
+ /* distribute the extra work evenly */
+ if (i < workers_with_one_extra_item)
+ batch_size++;
+
+ send_batch(worker->cp.in, next_to_assign, batch_size);
+ worker->next_to_complete = next_to_assign;
+ worker->nr_to_complete = batch_size;
+
+ next_to_assign += batch_size;
+ }
+
+ return workers;
+}
+
+static void finish_workers(struct pc_worker *workers, int num_workers)
+{
+ int i;
+
+ /*
+ * Close pipes before calling finish_command() to let the workers
+ * exit asynchronously and avoid spending extra time on wait().
+ */
+ for (i = 0; i < num_workers; ++i) {
+ struct child_process *cp = &workers[i].cp;
+ if (cp->in >= 0)
+ close(cp->in);
+ if (cp->out >= 0)
+ close(cp->out);
+ }
+
+ for (i = 0; i < num_workers; ++i) {
+ if (finish_command(&workers[i].cp))
+ error(_("checkout worker %d finished with error"), i);
+ }
+
+ free(workers);
+}
+
+#define ASSERT_PC_ITEM_RESULT_SIZE(got, exp) \
+{ \
+ if (got != exp) \
+ BUG("corrupted result from checkout worker (got %dB, exp %dB)", \
+ got, exp); \
+} while(0)
+
+static void parse_and_save_result(const char *line, int len,
+ struct pc_worker *worker)
+{
+ struct pc_item_result *res;
+ struct parallel_checkout_item *pc_item;
+ struct stat *st = NULL;
+
+ if (len < PC_ITEM_RESULT_BASE_SIZE)
+ BUG("too short result from checkout worker (got %dB, exp %dB)",
+ len, (int)PC_ITEM_RESULT_BASE_SIZE);
+
+ res = (struct pc_item_result *)line;
+
+ /*
+ * Worker should send either the full result struct on success, or
+ * just the base (i.e. no stat data), otherwise.
+ */
+ if (res->status == PC_ITEM_WRITTEN) {
+ ASSERT_PC_ITEM_RESULT_SIZE(len, (int)sizeof(struct pc_item_result));
+ st = &res->st;
+ } else {
+ ASSERT_PC_ITEM_RESULT_SIZE(len, (int)PC_ITEM_RESULT_BASE_SIZE);
+ }
+
+ if (!worker->nr_to_complete || res->id != worker->next_to_complete)
+ BUG("checkout worker sent unexpected item id");
+
+ worker->next_to_complete++;
+ worker->nr_to_complete--;
+
+ pc_item = ¶llel_checkout.items[res->id];
+ pc_item->status = res->status;
+ if (st)
+ pc_item->st = *st;
+}
+
+
+static void gather_results_from_workers(struct pc_worker *workers,
+ int num_workers)
+{
+ int i, active_workers = num_workers;
+ struct pollfd *pfds;
+
+ CALLOC_ARRAY(pfds, num_workers);
+ for (i = 0; i < num_workers; ++i) {
+ pfds[i].fd = workers[i].cp.out;
+ pfds[i].events = POLLIN;
+ }
+
+ while (active_workers) {
+ int nr = poll(pfds, num_workers, -1);
+
+ if (nr < 0) {
+ if (errno == EINTR)
+ continue;
+ die_errno("failed to poll checkout workers");
+ }
+
+ for (i = 0; i < num_workers && nr > 0; ++i) {
+ struct pc_worker *worker = &workers[i];
+ struct pollfd *pfd = &pfds[i];
+
+ if (!pfd->revents)
+ continue;
+
+ if (pfd->revents & POLLIN) {
+ int len;
+ const char *line = packet_read_line(pfd->fd, &len);
+
+ if (!line) {
+ pfd->fd = -1;
+ active_workers--;
+ } else {
+ parse_and_save_result(line, len, worker);
+ }
+ } else if (pfd->revents & POLLHUP) {
+ pfd->fd = -1;
+ active_workers--;
+ } else if (pfd->revents & (POLLNVAL | POLLERR)) {
+ die(_("error polling from checkout worker"));
+ }
+
+ nr--;
+ }
+ }
+
+ free(pfds);
+}
+
static void write_items_sequentially(struct checkout *state)
{
size_t i;
@@ -351,7 +569,7 @@ static void write_items_sequentially(struct checkout *state)
write_pc_item(¶llel_checkout.items[i], state);
}
-int run_parallel_checkout(struct checkout *state)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
{
int ret;
@@ -360,7 +578,17 @@ int run_parallel_checkout(struct checkout *state)
parallel_checkout.status = PC_RUNNING;
- write_items_sequentially(state);
+ if (parallel_checkout.nr < num_workers)
+ num_workers = parallel_checkout.nr;
+
+ if (num_workers <= 1 || parallel_checkout.nr < threshold) {
+ write_items_sequentially(state);
+ } else {
+ struct pc_worker *workers = setup_workers(state, num_workers);
+ gather_results_from_workers(workers, num_workers);
+ finish_workers(workers, num_workers);
+ }
+
ret = handle_results(state);
finish_parallel_checkout();
diff --git a/parallel-checkout.h b/parallel-checkout.h
index e6d6fc01ea..0c9984584e 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -1,9 +1,12 @@
#ifndef PARALLEL_CHECKOUT_H
#define PARALLEL_CHECKOUT_H
-struct cache_entry;
-struct checkout;
-struct conv_attrs;
+#include "entry.h"
+#include "convert.h"
+
+/****************************************************************
+ * Users of parallel checkout
+ ****************************************************************/
enum pc_status {
PC_UNINITIALIZED = 0,
@@ -12,6 +15,7 @@ enum pc_status {
};
enum pc_status parallel_checkout_status(void);
+void get_parallel_checkout_configs(int *num_workers, int *threshold);
void init_parallel_checkout(void);
/*
@@ -21,7 +25,77 @@ void init_parallel_checkout(void);
*/
int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
-/* Write all the queued entries, returning 0 on success.*/
-int run_parallel_checkout(struct checkout *state);
+/*
+ * Write all the queued entries, returning 0 on success. If the number of
+ * entries is smaller than the specified threshold, the operation is performed
+ * sequentially.
+ */
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+
+/****************************************************************
+ * Interface with checkout--helper
+ ****************************************************************/
+
+enum pc_item_status {
+ PC_ITEM_PENDING = 0,
+ PC_ITEM_WRITTEN,
+ /*
+ * The entry could not be written because there was another file
+ * already present in its path or leading directories. Since
+ * checkout_entry_ca() removes such files from the working tree before
+ * enqueueing the entry for parallel checkout, it means that there was
+ * a path collision among the entries being written.
+ */
+ PC_ITEM_COLLIDED,
+ PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+ /*
+ * In main process ce points to a istate->cache[] entry. Thus, it's not
+ * owned by us. In workers they own the memory, which *must be* released.
+ */
+ struct cache_entry *ce;
+ struct conv_attrs ca;
+ size_t id; /* position in parallel_checkout.items[] of main process */
+
+ /* Output fields, sent from workers. */
+ enum pc_item_status status;
+ struct stat st;
+};
+
+/*
+ * The fixed-size portion of `struct parallel_checkout_item` that is sent to the
+ * workers. Following this will be 2 strings: ca.working_tree_encoding and
+ * ce.name; These are NOT null terminated, since we have the size in the fixed
+ * portion.
+ *
+ * Note that not all fields of conv_attrs and cache_entry are passed, only the
+ * ones that will be required by the workers to smudge and write the entry.
+ */
+struct pc_item_fixed_portion {
+ size_t id;
+ struct object_id oid;
+ unsigned int ce_mode;
+ enum crlf_action crlf_action;
+ int ident;
+ size_t working_tree_encoding_len;
+ size_t name_len;
+};
+
+/*
+ * The fields of `struct parallel_checkout_item` that are returned by the
+ * workers. Note: `st` must be the last one, as it is omitted on error.
+ */
+struct pc_item_result {
+ size_t id;
+ enum pc_item_status status;
+ struct stat st;
+};
+
+#define PC_ITEM_RESULT_BASE_SIZE offsetof(struct pc_item_result, st)
+
+void write_pc_item(struct parallel_checkout_item *pc_item,
+ struct checkout *state);
#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 1b1da7485a..117ed42370 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o,
int errs = 0;
struct progress *progress;
struct checkout state = CHECKOUT_INIT;
- int i;
+ int i, pc_workers, pc_threshold;
trace_performance_enter();
state.force = 1;
@@ -462,8 +462,11 @@ static int check_updates(struct unpack_trees_options *o,
oid_array_clear(&to_fetch);
}
+ get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
enable_delayed_checkout(&state);
- init_parallel_checkout();
+ if (pc_workers > 1)
+ init_parallel_checkout();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
@@ -477,7 +480,8 @@ static int check_updates(struct unpack_trees_options *o,
}
}
stop_progress(&progress);
- errs |= run_parallel_checkout(&state);
+ if (pc_workers > 1)
+ errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
errs |= finish_delayed_checkout(&state, NULL);
git_attr_set_direction(GIT_ATTR_CHECKIN);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 12/19] parallel-checkout: support progress displaying
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (10 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 11/19] parallel-checkout: make it truly parallel Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
` (9 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
parallel-checkout.c | 34 +++++++++++++++++++++++++++++++---
parallel-checkout.h | 4 +++-
unpack-trees.c | 11 ++++++++---
3 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/parallel-checkout.c b/parallel-checkout.c
index a5508e27c2..c5c449d224 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -2,6 +2,7 @@
#include "entry.h"
#include "parallel-checkout.h"
#include "pkt-line.h"
+#include "progress.h"
#include "run-command.h"
#include "streaming.h"
#include "thread-utils.h"
@@ -16,6 +17,8 @@ struct parallel_checkout {
enum pc_status status;
struct parallel_checkout_item *items;
size_t nr, alloc;
+ struct progress *progress;
+ unsigned int *progress_cnt;
};
static struct parallel_checkout parallel_checkout = { 0 };
@@ -125,6 +128,20 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
return 0;
}
+size_t pc_queue_size(void)
+{
+ return parallel_checkout.nr;
+}
+
+static void advance_progress_meter(void)
+{
+ if (parallel_checkout.progress) {
+ (*parallel_checkout.progress_cnt)++;
+ display_progress(parallel_checkout.progress,
+ *parallel_checkout.progress_cnt);
+ }
+}
+
static int handle_results(struct checkout *state)
{
int ret = 0;
@@ -173,6 +190,7 @@ static int handle_results(struct checkout *state)
*/
ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
state, NULL, NULL);
+ advance_progress_meter();
break;
case PC_ITEM_PENDING:
have_pending = 1;
@@ -506,6 +524,9 @@ static void parse_and_save_result(const char *line, int len,
pc_item->status = res->status;
if (st)
pc_item->st = *st;
+
+ if (res->status != PC_ITEM_COLLIDED)
+ advance_progress_meter();
}
@@ -565,11 +586,16 @@ static void write_items_sequentially(struct checkout *state)
{
size_t i;
- for (i = 0; i < parallel_checkout.nr; ++i)
- write_pc_item(¶llel_checkout.items[i], state);
+ for (i = 0; i < parallel_checkout.nr; ++i) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+ write_pc_item(pc_item, state);
+ if (pc_item->status != PC_ITEM_COLLIDED)
+ advance_progress_meter();
+ }
}
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+ struct progress *progress, unsigned int *progress_cnt)
{
int ret;
@@ -577,6 +603,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold
BUG("cannot run parallel checkout: uninitialized or already running");
parallel_checkout.status = PC_RUNNING;
+ parallel_checkout.progress = progress;
+ parallel_checkout.progress_cnt = progress_cnt;
if (parallel_checkout.nr < num_workers)
num_workers = parallel_checkout.nr;
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 0c9984584e..6c3a016c0b 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -24,13 +24,15 @@ void init_parallel_checkout(void);
* write and return 0.
*/
int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+size_t pc_queue_size(void);
/*
* Write all the queued entries, returning 0 on success. If the number of
* entries is smaller than the specified threshold, the operation is performed
* sequentially.
*/
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+ struct progress *progress, unsigned int *progress_cnt);
/****************************************************************
* Interface with checkout--helper
diff --git a/unpack-trees.c b/unpack-trees.c
index 117ed42370..e05e6ceff2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -471,17 +471,22 @@ static int check_updates(struct unpack_trees_options *o,
struct cache_entry *ce = index->cache[i];
if (ce->ce_flags & CE_UPDATE) {
+ size_t last_pc_queue_size = pc_queue_size();
+
if (ce->ce_flags & CE_WT_REMOVE)
BUG("both update and delete flags are set on %s",
ce->name);
- display_progress(progress, ++cnt);
ce->ce_flags &= ~CE_UPDATE;
errs |= checkout_entry(ce, &state, NULL, NULL);
+
+ if (last_pc_queue_size == pc_queue_size())
+ display_progress(progress, ++cnt);
}
}
- stop_progress(&progress);
if (pc_workers > 1)
- errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
+ errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+ progress, &cnt);
+ stop_progress(&progress);
errs |= finish_delayed_checkout(&state, NULL);
git_attr_set_direction(GIT_ATTR_CHECKIN);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 13/19] make_transient_cache_entry(): optionally alloc from mem_pool
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (11 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 12/19] parallel-checkout: support progress displaying Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
` (8 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/checkout--helper.c | 2 +-
builtin/checkout.c | 2 +-
builtin/difftool.c | 2 +-
cache.h | 10 +++++-----
read-cache.c | 12 ++++++++----
unpack-trees.c | 2 +-
6 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/builtin/checkout--helper.c b/builtin/checkout--helper.c
index 67fe37cf11..9646ed9eeb 100644
--- a/builtin/checkout--helper.c
+++ b/builtin/checkout--helper.c
@@ -38,7 +38,7 @@ static void packet_to_pc_item(char *line, int len,
}
memset(pc_item, 0, sizeof(*pc_item));
- pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+ pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len, NULL);
pc_item->ce->ce_namelen = fixed_portion->name_len;
pc_item->ce->ce_mode = fixed_portion->ce_mode;
memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index b18b9d6f3c..c0bf5e6711 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -291,7 +291,7 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
- ce = make_transient_cache_entry(mode, &oid, path, 2);
+ ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
if (!ce)
die(_("make_cache_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL, nr_checkouts);
diff --git a/builtin/difftool.c b/builtin/difftool.c
index dfa22b67eb..5e7a57c8c2 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -323,7 +323,7 @@ static int checkout_path(unsigned mode, struct object_id *oid,
struct cache_entry *ce;
int ret;
- ce = make_transient_cache_entry(mode, oid, path, 0);
+ ce = make_transient_cache_entry(mode, oid, path, 0, NULL);
ret = checkout_entry(ce, state, NULL, NULL);
discard_cache_entry(ce);
diff --git a/cache.h b/cache.h
index ccfeb9ba2b..b5074b2cb2 100644
--- a/cache.h
+++ b/cache.h
@@ -355,16 +355,16 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate,
size_t name_len);
/*
- * Create a cache_entry that is not intended to be added to an index.
- * Caller is responsible for discarding the cache_entry
- * with `discard_cache_entry`.
+ * Create a cache_entry that is not intended to be added to an index. If mp is
+ * not NULL, the entry is allocated within the given memory pool. Caller is
+ * responsible for discarding the cache_entry with `discard_cache_entry`.
*/
struct cache_entry *make_transient_cache_entry(unsigned int mode,
const struct object_id *oid,
const char *path,
- int stage);
+ int stage, struct mem_pool *mp);
-struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp);
/*
* Discard cache entry.
diff --git a/read-cache.c b/read-cache.c
index ecf6f68994..f9bac760af 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -813,8 +813,10 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate, size_t le
return mem_pool__ce_calloc(find_mem_pool(istate), len);
}
-struct cache_entry *make_empty_transient_cache_entry(size_t len)
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp)
{
+ if (mp)
+ return mem_pool__ce_calloc(mp, len);
return xcalloc(1, cache_entry_size(len));
}
@@ -848,8 +850,10 @@ struct cache_entry *make_cache_entry(struct index_state *istate,
return ret;
}
-struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid,
- const char *path, int stage)
+struct cache_entry *make_transient_cache_entry(unsigned int mode,
+ const struct object_id *oid,
+ const char *path, int stage,
+ struct mem_pool *mp)
{
struct cache_entry *ce;
int len;
@@ -860,7 +864,7 @@ struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct o
}
len = strlen(path);
- ce = make_empty_transient_cache_entry(len);
+ ce = make_empty_transient_cache_entry(len, mp);
oidcpy(&ce->oid, oid);
memcpy(ce->name, path, len);
diff --git a/unpack-trees.c b/unpack-trees.c
index e05e6ceff2..dcb40dc8fa 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1031,7 +1031,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
size_t len = traverse_path_len(info, tree_entry_len(n));
struct cache_entry *ce =
is_transient ?
- make_empty_transient_cache_entry(len) :
+ make_empty_transient_cache_entry(len, NULL) :
make_empty_cache_entry(istate, len);
ce->ce_mode = create_ce_mode(n->mode);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 14/19] builtin/checkout.c: complete parallel checkout support
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (12 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 15/19] checkout-index: add " Matheus Tavares
` (7 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
There is one code path in builtin/checkout.c which still doesn't benefit
from parallel checkout because it calls checkout_entry() directly,
instead of unpack_trees(). Let's add parallel support for this missing
spot as well. Note: the transient cache entries allocated in
checkout_merged() are now allocated in a mem_pool which is only
discarded after parallel checkout finishes. This is done because the
entries need to be valid when run_parallel_checkout() is called.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/checkout.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/builtin/checkout.c b/builtin/checkout.c
index c0bf5e6711..ddc4079b85 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -27,6 +27,7 @@
#include "wt-status.h"
#include "xdiff-interface.h"
#include "entry.h"
+#include "parallel-checkout.h"
static const char * const checkout_usage[] = {
N_("git checkout [<options>] <branch>"),
@@ -230,7 +231,8 @@ static int checkout_stage(int stage, const struct cache_entry *ce, int pos,
return error(_("path '%s' does not have their version"), ce->name);
}
-static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts)
+static int checkout_merged(int pos, const struct checkout *state,
+ int *nr_checkouts, struct mem_pool *ce_mem_pool)
{
struct cache_entry *ce = active_cache[pos];
const char *path = ce->name;
@@ -291,11 +293,10 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
- ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
+ ce = make_transient_cache_entry(mode, &oid, path, 2, ce_mem_pool);
if (!ce)
die(_("make_cache_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL, nr_checkouts);
- discard_cache_entry(ce);
return status;
}
@@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
int nr_checkouts = 0, nr_unmerged = 0;
int errs = 0;
int pos;
+ int pc_workers, pc_threshold;
+ struct mem_pool ce_mem_pool;
state.force = 1;
state.refresh_cache = 1;
state.istate = &the_index;
+ mem_pool_init(&ce_mem_pool, 0);
+ get_parallel_checkout_configs(&pc_workers, &pc_threshold);
init_checkout_metadata(&state.meta, info->refname,
info->commit ? &info->commit->object.oid : &info->oid,
NULL);
enable_delayed_checkout(&state);
+ if (pc_workers > 1)
+ init_parallel_checkout();
for (pos = 0; pos < active_nr; pos++) {
struct cache_entry *ce = active_cache[pos];
if (ce->ce_flags & CE_MATCHED) {
@@ -384,10 +391,15 @@ static int checkout_worktree(const struct checkout_opts *opts,
&nr_checkouts, opts->overlay_mode);
else if (opts->merge)
errs |= checkout_merged(pos, &state,
- &nr_unmerged);
+ &nr_unmerged,
+ &ce_mem_pool);
pos = skip_same_name(ce, pos) - 1;
}
}
+ if (pc_workers > 1)
+ errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+ NULL, NULL);
+ mem_pool_discard(&ce_mem_pool, should_validate_cache_entries());
remove_marked_cache_entries(&the_index, 1);
remove_scheduled_dirs();
errs |= finish_delayed_checkout(&state, &nr_checkouts);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 15/19] checkout-index: add parallel checkout support
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (13 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
` (6 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/checkout-index.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 9276ed0258..9a2e255f58 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -12,6 +12,7 @@
#include "cache-tree.h"
#include "parse-options.h"
#include "entry.h"
+#include "parallel-checkout.h"
#define CHECKOUT_ALL 4
static int nul_term_line;
@@ -169,6 +170,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
int force = 0, quiet = 0, not_new = 0;
int index_opt = 0;
int err = 0;
+ int pc_workers, pc_threshold;
struct option builtin_checkout_index_options[] = {
OPT_BOOL('a', "all", &all,
N_("check out all files in the index")),
@@ -223,6 +225,14 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
}
+ if (!to_tempfile)
+ get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+ else
+ pc_workers = 1;
+
+ if (pc_workers > 1)
+ init_parallel_checkout();
+
/* Check out named files first */
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
@@ -262,12 +272,17 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
strbuf_release(&buf);
}
- if (err)
- return 1;
-
if (all)
checkout_all(prefix, prefix_length);
+ if (pc_workers > 1) {
+ err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+ NULL, NULL);
+ }
+
+ if (err)
+ return 1;
+
if (is_lock_file_locked(&lock_file) &&
write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
die("Unable to write new index file");
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 16/19] parallel-checkout: add tests for basic operations
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (14 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 15/19] checkout-index: add " Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
` (5 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Add tests to populate the working tree during clone and checkout using
the sequential and parallel modes, to confirm that they produce
identical results. Also test basic checkout mechanics, such as checking
for symlinks in the leading directories and the abidance to --force.
Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But it will also be used by other
parallel-checkout tests in the following patches.
Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
t/lib-parallel-checkout.sh | 40 +++++++
t/t2080-parallel-checkout-basics.sh | 170 ++++++++++++++++++++++++++++
2 files changed, 210 insertions(+)
create mode 100644 t/lib-parallel-checkout.sh
create mode 100755 t/t2080-parallel-checkout-basics.sh
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
new file mode 100644
index 0000000000..4dad9043fb
--- /dev/null
+++ b/t/lib-parallel-checkout.sh
@@ -0,0 +1,40 @@
+# Helpers for t208* tests
+
+# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
+# and checks that the number of workers spawned is equal to $3.
+#
+git_pc()
+{
+ if test $# -lt 4
+ then
+ BUG "too few arguments to git_pc()"
+ fi &&
+
+ workers=$1 threshold=$2 expected_workers=$3 &&
+ shift 3 &&
+
+ rm -f trace &&
+ GIT_TRACE2="$(pwd)/trace" git \
+ -c checkout.workers=$workers \
+ -c checkout.thresholdForParallelism=$threshold \
+ -c advice.detachedHead=0 \
+ "$@" &&
+
+ # Check that the expected number of workers has been used. Note that it
+ # can be different from the requested number in two cases: when the
+ # threshold is not reached; and when there are not enough
+ # parallel-eligible entries for all workers.
+ #
+ local workers_in_trace=$(grep "child_start\[..*\] git checkout--helper" trace | wc -l) &&
+ test $workers_in_trace -eq $expected_workers &&
+ rm -f trace
+}
+
+# Verify that both the working tree and the index were created correctly
+verify_checkout()
+{
+ git -C "$1" diff-index --quiet HEAD -- &&
+ git -C "$1" diff-index --quiet --cached HEAD -- &&
+ git -C "$1" status --porcelain >"$1".status &&
+ test_must_be_empty "$1".status
+}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
new file mode 100755
index 0000000000..edea88f14f
--- /dev/null
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -0,0 +1,170 @@
+#!/bin/sh
+
+test_description='parallel-checkout basics
+
+Ensure that parallel-checkout basically works on clone and checkout, spawning
+the required number of workers and correctly populating both the index and
+working tree.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+# Test parallel-checkout with different operations (creation, deletion,
+# modification) and entry types. A branch switch from B1 to B2 will contain:
+#
+# - a (file): modified
+# - e/x (file): deleted
+# - b (symlink): deleted
+# - b/f (file): created
+# - e (symlink): created
+# - d (submodule): created
+#
+test_expect_success SYMLINKS 'setup repo for checkout with various operations' '
+ git init various &&
+ (
+ cd various &&
+ git checkout -b B1 &&
+ echo a>a &&
+ mkdir e &&
+ echo e/x >e/x &&
+ ln -s e b &&
+ git add -A &&
+ git commit -m B1 &&
+
+ git checkout -b B2 &&
+ echo modified >a &&
+ rm -rf e &&
+ rm b &&
+ mkdir b &&
+ echo b/f >b/f &&
+ ln -s b e &&
+ git init d &&
+ test_commit -C d f &&
+ git submodule add ./d &&
+ git add -A &&
+ git commit -m B2 &&
+
+ git checkout --recurse-submodules B1
+ )
+'
+
+test_expect_success SYMLINKS 'sequential checkout' '
+ cp -R various various_sequential &&
+ git_pc 1 0 0 -C various_sequential checkout --recurse-submodules B2 &&
+ verify_checkout various_sequential
+'
+
+test_expect_success SYMLINKS 'parallel checkout' '
+ cp -R various various_parallel &&
+ git_pc 2 0 2 -C various_parallel checkout --recurse-submodules B2 &&
+ verify_checkout various_parallel
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout (threshold)' '
+ cp -R various various_sequential_fallback &&
+ git_pc 2 100 0 -C various_sequential_fallback checkout --recurse-submodules B2 &&
+ verify_checkout various_sequential_fallback
+'
+
+test_expect_success SYMLINKS 'parallel checkout on clone' '
+ git -C various checkout --recurse-submodules B2 &&
+ git_pc 2 0 2 clone --recurse-submodules various various_parallel_clone &&
+ verify_checkout various_parallel_clone
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout on clone (threshold)' '
+ git -C various checkout --recurse-submodules B2 &&
+ git_pc 2 100 0 clone --recurse-submodules various various_sequential_fallback_clone &&
+ verify_checkout various_sequential_fallback_clone
+'
+
+# Just to be paranoid, actually compare the working trees' contents directly.
+test_expect_success SYMLINKS 'compare the working trees' '
+ rm -rf various_*/.git &&
+ rm -rf various_*/d/.git &&
+
+ diff -r various_sequential various_parallel &&
+ diff -r various_sequential various_sequential_fallback &&
+ diff -r various_sequential various_parallel_clone &&
+ diff -r various_sequential various_sequential_fallback_clone
+'
+
+test_cmp_str()
+{
+ echo "$1" >tmp &&
+ test_cmp tmp "$2"
+}
+
+test_expect_success 'parallel checkout respects --[no]-force' '
+ git init dirty &&
+ (
+ cd dirty &&
+ mkdir D &&
+ test_commit D/F &&
+ test_commit F &&
+
+ echo changed >F.t &&
+ rm -rf D &&
+ echo changed >D &&
+
+ # We expect 0 workers because there is nothing to be updated
+ git_pc 2 0 0 checkout HEAD &&
+ test_path_is_file D &&
+ test_cmp_str changed D &&
+ test_cmp_str changed F.t &&
+
+ git_pc 2 0 2 checkout --force HEAD &&
+ test_path_is_dir D &&
+ test_cmp_str D/F D/F.t &&
+ test_cmp_str F F.t
+ )
+'
+
+test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
+ git init symlinks &&
+ (
+ cd symlinks &&
+ mkdir D E &&
+
+ # Create two entries in D to have enough work for 2 parallel
+ # workers
+ test_commit D/A &&
+ test_commit D/B &&
+ test_commit E/C &&
+ rm -rf D &&
+ ln -s E D &&
+
+ git_pc 2 0 2 checkout --force HEAD &&
+ ! test -L D &&
+ test_cmp_str D/A D/A.t &&
+ test_cmp_str D/B D/B.t
+ )
+'
+
+test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'symlink colliding with leading dir' '
+ git init colliding-symlink &&
+ (
+ cd colliding-symlink &&
+ file_hex=$(git hash-object -w --stdin </dev/null) &&
+ file_oct=$(echo $file_hex | hex2oct) &&
+
+ sym_hex=$(echo "./D" | git hash-object -w --stdin) &&
+ sym_oct=$(echo $sym_hex | hex2oct) &&
+
+ printf "100644 D/A\0${file_oct}" >tree &&
+ printf "100644 E/B\0${file_oct}" >>tree &&
+ printf "120000 e\0${sym_oct}" >>tree &&
+
+ tree_hex=$(git hash-object -w -t tree --stdin <tree) &&
+ commit_hex=$(git commit-tree -m collisions $tree_hex) &&
+ git update-ref refs/heads/colliding-symlink $commit_hex &&
+
+ git_pc 2 0 2 checkout colliding-symlink &&
+ test_path_is_dir D &&
+ test_path_is_missing D/B
+ )
+'
+
+test_done
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 17/19] parallel-checkout: add tests related to clone collisions
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (15 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
` (4 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Add tests to confirm that path collisions are properly reported during a
clone operation using parallel-checkout.
Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
t/lib-parallel-checkout.sh | 4 +-
t/t2081-parallel-checkout-collisions.sh | 98 +++++++++++++++++++++++++
2 files changed, 100 insertions(+), 2 deletions(-)
create mode 100755 t/t2081-parallel-checkout-collisions.sh
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index 4dad9043fb..e62a433eb1 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -18,7 +18,7 @@ git_pc()
-c checkout.workers=$workers \
-c checkout.thresholdForParallelism=$threshold \
-c advice.detachedHead=0 \
- "$@" &&
+ "$@" 2>&8 &&
# Check that the expected number of workers has been used. Note that it
# can be different from the requested number in two cases: when the
@@ -28,7 +28,7 @@ git_pc()
local workers_in_trace=$(grep "child_start\[..*\] git checkout--helper" trace | wc -l) &&
test $workers_in_trace -eq $expected_workers &&
rm -f trace
-}
+} 8>&2 2>&4
# Verify that both the working tree and the index were created correctly
verify_checkout()
diff --git a/t/t2081-parallel-checkout-collisions.sh b/t/t2081-parallel-checkout-collisions.sh
new file mode 100755
index 0000000000..5cab2dcd2c
--- /dev/null
+++ b/t/t2081-parallel-checkout-collisions.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+
+test_description='parallel-checkout collisions
+
+When there are path collisions during a clone, Git should report a warning
+listing all of the colliding entries. The sequential code detects a collision
+by calling lstat() before trying to open(O_CREAT) the file. Then, to find the
+colliding pair of an item k, it searches cache_entry[0, k-1].
+
+This is not sufficient in parallel checkout since:
+
+- A colliding file may be created between the lstat() and open() calls;
+- A colliding entry might appear in the second half of the cache_entry array.
+
+The tests in this file make sure that the collision detection code is extended
+for parallel checkout.
+'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+TEST_ROOT="$PWD"
+
+test_expect_success CASE_INSENSITIVE_FS 'setup' '
+ file_x_hex=$(git hash-object -w --stdin </dev/null) &&
+ file_x_oct=$(echo $file_x_hex | hex2oct) &&
+
+ attr_hex=$(echo "file_x filter=logger" | git hash-object -w --stdin) &&
+ attr_oct=$(echo $attr_hex | hex2oct) &&
+
+ printf "100644 FILE_X\0${file_x_oct}" >tree &&
+ printf "100644 FILE_x\0${file_x_oct}" >>tree &&
+ printf "100644 file_X\0${file_x_oct}" >>tree &&
+ printf "100644 file_x\0${file_x_oct}" >>tree &&
+ printf "100644 .gitattributes\0${attr_oct}" >>tree &&
+
+ tree_hex=$(git hash-object -w -t tree --stdin <tree) &&
+ commit_hex=$(git commit-tree -m collisions $tree_hex) &&
+ git update-ref refs/heads/collisions $commit_hex &&
+
+ write_script "$TEST_ROOT"/logger_script <<-\EOF
+ echo "$@" >>filter.log
+ EOF
+'
+
+for mode in parallel sequential-fallback
+do
+
+ case $mode in
+ parallel) workers=2 threshold=0 expected_workers=2 ;;
+ sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
+ esac
+
+ test_expect_success CASE_INSENSITIVE_FS "collision detection on $mode clone" '
+ git_pc $workers $threshold $expected_workers \
+ clone --branch=collisions . $mode 2>$mode.stderr &&
+
+ grep FILE_X $mode.stderr &&
+ grep FILE_x $mode.stderr &&
+ grep file_X $mode.stderr &&
+ grep file_x $mode.stderr &&
+ test_i18ngrep "the following paths have collided" $mode.stderr
+ '
+
+ # The following test ensures that the collision detection code is
+ # correctly looking for colliding peers in the second half of the
+ # cache_entry array. This is done by defining a smudge command for the
+ # *last* array entry, which makes it non-eligible for parallel-checkout.
+ # The last entry is then checked out *before* any worker is spawned,
+ # making it succeed and the workers' entries collide.
+ #
+ # Note: this test don't work on Windows because, on this system,
+ # collision detection uses strcmp() when core.ignoreCase=false. And we
+ # have to set core.ignoreCase=false so that only 'file_x' matches the
+ # pattern of the filter attribute. But it works on OSX, where collision
+ # detection uses inode.
+ #
+ test_expect_success CASE_INSENSITIVE_FS,!MINGW,!CYGWIN "collision detection on $mode clone w/ filter" '
+ git_pc $workers $threshold $expected_workers \
+ -c core.ignoreCase=false \
+ -c filter.logger.smudge="\"$TEST_ROOT/logger_script\" %f" \
+ clone --branch=collisions . ${mode}_with_filter \
+ 2>${mode}_with_filter.stderr &&
+
+ grep FILE_X ${mode}_with_filter.stderr &&
+ grep FILE_x ${mode}_with_filter.stderr &&
+ grep file_X ${mode}_with_filter.stderr &&
+ grep file_x ${mode}_with_filter.stderr &&
+ test_i18ngrep "the following paths have collided" ${mode}_with_filter.stderr &&
+
+ # Make sure only "file_x" was filtered
+ test_path_is_file ${mode}_with_filter/filter.log &&
+ echo file_x >expected.filter.log &&
+ test_cmp ${mode}_with_filter/filter.log expected.filter.log
+ '
+done
+
+test_done
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 18/19] parallel-checkout: add tests related to .gitattributes
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (16 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
` (3 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Add tests to confirm that `struct conv_attrs` data is correctly passed
from the main process to the workers, and that they properly smudge
files before writing to the working tree. Also check that
non-parallel-eligible entries, such as regular files that require
external filters, are correctly smudge and written when
parallel-checkout is enabled.
Note: to avoid repeating code, some helper functions are extracted from
t0028 into a common lib file.
Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
t/lib-encoding.sh | 25 ++++
t/t0028-working-tree-encoding.sh | 25 +---
t/t2082-parallel-checkout-attributes.sh | 174 ++++++++++++++++++++++++
3 files changed, 200 insertions(+), 24 deletions(-)
create mode 100644 t/lib-encoding.sh
create mode 100755 t/t2082-parallel-checkout-attributes.sh
diff --git a/t/lib-encoding.sh b/t/lib-encoding.sh
new file mode 100644
index 0000000000..c52ffbbed5
--- /dev/null
+++ b/t/lib-encoding.sh
@@ -0,0 +1,25 @@
+# Encoding helpers used by t0028 and t2082
+
+test_lazy_prereq NO_UTF16_BOM '
+ test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
+'
+
+test_lazy_prereq NO_UTF32_BOM '
+ test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
+'
+
+write_utf16 () {
+ if test_have_prereq NO_UTF16_BOM
+ then
+ printf '\376\377'
+ fi &&
+ iconv -f UTF-8 -t UTF-16
+}
+
+write_utf32 () {
+ if test_have_prereq NO_UTF32_BOM
+ then
+ printf '\0\0\376\377'
+ fi &&
+ iconv -f UTF-8 -t UTF-32
+}
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index bfc4fb9af5..4fffc3a639 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -3,33 +3,10 @@
test_description='working-tree-encoding conversion via gitattributes'
. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-encoding.sh"
GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
-test_lazy_prereq NO_UTF16_BOM '
- test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
-'
-
-test_lazy_prereq NO_UTF32_BOM '
- test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
-'
-
-write_utf16 () {
- if test_have_prereq NO_UTF16_BOM
- then
- printf '\376\377'
- fi &&
- iconv -f UTF-8 -t UTF-16
-}
-
-write_utf32 () {
- if test_have_prereq NO_UTF32_BOM
- then
- printf '\0\0\376\377'
- fi &&
- iconv -f UTF-8 -t UTF-32
-}
-
test_expect_success 'setup test files' '
git config core.eol lf &&
diff --git a/t/t2082-parallel-checkout-attributes.sh b/t/t2082-parallel-checkout-attributes.sh
new file mode 100755
index 0000000000..6800574588
--- /dev/null
+++ b/t/t2082-parallel-checkout-attributes.sh
@@ -0,0 +1,174 @@
+#!/bin/sh
+
+test_description='parallel-checkout: attributes
+
+Verify that parallel-checkout correctly creates files that require
+conversions, as specified in .gitattributes. The main point here is
+to check that the conv_attr data is correctly sent to the workers
+and that it contains sufficient information to smudge files
+properly (without access to the index or attribute stack).
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+. "$TEST_DIRECTORY/lib-encoding.sh"
+
+test_expect_success 'parallel-checkout with ident' '
+ git init ident &&
+ (
+ cd ident &&
+ echo "A ident" >.gitattributes &&
+ echo "\$Id\$" >A &&
+ echo "\$Id\$" >B &&
+ git add -A &&
+ git commit -m id &&
+
+ rm A B &&
+ git_pc 2 0 2 reset --hard &&
+ hexsz=$(test_oid hexsz) &&
+ grep -E "\\\$Id: [0-9a-f]{$hexsz} \\\$" A &&
+ grep "\\\$Id\\\$" B
+ )
+'
+
+test_expect_success 'parallel-checkout with re-encoding' '
+ git init encoding &&
+ (
+ cd encoding &&
+ echo text >utf8-text &&
+ cat utf8-text | write_utf16 >utf16-text &&
+
+ echo "A working-tree-encoding=UTF-16" >.gitattributes &&
+ cp utf16-text A &&
+ cp utf16-text B &&
+ git add A B .gitattributes &&
+ git commit -m encoding &&
+
+ # Check that A (and only A) is stored in UTF-8
+ git cat-file -p :A >A.internal &&
+ test_cmp_bin utf8-text A.internal &&
+ git cat-file -p :B >B.internal &&
+ test_cmp_bin utf16-text B.internal &&
+
+ # Check that A is re-encoded during checkout
+ rm A B &&
+ git_pc 2 0 2 checkout A B &&
+ test_cmp_bin utf16-text A
+ )
+'
+
+test_expect_success 'parallel-checkout with eol conversions' '
+ git init eol &&
+ (
+ cd eol &&
+ git config core.autocrlf false &&
+ printf "multi\r\nline\r\ntext" >crlf-text &&
+ printf "multi\nline\ntext" >lf-text &&
+
+ echo "A text eol=crlf" >.gitattributes &&
+ echo "B -text" >>.gitattributes &&
+ cp crlf-text A &&
+ cp crlf-text B &&
+ git add A B .gitattributes &&
+ git commit -m eol &&
+
+ # Check that A (and only A) is stored with LF format
+ git cat-file -p :A >A.internal &&
+ test_cmp_bin lf-text A.internal &&
+ git cat-file -p :B >B.internal &&
+ test_cmp_bin crlf-text B.internal &&
+
+ # Check that A is converted to CRLF during checkout
+ rm A B &&
+ git_pc 2 0 2 checkout A B &&
+ test_cmp_bin crlf-text A
+ )
+'
+
+test_cmp_str()
+{
+ echo "$1" >tmp &&
+ test_cmp tmp "$2"
+}
+
+# Entries that require an external filter are not eligible for parallel
+# checkout. Check that both the parallel-eligible and non-eligible entries are
+# properly writen in a single checkout process.
+#
+test_expect_success 'parallel-checkout and external filter' '
+ git init filter &&
+ (
+ cd filter &&
+ git config filter.x2y.clean "tr x y" &&
+ git config filter.x2y.smudge "tr y x" &&
+ git config filter.x2y.required true &&
+
+ echo "A filter=x2y" >.gitattributes &&
+ echo x >A &&
+ echo x >B &&
+ echo x >C &&
+ git add -A &&
+ git commit -m filter &&
+
+ # Check that A (and only A) was cleaned
+ git cat-file -p :A >A.internal &&
+ test_cmp_str y A.internal &&
+ git cat-file -p :B >B.internal &&
+ test_cmp_str x B.internal &&
+ git cat-file -p :C >C.internal &&
+ test_cmp_str x C.internal &&
+
+ rm A B C *.internal &&
+ git_pc 2 0 2 checkout A B C &&
+ test_cmp_str x A &&
+ test_cmp_str x B &&
+ test_cmp_str x C
+ )
+'
+
+# The delayed queue is independent from the parallel queue, and they should be
+# able to work together in the same checkout process.
+#
+test_expect_success PERL 'parallel-checkout and delayed checkout' '
+ write_script rot13-filter.pl "$PERL_PATH" \
+ <"$TEST_DIRECTORY"/t0021/rot13-filter.pl &&
+ test_config_global filter.delay.process \
+ "\"$(pwd)/rot13-filter.pl\" \"$(pwd)/delayed.log\" clean smudge delay" &&
+ test_config_global filter.delay.required true &&
+
+ echo "a b c" >delay-content &&
+ echo "n o p" >delay-rot13-content &&
+
+ git init delayed &&
+ (
+ cd delayed &&
+ echo "*.a filter=delay" >.gitattributes &&
+ cp ../delay-content test-delay10.a &&
+ cp ../delay-content test-delay11.a &&
+ echo parallel >parallel1.b &&
+ echo parallel >parallel2.b &&
+ git add -A &&
+ git commit -m delayed &&
+
+ # Check that the stored data was cleaned
+ git cat-file -p :test-delay10.a > delay10.internal &&
+ test_cmp delay10.internal ../delay-rot13-content &&
+ git cat-file -p :test-delay11.a > delay11.internal &&
+ test_cmp delay11.internal ../delay-rot13-content &&
+ rm *.internal &&
+
+ rm *.a *.b
+ ) &&
+
+ git_pc 2 0 2 -C delayed checkout -f &&
+ verify_checkout delayed &&
+
+ # Check that the *.a files got to the delay queue and were filtered
+ grep "smudge test-delay10.a .* \[DELAYED\]" delayed.log &&
+ grep "smudge test-delay11.a .* \[DELAYED\]" delayed.log &&
+ test_cmp delayed/test-delay10.a delay-content &&
+ test_cmp delayed/test-delay11.a delay-content
+'
+
+test_done
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v3 19/19] ci: run test round with parallel-checkout enabled
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (17 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
@ 2020-10-29 2:14 ` Matheus Tavares
2020-10-29 19:48 ` [PATCH v3 00/19] Parallel Checkout (part I) Junio C Hamano
` (2 subsequent siblings)
21 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-10-29 2:14 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
We already have tests for the basic parallel-checkout operations. But
this code can also run in other commands, such as git-read-tree and
git-sparse-checkout, which are currently not tested with multiple
workers. To promote a wider test coverage without duplicating tests:
1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally
force parallel-checkout execution during the whole test suite.
2. Include this variable in the second test round of the linux-gcc job
of our ci scripts. This round runs `make test` again with some
optional GIT_TEST_* variables enabled, so there is no additional
overhead in exercising the parallel-checkout code here.
Note: the specific parallel-checkout tests t208* cannot be used in
combination with GIT_TEST_CHECKOUT_WORKERS as they need to set and check
the number of workers by themselves. So skip those tests when this flag
is set.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
ci/run-build-and-tests.sh | 1 +
parallel-checkout.c | 14 ++++++++++++++
t/README | 4 ++++
t/lib-parallel-checkout.sh | 6 ++++++
4 files changed, 25 insertions(+)
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 6c27b886b8..aa32ddc361 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -22,6 +22,7 @@ linux-gcc)
export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
export GIT_TEST_MULTI_PACK_INDEX=1
export GIT_TEST_ADD_I_USE_BUILTIN=1
+ export GIT_TEST_CHECKOUT_WORKERS=2
make test
;;
linux-clang)
diff --git a/parallel-checkout.c b/parallel-checkout.c
index c5c449d224..7482447f2d 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -32,6 +32,20 @@ enum pc_status parallel_checkout_status(void)
void get_parallel_checkout_configs(int *num_workers, int *threshold)
{
+ char *env_workers = getenv("GIT_TEST_CHECKOUT_WORKERS");
+
+ if (env_workers && *env_workers) {
+ if (strtol_i(env_workers, 10, num_workers)) {
+ die("invalid value for GIT_TEST_CHECKOUT_WORKERS: '%s'",
+ env_workers);
+ }
+ if (*num_workers < 1)
+ *num_workers = online_cpus();
+
+ *threshold = 0;
+ return;
+ }
+
if (git_config_get_int("checkout.workers", num_workers))
*num_workers = 1;
else if (*num_workers < 1)
diff --git a/t/README b/t/README
index 2adaf7c2d2..cd1b15c55a 100644
--- a/t/README
+++ b/t/README
@@ -425,6 +425,10 @@ GIT_TEST_DEFAULT_HASH=<hash-algo> specifies which hash algorithm to
use in the test scripts. Recognized values for <hash-algo> are "sha1"
and "sha256".
+GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
+to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
+execution of the parallel-checkout code.
+
Naming Tests
------------
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index e62a433eb1..7b454da375 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -1,5 +1,11 @@
# Helpers for t208* tests
+if ! test -z "$GIT_TEST_CHECKOUT_WORKERS"
+then
+ skip_all="skipping test, GIT_TEST_CHECKOUT_WORKERS is set"
+ test_done
+fi
+
# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
# and checks that the number of workers spawned is equal to $3.
#
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v3 00/19] Parallel Checkout (part I)
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (18 preceding siblings ...)
2020-10-29 2:14 ` [PATCH v3 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
@ 2020-10-29 19:48 ` Junio C Hamano
2020-10-30 15:58 ` Jeff Hostetler
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
21 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-10-29 19:48 UTC (permalink / raw)
To: Matheus Tavares; +Cc: git, git, chriscool, peff, newren, jrnieder, martin.agren
Matheus Tavares <matheus.bernardino@usp.br> writes:
> There was some semantic conflicts between this series and
> jk/checkout-index-errors, so I rebased my series on top of that.
That is sensible, as you'd want to be able to rely on the exit
status from the command while testing.
Will replace what has been queued.
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v3 00/19] Parallel Checkout (part I)
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (19 preceding siblings ...)
2020-10-29 19:48 ` [PATCH v3 00/19] Parallel Checkout (part I) Junio C Hamano
@ 2020-10-30 15:58 ` Jeff Hostetler
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
21 siblings, 0 replies; 154+ messages in thread
From: Jeff Hostetler @ 2020-10-30 15:58 UTC (permalink / raw)
To: Matheus Tavares, git
Cc: gitster, chriscool, peff, newren, jrnieder, martin.agren
On 10/28/20 10:14 PM, Matheus Tavares wrote:
> ...
Looks good to me.
Thanks for pushing this forward.
Jeff
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 00/19] Parallel Checkout (part I)
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
` (20 preceding siblings ...)
2020-10-30 15:58 ` Jeff Hostetler
@ 2020-11-04 20:32 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
` (19 more replies)
21 siblings, 20 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:32 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Changes since v3:
Patch 1:
- Renamed 'enum crlf_action' to 'enum convert_crlf_action', since it's
now public and the latter suits better the global namespace.
Patch 2:
- Implemented the regular [async_]convert_to_working_tree() functions
as thin wrappers around the new _ca() variants.
Patches 5 and 6:
- Properly added blank lines to separate declaration blocks in entry.h.
Patch 8:
- Used a `struct conv_attrs ca_buf` (together with the `ca` pointer)
to avoid the early return in checkout_entry() when
S_ISREG(ce->ce_mode). I think this makes the patch a little easier
to parse and also simplifies the next patch.
Patch 10:
- Removed explicit zero initialization of the static struct parallel_checkout.
- Removed the check_leading_dirs() function, which had a quite generic
name, and integrated the code into write_pc_item().
Note: for this change, I used the find_last_dir_sep() helper, which
is slightly slower since it doesn't take the path's length, and
thus, it has to iterate the whole string. Alternatively, We could
add an strbuf_find_last_dir_sep() variant which takes an strbuf and
starts the search from the end, to save some iterations per path.
But this snippet will be removed in part 2, so I thought it wouldn't
be worth adding a new helper now.
- Rephrased comment about the has_dirs_only_path() call in workers, for
better clarity.
- Changed all unnecessary uses of ++pre_increment in the series to
post_increment++ (patches 10 and 11).
Jeff Hostetler (4):
convert: make convert_attrs() and convert structs public
convert: add [async_]convert_to_working_tree_ca() variants
convert: add get_stream_filter_ca() variant
convert: add conv_attrs classification
Matheus Tavares (15):
entry: extract a header file for entry.c functions
entry: make fstat_output() and read_blob_entry() public
entry: extract cache_entry update from write_entry()
entry: move conv_attrs lookup up to checkout_entry()
entry: add checkout_entry_ca() which takes preloaded conv_attrs
unpack-trees: add basic support for parallel checkout
parallel-checkout: make it truly parallel
parallel-checkout: support progress displaying
make_transient_cache_entry(): optionally alloc from mem_pool
builtin/checkout.c: complete parallel checkout support
checkout-index: add parallel checkout support
parallel-checkout: add tests for basic operations
parallel-checkout: add tests related to clone collisions
parallel-checkout: add tests related to .gitattributes
ci: run test round with parallel-checkout enabled
.gitignore | 1 +
Documentation/config/checkout.txt | 21 +
Makefile | 2 +
apply.c | 1 +
builtin.h | 1 +
builtin/checkout--helper.c | 142 ++++++
builtin/checkout-index.c | 22 +-
builtin/checkout.c | 21 +-
builtin/difftool.c | 3 +-
cache.h | 34 +-
ci/run-build-and-tests.sh | 1 +
convert.c | 130 ++---
convert.h | 96 +++-
entry.c | 102 ++--
entry.h | 55 +++
git.c | 2 +
parallel-checkout.c | 632 ++++++++++++++++++++++++
parallel-checkout.h | 103 ++++
read-cache.c | 12 +-
t/README | 4 +
t/lib-encoding.sh | 25 +
t/lib-parallel-checkout.sh | 46 ++
t/t0028-working-tree-encoding.sh | 25 +-
t/t2080-parallel-checkout-basics.sh | 170 +++++++
t/t2081-parallel-checkout-collisions.sh | 98 ++++
t/t2082-parallel-checkout-attributes.sh | 174 +++++++
unpack-trees.c | 22 +-
27 files changed, 1766 insertions(+), 179 deletions(-)
create mode 100644 builtin/checkout--helper.c
create mode 100644 entry.h
create mode 100644 parallel-checkout.c
create mode 100644 parallel-checkout.h
create mode 100644 t/lib-encoding.sh
create mode 100644 t/lib-parallel-checkout.sh
create mode 100755 t/t2080-parallel-checkout-basics.sh
create mode 100755 t/t2081-parallel-checkout-collisions.sh
create mode 100755 t/t2082-parallel-checkout-attributes.sh
Range-diff against v3:
1: dfc3e0fd62 ! 1: 2726f6dc05 convert: make convert_attrs() and convert structs public
@@ Commit message
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
- parallel checkout implementation.
+ parallel checkout implementation. Note that crlf_action is renamed to
+ convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: squash and reword msg]
@@ convert.c
struct text_stat {
/* NUL, CR, LF and CRLF counts */
unsigned nul, lonecr, lonelf, crlf;
+@@ convert.c: static int text_eol_is_crlf(void)
+ return 0;
+ }
+
+-static enum eol output_eol(enum crlf_action crlf_action)
++static enum eol output_eol(enum convert_crlf_action crlf_action)
+ {
+ switch (crlf_action) {
+ case CRLF_BINARY:
+@@ convert.c: static int has_crlf_in_index(const struct index_state *istate, const char *path)
+ }
+
+ static int will_convert_lf_to_crlf(struct text_stat *stats,
+- enum crlf_action crlf_action)
++ enum convert_crlf_action crlf_action)
+ {
+ if (output_eol(crlf_action) != EOL_CRLF)
+ return 0;
+@@ convert.c: static int encode_to_worktree(const char *path, const char *src, size_t src_len,
+ static int crlf_to_git(const struct index_state *istate,
+ const char *path, const char *src, size_t len,
+ struct strbuf *buf,
+- enum crlf_action crlf_action, int conv_flags)
++ enum convert_crlf_action crlf_action, int conv_flags)
+ {
+ struct text_stat stats;
+ char *dst;
+@@ convert.c: static int crlf_to_git(const struct index_state *istate,
+ return 1;
+ }
+
+-static int crlf_to_worktree(const char *src, size_t len,
+- struct strbuf *buf, enum crlf_action crlf_action)
++static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf,
++ enum convert_crlf_action crlf_action)
+ {
+ char *to_free = NULL;
+ struct text_stat stats;
+@@ convert.c: static const char *git_path_check_encoding(struct attr_check_item *check)
+ return value;
+ }
+
+-static enum crlf_action git_path_check_crlf(struct attr_check_item *check)
++static enum convert_crlf_action git_path_check_crlf(struct attr_check_item *check)
+ {
+ const char *value = check->value;
+
@@ convert.c: static int git_path_check_ident(struct attr_check_item *check)
return !!ATTR_TRUE(value);
}
@@ convert.c: static int git_path_check_ident(struct attr_check_item *check)
## convert.h ##
-@@ convert.h: enum eol {
- #endif
+@@ convert.h: struct checkout_metadata {
+ struct object_id blob;
};
-+enum crlf_action {
++enum convert_crlf_action {
+ CRLF_UNDEFINED,
+ CRLF_BINARY,
+ CRLF_TEXT,
@@ convert.h: enum eol {
+
+struct conv_attrs {
+ struct convert_driver *drv;
-+ enum crlf_action attr_action; /* What attr says */
-+ enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
++ enum convert_crlf_action attr_action; /* What attr says */
++ enum convert_crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
+ int ident;
+ const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
+};
+
- enum ce_delay_state {
- CE_NO_DELAY = 0,
- CE_CAN_DELAY = 1,
-@@ convert.h: void convert_to_git_filter_fd(const struct index_state *istate,
- int would_convert_to_git_filter_fd(const struct index_state *istate,
- const char *path);
-
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path);
+
- /*
- * Initialize the checkout metadata with the given values. Any argument may be
- * NULL if it is not applicable. The treeish should be a commit if that is
+ extern enum eol core_eol;
+ extern char *check_roundtrip_encoding;
+ const char *get_cached_convert_stats_ascii(const struct index_state *istate,
2: c5fbd1e16d ! 2: fc03417592 convert: add [async_]convert_to_working_tree_ca() variants
@@ convert.c: static int convert_to_working_tree_internal(const struct index_state
return ret | ret_filter;
}
-@@ convert.c: int async_convert_to_working_tree(const struct index_state *istate,
- const struct checkout_metadata *meta,
- void *dco)
- {
-- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
-+ struct conv_attrs ca;
-+ convert_attrs(istate, &ca, path);
-+ return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, dco);
- }
- int convert_to_working_tree(const struct index_state *istate,
-@@ convert.c: int convert_to_working_tree(const struct index_state *istate,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta)
- {
-- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
-+ struct conv_attrs ca;
-+ convert_attrs(istate, &ca, path);
-+ return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, NULL);
-+}
-+
+-int async_convert_to_working_tree(const struct index_state *istate,
+- const char *path, const char *src,
+- size_t len, struct strbuf *dst,
+- const struct checkout_metadata *meta,
+- void *dco)
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
-+{
+ {
+- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, dco);
-+}
-+
+ }
+
+-int convert_to_working_tree(const struct index_state *istate,
+- const char *path, const char *src,
+- size_t len, struct strbuf *dst,
+- const struct checkout_metadata *meta)
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
-+{
+ {
+- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, NULL);
}
@@ convert.c: int convert_to_working_tree(const struct index_state *istate,
len = dst->len;
## convert.h ##
-@@ convert.h: int convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta);
+@@ convert.h: const char *get_convert_attr_ascii(const struct index_state *istate,
+ int convert_to_git(const struct index_state *istate,
+ const char *path, const char *src, size_t len,
+ struct strbuf *dst, int conv_flags);
+-int convert_to_working_tree(const struct index_state *istate,
+- const char *path, const char *src,
+- size_t len, struct strbuf *dst,
+- const struct checkout_metadata *meta);
+-int async_convert_to_working_tree(const struct index_state *istate,
+- const char *path, const char *src,
+- size_t len, struct strbuf *dst,
+- const struct checkout_metadata *meta,
+- void *dco);
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta);
- int async_convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta,
- void *dco);
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco);
++static inline int convert_to_working_tree(const struct index_state *istate,
++ const char *path, const char *src,
++ size_t len, struct strbuf *dst,
++ const struct checkout_metadata *meta)
++{
++ struct conv_attrs ca;
++ convert_attrs(istate, &ca, path);
++ return convert_to_working_tree_ca(&ca, path, src, len, dst, meta);
++}
++static inline int async_convert_to_working_tree(const struct index_state *istate,
++ const char *path, const char *src,
++ size_t len, struct strbuf *dst,
++ const struct checkout_metadata *meta,
++ void *dco)
++{
++ struct conv_attrs ca;
++ convert_attrs(istate, &ca, path);
++ return async_convert_to_working_tree_ca(&ca, path, src, len, dst, meta, dco);
++}
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
int renormalize_buffer(const struct index_state *istate,
3: c77b16f694 = 3: 8ce20f1031 convert: add get_stream_filter_ca() variant
4: 18c3f4247e = 4: aa1eb461f4 convert: add conv_attrs classification
5: 2caa2c4345 ! 5: cb3dea224b entry: extract a header file for entry.c functions
@@ entry.h (new)
+#define CHECKOUT_INIT { NULL, "" }
+
+#define TEMPORARY_FILENAME_LENGTH 25
-+
+/*
+ * Write the contents from ce out to the working tree.
+ *
@@ entry.h (new)
+ */
+int checkout_entry(struct cache_entry *ce, const struct checkout *state,
+ char *topath, int *nr_checkouts);
++
+void enable_delayed_checkout(struct checkout *state);
+int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
++
+/*
+ * Unlink the last component and schedule the leading directories for
+ * removal, such that empty directories get removed.
6: bfa52df9e2 ! 6: 46ed6274d7 entry: make fstat_output() and read_blob_entry() public
@@ entry.c: static int write_entry(struct cache_entry *ce,
## entry.h ##
@@ entry.h: int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
- * removal, such that empty directories get removed.
*/
void unlink_entry(const struct cache_entry *ce);
+
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
-
++
#endif /* ENTRY_H */
7: 91ef17f533 ! 7: a0479d02ff entry: extract cache_entry update from write_entry()
@@ entry.c: static int write_entry(struct cache_entry *ce,
return 0;
## entry.h ##
-@@ entry.h: int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
- void unlink_entry(const struct cache_entry *ce);
+@@ entry.h: void unlink_entry(const struct cache_entry *ce);
+
void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
8: 81e03baab1 ! 8: 5c993cc27f entry: move conv_attrs lookup up to checkout_entry()
@@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
-+ struct conv_attrs ca;
++ struct conv_attrs ca_buf, *ca = NULL;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
- return write_entry(ce, topath, state, 1);
+ if (topath) {
+ if (S_ISREG(ce->ce_mode)) {
-+ convert_attrs(state->istate, &ca, ce->name);
-+ return write_entry(ce, topath, &ca, state, 1);
++ convert_attrs(state->istate, &ca_buf, ce->name);
++ ca = &ca_buf;
+ }
-+ return write_entry(ce, topath, NULL, state, 1);
++ return write_entry(ce, topath, ca, state, 1);
+ }
strbuf_reset(&path);
@@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
- return write_entry(ce, path.buf, state, 0);
+
+ if (S_ISREG(ce->ce_mode)) {
-+ convert_attrs(state->istate, &ca, ce->name);
-+ return write_entry(ce, path.buf, &ca, state, 0);
++ convert_attrs(state->istate, &ca_buf, ce->name);
++ ca = &ca_buf;
+ }
+
+ return write_entry(ce, path.buf, NULL, state, 0);
9: e1b886f823 ! 9: aa635bda21 entry: add checkout_entry_ca() which takes preloaded conv_attrs
@@ entry.c: static void mark_colliding_entries(const struct checkout *state,
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
-- struct conv_attrs ca;
+- struct conv_attrs ca_buf, *ca = NULL;
+ struct conv_attrs ca_buf;
if (ce->ce_flags & CE_WT_REMOVE) {
@@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
if (topath) {
- if (S_ISREG(ce->ce_mode)) {
-- convert_attrs(state->istate, &ca, ce->name);
-- return write_entry(ce, topath, &ca, state, 1);
+ if (S_ISREG(ce->ce_mode) && !ca) {
-+ convert_attrs(state->istate, &ca_buf, ce->name);
-+ ca = &ca_buf;
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
}
-- return write_entry(ce, topath, NULL, state, 1);
-+ return write_entry(ce, topath, ca, state, 1);
- }
-
- strbuf_reset(&path);
@@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state,
if (nr_checkouts)
(*nr_checkouts)++;
- if (S_ISREG(ce->ce_mode)) {
-- convert_attrs(state->istate, &ca, ce->name);
-- return write_entry(ce, path.buf, &ca, state, 0);
+ if (S_ISREG(ce->ce_mode) && !ca) {
-+ convert_attrs(state->istate, &ca_buf, ce->name);
-+ ca = &ca_buf;
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
}
- return write_entry(ce, path.buf, NULL, state, 0);
@@ entry.h: struct checkout {
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts);
-+
+
void enable_delayed_checkout(struct checkout *state);
int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
- /*
10: 2bdc13664e ! 10: bc8447cd9c unpack-trees: add basic support for parallel checkout
@@ parallel-checkout.c (new)
+ size_t nr, alloc;
+};
+
-+static struct parallel_checkout parallel_checkout = { 0 };
++static struct parallel_checkout parallel_checkout;
+
+enum pc_status parallel_checkout_status(void)
+{
@@ parallel-checkout.c (new)
+ * stat() data, so that they can be found by mark_colliding_entries(),
+ * in the next loop, when necessary.
+ */
-+ for (i = 0; i < parallel_checkout.nr; ++i) {
++ for (i = 0; i < parallel_checkout.nr; i++) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+ if (pc_item->status == PC_ITEM_WRITTEN)
+ update_ce_after_write(state, pc_item->ce, &pc_item->st);
+ }
+
-+ for (i = 0; i < parallel_checkout.nr; ++i) {
++ for (i = 0; i < parallel_checkout.nr; i++) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+
+ switch(pc_item->status) {
@@ parallel-checkout.c (new)
+ return ret;
+}
+
-+static int check_leading_dirs(const char *path, int len, int prefix_len)
-+{
-+ const char *slash = path + len;
-+
-+ while (slash > path && *slash != '/')
-+ slash--;
-+
-+ return has_dirs_only_path(path, slash - path, prefix_len);
-+}
-+
+static void write_pc_item(struct parallel_checkout_item *pc_item,
+ struct checkout *state)
+{
+ unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
+ int fd = -1, fstat_done = 0;
+ struct strbuf path = STRBUF_INIT;
++ const char *dir_sep;
+
+ strbuf_add(&path, state->base_dir, state->base_dir_len);
+ strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
+
++ dir_sep = find_last_dir_sep(path.buf);
++
+ /*
-+ * At this point, leading dirs should have already been created. But if
-+ * a symlink being checked out has collided with one of the dirs, due to
-+ * file system folding rules, it's possible that the dirs are no longer
-+ * present. So we have to check again, and report any path collisions.
++ * The leading dirs should have been already created by now. But, in
++ * case of path collisions, one of the dirs could have been replaced by
++ * a symlink (checked out after we enqueued this entry for parallel
++ * checkout). Thus, we must check the leading dirs again.
+ */
-+ if (!check_leading_dirs(path.buf, path.len, state->base_dir_len)) {
++ if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
++ state->base_dir_len)) {
+ pc_item->status = PC_ITEM_COLLIDED;
+ goto out;
+ }
@@ parallel-checkout.c (new)
+ * Errors which probably represent a path collision.
+ * Suppress the error message and mark the item to be
+ * retried later, sequentially. ENOTDIR and ENOENT are
-+ * also interesting, but check_leading_dirs() should
-+ * have already caught these cases.
++ * also interesting, but the above has_dirs_only_path()
++ * call should have already caught these cases.
+ */
+ pc_item->status = PC_ITEM_COLLIDED;
+ } else {
@@ parallel-checkout.c (new)
+{
+ size_t i;
+
-+ for (i = 0; i < parallel_checkout.nr; ++i)
++ for (i = 0; i < parallel_checkout.nr; i++)
+ write_pc_item(¶llel_checkout.items[i], state);
+}
+
11: 096e543fd2 ! 11: 815137685a parallel-checkout: make it truly parallel
@@ builtin/checkout--helper.c (new)
+ packet_to_pc_item(line, len, &items[nr++]);
+ }
+
-+ for (i = 0; i < nr; ++i) {
++ for (i = 0; i < nr; i++) {
+ struct parallel_checkout_item *pc_item = &items[i];
+ write_pc_item(pc_item, state);
+ report_result(pc_item);
@@ parallel-checkout.c: static int write_pc_item_to_fd(struct parallel_checkout_ite
*/
ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
new_blob, size, &buf, NULL);
-@@ parallel-checkout.c: static int check_leading_dirs(const char *path, int len, int prefix_len)
- return has_dirs_only_path(path, slash - path, prefix_len);
+@@ parallel-checkout.c: static int close_and_clear(int *fd)
+ return ret;
}
-static void write_pc_item(struct parallel_checkout_item *pc_item,
@@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
+static void send_batch(int fd, size_t start, size_t nr)
+{
+ size_t i;
-+ for (i = 0; i < nr; ++i)
++ for (i = 0; i < nr; i++)
+ send_one_item(fd, ¶llel_checkout.items[start + i]);
+ packet_flush(fd);
+}
@@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
+
+ ALLOC_ARRAY(workers, num_workers);
+
-+ for (i = 0; i < num_workers; ++i) {
++ for (i = 0; i < num_workers; i++) {
+ struct child_process *cp = &workers[i].cp;
+
+ child_process_init(cp);
@@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
+ base_batch_size = parallel_checkout.nr / num_workers;
+ workers_with_one_extra_item = parallel_checkout.nr % num_workers;
+
-+ for (i = 0; i < num_workers; ++i) {
++ for (i = 0; i < num_workers; i++) {
+ struct pc_worker *worker = &workers[i];
+ size_t batch_size = base_batch_size;
+
@@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
+ * Close pipes before calling finish_command() to let the workers
+ * exit asynchronously and avoid spending extra time on wait().
+ */
-+ for (i = 0; i < num_workers; ++i) {
++ for (i = 0; i < num_workers; i++) {
+ struct child_process *cp = &workers[i].cp;
+ if (cp->in >= 0)
+ close(cp->in);
@@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
+ close(cp->out);
+ }
+
-+ for (i = 0; i < num_workers; ++i) {
++ for (i = 0; i < num_workers; i++) {
+ if (finish_command(&workers[i].cp))
+ error(_("checkout worker %d finished with error"), i);
+ }
@@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
+ struct pollfd *pfds;
+
+ CALLOC_ARRAY(pfds, num_workers);
-+ for (i = 0; i < num_workers; ++i) {
++ for (i = 0; i < num_workers; i++) {
+ pfds[i].fd = workers[i].cp.out;
+ pfds[i].events = POLLIN;
+ }
@@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
+ die_errno("failed to poll checkout workers");
+ }
+
-+ for (i = 0; i < num_workers && nr > 0; ++i) {
++ for (i = 0; i < num_workers && nr > 0; i++) {
+ struct pc_worker *worker = &workers[i];
+ struct pollfd *pfd = &pfds[i];
+
@@ parallel-checkout.h: void init_parallel_checkout(void);
+ size_t id;
+ struct object_id oid;
+ unsigned int ce_mode;
-+ enum crlf_action crlf_action;
++ enum convert_crlf_action crlf_action;
+ int ident;
+ size_t working_tree_encoding_len;
+ size_t name_len;
12: 9cfeb4821c ! 12: 2b42621582 parallel-checkout: support progress displaying
@@ parallel-checkout.c: struct parallel_checkout {
+ unsigned int *progress_cnt;
};
- static struct parallel_checkout parallel_checkout = { 0 };
+ static struct parallel_checkout parallel_checkout;
@@ parallel-checkout.c: int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
return 0;
}
@@ parallel-checkout.c: static void write_items_sequentially(struct checkout *state
{
size_t i;
-- for (i = 0; i < parallel_checkout.nr; ++i)
+- for (i = 0; i < parallel_checkout.nr; i++)
- write_pc_item(¶llel_checkout.items[i], state);
-+ for (i = 0; i < parallel_checkout.nr; ++i) {
++ for (i = 0; i < parallel_checkout.nr; i++) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+ write_pc_item(pc_item, state);
+ if (pc_item->status != PC_ITEM_COLLIDED)
13: da99b671e6 = 13: 960116579a make_transient_cache_entry(): optionally alloc from mem_pool
14: d3d561754a = 14: fb9f2f580c builtin/checkout.c: complete parallel checkout support
15: ee34c6e149 = 15: a844451e58 checkout-index: add parallel checkout support
16: 05299a3cc0 = 16: 3733857ffa parallel-checkout: add tests for basic operations
17: 3d140dcacb = 17: c8a2974f81 parallel-checkout: add tests related to clone collisions
18: b26f676cae = 18: 86fccd57d5 parallel-checkout: add tests related to .gitattributes
19: 641c61f9b6 = 19: 7f3e23cc38 ci: run test round with parallel-checkout enabled
--
2.28.0
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 01/19] convert: make convert_attrs() and convert structs public
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-05 10:40 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
` (18 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
parallel checkout implementation. Note that crlf_action is renamed to
convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: squash and reword msg]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 35 ++++++++---------------------------
convert.h | 24 ++++++++++++++++++++++++
2 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/convert.c b/convert.c
index ee360c2f07..f13b001273 100644
--- a/convert.c
+++ b/convert.c
@@ -24,17 +24,6 @@
#define CONVERT_STAT_BITS_TXT_CRLF 0x2
#define CONVERT_STAT_BITS_BIN 0x4
-enum crlf_action {
- CRLF_UNDEFINED,
- CRLF_BINARY,
- CRLF_TEXT,
- CRLF_TEXT_INPUT,
- CRLF_TEXT_CRLF,
- CRLF_AUTO,
- CRLF_AUTO_INPUT,
- CRLF_AUTO_CRLF
-};
-
struct text_stat {
/* NUL, CR, LF and CRLF counts */
unsigned nul, lonecr, lonelf, crlf;
@@ -172,7 +161,7 @@ static int text_eol_is_crlf(void)
return 0;
}
-static enum eol output_eol(enum crlf_action crlf_action)
+static enum eol output_eol(enum convert_crlf_action crlf_action)
{
switch (crlf_action) {
case CRLF_BINARY:
@@ -246,7 +235,7 @@ static int has_crlf_in_index(const struct index_state *istate, const char *path)
}
static int will_convert_lf_to_crlf(struct text_stat *stats,
- enum crlf_action crlf_action)
+ enum convert_crlf_action crlf_action)
{
if (output_eol(crlf_action) != EOL_CRLF)
return 0;
@@ -499,7 +488,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
static int crlf_to_git(const struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *buf,
- enum crlf_action crlf_action, int conv_flags)
+ enum convert_crlf_action crlf_action, int conv_flags)
{
struct text_stat stats;
char *dst;
@@ -585,8 +574,8 @@ static int crlf_to_git(const struct index_state *istate,
return 1;
}
-static int crlf_to_worktree(const char *src, size_t len,
- struct strbuf *buf, enum crlf_action crlf_action)
+static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf,
+ enum convert_crlf_action crlf_action)
{
char *to_free = NULL;
struct text_stat stats;
@@ -1247,7 +1236,7 @@ static const char *git_path_check_encoding(struct attr_check_item *check)
return value;
}
-static enum crlf_action git_path_check_crlf(struct attr_check_item *check)
+static enum convert_crlf_action git_path_check_crlf(struct attr_check_item *check)
{
const char *value = check->value;
@@ -1297,18 +1286,10 @@ static int git_path_check_ident(struct attr_check_item *check)
return !!ATTR_TRUE(value);
}
-struct conv_attrs {
- struct convert_driver *drv;
- enum crlf_action attr_action; /* What attr says */
- enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
- int ident;
- const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
-};
-
static struct attr_check *check;
-static void convert_attrs(const struct index_state *istate,
- struct conv_attrs *ca, const char *path)
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path)
{
struct attr_check_item *ccheck = NULL;
diff --git a/convert.h b/convert.h
index e29d1026a6..5678e99922 100644
--- a/convert.h
+++ b/convert.h
@@ -63,6 +63,30 @@ struct checkout_metadata {
struct object_id blob;
};
+enum convert_crlf_action {
+ CRLF_UNDEFINED,
+ CRLF_BINARY,
+ CRLF_TEXT,
+ CRLF_TEXT_INPUT,
+ CRLF_TEXT_CRLF,
+ CRLF_AUTO,
+ CRLF_AUTO_INPUT,
+ CRLF_AUTO_CRLF
+};
+
+struct convert_driver;
+
+struct conv_attrs {
+ struct convert_driver *drv;
+ enum convert_crlf_action attr_action; /* What attr says */
+ enum convert_crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
+ int ident;
+ const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
+};
+
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path);
+
extern enum eol core_eol;
extern char *check_roundtrip_encoding;
const char *get_cached_convert_stats_ascii(const struct index_state *istate,
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 01/19] convert: make convert_attrs() and convert structs public
2020-11-04 20:33 ` [PATCH v4 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
@ 2020-12-05 10:40 ` Christian Couder
2020-12-05 21:53 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Christian Couder @ 2020-12-05 10:40 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren,
Jeff Hostetler
On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Move convert_attrs() declaration from convert.c to convert.h, together
> with the conv_attrs struct and the crlf_action enum. This function and
> the data structures will be used outside convert.c in the upcoming
> parallel checkout implementation. Note that crlf_action is renamed to
> convert_crlf_action, which is more appropriate for the global namespace.
It annoys me a bit that some things are called "conv_*" and others
"convert_*". Maybe we could standardize everything, but it could be a
separate patch series.
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> [matheus.bernardino: squash and reword msg]
Not sure we want the above line, which could actually not be
completely true if you rework the patch before it gets merged.
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> --- a/convert.h
> +++ b/convert.h
> @@ -63,6 +63,30 @@ struct checkout_metadata {
> struct object_id blob;
> };
>
> +enum convert_crlf_action {
> + CRLF_UNDEFINED,
> + CRLF_BINARY,
> + CRLF_TEXT,
> + CRLF_TEXT_INPUT,
> + CRLF_TEXT_CRLF,
> + CRLF_AUTO,
> + CRLF_AUTO_INPUT,
> + CRLF_AUTO_CRLF
> +};
Maybe we should also prepend "CONVERT_" to the values?
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v4 01/19] convert: make convert_attrs() and convert structs public
2020-12-05 10:40 ` Christian Couder
@ 2020-12-05 21:53 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-12-05 21:53 UTC (permalink / raw)
To: Christian Couder
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren,
Jeff Hostetler
On Sat, Dec 5, 2020 at 7:40 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > From: Jeff Hostetler <jeffhost@microsoft.com>
> >
> > Move convert_attrs() declaration from convert.c to convert.h, together
> > with the conv_attrs struct and the crlf_action enum. This function and
> > the data structures will be used outside convert.c in the upcoming
> > parallel checkout implementation. Note that crlf_action is renamed to
> > convert_crlf_action, which is more appropriate for the global namespace.
>
> It annoys me a bit that some things are called "conv_*" and others
> "convert_*". Maybe we could standardize everything, but it could be a
> separate patch series.
>
> > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> > [matheus.bernardino: squash and reword msg]
>
> Not sure we want the above line, which could actually not be
> completely true if you rework the patch before it gets merged.
Right, I'll remove this line from this and the next patches. Thanks.
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>
> > --- a/convert.h
> > +++ b/convert.h
> > @@ -63,6 +63,30 @@ struct checkout_metadata {
> > struct object_id blob;
> > };
> >
> > +enum convert_crlf_action {
> > + CRLF_UNDEFINED,
> > + CRLF_BINARY,
> > + CRLF_TEXT,
> > + CRLF_TEXT_INPUT,
> > + CRLF_TEXT_CRLF,
> > + CRLF_AUTO,
> > + CRLF_AUTO_INPUT,
> > + CRLF_AUTO_CRLF
> > +};
>
> Maybe we should also prepend "CONVERT_" to the values?
Yeah, I also wondered about that. But I wasn't sure if it was worth
the change since there are about 52 occurrences of them. Junio later
mentioned [1] that it might be OK to leave them as-is since the use
sites will always pass these values to the API functions, which would
make it clear that they are from the "convert_" family.
[1]: https://lore.kernel.org/git/xmqqd00z397m.fsf@gitster.c.googlers.com/
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-05 11:10 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
` (17 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
precomputed 'struct conv_attrs', not relying, thus, on a index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
- We will already load the conversion attributes in checkout_entry(),
before conversion, to decide whether a path is eligible for parallel
checkout. Therefore, it would be wasteful to load them again later,
for the actual conversion.
- The parallel workers will be responsible for reading, converting and
writing blobs to the working tree. They won't have access to the main
process' index state, so they cannot load the attributes. Instead,
they will receive the preloaded ones and call the _ca() variant of
the conversion functions. Furthermore, the attributes machinery is
optimized to handle paths in sequential order, so it's better to leave
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: squash, remove one function definition and reword]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 47 ++++++++++++++++++++++++-----------------------
convert.h | 37 ++++++++++++++++++++++++++++---------
2 files changed, 52 insertions(+), 32 deletions(-)
diff --git a/convert.c b/convert.c
index f13b001273..ab3d517233 100644
--- a/convert.c
+++ b/convert.c
@@ -1447,7 +1447,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
ident_to_git(dst->buf, dst->len, dst, ca.ident);
}
-static int convert_to_working_tree_internal(const struct index_state *istate,
+static int convert_to_working_tree_internal(const struct conv_attrs *ca,
const char *path, const char *src,
size_t len, struct strbuf *dst,
int normalizing,
@@ -1455,11 +1455,8 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
struct delayed_checkout *dco)
{
int ret = 0, ret_filter = 0;
- struct conv_attrs ca;
-
- convert_attrs(istate, &ca, path);
- ret |= ident_to_worktree(src, len, dst, ca.ident);
+ ret |= ident_to_worktree(src, len, dst, ca->ident);
if (ret) {
src = dst->buf;
len = dst->len;
@@ -1469,49 +1466,53 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
* is a smudge or process filter (even if the process filter doesn't
* support smudge). The filters might expect CRLFs.
*/
- if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
- ret |= crlf_to_worktree(src, len, dst, ca.crlf_action);
+ if ((ca->drv && (ca->drv->smudge || ca->drv->process)) || !normalizing) {
+ ret |= crlf_to_worktree(src, len, dst, ca->crlf_action);
if (ret) {
src = dst->buf;
len = dst->len;
}
}
- ret |= encode_to_worktree(path, src, len, dst, ca.working_tree_encoding);
+ ret |= encode_to_worktree(path, src, len, dst, ca->working_tree_encoding);
if (ret) {
src = dst->buf;
len = dst->len;
}
ret_filter = apply_filter(
- path, src, len, -1, dst, ca.drv, CAP_SMUDGE, meta, dco);
- if (!ret_filter && ca.drv && ca.drv->required)
- die(_("%s: smudge filter %s failed"), path, ca.drv->name);
+ path, src, len, -1, dst, ca->drv, CAP_SMUDGE, meta, dco);
+ if (!ret_filter && ca->drv && ca->drv->required)
+ die(_("%s: smudge filter %s failed"), path, ca->drv->name);
return ret | ret_filter;
}
-int async_convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta,
- void *dco)
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, dco);
}
-int convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta)
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, NULL);
}
int renormalize_buffer(const struct index_state *istate, const char *path,
const char *src, size_t len, struct strbuf *dst)
{
- int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
+ struct conv_attrs ca;
+ int ret;
+
+ convert_attrs(istate, &ca, path);
+ ret = convert_to_working_tree_internal(&ca, path, src, len, dst, 1, NULL, NULL);
if (ret) {
src = dst->buf;
len = dst->len;
diff --git a/convert.h b/convert.h
index 5678e99922..a4838b5e5c 100644
--- a/convert.h
+++ b/convert.h
@@ -99,15 +99,34 @@ const char *get_convert_attr_ascii(const struct index_state *istate,
int convert_to_git(const struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta,
- void *dco);
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta);
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco);
+static inline int convert_to_working_tree(const struct index_state *istate,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return convert_to_working_tree_ca(&ca, path, src, len, dst, meta);
+}
+static inline int async_convert_to_working_tree(const struct index_state *istate,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return async_convert_to_working_tree_ca(&ca, path, src, len, dst, meta, dco);
+}
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
int renormalize_buffer(const struct index_state *istate,
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants
2020-11-04 20:33 ` [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
@ 2020-12-05 11:10 ` Christian Couder
2020-12-05 22:20 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Christian Couder @ 2020-12-05 11:10 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren,
Jeff Hostetler
On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Separate the attribute gathering from the actual conversion by adding
> _ca() variants of the conversion functions. These variants receive a
> precomputed 'struct conv_attrs', not relying, thus, on a index state.
s/a/an/
> They will be used in a future patch adding parallel checkout support,
> for two reasons:
>
> - We will already load the conversion attributes in checkout_entry(),
> before conversion, to decide whether a path is eligible for parallel
> checkout. Therefore, it would be wasteful to load them again later,
> for the actual conversion.
>
> - The parallel workers will be responsible for reading, converting and
> writing blobs to the working tree. They won't have access to the main
> process' index state, so they cannot load the attributes. Instead,
> they will receive the preloaded ones and call the _ca() variant of
> the conversion functions. Furthermore, the attributes machinery is
> optimized to handle paths in sequential order, so it's better to leave
> it for the main process, anyway.
Well explained.
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> [matheus.bernardino: squash, remove one function definition and reword]
<rant++>I'd rather have "Reworked-by: Matheus Tavares
<matheus.bernardino@usp.br>" or "Improved-by: Matheus Tavares
<matheus.bernardino@usp.br>" than lines such as the above
one.</rant++>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> @@ -1447,7 +1447,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
> ident_to_git(dst->buf, dst->len, dst, ca.ident);
> }
>
> -static int convert_to_working_tree_internal(const struct index_state *istate,
> +static int convert_to_working_tree_internal(const struct conv_attrs *ca,
It is still internal, but for consistency it might be better to add
"_ca" to the name of this function too, as we now pass it a "ca"
instead of an "istate".
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants
2020-12-05 11:10 ` Christian Couder
@ 2020-12-05 22:20 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-12-05 22:20 UTC (permalink / raw)
To: Christian Couder
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren,
Jeff Hostetler
On Sat, Dec 5, 2020 at 8:10 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > From: Jeff Hostetler <jeffhost@microsoft.com>
> >
> > Separate the attribute gathering from the actual conversion by adding
> > _ca() variants of the conversion functions. These variants receive a
> > precomputed 'struct conv_attrs', not relying, thus, on a index state.
>
> s/a/an/
Good catch, thanks.
> > They will be used in a future patch adding parallel checkout support,
> > for two reasons:
> >
> > - We will already load the conversion attributes in checkout_entry(),
> > before conversion, to decide whether a path is eligible for parallel
> > checkout. Therefore, it would be wasteful to load them again later,
> > for the actual conversion.
> >
> > - The parallel workers will be responsible for reading, converting and
> > writing blobs to the working tree. They won't have access to the main
> > process' index state, so they cannot load the attributes. Instead,
> > they will receive the preloaded ones and call the _ca() variant of
> > the conversion functions. Furthermore, the attributes machinery is
> > optimized to handle paths in sequential order, so it's better to leave
> > it for the main process, anyway.
>
> Well explained.
>
> > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> > [matheus.bernardino: squash, remove one function definition and reword]
>
> <rant++>I'd rather have "Reworked-by: Matheus Tavares
> <matheus.bernardino@usp.br>" or "Improved-by: Matheus Tavares
> <matheus.bernardino@usp.br>" than lines such as the above
> one.</rant++>
>
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>
> > @@ -1447,7 +1447,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
> > ident_to_git(dst->buf, dst->len, dst, ca.ident);
> > }
> >
> > -static int convert_to_working_tree_internal(const struct index_state *istate,
> > +static int convert_to_working_tree_internal(const struct conv_attrs *ca,
>
> It is still internal, but for consistency it might be better to add
> "_ca" to the name of this function too, as we now pass it a "ca"
> instead of an "istate".
Right, will do.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 03/19] convert: add get_stream_filter_ca() variant
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-05 11:45 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 04/19] convert: add conv_attrs classification Matheus Tavares
` (16 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Like the previous patch, we will also need to call get_stream_filter()
with a precomputed `struct conv_attrs`, when we add support for parallel
checkout workers. So add the _ca() variant which takes the conversion
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: move header comment to ca() variant and reword msg]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 28 +++++++++++++++++-----------
convert.h | 2 ++
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/convert.c b/convert.c
index ab3d517233..0a61e4e9bf 100644
--- a/convert.c
+++ b/convert.c
@@ -1939,34 +1939,31 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
}
/*
- * Return an appropriately constructed filter for the path, or NULL if
+ * Return an appropriately constructed filter for the given ca, or NULL if
* the contents cannot be filtered without reading the whole thing
* in-core.
*
* Note that you would be crazy to set CRLF, smudge/clean or ident to a
* large binary blob you would want us not to slurp into the memory!
*/
-struct stream_filter *get_stream_filter(const struct index_state *istate,
- const char *path,
- const struct object_id *oid)
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid)
{
- struct conv_attrs ca;
struct stream_filter *filter = NULL;
- convert_attrs(istate, &ca, path);
- if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
+ if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
return NULL;
- if (ca.working_tree_encoding)
+ if (ca->working_tree_encoding)
return NULL;
- if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
return NULL;
- if (ca.ident)
+ if (ca->ident)
filter = ident_filter(oid);
- if (output_eol(ca.crlf_action) == EOL_CRLF)
+ if (output_eol(ca->crlf_action) == EOL_CRLF)
filter = cascade_filter(filter, lf_to_crlf_filter());
else
filter = cascade_filter(filter, &null_filter_singleton);
@@ -1974,6 +1971,15 @@ struct stream_filter *get_stream_filter(const struct index_state *istate,
return filter;
}
+struct stream_filter *get_stream_filter(const struct index_state *istate,
+ const char *path,
+ const struct object_id *oid)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return get_stream_filter_ca(&ca, oid);
+}
+
void free_stream_filter(struct stream_filter *filter)
{
filter->vtbl->free(filter);
diff --git a/convert.h b/convert.h
index a4838b5e5c..484b50965d 100644
--- a/convert.h
+++ b/convert.h
@@ -179,6 +179,8 @@ struct stream_filter; /* opaque */
struct stream_filter *get_stream_filter(const struct index_state *istate,
const char *path,
const struct object_id *);
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid);
void free_stream_filter(struct stream_filter *);
int is_null_stream_filter(struct stream_filter *);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 03/19] convert: add get_stream_filter_ca() variant
2020-11-04 20:33 ` [PATCH v4 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
@ 2020-12-05 11:45 ` Christian Couder
0 siblings, 0 replies; 154+ messages in thread
From: Christian Couder @ 2020-12-05 11:45 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren,
Jeff Hostetler
On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
> -struct stream_filter *get_stream_filter(const struct index_state *istate,
> - const char *path,
> - const struct object_id *oid)
> +struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
> + const struct object_id *oid)
Nice that the `path` argument could be removed here.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 04/19] convert: add conv_attrs classification
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (2 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-05 12:07 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 05/19] entry: extract a header file for entry.c functions Matheus Tavares
` (15 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git
Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren,
Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
This will be used in a later commit when deciding whether to add a file
to the parallel or delayed queue during checkout. For now, we can also
use it in get_stream_filter_ca() to simplify the function (as the
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
[matheus.bernardino: use classification in get_stream_filter_ca()]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 26 +++++++++++++++++++-------
convert.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+), 7 deletions(-)
diff --git a/convert.c b/convert.c
index 0a61e4e9bf..3b2d626268 100644
--- a/convert.c
+++ b/convert.c
@@ -1951,13 +1951,7 @@ struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
{
struct stream_filter *filter = NULL;
- if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
- return NULL;
-
- if (ca->working_tree_encoding)
- return NULL;
-
- if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ if (classify_conv_attrs(ca) != CA_CLASS_STREAMABLE)
return NULL;
if (ca->ident)
@@ -2013,3 +2007,21 @@ void clone_checkout_metadata(struct checkout_metadata *dst,
if (blob)
oidcpy(&dst->blob, blob);
}
+
+enum conv_attrs_classification classify_conv_attrs(const struct conv_attrs *ca)
+{
+ if (ca->drv) {
+ if (ca->drv->process)
+ return CA_CLASS_INCORE_PROCESS;
+ if (ca->drv->smudge || ca->drv->clean)
+ return CA_CLASS_INCORE_FILTER;
+ }
+
+ if (ca->working_tree_encoding)
+ return CA_CLASS_INCORE;
+
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ return CA_CLASS_INCORE;
+
+ return CA_CLASS_STREAMABLE;
+}
diff --git a/convert.h b/convert.h
index 484b50965d..43e567a59b 100644
--- a/convert.h
+++ b/convert.h
@@ -200,4 +200,37 @@ int stream_filter(struct stream_filter *,
const char *input, size_t *isize_p,
char *output, size_t *osize_p);
+enum conv_attrs_classification {
+ /*
+ * The blob must be loaded into a buffer before it can be
+ * smudged. All smudging is done in-proc.
+ */
+ CA_CLASS_INCORE,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * single-file driver filter, such as rot13.
+ */
+ CA_CLASS_INCORE_FILTER,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * long-running driver process, such as LFS. This might or
+ * might not use delayed operations. (The important thing is
+ * that there is a single subordinate long-running process
+ * handling all associated blobs and in case of delayed
+ * operations, may hold per-blob state.)
+ */
+ CA_CLASS_INCORE_PROCESS,
+
+ /*
+ * The blob can be streamed and smudged without needing to
+ * completely read it into a buffer.
+ */
+ CA_CLASS_STREAMABLE,
+};
+
+enum conv_attrs_classification classify_conv_attrs(
+ const struct conv_attrs *ca);
+
#endif /* CONVERT_H */
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 04/19] convert: add conv_attrs classification
2020-11-04 20:33 ` [PATCH v4 04/19] convert: add conv_attrs classification Matheus Tavares
@ 2020-12-05 12:07 ` Christian Couder
2020-12-05 22:08 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Christian Couder @ 2020-12-05 12:07 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren,
Jeff Hostetler
On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Create `enum conv_attrs_classification` to express the different ways
> that attributes are handled for a blob during checkout.
Micronit: the subject of the patch might want to be a bit more
explicit like "convert: add conv_attrs_classification enum". Otherwise
it could make one wonder if it is missing an underscore between
"conv_attrs" and "classification".
> This will be used in a later commit when deciding whether to add a file
> to the parallel or delayed queue during checkout. For now, we can also
> use it in get_stream_filter_ca() to simplify the function (as the
> classifying logic is the same).
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> [matheus.bernardino: use classification in get_stream_filter_ca()]
Maybe "Co-authored-by: Matheus Tavares <matheus.bernardino@usp.br>" instead?
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v4 04/19] convert: add conv_attrs classification
2020-12-05 12:07 ` Christian Couder
@ 2020-12-05 22:08 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-12-05 22:08 UTC (permalink / raw)
To: Christian Couder
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren,
Jeff Hostetler
On Sat, Dec 5, 2020 at 9:07 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > From: Jeff Hostetler <jeffhost@microsoft.com>
> >
> > Create `enum conv_attrs_classification` to express the different ways
> > that attributes are handled for a blob during checkout.
>
> Micronit: the subject of the patch might want to be a bit more
> explicit like "convert: add conv_attrs_classification enum". Otherwise
> it could make one wonder if it is missing an underscore between
> "conv_attrs" and "classification".
Makes sense, thanks!
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 05/19] entry: extract a header file for entry.c functions
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (3 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 04/19] convert: add conv_attrs classification Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-06 8:31 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
` (14 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
The declarations of entry.c's public functions and structures currently
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
new entry.h header.
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
apply.c | 1 +
builtin/checkout-index.c | 1 +
builtin/checkout.c | 1 +
builtin/difftool.c | 1 +
cache.h | 24 -----------------------
entry.c | 9 +--------
entry.h | 42 ++++++++++++++++++++++++++++++++++++++++
unpack-trees.c | 1 +
8 files changed, 48 insertions(+), 32 deletions(-)
create mode 100644 entry.h
diff --git a/apply.c b/apply.c
index 76dba93c97..ddec80b4b0 100644
--- a/apply.c
+++ b/apply.c
@@ -21,6 +21,7 @@
#include "quote.h"
#include "rerere.h"
#include "apply.h"
+#include "entry.h"
struct gitdiff_data {
struct strbuf *root;
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 4bbfc92dce..9276ed0258 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -11,6 +11,7 @@
#include "quote.h"
#include "cache-tree.h"
#include "parse-options.h"
+#include "entry.h"
#define CHECKOUT_ALL 4
static int nul_term_line;
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0951f8fee5..b18b9d6f3c 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -26,6 +26,7 @@
#include "unpack-trees.h"
#include "wt-status.h"
#include "xdiff-interface.h"
+#include "entry.h"
static const char * const checkout_usage[] = {
N_("git checkout [<options>] <branch>"),
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 7ac432b881..dfa22b67eb 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -23,6 +23,7 @@
#include "lockfile.h"
#include "object-store.h"
#include "dir.h"
+#include "entry.h"
static int trust_exit_code;
diff --git a/cache.h b/cache.h
index c0072d43b1..ccfeb9ba2b 100644
--- a/cache.h
+++ b/cache.h
@@ -1706,30 +1706,6 @@ const char *show_ident_date(const struct ident_split *id,
*/
int ident_cmp(const struct ident_split *, const struct ident_split *);
-struct checkout {
- struct index_state *istate;
- const char *base_dir;
- int base_dir_len;
- struct delayed_checkout *delayed_checkout;
- struct checkout_metadata meta;
- unsigned force:1,
- quiet:1,
- not_new:1,
- clone:1,
- refresh_cache:1;
-};
-#define CHECKOUT_INIT { NULL, "" }
-
-#define TEMPORARY_FILENAME_LENGTH 25
-int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath, int *nr_checkouts);
-void enable_delayed_checkout(struct checkout *state);
-int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
-/*
- * Unlink the last component and schedule the leading directories for
- * removal, such that empty directories get removed.
- */
-void unlink_entry(const struct cache_entry *ce);
-
struct cache_def {
struct strbuf path;
int flags;
diff --git a/entry.c b/entry.c
index a0532f1f00..b0b8099699 100644
--- a/entry.c
+++ b/entry.c
@@ -6,6 +6,7 @@
#include "submodule.h"
#include "progress.h"
#include "fsmonitor.h"
+#include "entry.h"
static void create_directories(const char *path, int path_len,
const struct checkout *state)
@@ -429,14 +430,6 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-/*
- * Write the contents from ce out to the working tree.
- *
- * When topath[] is not NULL, instead of writing to the working tree
- * file named by ce, a temporary file is created by this function and
- * its name is returned in topath[], which must be able to hold at
- * least TEMPORARY_FILENAME_LENGTH bytes long.
- */
int checkout_entry(struct cache_entry *ce, const struct checkout *state,
char *topath, int *nr_checkouts)
{
diff --git a/entry.h b/entry.h
new file mode 100644
index 0000000000..acbbb90220
--- /dev/null
+++ b/entry.h
@@ -0,0 +1,42 @@
+#ifndef ENTRY_H
+#define ENTRY_H
+
+#include "cache.h"
+#include "convert.h"
+
+struct checkout {
+ struct index_state *istate;
+ const char *base_dir;
+ int base_dir_len;
+ struct delayed_checkout *delayed_checkout;
+ struct checkout_metadata meta;
+ unsigned force:1,
+ quiet:1,
+ not_new:1,
+ clone:1,
+ refresh_cache:1;
+};
+#define CHECKOUT_INIT { NULL, "" }
+
+#define TEMPORARY_FILENAME_LENGTH 25
+/*
+ * Write the contents from ce out to the working tree.
+ *
+ * When topath[] is not NULL, instead of writing to the working tree
+ * file named by ce, a temporary file is created by this function and
+ * its name is returned in topath[], which must be able to hold at
+ * least TEMPORARY_FILENAME_LENGTH bytes long.
+ */
+int checkout_entry(struct cache_entry *ce, const struct checkout *state,
+ char *topath, int *nr_checkouts);
+
+void enable_delayed_checkout(struct checkout *state);
+int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
+
+/*
+ * Unlink the last component and schedule the leading directories for
+ * removal, such that empty directories get removed.
+ */
+void unlink_entry(const struct cache_entry *ce);
+
+#endif /* ENTRY_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 323280dd48..a511fadd89 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -16,6 +16,7 @@
#include "fsmonitor.h"
#include "object-store.h"
#include "promisor-remote.h"
+#include "entry.h"
/*
* Error messages expected by scripts out of plumbing commands such as
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 05/19] entry: extract a header file for entry.c functions
2020-11-04 20:33 ` [PATCH v4 05/19] entry: extract a header file for entry.c functions Matheus Tavares
@ 2020-12-06 8:31 ` Christian Couder
0 siblings, 0 replies; 154+ messages in thread
From: Christian Couder @ 2020-12-06 8:31 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
> --- a/entry.c
> +++ b/entry.c
> @@ -6,6 +6,7 @@
> #include "submodule.h"
> #include "progress.h"
> #include "fsmonitor.h"
> +#include "entry.h"
>
> static void create_directories(const char *path, int path_len,
> const struct checkout *state)
> @@ -429,14 +430,6 @@ static void mark_colliding_entries(const struct checkout *state,
> }
> }
>
> -/*
> - * Write the contents from ce out to the working tree.
> - *
> - * When topath[] is not NULL, instead of writing to the working tree
> - * file named by ce, a temporary file is created by this function and
> - * its name is returned in topath[], which must be able to hold at
> - * least TEMPORARY_FILENAME_LENGTH bytes long.
> - */
About the above change, the commit message might want to say something like:
"While at it let's also move a comment related to checkout_entry()
from entry.c to entry.h as it's more useful to describe the function
there."
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 06/19] entry: make fstat_output() and read_blob_entry() public
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (4 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 05/19] entry: extract a header file for entry.c functions Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
` (13 subsequent siblings)
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
These two functions will be used by the parallel checkout code, so let's
make them public. Note: fstat_output() is renamed to
fstat_checkout_output(), now that it has become public, seeking to avoid
future name collisions.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 8 ++++----
entry.h | 3 +++
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/entry.c b/entry.c
index b0b8099699..b36071a610 100644
--- a/entry.c
+++ b/entry.c
@@ -84,7 +84,7 @@ static int create_file(const char *path, unsigned int mode)
return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
}
-static void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
{
enum object_type type;
void *blob_data = read_object_file(&ce->oid, &type, size);
@@ -109,7 +109,7 @@ static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempf
}
}
-static int fstat_output(int fd, const struct checkout *state, struct stat *st)
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st)
{
/* use fstat() only when path == ce->name */
if (fstat_is_reliable() &&
@@ -132,7 +132,7 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path,
return -1;
result |= stream_blob_to_fd(fd, &ce->oid, filter, 1);
- *fstat_done = fstat_output(fd, state, statbuf);
+ *fstat_done = fstat_checkout_output(fd, state, statbuf);
result |= close(fd);
if (result)
@@ -346,7 +346,7 @@ static int write_entry(struct cache_entry *ce,
wrote = write_in_full(fd, new_blob, size);
if (!to_tempfile)
- fstat_done = fstat_output(fd, state, &st);
+ fstat_done = fstat_checkout_output(fd, state, &st);
close(fd);
free(new_blob);
if (wrote < 0)
diff --git a/entry.h b/entry.h
index acbbb90220..60df93ca78 100644
--- a/entry.h
+++ b/entry.h
@@ -39,4 +39,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
*/
void unlink_entry(const struct cache_entry *ce);
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+
#endif /* ENTRY_H */
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 07/19] entry: extract cache_entry update from write_entry()
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (5 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-06 8:53 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
` (12 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
This code will be used by the parallel checkout functions, outside
entry.c, so extract it to a public function.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 25 ++++++++++++++++---------
entry.h | 2 ++
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/entry.c b/entry.c
index b36071a610..1d2df188e5 100644
--- a/entry.c
+++ b/entry.c
@@ -251,6 +251,18 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
return errs;
}
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st)
+{
+ if (state->refresh_cache) {
+ assert(state->istate);
+ fill_stat_cache_info(state->istate, ce, st);
+ ce->ce_flags |= CE_UPDATE_IN_BASE;
+ mark_fsmonitor_invalid(state->istate, ce);
+ state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ }
+}
+
static int write_entry(struct cache_entry *ce,
char *path, const struct checkout *state, int to_tempfile)
{
@@ -371,15 +383,10 @@ static int write_entry(struct cache_entry *ce,
finish:
if (state->refresh_cache) {
- assert(state->istate);
- if (!fstat_done)
- if (lstat(ce->name, &st) < 0)
- return error_errno("unable to stat just-written file %s",
- ce->name);
- fill_stat_cache_info(state->istate, ce, &st);
- ce->ce_flags |= CE_UPDATE_IN_BASE;
- mark_fsmonitor_invalid(state->istate, ce);
- state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ if (!fstat_done && lstat(ce->name, &st) < 0)
+ return error_errno("unable to stat just-written file %s",
+ ce->name);
+ update_ce_after_write(state, ce , &st);
}
delayed:
return 0;
diff --git a/entry.h b/entry.h
index 60df93ca78..ea7290bcd5 100644
--- a/entry.h
+++ b/entry.h
@@ -41,5 +41,7 @@ void unlink_entry(const struct cache_entry *ce);
void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st);
#endif /* ENTRY_H */
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 07/19] entry: extract cache_entry update from write_entry()
2020-11-04 20:33 ` [PATCH v4 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
@ 2020-12-06 8:53 ` Christian Couder
0 siblings, 0 replies; 154+ messages in thread
From: Christian Couder @ 2020-12-06 8:53 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Wed, Nov 4, 2020 at 9:33 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> This code will be used by the parallel checkout functions, outside
> entry.c, so extract it to a public function.
Maybe the title could be a bit more explicit like: "entry: extract
update_ce_after_write() from write_entry()"
And: s/This code/This new function/
> @@ -371,15 +383,10 @@ static int write_entry(struct cache_entry *ce,
>
> finish:
> if (state->refresh_cache) {
Here we check state->refresh_cache ...
> - assert(state->istate);
> - if (!fstat_done)
> - if (lstat(ce->name, &st) < 0)
> - return error_errno("unable to stat just-written file %s",
> - ce->name);
> - fill_stat_cache_info(state->istate, ce, &st);
> - ce->ce_flags |= CE_UPDATE_IN_BASE;
> - mark_fsmonitor_invalid(state->istate, ce);
> - state->istate->cache_changed |= CE_ENTRY_CHANGED;
> + if (!fstat_done && lstat(ce->name, &st) < 0)
> + return error_errno("unable to stat just-written file %s",
> + ce->name);
> + update_ce_after_write(state, ce , &st);
... and inside update_ce_after_write() we check state->refresh_cache
again. So the check is now done twice.
Not a big issue though.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry()
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (6 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-06 9:35 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
` (11 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
In a following patch, checkout_entry() will use conv_attrs to decide
whether an entry should be enqueued for parallel checkout or not. But
the attributes lookup only happens lower in this call stack. To avoid
the unnecessary work of loading the attributes twice, let's move it up
to checkout_entry(), and pass the loaded struct down to write_entry().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/entry.c b/entry.c
index 1d2df188e5..486712c3a9 100644
--- a/entry.c
+++ b/entry.c
@@ -263,8 +263,9 @@ void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
}
}
-static int write_entry(struct cache_entry *ce,
- char *path, const struct checkout *state, int to_tempfile)
+/* Note: ca is used (and required) iff the entry refers to a regular file. */
+static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca,
+ const struct checkout *state, int to_tempfile)
{
unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
struct delayed_checkout *dco = state->delayed_checkout;
@@ -281,8 +282,7 @@ static int write_entry(struct cache_entry *ce,
clone_checkout_metadata(&meta, &state->meta, &ce->oid);
if (ce_mode_s_ifmt == S_IFREG) {
- struct stream_filter *filter = get_stream_filter(state->istate, ce->name,
- &ce->oid);
+ struct stream_filter *filter = get_stream_filter_ca(ca, &ce->oid);
if (filter &&
!streaming_write_entry(ce, path, filter,
state, to_tempfile,
@@ -329,14 +329,17 @@ static int write_entry(struct cache_entry *ce,
* Convert from git internal format to working tree format
*/
if (dco && dco->state != CE_NO_DELAY) {
- ret = async_convert_to_working_tree(state->istate, ce->name, new_blob,
- size, &buf, &meta, dco);
+ ret = async_convert_to_working_tree_ca(ca, ce->name,
+ new_blob, size,
+ &buf, &meta, dco);
if (ret && string_list_has_string(&dco->paths, ce->name)) {
free(new_blob);
goto delayed;
}
- } else
- ret = convert_to_working_tree(state->istate, ce->name, new_blob, size, &buf, &meta);
+ } else {
+ ret = convert_to_working_tree_ca(ca, ce->name, new_blob,
+ size, &buf, &meta);
+ }
if (ret) {
free(new_blob);
@@ -442,6 +445,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
+ struct conv_attrs ca_buf, *ca = NULL;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -454,8 +458,13 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
}
- if (topath)
- return write_entry(ce, topath, state, 1);
+ if (topath) {
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
+ }
+ return write_entry(ce, topath, ca, state, 1);
+ }
strbuf_reset(&path);
strbuf_add(&path, state->base_dir, state->base_dir_len);
@@ -517,9 +526,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
create_directories(path.buf, path.len, state);
+
if (nr_checkouts)
(*nr_checkouts)++;
- return write_entry(ce, path.buf, state, 0);
+
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
+ }
+
+ return write_entry(ce, path.buf, NULL, state, 0);
}
void unlink_entry(const struct cache_entry *ce)
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry()
2020-11-04 20:33 ` [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
@ 2020-12-06 9:35 ` Christian Couder
2020-12-07 13:52 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Christian Couder @ 2020-12-06 9:35 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Wed, Nov 4, 2020 at 9:34 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
> + if (topath) {
> + if (S_ISREG(ce->ce_mode)) {
> + convert_attrs(state->istate, &ca_buf, ce->name);
> + ca = &ca_buf;
> + }
> + return write_entry(ce, topath, ca, state, 1);
We pass `ca` here instead of `&ca_buf` because ca is NULL if we are
not dealing with a regular file. Ok, I think it's indeed better to
pass NULL in this case.
> @@ -517,9 +526,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> return 0;
>
> create_directories(path.buf, path.len, state);
> +
> if (nr_checkouts)
> (*nr_checkouts)++;
> - return write_entry(ce, path.buf, state, 0);
> +
> + if (S_ISREG(ce->ce_mode)) {
> + convert_attrs(state->istate, &ca_buf, ce->name);
> + ca = &ca_buf;
> + }
> +
> + return write_entry(ce, path.buf, NULL, state, 0);
I am not sure why NULL is passed here though instead of `ca`.
The following comment is added in front of write_entry():
+/* Note: ca is used (and required) iff the entry refers to a regular file. */
So I would think that `ca` should be passed.
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry()
2020-12-06 9:35 ` Christian Couder
@ 2020-12-07 13:52 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-12-07 13:52 UTC (permalink / raw)
To: Christian Couder
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Sun, Dec 6, 2020 at 6:36 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 9:34 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> > + if (topath) {
> > + if (S_ISREG(ce->ce_mode)) {
> > + convert_attrs(state->istate, &ca_buf, ce->name);
> > + ca = &ca_buf;
> > + }
> > + return write_entry(ce, topath, ca, state, 1);
>
> We pass `ca` here instead of `&ca_buf` because ca is NULL if we are
> not dealing with a regular file. Ok, I think it's indeed better to
> pass NULL in this case.
>
> > @@ -517,9 +526,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> > return 0;
> >
> > create_directories(path.buf, path.len, state);
> > +
> > if (nr_checkouts)
> > (*nr_checkouts)++;
> > - return write_entry(ce, path.buf, state, 0);
> > +
> > + if (S_ISREG(ce->ce_mode)) {
> > + convert_attrs(state->istate, &ca_buf, ce->name);
> > + ca = &ca_buf;
> > + }
> > +
> > + return write_entry(ce, path.buf, NULL, state, 0);
>
> I am not sure why NULL is passed here though instead of `ca`.
Oops, this is indeed wrong. I think I forgot to modify this line while
applying the changes from the last review round. Thanks for catching
it!
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (7 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-06 10:02 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
` (10 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
The parallel checkout machinery will call checkout_entry() for entries
that could not be written in parallel due to path collisions. At this
point, we will already be holding the conversion attributes for each
entry, and it would be wasteful to let checkout_entry() load these
again. Instead, let's add the checkout_entry_ca() variant, which
optionally takes a preloaded conv_attrs struct.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 13 +++++++------
entry.h | 12 ++++++++++--
2 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/entry.c b/entry.c
index 486712c3a9..9d79a5671f 100644
--- a/entry.c
+++ b/entry.c
@@ -440,12 +440,13 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts)
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts)
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
- struct conv_attrs ca_buf, *ca = NULL;
+ struct conv_attrs ca_buf;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -459,7 +460,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
}
if (topath) {
- if (S_ISREG(ce->ce_mode)) {
+ if (S_ISREG(ce->ce_mode) && !ca) {
convert_attrs(state->istate, &ca_buf, ce->name);
ca = &ca_buf;
}
@@ -530,12 +531,12 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
if (nr_checkouts)
(*nr_checkouts)++;
- if (S_ISREG(ce->ce_mode)) {
+ if (S_ISREG(ce->ce_mode) && !ca) {
convert_attrs(state->istate, &ca_buf, ce->name);
ca = &ca_buf;
}
- return write_entry(ce, path.buf, NULL, state, 0);
+ return write_entry(ce, path.buf, ca, state, 0);
}
void unlink_entry(const struct cache_entry *ce)
diff --git a/entry.h b/entry.h
index ea7290bcd5..d8244c5db2 100644
--- a/entry.h
+++ b/entry.h
@@ -26,9 +26,17 @@ struct checkout {
* file named by ce, a temporary file is created by this function and
* its name is returned in topath[], which must be able to hold at
* least TEMPORARY_FILENAME_LENGTH bytes long.
+ *
+ * With checkout_entry_ca(), callers can optionally pass a preloaded
+ * conv_attrs struct (to avoid reloading it), when ce refers to a
+ * regular file. If ca is NULL, the attributes will be loaded
+ * internally when (and if) needed.
*/
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts);
+#define checkout_entry(ce, state, topath, nr_checkouts) \
+ checkout_entry_ca(ce, NULL, state, topath, nr_checkouts)
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts);
void enable_delayed_checkout(struct checkout *state);
int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs
2020-11-04 20:33 ` [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
@ 2020-12-06 10:02 ` Christian Couder
2020-12-07 16:47 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Christian Couder @ 2020-12-06 10:02 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Wed, Nov 4, 2020 at 9:34 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
The title might be a bit shorter with: s/which takes/taking/ or
s/which takes/using/
> @@ -530,12 +531,12 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> if (nr_checkouts)
> (*nr_checkouts)++;
>
> - if (S_ISREG(ce->ce_mode)) {
> + if (S_ISREG(ce->ce_mode) && !ca) {
> convert_attrs(state->istate, &ca_buf, ce->name);
> ca = &ca_buf;
> }
I wonder if it's possible to avoid repeating the above 4 lines twice
in this function.
> - return write_entry(ce, path.buf, NULL, state, 0);
> + return write_entry(ce, path.buf, ca, state, 0);
It's good that ca is eventually passed here, but it might belong to
the previous patch.
> -int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> - char *topath, int *nr_checkouts);
> +#define checkout_entry(ce, state, topath, nr_checkouts) \
> + checkout_entry_ca(ce, NULL, state, topath, nr_checkouts)
I thought that we prefer inline functions over macros when possible.
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs
2020-12-06 10:02 ` Christian Couder
@ 2020-12-07 16:47 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-12-07 16:47 UTC (permalink / raw)
To: Christian Couder
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Sun, Dec 6, 2020 at 7:03 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 9:34 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> The title might be a bit shorter with: s/which takes/taking/ or
> s/which takes/using/
>
> > @@ -530,12 +531,12 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> > if (nr_checkouts)
> > (*nr_checkouts)++;
> >
> > - if (S_ISREG(ce->ce_mode)) {
> > + if (S_ISREG(ce->ce_mode) && !ca) {
> > convert_attrs(state->istate, &ca_buf, ce->name);
> > ca = &ca_buf;
> > }
>
> I wonder if it's possible to avoid repeating the above 4 lines twice
> in this function.
Yeah, I was thinking about that as well. But the only way I found to
avoid the repetition would be to place this block before check_path().
The problem is that we would end up calling convert_attrs() even for
the cases where we later decide not to checkout the entry (because,
for example, state->not_new is true or the file is already up to
date).
> > - return write_entry(ce, path.buf, NULL, state, 0);
> > + return write_entry(ce, path.buf, ca, state, 0);
>
> It's good that ca is eventually passed here, but it might belong to
> the previous patch.
Right, I'll fix the previous patch.
> > -int checkout_entry(struct cache_entry *ce, const struct checkout *state,
> > - char *topath, int *nr_checkouts);
> > +#define checkout_entry(ce, state, topath, nr_checkouts) \
> > + checkout_entry_ca(ce, NULL, state, topath, nr_checkouts)
>
> I thought that we prefer inline functions over macros when possible.
Thanks, I didn't know about the preference. I'll change that.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (8 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-06 11:36 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 11/19] parallel-checkout: make it truly parallel Matheus Tavares
` (9 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
This new interface allows us to enqueue some of the entries being
checked out to later call write_entry() for them in parallel. For now,
the parallel checkout machinery is enabled by default and there is no
user configuration, but run_parallel_checkout() just writes the queued
entries in sequence (without spawning additional workers). The next
patch will actually implement the parallelism and, later, we will make
it configurable.
When there are path collisions among the entries being written (which
can happen e.g. with case-sensitive files in case-insensitive file
systems), the parallel checkout code detects the problem and marks the
item with PC_ITEM_COLLIDED. Later, these items are sequentially fed to
checkout_entry() again. This is similar to the way the sequential code
deals with collisions, overwriting the previously checked out entries
with the subsequent ones. The only difference is that, when we start
writing the entries in parallel, we won't be able to determine which of
the colliding entries will survive on disk (for the sequential
algorithm, it is always the last one).
I also experimented with the idea of not overwriting colliding entries,
and it seemed to work well in my simple tests. However, because just one
entry of each colliding group would be actually written, the others
would have null lstat() fields on the index. This might not be a problem
by itself, but it could cause performance penalties for subsequent
commands that need to refresh the index: when the st_size value cached
is 0, read-cache.c:ie_modified() will go to the filesystem to see if the
contents match. As mentioned in the function:
* Immediately after read-tree or update-index --cacheinfo,
* the length field is zero, as we have never even read the
* lstat(2) information once, and we cannot trust DATA_CHANGED
* returned by ie_match_stat() which in turn was returned by
* ce_match_stat_basic() to signal that the filesize of the
* blob changed. We have to actually go to the filesystem to
* see if the contents match, and if so, should answer "unchanged".
So, if we have N entries in a colliding group and we decide to write and
lstat() only one of them, every subsequent git-status will have to read,
convert, and hash the written file N - 1 times, to check that the N - 1
unwritten entries are dirty. By checking out all colliding entries (like
the sequential code does), we only pay the overhead once.
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
Makefile | 1 +
entry.c | 17 ++-
parallel-checkout.c | 362 ++++++++++++++++++++++++++++++++++++++++++++
parallel-checkout.h | 27 ++++
unpack-trees.c | 6 +-
5 files changed, 410 insertions(+), 3 deletions(-)
create mode 100644 parallel-checkout.c
create mode 100644 parallel-checkout.h
diff --git a/Makefile b/Makefile
index 1fb0ec1705..10ee5e709b 100644
--- a/Makefile
+++ b/Makefile
@@ -945,6 +945,7 @@ LIB_OBJS += pack-revindex.o
LIB_OBJS += pack-write.o
LIB_OBJS += packfile.o
LIB_OBJS += pager.o
+LIB_OBJS += parallel-checkout.o
LIB_OBJS += parse-options-cb.o
LIB_OBJS += parse-options.o
LIB_OBJS += patch-delta.o
diff --git a/entry.c b/entry.c
index 9d79a5671f..6676954431 100644
--- a/entry.c
+++ b/entry.c
@@ -7,6 +7,7 @@
#include "progress.h"
#include "fsmonitor.h"
#include "entry.h"
+#include "parallel-checkout.h"
static void create_directories(const char *path, int path_len,
const struct checkout *state)
@@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
- if (dup == ce)
- break;
+ if (dup == ce) {
+ /*
+ * Parallel checkout creates the files in no particular
+ * order. So the other side of the collision may appear
+ * after the given cache_entry in the array.
+ */
+ if (parallel_checkout_status() == PC_RUNNING)
+ continue;
+ else
+ break;
+ }
if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
@@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
ca = &ca_buf;
}
+ if (!enqueue_checkout(ce, ca))
+ return 0;
+
return write_entry(ce, path.buf, ca, state, 0);
}
diff --git a/parallel-checkout.c b/parallel-checkout.c
new file mode 100644
index 0000000000..fd871b09d3
--- /dev/null
+++ b/parallel-checkout.c
@@ -0,0 +1,362 @@
+#include "cache.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "streaming.h"
+
+enum pc_item_status {
+ PC_ITEM_PENDING = 0,
+ PC_ITEM_WRITTEN,
+ /*
+ * The entry could not be written because there was another file
+ * already present in its path or leading directories. Since
+ * checkout_entry_ca() removes such files from the working tree before
+ * enqueueing the entry for parallel checkout, it means that there was
+ * a path collision among the entries being written.
+ */
+ PC_ITEM_COLLIDED,
+ PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+ /* pointer to a istate->cache[] entry. Not owned by us. */
+ struct cache_entry *ce;
+ struct conv_attrs ca;
+ struct stat st;
+ enum pc_item_status status;
+};
+
+struct parallel_checkout {
+ enum pc_status status;
+ struct parallel_checkout_item *items;
+ size_t nr, alloc;
+};
+
+static struct parallel_checkout parallel_checkout;
+
+enum pc_status parallel_checkout_status(void)
+{
+ return parallel_checkout.status;
+}
+
+void init_parallel_checkout(void)
+{
+ if (parallel_checkout.status != PC_UNINITIALIZED)
+ BUG("parallel checkout already initialized");
+
+ parallel_checkout.status = PC_ACCEPTING_ENTRIES;
+}
+
+static void finish_parallel_checkout(void)
+{
+ if (parallel_checkout.status == PC_UNINITIALIZED)
+ BUG("cannot finish parallel checkout: not initialized yet");
+
+ free(parallel_checkout.items);
+ memset(¶llel_checkout, 0, sizeof(parallel_checkout));
+}
+
+static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
+ const struct conv_attrs *ca)
+{
+ enum conv_attrs_classification c;
+
+ if (!S_ISREG(ce->ce_mode))
+ return 0;
+
+ c = classify_conv_attrs(ca);
+ switch (c) {
+ case CA_CLASS_INCORE:
+ return 1;
+
+ case CA_CLASS_INCORE_FILTER:
+ /*
+ * It would be safe to allow concurrent instances of
+ * single-file smudge filters, like rot13, but we should not
+ * assume that all filters are parallel-process safe. So we
+ * don't allow this.
+ */
+ return 0;
+
+ case CA_CLASS_INCORE_PROCESS:
+ /*
+ * The parallel queue and the delayed queue are not compatible,
+ * so they must be kept completely separated. And we can't tell
+ * if a long-running process will delay its response without
+ * actually asking it to perform the filtering. Therefore, this
+ * type of filter is not allowed in parallel checkout.
+ *
+ * Furthermore, there should only be one instance of the
+ * long-running process filter as we don't know how it is
+ * managing its own concurrency. So, spreading the entries that
+ * requisite such a filter among the parallel workers would
+ * require a lot more inter-process communication. We would
+ * probably have to designate a single process to interact with
+ * the filter and send all the necessary data to it, for each
+ * entry.
+ */
+ return 0;
+
+ case CA_CLASS_STREAMABLE:
+ return 1;
+
+ default:
+ BUG("unsupported conv_attrs classification '%d'", c);
+ }
+}
+
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
+{
+ struct parallel_checkout_item *pc_item;
+
+ if (parallel_checkout.status != PC_ACCEPTING_ENTRIES ||
+ !is_eligible_for_parallel_checkout(ce, ca))
+ return -1;
+
+ ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
+ parallel_checkout.alloc);
+
+ pc_item = ¶llel_checkout.items[parallel_checkout.nr++];
+ pc_item->ce = ce;
+ memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
+ pc_item->status = PC_ITEM_PENDING;
+
+ return 0;
+}
+
+static int handle_results(struct checkout *state)
+{
+ int ret = 0;
+ size_t i;
+ int have_pending = 0;
+
+ /*
+ * We first update the successfully written entries with the collected
+ * stat() data, so that they can be found by mark_colliding_entries(),
+ * in the next loop, when necessary.
+ */
+ for (i = 0; i < parallel_checkout.nr; i++) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+ if (pc_item->status == PC_ITEM_WRITTEN)
+ update_ce_after_write(state, pc_item->ce, &pc_item->st);
+ }
+
+ for (i = 0; i < parallel_checkout.nr; i++) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+
+ switch(pc_item->status) {
+ case PC_ITEM_WRITTEN:
+ /* Already handled */
+ break;
+ case PC_ITEM_COLLIDED:
+ /*
+ * The entry could not be checked out due to a path
+ * collision with another entry. Since there can only
+ * be one entry of each colliding group on the disk, we
+ * could skip trying to check out this one and move on.
+ * However, this would leave the unwritten entries with
+ * null stat() fields on the index, which could
+ * potentially slow down subsequent operations that
+ * require refreshing it: git would not be able to
+ * trust st_size and would have to go to the filesystem
+ * to see if the contents match (see ie_modified()).
+ *
+ * Instead, let's pay the overhead only once, now, and
+ * call checkout_entry_ca() again for this file, to
+ * have it's stat() data stored in the index. This also
+ * has the benefit of adding this entry and its
+ * colliding pair to the collision report message.
+ * Additionally, this overwriting behavior is consistent
+ * with what the sequential checkout does, so it doesn't
+ * add any extra overhead.
+ */
+ ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
+ state, NULL, NULL);
+ break;
+ case PC_ITEM_PENDING:
+ have_pending = 1;
+ /* fall through */
+ case PC_ITEM_FAILED:
+ ret = -1;
+ break;
+ default:
+ BUG("unknown checkout item status in parallel checkout");
+ }
+ }
+
+ if (have_pending)
+ error(_("parallel checkout finished with pending entries"));
+
+ return ret;
+}
+
+static int reset_fd(int fd, const char *path)
+{
+ if (lseek(fd, 0, SEEK_SET) != 0)
+ return error_errno("failed to rewind descriptor of %s", path);
+ if (ftruncate(fd, 0))
+ return error_errno("failed to truncate file %s", path);
+ return 0;
+}
+
+static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
+ const char *path)
+{
+ int ret;
+ struct stream_filter *filter;
+ struct strbuf buf = STRBUF_INIT;
+ char *new_blob;
+ unsigned long size;
+ size_t newsize = 0;
+ ssize_t wrote;
+
+ /* Sanity check */
+ assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
+
+ filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
+ if (filter) {
+ if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
+ /* On error, reset fd to try writing without streaming */
+ if (reset_fd(fd, path))
+ return -1;
+ } else {
+ return 0;
+ }
+ }
+
+ new_blob = read_blob_entry(pc_item->ce, &size);
+ if (!new_blob)
+ return error("unable to read sha1 file of %s (%s)", path,
+ oid_to_hex(&pc_item->ce->oid));
+
+ /*
+ * checkout metadata is used to give context for external process
+ * filters. Files requiring such filters are not eligible for parallel
+ * checkout, so pass NULL.
+ */
+ ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
+ new_blob, size, &buf, NULL);
+
+ if (ret) {
+ free(new_blob);
+ new_blob = strbuf_detach(&buf, &newsize);
+ size = newsize;
+ }
+
+ wrote = write_in_full(fd, new_blob, size);
+ free(new_blob);
+ if (wrote < 0)
+ return error("unable to write file %s", path);
+
+ return 0;
+}
+
+static int close_and_clear(int *fd)
+{
+ int ret = 0;
+
+ if (*fd >= 0) {
+ ret = close(*fd);
+ *fd = -1;
+ }
+
+ return ret;
+}
+
+static void write_pc_item(struct parallel_checkout_item *pc_item,
+ struct checkout *state)
+{
+ unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
+ int fd = -1, fstat_done = 0;
+ struct strbuf path = STRBUF_INIT;
+ const char *dir_sep;
+
+ strbuf_add(&path, state->base_dir, state->base_dir_len);
+ strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
+
+ dir_sep = find_last_dir_sep(path.buf);
+
+ /*
+ * The leading dirs should have been already created by now. But, in
+ * case of path collisions, one of the dirs could have been replaced by
+ * a symlink (checked out after we enqueued this entry for parallel
+ * checkout). Thus, we must check the leading dirs again.
+ */
+ if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
+ state->base_dir_len)) {
+ pc_item->status = PC_ITEM_COLLIDED;
+ goto out;
+ }
+
+ fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode);
+
+ if (fd < 0) {
+ if (errno == EEXIST || errno == EISDIR) {
+ /*
+ * Errors which probably represent a path collision.
+ * Suppress the error message and mark the item to be
+ * retried later, sequentially. ENOTDIR and ENOENT are
+ * also interesting, but the above has_dirs_only_path()
+ * call should have already caught these cases.
+ */
+ pc_item->status = PC_ITEM_COLLIDED;
+ } else {
+ error_errno("failed to open file %s", path.buf);
+ pc_item->status = PC_ITEM_FAILED;
+ }
+ goto out;
+ }
+
+ if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
+ /* Error was already reported. */
+ pc_item->status = PC_ITEM_FAILED;
+ goto out;
+ }
+
+ fstat_done = fstat_checkout_output(fd, state, &pc_item->st);
+
+ if (close_and_clear(&fd)) {
+ error_errno("unable to close file %s", path.buf);
+ pc_item->status = PC_ITEM_FAILED;
+ goto out;
+ }
+
+ if (state->refresh_cache && !fstat_done && lstat(path.buf, &pc_item->st) < 0) {
+ error_errno("unable to stat just-written file %s", path.buf);
+ pc_item->status = PC_ITEM_FAILED;
+ goto out;
+ }
+
+ pc_item->status = PC_ITEM_WRITTEN;
+
+out:
+ /*
+ * No need to check close() return. At this point, either fd is already
+ * closed, or we are on an error path, that has already been reported.
+ */
+ close_and_clear(&fd);
+ strbuf_release(&path);
+}
+
+static void write_items_sequentially(struct checkout *state)
+{
+ size_t i;
+
+ for (i = 0; i < parallel_checkout.nr; i++)
+ write_pc_item(¶llel_checkout.items[i], state);
+}
+
+int run_parallel_checkout(struct checkout *state)
+{
+ int ret;
+
+ if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
+ BUG("cannot run parallel checkout: uninitialized or already running");
+
+ parallel_checkout.status = PC_RUNNING;
+
+ write_items_sequentially(state);
+ ret = handle_results(state);
+
+ finish_parallel_checkout();
+ return ret;
+}
diff --git a/parallel-checkout.h b/parallel-checkout.h
new file mode 100644
index 0000000000..e6d6fc01ea
--- /dev/null
+++ b/parallel-checkout.h
@@ -0,0 +1,27 @@
+#ifndef PARALLEL_CHECKOUT_H
+#define PARALLEL_CHECKOUT_H
+
+struct cache_entry;
+struct checkout;
+struct conv_attrs;
+
+enum pc_status {
+ PC_UNINITIALIZED = 0,
+ PC_ACCEPTING_ENTRIES,
+ PC_RUNNING,
+};
+
+enum pc_status parallel_checkout_status(void);
+void init_parallel_checkout(void);
+
+/*
+ * Return -1 if parallel checkout is currently not enabled or if the entry is
+ * not eligible for parallel checkout. Otherwise, enqueue the entry for later
+ * write and return 0.
+ */
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+
+/* Write all the queued entries, returning 0 on success.*/
+int run_parallel_checkout(struct checkout *state);
+
+#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index a511fadd89..1b1da7485a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -17,6 +17,7 @@
#include "object-store.h"
#include "promisor-remote.h"
#include "entry.h"
+#include "parallel-checkout.h"
/*
* Error messages expected by scripts out of plumbing commands such as
@@ -438,7 +439,6 @@ static int check_updates(struct unpack_trees_options *o,
if (should_update_submodules())
load_gitmodules_file(index, &state);
- enable_delayed_checkout(&state);
if (has_promisor_remote()) {
/*
* Prefetch the objects that are to be checked out in the loop
@@ -461,6 +461,9 @@ static int check_updates(struct unpack_trees_options *o,
to_fetch.oid, to_fetch.nr);
oid_array_clear(&to_fetch);
}
+
+ enable_delayed_checkout(&state);
+ init_parallel_checkout();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
@@ -474,6 +477,7 @@ static int check_updates(struct unpack_trees_options *o,
}
}
stop_progress(&progress);
+ errs |= run_parallel_checkout(&state);
errs |= finish_delayed_checkout(&state, NULL);
git_attr_set_direction(GIT_ATTR_CHECKIN);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout
2020-11-04 20:33 ` [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2020-12-06 11:36 ` Christian Couder
2020-12-07 19:06 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Christian Couder @ 2020-12-06 11:36 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Wed, Nov 4, 2020 at 9:34 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> This new interface allows us to enqueue some of the entries being
> checked out to later call write_entry() for them in parallel. For now,
> the parallel checkout machinery is enabled by default and there is no
> user configuration, but run_parallel_checkout() just writes the queued
> entries in sequence (without spawning additional workers). The next
> patch will actually implement the parallelism and, later, we will make
> it configurable.
I would think that it might be more logical to first add a
configuration that does nothing, then add writing the queued entries
in sequence without parallelism, and then add actual parallelism.
> When there are path collisions among the entries being written (which
> can happen e.g. with case-sensitive files in case-insensitive file
> systems), the parallel checkout code detects the problem and marks the
> item with PC_ITEM_COLLIDED.
Is this needed in this step that only writes the queued entries in
sequence without parallelism, or could this be added later, before the
step that adds actual parallelism?
> Later, these items are sequentially fed to
> checkout_entry() again. This is similar to the way the sequential code
> deals with collisions, overwriting the previously checked out entries
> with the subsequent ones. The only difference is that, when we start
> writing the entries in parallel, we won't be able to determine which of
> the colliding entries will survive on disk (for the sequential
> algorithm, it is always the last one).
So I guess that PC_ITEM_COLLIDED will then be used to decide which
entries will not be checked out in parallel?
> I also experimented with the idea of not overwriting colliding entries,
> and it seemed to work well in my simple tests.
There are a number of co-author of this patch, so it's not very clear
who "I" is. Maybe:
"The idea of not overwriting colliding entries seemed to work well in
simple tests, however ..."
> However, because just one
> entry of each colliding group would be actually written, the others
> would have null lstat() fields on the index. This might not be a problem
> by itself, but it could cause performance penalties for subsequent
> commands that need to refresh the index: when the st_size value cached
> is 0, read-cache.c:ie_modified() will go to the filesystem to see if the
> contents match. As mentioned in the function:
>
> * Immediately after read-tree or update-index --cacheinfo,
> * the length field is zero, as we have never even read the
> * lstat(2) information once, and we cannot trust DATA_CHANGED
> * returned by ie_match_stat() which in turn was returned by
> * ce_match_stat_basic() to signal that the filesize of the
> * blob changed. We have to actually go to the filesystem to
> * see if the contents match, and if so, should answer "unchanged".
>
> So, if we have N entries in a colliding group and we decide to write and
> lstat() only one of them, every subsequent git-status will have to read,
> convert, and hash the written file N - 1 times, to check that the N - 1
> unwritten entries are dirty. By checking out all colliding entries (like
> the sequential code does), we only pay the overhead once.
Ok.
> 5 files changed, 410 insertions(+), 3 deletions(-)
It looks like a lot of new code in one patch/commit, which is why it
might be interesting to split it.
> @@ -7,6 +7,7 @@
> #include "progress.h"
> #include "fsmonitor.h"
> #include "entry.h"
> +#include "parallel-checkout.h"
>
> static void create_directories(const char *path, int path_len,
> const struct checkout *state)
> @@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
> for (i = 0; i < state->istate->cache_nr; i++) {
> struct cache_entry *dup = state->istate->cache[i];
>
> - if (dup == ce)
> - break;
> + if (dup == ce) {
> + /*
> + * Parallel checkout creates the files in no particular
> + * order. So the other side of the collision may appear
> + * after the given cache_entry in the array.
> + */
Is it really the case right now that the code creates files in no
particular order or will that be the case later when actual
parallelism is implemented?
> + if (parallel_checkout_status() == PC_RUNNING)
> + continue;
> + else
> + break;
> + }
> +struct parallel_checkout_item {
> + /* pointer to a istate->cache[] entry. Not owned by us. */
> + struct cache_entry *ce;
> + struct conv_attrs ca;
> + struct stat st;
> + enum pc_item_status status;
> +};
"item" seems not very clear to me. If there is only one
parallel_checkout_item for each cache_entry then it might be better to
use "parallel_checkout_entry" instead of "parallel_checkout_item".
> +enum pc_status {
> + PC_UNINITIALIZED = 0,
> + PC_ACCEPTING_ENTRIES,
> + PC_RUNNING,
> +};
> +
> +enum pc_status parallel_checkout_status(void);
> +void init_parallel_checkout(void);
Maybe a comment to tell what the above function does could be helpful.
If I had to guess, I would write something like:
/*
* Put parallel checkout into the PC_ACCEPTING_ENTRIES state.
* Should be used only when in the PC_UNINITIALIZED state.
*/
> +/*
> + * Return -1 if parallel checkout is currently not enabled
Is it "enabled" or "initialized" or "configured" here? Does it refer
to `enum pc_status` or a config option or something else? Looking at
the code, it is testing if the status PC_ACCEPTING_ENTRIES, so
perhaps: s/not enabled/not accepting entries/
> or if the entry is
> + * not eligible for parallel checkout. Otherwise, enqueue the entry for later
> + * write and return 0.
> + */
> +int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout
2020-12-06 11:36 ` Christian Couder
@ 2020-12-07 19:06 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-12-07 19:06 UTC (permalink / raw)
To: Christian Couder
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
Hi, Christian
On Sun, Dec 6, 2020 at 8:36 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 9:34 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > This new interface allows us to enqueue some of the entries being
> > checked out to later call write_entry() for them in parallel. For now,
> > the parallel checkout machinery is enabled by default and there is no
> > user configuration, but run_parallel_checkout() just writes the queued
> > entries in sequence (without spawning additional workers). The next
> > patch will actually implement the parallelism and, later, we will make
> > it configurable.
I just noticed that this part of the commit message is a little
outdated. I'll fix it for v5. Currently, the parallelism and
configuration are added in the same patch (the next one). This way,
the patch that adds parallelism can already include runtime numbers
for different configuration values (which shows when the change is
beneficial).
> I would think that it might be more logical to first add a
> configuration that does nothing, then add writing the queued entries
> in sequence without parallelism, and then add actual parallelism.
I'm not sure I get the idea. Would the first patch add just the
documentation for `checkout.workers` and
`checkout.thresholdForParallelism` in
`Documentation/config/checkout.txt`, without the support for it in the
code? In that case, wouldn't the patch become somewhat incomplete on
its own?
> > When there are path collisions among the entries being written (which
> > can happen e.g. with case-sensitive files in case-insensitive file
> > systems), the parallel checkout code detects the problem and marks the
> > item with PC_ITEM_COLLIDED.
>
> Is this needed in this step that only writes the queued entries in
> sequence without parallelism, or could this be added later, before the
> step that adds actual parallelism?
This is already used in this patch. Even though the parallel checkout
machinery only learns to spawn additional workers in the next patch,
it can already encounter path collisions when writing the entries
sequentially. PC_ITEM_COLLIDED is then used to mark the colliding
entries, so that they can be properly handled later.
> > Later, these items are sequentially fed to
> > checkout_entry() again. This is similar to the way the sequential code
> > deals with collisions, overwriting the previously checked out entries
> > with the subsequent ones. The only difference is that, when we start
> > writing the entries in parallel, we won't be able to determine which of
> > the colliding entries will survive on disk (for the sequential
> > algorithm, it is always the last one).
>
> So I guess that PC_ITEM_COLLIDED will then be used to decide which
> entries will not be checked out in parallel?
Yes, in the next patch, the parallel workers will detect collisions
when `open(path, O_CREAT | O_EXCL)` fails with EEXIST or EISDIR. The
workers then mark such items with PC_ITEM_COLLIDED and let the main
process sequentially write them later.
> > I also experimented with the idea of not overwriting colliding entries,
> > and it seemed to work well in my simple tests.
>
> There are a number of co-author of this patch, so it's not very clear
> who "I" is. Maybe:
>
> "The idea of not overwriting colliding entries seemed to work well in
> simple tests, however ..."
Makes sense, thanks.
> > However, because just one
> > entry of each colliding group would be actually written, the others
> > would have null lstat() fields on the index. This might not be a problem
> > by itself, but it could cause performance penalties for subsequent
> > commands that need to refresh the index: when the st_size value cached
> > is 0, read-cache.c:ie_modified() will go to the filesystem to see if the
> > contents match. As mentioned in the function:
> >
> > * Immediately after read-tree or update-index --cacheinfo,
> > * the length field is zero, as we have never even read the
> > * lstat(2) information once, and we cannot trust DATA_CHANGED
> > * returned by ie_match_stat() which in turn was returned by
> > * ce_match_stat_basic() to signal that the filesize of the
> > * blob changed. We have to actually go to the filesystem to
> > * see if the contents match, and if so, should answer "unchanged".
> >
> > So, if we have N entries in a colliding group and we decide to write and
> > lstat() only one of them, every subsequent git-status will have to read,
> > convert, and hash the written file N - 1 times, to check that the N - 1
> > unwritten entries are dirty. By checking out all colliding entries (like
> > the sequential code does), we only pay the overhead once.
>
> Ok.
>
> > 5 files changed, 410 insertions(+), 3 deletions(-)
>
> It looks like a lot of new code in one patch/commit, which is why it
> might be interesting to split it.
Yeah, this and the following patch ended up quite big... But I wasn't
sure how to further split them while still keeping each part buildable
and self-contained :(
> > @@ -7,6 +7,7 @@
> > #include "progress.h"
> > #include "fsmonitor.h"
> > #include "entry.h"
> > +#include "parallel-checkout.h"
> >
> > static void create_directories(const char *path, int path_len,
> > const struct checkout *state)
> > @@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
> > for (i = 0; i < state->istate->cache_nr; i++) {
> > struct cache_entry *dup = state->istate->cache[i];
> >
> > - if (dup == ce)
> > - break;
> > + if (dup == ce) {
> > + /*
> > + * Parallel checkout creates the files in no particular
> > + * order. So the other side of the collision may appear
> > + * after the given cache_entry in the array.
> > + */
>
> Is it really the case right now that the code creates files in no
> particular order or will that be the case later when actual
> parallelism is implemented?
In this patch, the code already creates files in no particular order.
Since not all entries are eligible for parallel checkout, and because
ineligible entries are written first, the files are not created in the
same order that they appear in istate->cache[]. (Even though
everything is still written sequentially in this patch).
> > + if (parallel_checkout_status() == PC_RUNNING)
> > + continue;
> > + else
> > + break;
> > + }
>
> > +struct parallel_checkout_item {
> > + /* pointer to a istate->cache[] entry. Not owned by us. */
> > + struct cache_entry *ce;
> > + struct conv_attrs ca;
> > + struct stat st;
> > + enum pc_item_status status;
> > +};
>
> "item" seems not very clear to me. If there is only one
> parallel_checkout_item for each cache_entry then it might be better to
> use "parallel_checkout_entry" instead of "parallel_checkout_item".
Hmm, I'm a little uncertain about this one. I usually use "item" and
"entry" interchangeably when talking about elements on a list, as in
this case. Could perhaps the 'struct parallel_checkout_item'
definition be unclear because it's far from the 'struct
parallel_checkout', where the list is actually defined?
> > +enum pc_status {
> > + PC_UNINITIALIZED = 0,
> > + PC_ACCEPTING_ENTRIES,
> > + PC_RUNNING,
> > +};
> > +
> > +enum pc_status parallel_checkout_status(void);
> > +void init_parallel_checkout(void);
>
> Maybe a comment to tell what the above function does could be helpful.
> If I had to guess, I would write something like:
>
> /*
> * Put parallel checkout into the PC_ACCEPTING_ENTRIES state.
> * Should be used only when in the PC_UNINITIALIZED state.
> */
OK, will do, thanks!
> > +/*
> > + * Return -1 if parallel checkout is currently not enabled
>
> Is it "enabled" or "initialized" or "configured" here? Does it refer
> to `enum pc_status` or a config option or something else? Looking at
> the code, it is testing if the status PC_ACCEPTING_ENTRIES, so
> perhaps: s/not enabled/not accepting entries/
Yes, that's better, thanks!
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 11/19] parallel-checkout: make it truly parallel
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (9 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-16 22:31 ` Emily Shaffer
2020-11-04 20:33 ` [PATCH v4 12/19] parallel-checkout: support progress displaying Matheus Tavares
` (8 subsequent siblings)
19 siblings, 1 reply; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Use multiple worker processes to distribute the queued entries and call
write_checkout_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could
affect performance due to lock contention in the kernel. Work stealing
(or any other format of re-distribution) is not implemented yet.
The parallel version was benchmarked during three operations in the
linux repo, with cold cache: cloning v5.8, checking out v5.8 from
v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The
four tables below show the mean run times and standard deviations for
5 runs in: a local file system with SSD, a local file system with HDD, a
Linux NFS server, and Amazon EFS. The numbers of workers were chosen
based on what produces the best result for each case.
Local SSD:
Clone Checkout I Checkout II
Sequential 8.171 s ± 0.206 s 8.735 s ± 0.230 s 4.166 s ± 0.246 s
10 workers 3.277 s ± 0.138 s 3.774 s ± 0.188 s 2.561 s ± 0.120 s
Speedup 2.49 ± 0.12 2.31 ± 0.13 1.63 ± 0.12
Local HDD:
Clone Checkout I Checkout II
Sequential 35.157 s ± 0.205 s 48.835 s ± 0.407 s 47.302 s ± 1.435 s
8 workers 35.538 s ± 0.325 s 49.353 s ± 0.826 s 48.919 s ± 0.416 s
Speedup 0.99 ± 0.01 0.99 ± 0.02 0.97 ± 0.03
Linux NFS server (v4.1, on EBS, single availability zone):
Clone Checkout I Checkout II
Sequential 216.070 s ± 3.611 s 211.169 s ± 3.147 s 57.446 s ± 1.301 s
32 workers 67.997 s ± 0.740 s 66.563 s ± 0.457 s 23.708 s ± 0.622 s
Speedup 3.18 ± 0.06 3.17 ± 0.05 2.42 ± 0.08
EFS (v4.1, replicated over multiple availability zones):
Clone Checkout I Checkout II
Sequential 1249.329 s ± 13.857 s 1438.979 s ± 78.792 s 543.919 s ± 18.745 s
64 workers 225.864 s ± 12.433 s 316.345 s ± 1.887 s 183.648 s ± 10.095 s
Speedup 5.53 ± 0.31 4.55 ± 0.25 2.96 ± 0.19
The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. In fact, it can
even increase the run time. For this reason, the sequential code is
still the default. Two settings are added to optionally enable and
configure the new parallel version as desired.
Local SSD tests were executed in an i7-7700HQ (4 cores with
hyper-threading) running Manjaro Linux. Local HDD tests were executed in
an i7-2600 (also 4 cores with hyper-threading), HDD Seagate Barracuda
7200 rpm SATA 3.0, running Debian 9.13. NFS and EFS tests were
executed in an Amazon EC2 c5n.large instance, with 2 vCPUs. The Linux
NFS server was running on a m6g.large instance with 1 TB, EBS GP2
volume. Before each timing, the linux repository was removed (or checked
out back), and `sync && sysctl vm.drop_caches=3` was executed.
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
.gitignore | 1 +
Documentation/config/checkout.txt | 21 +++
Makefile | 1 +
builtin.h | 1 +
builtin/checkout--helper.c | 142 +++++++++++++++
git.c | 2 +
parallel-checkout.c | 280 +++++++++++++++++++++++++++---
parallel-checkout.h | 84 ++++++++-
unpack-trees.c | 10 +-
9 files changed, 508 insertions(+), 34 deletions(-)
create mode 100644 builtin/checkout--helper.c
diff --git a/.gitignore b/.gitignore
index 6232d33924..1a341ea184 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,7 @@
/git-check-mailmap
/git-check-ref-format
/git-checkout
+/git-checkout--helper
/git-checkout-index
/git-cherry
/git-cherry-pick
diff --git a/Documentation/config/checkout.txt b/Documentation/config/checkout.txt
index 6b646813ab..23e8f7cde0 100644
--- a/Documentation/config/checkout.txt
+++ b/Documentation/config/checkout.txt
@@ -16,3 +16,24 @@ will checkout the '<something>' branch on another remote,
and by linkgit:git-worktree[1] when 'git worktree add' refers to a
remote branch. This setting might be used for other checkout-like
commands or functionality in the future.
+
+checkout.workers::
+ The number of parallel workers to use when updating the working tree.
+ The default is one, i.e. sequential execution. If set to a value less
+ than one, Git will use as many workers as the number of logical cores
+ available. This setting and `checkout.thresholdForParallelism` affect
+ all commands that perform checkout. E.g. checkout, clone, reset,
+ sparse-checkout, etc.
++
+Note: parallel checkout usually delivers better performance for repositories
+located on SSDs or over NFS. For repositories on spinning disks and/or machines
+with a small number of cores, the default sequential checkout often performs
+better. The size and compression level of a repository might also influence how
+well the parallel version performs.
+
+checkout.thresholdForParallelism::
+ When running parallel checkout with a small number of files, the cost
+ of subprocess spawning and inter-process communication might outweigh
+ the parallelization gains. This setting allows to define the minimum
+ number of files for which parallel checkout should be attempted. The
+ default is 100.
diff --git a/Makefile b/Makefile
index 10ee5e709b..535e6e94aa 100644
--- a/Makefile
+++ b/Makefile
@@ -1063,6 +1063,7 @@ BUILTIN_OBJS += builtin/check-attr.o
BUILTIN_OBJS += builtin/check-ignore.o
BUILTIN_OBJS += builtin/check-mailmap.o
BUILTIN_OBJS += builtin/check-ref-format.o
+BUILTIN_OBJS += builtin/checkout--helper.o
BUILTIN_OBJS += builtin/checkout-index.o
BUILTIN_OBJS += builtin/checkout.o
BUILTIN_OBJS += builtin/clean.o
diff --git a/builtin.h b/builtin.h
index 53fb290963..2abbe14b0b 100644
--- a/builtin.h
+++ b/builtin.h
@@ -123,6 +123,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix);
int cmd_bundle(int argc, const char **argv, const char *prefix);
int cmd_cat_file(int argc, const char **argv, const char *prefix);
int cmd_checkout(int argc, const char **argv, const char *prefix);
+int cmd_checkout__helper(int argc, const char **argv, const char *prefix);
int cmd_checkout_index(int argc, const char **argv, const char *prefix);
int cmd_check_attr(int argc, const char **argv, const char *prefix);
int cmd_check_ignore(int argc, const char **argv, const char *prefix);
diff --git a/builtin/checkout--helper.c b/builtin/checkout--helper.c
new file mode 100644
index 0000000000..a61ed76f0d
--- /dev/null
+++ b/builtin/checkout--helper.c
@@ -0,0 +1,142 @@
+#include "builtin.h"
+#include "config.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static void packet_to_pc_item(char *line, int len,
+ struct parallel_checkout_item *pc_item)
+{
+ struct pc_item_fixed_portion *fixed_portion;
+ char *encoding, *variant;
+
+ if (len < sizeof(struct pc_item_fixed_portion))
+ BUG("checkout worker received too short item (got %dB, exp %dB)",
+ len, (int)sizeof(struct pc_item_fixed_portion));
+
+ fixed_portion = (struct pc_item_fixed_portion *)line;
+
+ if (len - sizeof(struct pc_item_fixed_portion) !=
+ fixed_portion->name_len + fixed_portion->working_tree_encoding_len)
+ BUG("checkout worker received corrupted item");
+
+ variant = line + sizeof(struct pc_item_fixed_portion);
+
+ /*
+ * Note: the main process uses zero length to communicate that the
+ * encoding is NULL. There is no use case in actually sending an empty
+ * string since it's considered as NULL when ca.working_tree_encoding
+ * is set at git_path_check_encoding().
+ */
+ if (fixed_portion->working_tree_encoding_len) {
+ encoding = xmemdupz(variant,
+ fixed_portion->working_tree_encoding_len);
+ variant += fixed_portion->working_tree_encoding_len;
+ } else {
+ encoding = NULL;
+ }
+
+ memset(pc_item, 0, sizeof(*pc_item));
+ pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+ pc_item->ce->ce_namelen = fixed_portion->name_len;
+ pc_item->ce->ce_mode = fixed_portion->ce_mode;
+ memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
+ oidcpy(&pc_item->ce->oid, &fixed_portion->oid);
+
+ pc_item->id = fixed_portion->id;
+ pc_item->ca.crlf_action = fixed_portion->crlf_action;
+ pc_item->ca.ident = fixed_portion->ident;
+ pc_item->ca.working_tree_encoding = encoding;
+}
+
+static void report_result(struct parallel_checkout_item *pc_item)
+{
+ struct pc_item_result res = { 0 };
+ size_t size;
+
+ res.id = pc_item->id;
+ res.status = pc_item->status;
+
+ if (pc_item->status == PC_ITEM_WRITTEN) {
+ res.st = pc_item->st;
+ size = sizeof(res);
+ } else {
+ size = PC_ITEM_RESULT_BASE_SIZE;
+ }
+
+ packet_write(1, (const char *)&res, size);
+}
+
+/* Free the worker-side malloced data, but not pc_item itself. */
+static void release_pc_item_data(struct parallel_checkout_item *pc_item)
+{
+ free((char *)pc_item->ca.working_tree_encoding);
+ discard_cache_entry(pc_item->ce);
+}
+
+static void worker_loop(struct checkout *state)
+{
+ struct parallel_checkout_item *items = NULL;
+ size_t i, nr = 0, alloc = 0;
+
+ while (1) {
+ int len;
+ char *line = packet_read_line(0, &len);
+
+ if (!line)
+ break;
+
+ ALLOC_GROW(items, nr + 1, alloc);
+ packet_to_pc_item(line, len, &items[nr++]);
+ }
+
+ for (i = 0; i < nr; i++) {
+ struct parallel_checkout_item *pc_item = &items[i];
+ write_pc_item(pc_item, state);
+ report_result(pc_item);
+ release_pc_item_data(pc_item);
+ }
+
+ packet_flush(1);
+
+ free(items);
+}
+
+static const char * const checkout_helper_usage[] = {
+ N_("git checkout--helper [<options>]"),
+ NULL
+};
+
+int cmd_checkout__helper(int argc, const char **argv, const char *prefix)
+{
+ struct checkout state = CHECKOUT_INIT;
+ struct option checkout_helper_options[] = {
+ OPT_STRING(0, "prefix", &state.base_dir, N_("string"),
+ N_("when creating files, prepend <string>")),
+ OPT_END()
+ };
+
+ if (argc == 2 && !strcmp(argv[1], "-h"))
+ usage_with_options(checkout_helper_usage,
+ checkout_helper_options);
+
+ git_config(git_default_config, NULL);
+ argc = parse_options(argc, argv, prefix, checkout_helper_options,
+ checkout_helper_usage, 0);
+ if (argc > 0)
+ usage_with_options(checkout_helper_usage, checkout_helper_options);
+
+ if (state.base_dir)
+ state.base_dir_len = strlen(state.base_dir);
+
+ /*
+ * Setting this on worker won't actually update the index. We just need
+ * to pretend so to induce the checkout machinery to stat() the written
+ * entries.
+ */
+ state.refresh_cache = 1;
+
+ worker_loop(&state);
+ return 0;
+}
diff --git a/git.c b/git.c
index 4bdcdad2cc..384f144593 100644
--- a/git.c
+++ b/git.c
@@ -487,6 +487,8 @@ static struct cmd_struct commands[] = {
{ "check-mailmap", cmd_check_mailmap, RUN_SETUP },
{ "check-ref-format", cmd_check_ref_format, NO_PARSEOPT },
{ "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
+ { "checkout--helper", cmd_checkout__helper,
+ RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX },
{ "checkout-index", cmd_checkout_index,
RUN_SETUP | NEED_WORK_TREE},
{ "cherry", cmd_cherry, RUN_SETUP },
diff --git a/parallel-checkout.c b/parallel-checkout.c
index fd871b09d3..2d77998f46 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,28 +1,15 @@
#include "cache.h"
#include "entry.h"
#include "parallel-checkout.h"
+#include "pkt-line.h"
+#include "run-command.h"
#include "streaming.h"
+#include "thread-utils.h"
+#include "config.h"
-enum pc_item_status {
- PC_ITEM_PENDING = 0,
- PC_ITEM_WRITTEN,
- /*
- * The entry could not be written because there was another file
- * already present in its path or leading directories. Since
- * checkout_entry_ca() removes such files from the working tree before
- * enqueueing the entry for parallel checkout, it means that there was
- * a path collision among the entries being written.
- */
- PC_ITEM_COLLIDED,
- PC_ITEM_FAILED,
-};
-
-struct parallel_checkout_item {
- /* pointer to a istate->cache[] entry. Not owned by us. */
- struct cache_entry *ce;
- struct conv_attrs ca;
- struct stat st;
- enum pc_item_status status;
+struct pc_worker {
+ struct child_process cp;
+ size_t next_to_complete, nr_to_complete;
};
struct parallel_checkout {
@@ -38,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
return parallel_checkout.status;
}
+#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100
+
+void get_parallel_checkout_configs(int *num_workers, int *threshold)
+{
+ if (git_config_get_int("checkout.workers", num_workers))
+ *num_workers = 1;
+ else if (*num_workers < 1)
+ *num_workers = online_cpus();
+
+ if (git_config_get_int("checkout.thresholdForParallelism", threshold))
+ *threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
+}
+
void init_parallel_checkout(void)
{
if (parallel_checkout.status != PC_UNINITIALIZED)
@@ -115,10 +115,12 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
parallel_checkout.alloc);
- pc_item = ¶llel_checkout.items[parallel_checkout.nr++];
+ pc_item = ¶llel_checkout.items[parallel_checkout.nr];
pc_item->ce = ce;
memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
pc_item->status = PC_ITEM_PENDING;
+ pc_item->id = parallel_checkout.nr;
+ parallel_checkout.nr++;
return 0;
}
@@ -231,7 +233,8 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
/*
* checkout metadata is used to give context for external process
* filters. Files requiring such filters are not eligible for parallel
- * checkout, so pass NULL.
+ * checkout, so pass NULL. Note: if that changes, the metadata must also
+ * be passed from the main process to the workers.
*/
ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
new_blob, size, &buf, NULL);
@@ -262,8 +265,8 @@ static int close_and_clear(int *fd)
return ret;
}
-static void write_pc_item(struct parallel_checkout_item *pc_item,
- struct checkout *state)
+void write_pc_item(struct parallel_checkout_item *pc_item,
+ struct checkout *state)
{
unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
int fd = -1, fstat_done = 0;
@@ -337,6 +340,221 @@ static void write_pc_item(struct parallel_checkout_item *pc_item,
strbuf_release(&path);
}
+static void send_one_item(int fd, struct parallel_checkout_item *pc_item)
+{
+ size_t len_data;
+ char *data, *variant;
+ struct pc_item_fixed_portion *fixed_portion;
+ const char *working_tree_encoding = pc_item->ca.working_tree_encoding;
+ size_t name_len = pc_item->ce->ce_namelen;
+ size_t working_tree_encoding_len = working_tree_encoding ?
+ strlen(working_tree_encoding) : 0;
+
+ len_data = sizeof(struct pc_item_fixed_portion) + name_len +
+ working_tree_encoding_len;
+
+ data = xcalloc(1, len_data);
+
+ fixed_portion = (struct pc_item_fixed_portion *)data;
+ fixed_portion->id = pc_item->id;
+ fixed_portion->ce_mode = pc_item->ce->ce_mode;
+ fixed_portion->crlf_action = pc_item->ca.crlf_action;
+ fixed_portion->ident = pc_item->ca.ident;
+ fixed_portion->name_len = name_len;
+ fixed_portion->working_tree_encoding_len = working_tree_encoding_len;
+ /*
+ * We use hashcpy() instead of oidcpy() because the hash[] positions
+ * after `the_hash_algo->rawsz` might not be initialized. And Valgrind
+ * would complain about passing uninitialized bytes to a syscall
+ * (write(2)). There is no real harm in this case, but the warning could
+ * hinder the detection of actual errors.
+ */
+ hashcpy(fixed_portion->oid.hash, pc_item->ce->oid.hash);
+
+ variant = data + sizeof(*fixed_portion);
+ if (working_tree_encoding_len) {
+ memcpy(variant, working_tree_encoding, working_tree_encoding_len);
+ variant += working_tree_encoding_len;
+ }
+ memcpy(variant, pc_item->ce->name, name_len);
+
+ packet_write(fd, data, len_data);
+
+ free(data);
+}
+
+static void send_batch(int fd, size_t start, size_t nr)
+{
+ size_t i;
+ for (i = 0; i < nr; i++)
+ send_one_item(fd, ¶llel_checkout.items[start + i]);
+ packet_flush(fd);
+}
+
+static struct pc_worker *setup_workers(struct checkout *state, int num_workers)
+{
+ struct pc_worker *workers;
+ int i, workers_with_one_extra_item;
+ size_t base_batch_size, next_to_assign = 0;
+
+ ALLOC_ARRAY(workers, num_workers);
+
+ for (i = 0; i < num_workers; i++) {
+ struct child_process *cp = &workers[i].cp;
+
+ child_process_init(cp);
+ cp->git_cmd = 1;
+ cp->in = -1;
+ cp->out = -1;
+ cp->clean_on_exit = 1;
+ strvec_push(&cp->args, "checkout--helper");
+ if (state->base_dir_len)
+ strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
+ if (start_command(cp))
+ die(_("failed to spawn checkout worker"));
+ }
+
+ base_batch_size = parallel_checkout.nr / num_workers;
+ workers_with_one_extra_item = parallel_checkout.nr % num_workers;
+
+ for (i = 0; i < num_workers; i++) {
+ struct pc_worker *worker = &workers[i];
+ size_t batch_size = base_batch_size;
+
+ /* distribute the extra work evenly */
+ if (i < workers_with_one_extra_item)
+ batch_size++;
+
+ send_batch(worker->cp.in, next_to_assign, batch_size);
+ worker->next_to_complete = next_to_assign;
+ worker->nr_to_complete = batch_size;
+
+ next_to_assign += batch_size;
+ }
+
+ return workers;
+}
+
+static void finish_workers(struct pc_worker *workers, int num_workers)
+{
+ int i;
+
+ /*
+ * Close pipes before calling finish_command() to let the workers
+ * exit asynchronously and avoid spending extra time on wait().
+ */
+ for (i = 0; i < num_workers; i++) {
+ struct child_process *cp = &workers[i].cp;
+ if (cp->in >= 0)
+ close(cp->in);
+ if (cp->out >= 0)
+ close(cp->out);
+ }
+
+ for (i = 0; i < num_workers; i++) {
+ if (finish_command(&workers[i].cp))
+ error(_("checkout worker %d finished with error"), i);
+ }
+
+ free(workers);
+}
+
+#define ASSERT_PC_ITEM_RESULT_SIZE(got, exp) \
+{ \
+ if (got != exp) \
+ BUG("corrupted result from checkout worker (got %dB, exp %dB)", \
+ got, exp); \
+} while(0)
+
+static void parse_and_save_result(const char *line, int len,
+ struct pc_worker *worker)
+{
+ struct pc_item_result *res;
+ struct parallel_checkout_item *pc_item;
+ struct stat *st = NULL;
+
+ if (len < PC_ITEM_RESULT_BASE_SIZE)
+ BUG("too short result from checkout worker (got %dB, exp %dB)",
+ len, (int)PC_ITEM_RESULT_BASE_SIZE);
+
+ res = (struct pc_item_result *)line;
+
+ /*
+ * Worker should send either the full result struct on success, or
+ * just the base (i.e. no stat data), otherwise.
+ */
+ if (res->status == PC_ITEM_WRITTEN) {
+ ASSERT_PC_ITEM_RESULT_SIZE(len, (int)sizeof(struct pc_item_result));
+ st = &res->st;
+ } else {
+ ASSERT_PC_ITEM_RESULT_SIZE(len, (int)PC_ITEM_RESULT_BASE_SIZE);
+ }
+
+ if (!worker->nr_to_complete || res->id != worker->next_to_complete)
+ BUG("checkout worker sent unexpected item id");
+
+ worker->next_to_complete++;
+ worker->nr_to_complete--;
+
+ pc_item = ¶llel_checkout.items[res->id];
+ pc_item->status = res->status;
+ if (st)
+ pc_item->st = *st;
+}
+
+
+static void gather_results_from_workers(struct pc_worker *workers,
+ int num_workers)
+{
+ int i, active_workers = num_workers;
+ struct pollfd *pfds;
+
+ CALLOC_ARRAY(pfds, num_workers);
+ for (i = 0; i < num_workers; i++) {
+ pfds[i].fd = workers[i].cp.out;
+ pfds[i].events = POLLIN;
+ }
+
+ while (active_workers) {
+ int nr = poll(pfds, num_workers, -1);
+
+ if (nr < 0) {
+ if (errno == EINTR)
+ continue;
+ die_errno("failed to poll checkout workers");
+ }
+
+ for (i = 0; i < num_workers && nr > 0; i++) {
+ struct pc_worker *worker = &workers[i];
+ struct pollfd *pfd = &pfds[i];
+
+ if (!pfd->revents)
+ continue;
+
+ if (pfd->revents & POLLIN) {
+ int len;
+ const char *line = packet_read_line(pfd->fd, &len);
+
+ if (!line) {
+ pfd->fd = -1;
+ active_workers--;
+ } else {
+ parse_and_save_result(line, len, worker);
+ }
+ } else if (pfd->revents & POLLHUP) {
+ pfd->fd = -1;
+ active_workers--;
+ } else if (pfd->revents & (POLLNVAL | POLLERR)) {
+ die(_("error polling from checkout worker"));
+ }
+
+ nr--;
+ }
+ }
+
+ free(pfds);
+}
+
static void write_items_sequentially(struct checkout *state)
{
size_t i;
@@ -345,7 +563,7 @@ static void write_items_sequentially(struct checkout *state)
write_pc_item(¶llel_checkout.items[i], state);
}
-int run_parallel_checkout(struct checkout *state)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
{
int ret;
@@ -354,7 +572,17 @@ int run_parallel_checkout(struct checkout *state)
parallel_checkout.status = PC_RUNNING;
- write_items_sequentially(state);
+ if (parallel_checkout.nr < num_workers)
+ num_workers = parallel_checkout.nr;
+
+ if (num_workers <= 1 || parallel_checkout.nr < threshold) {
+ write_items_sequentially(state);
+ } else {
+ struct pc_worker *workers = setup_workers(state, num_workers);
+ gather_results_from_workers(workers, num_workers);
+ finish_workers(workers, num_workers);
+ }
+
ret = handle_results(state);
finish_parallel_checkout();
diff --git a/parallel-checkout.h b/parallel-checkout.h
index e6d6fc01ea..54314ccdc5 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -1,9 +1,12 @@
#ifndef PARALLEL_CHECKOUT_H
#define PARALLEL_CHECKOUT_H
-struct cache_entry;
-struct checkout;
-struct conv_attrs;
+#include "entry.h"
+#include "convert.h"
+
+/****************************************************************
+ * Users of parallel checkout
+ ****************************************************************/
enum pc_status {
PC_UNINITIALIZED = 0,
@@ -12,6 +15,7 @@ enum pc_status {
};
enum pc_status parallel_checkout_status(void);
+void get_parallel_checkout_configs(int *num_workers, int *threshold);
void init_parallel_checkout(void);
/*
@@ -21,7 +25,77 @@ void init_parallel_checkout(void);
*/
int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
-/* Write all the queued entries, returning 0 on success.*/
-int run_parallel_checkout(struct checkout *state);
+/*
+ * Write all the queued entries, returning 0 on success. If the number of
+ * entries is smaller than the specified threshold, the operation is performed
+ * sequentially.
+ */
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+
+/****************************************************************
+ * Interface with checkout--helper
+ ****************************************************************/
+
+enum pc_item_status {
+ PC_ITEM_PENDING = 0,
+ PC_ITEM_WRITTEN,
+ /*
+ * The entry could not be written because there was another file
+ * already present in its path or leading directories. Since
+ * checkout_entry_ca() removes such files from the working tree before
+ * enqueueing the entry for parallel checkout, it means that there was
+ * a path collision among the entries being written.
+ */
+ PC_ITEM_COLLIDED,
+ PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+ /*
+ * In main process ce points to a istate->cache[] entry. Thus, it's not
+ * owned by us. In workers they own the memory, which *must be* released.
+ */
+ struct cache_entry *ce;
+ struct conv_attrs ca;
+ size_t id; /* position in parallel_checkout.items[] of main process */
+
+ /* Output fields, sent from workers. */
+ enum pc_item_status status;
+ struct stat st;
+};
+
+/*
+ * The fixed-size portion of `struct parallel_checkout_item` that is sent to the
+ * workers. Following this will be 2 strings: ca.working_tree_encoding and
+ * ce.name; These are NOT null terminated, since we have the size in the fixed
+ * portion.
+ *
+ * Note that not all fields of conv_attrs and cache_entry are passed, only the
+ * ones that will be required by the workers to smudge and write the entry.
+ */
+struct pc_item_fixed_portion {
+ size_t id;
+ struct object_id oid;
+ unsigned int ce_mode;
+ enum convert_crlf_action crlf_action;
+ int ident;
+ size_t working_tree_encoding_len;
+ size_t name_len;
+};
+
+/*
+ * The fields of `struct parallel_checkout_item` that are returned by the
+ * workers. Note: `st` must be the last one, as it is omitted on error.
+ */
+struct pc_item_result {
+ size_t id;
+ enum pc_item_status status;
+ struct stat st;
+};
+
+#define PC_ITEM_RESULT_BASE_SIZE offsetof(struct pc_item_result, st)
+
+void write_pc_item(struct parallel_checkout_item *pc_item,
+ struct checkout *state);
#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 1b1da7485a..117ed42370 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o,
int errs = 0;
struct progress *progress;
struct checkout state = CHECKOUT_INIT;
- int i;
+ int i, pc_workers, pc_threshold;
trace_performance_enter();
state.force = 1;
@@ -462,8 +462,11 @@ static int check_updates(struct unpack_trees_options *o,
oid_array_clear(&to_fetch);
}
+ get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
enable_delayed_checkout(&state);
- init_parallel_checkout();
+ if (pc_workers > 1)
+ init_parallel_checkout();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
@@ -477,7 +480,8 @@ static int check_updates(struct unpack_trees_options *o,
}
}
stop_progress(&progress);
- errs |= run_parallel_checkout(&state);
+ if (pc_workers > 1)
+ errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
errs |= finish_delayed_checkout(&state, NULL);
git_attr_set_direction(GIT_ATTR_CHECKIN);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v4 11/19] parallel-checkout: make it truly parallel
2020-11-04 20:33 ` [PATCH v4 11/19] parallel-checkout: make it truly parallel Matheus Tavares
@ 2020-12-16 22:31 ` Emily Shaffer
2020-12-17 15:00 ` Matheus Tavares Bernardino
0 siblings, 1 reply; 154+ messages in thread
From: Emily Shaffer @ 2020-12-16 22:31 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, gitster, git, chriscool, peff, newren, jrnieder, martin.agren
On Wed, Nov 04, 2020 at 05:33:10PM -0300, Matheus Tavares wrote:
>
> Use multiple worker processes to distribute the queued entries and call
> write_checkout_item() in parallel for them. The items are distributed
> uniformly in contiguous chunks. This minimizes the chances of two
> workers writing to the same directory simultaneously, which could
> affect performance due to lock contention in the kernel. Work stealing
> (or any other format of re-distribution) is not implemented yet.
>
> The parallel version was benchmarked during three operations in the
> linux repo, with cold cache: cloning v5.8, checking out v5.8 from
> v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The
> four tables below show the mean run times and standard deviations for
> 5 runs in: a local file system with SSD, a local file system with HDD, a
> Linux NFS server, and Amazon EFS. The numbers of workers were chosen
> based on what produces the best result for each case.
>
> Local SSD:
>
> Clone Checkout I Checkout II
> Sequential 8.171 s ± 0.206 s 8.735 s ± 0.230 s 4.166 s ± 0.246 s
> 10 workers 3.277 s ± 0.138 s 3.774 s ± 0.188 s 2.561 s ± 0.120 s
> Speedup 2.49 ± 0.12 2.31 ± 0.13 1.63 ± 0.12
>
> Local HDD:
>
> Clone Checkout I Checkout II
> Sequential 35.157 s ± 0.205 s 48.835 s ± 0.407 s 47.302 s ± 1.435 s
> 8 workers 35.538 s ± 0.325 s 49.353 s ± 0.826 s 48.919 s ± 0.416 s
> Speedup 0.99 ± 0.01 0.99 ± 0.02 0.97 ± 0.03
>
> Linux NFS server (v4.1, on EBS, single availability zone):
>
> Clone Checkout I Checkout II
> Sequential 216.070 s ± 3.611 s 211.169 s ± 3.147 s 57.446 s ± 1.301 s
> 32 workers 67.997 s ± 0.740 s 66.563 s ± 0.457 s 23.708 s ± 0.622 s
> Speedup 3.18 ± 0.06 3.17 ± 0.05 2.42 ± 0.08
>
> EFS (v4.1, replicated over multiple availability zones):
>
> Clone Checkout I Checkout II
> Sequential 1249.329 s ± 13.857 s 1438.979 s ± 78.792 s 543.919 s ± 18.745 s
> 64 workers 225.864 s ± 12.433 s 316.345 s ± 1.887 s 183.648 s ± 10.095 s
> Speedup 5.53 ± 0.31 4.55 ± 0.25 2.96 ± 0.19
>
> The above benchmarks show that parallel checkout is most effective on
> repositories located on an SSD or over a distributed file system. For
> local file systems on spinning disks, and/or older machines, the
> parallelism does not always bring a good performance. In fact, it can
> even increase the run time. For this reason, the sequential code is
> still the default. Two settings are added to optionally enable and
> configure the new parallel version as desired.
>
> Local SSD tests were executed in an i7-7700HQ (4 cores with
> hyper-threading) running Manjaro Linux. Local HDD tests were executed in
> an i7-2600 (also 4 cores with hyper-threading), HDD Seagate Barracuda
> 7200 rpm SATA 3.0, running Debian 9.13. NFS and EFS tests were
> executed in an Amazon EC2 c5n.large instance, with 2 vCPUs. The Linux
> NFS server was running on a m6g.large instance with 1 TB, EBS GP2
> volume. Before each timing, the linux repository was removed (or checked
> out back), and `sync && sysctl vm.drop_caches=3` was executed.
>
> Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Only having done a quick skim, is there a reason that you are doing the
workqueue handling from scratch rather than using
run-command.h:run_processes_parallel()? The implementation you use and
the implementation in run-command.c seem really similar to me.
- Emily
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v4 11/19] parallel-checkout: make it truly parallel
2020-12-16 22:31 ` Emily Shaffer
@ 2020-12-17 15:00 ` Matheus Tavares Bernardino
0 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares Bernardino @ 2020-12-17 15:00 UTC (permalink / raw)
To: Emily Shaffer
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
Hi, Emily
On Wed, Dec 16, 2020 at 7:31 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Wed, Nov 04, 2020 at 05:33:10PM -0300, Matheus Tavares wrote:
> >
> > Use multiple worker processes to distribute the queued entries and call
> > write_checkout_item() in parallel for them. The items are distributed
> > uniformly in contiguous chunks. This minimizes the chances of two
> > workers writing to the same directory simultaneously, which could
> > affect performance due to lock contention in the kernel. Work stealing
> > (or any other format of re-distribution) is not implemented yet.
> >
>
> Only having done a quick skim, is there a reason that you are doing the
> workqueue handling from scratch rather than using
> run-command.h:run_processes_parallel()? The implementation you use and
> the implementation in run-command.c seem really similar to me.
TBH, I wasn't aware of run_process_parallel(). Thanks for bringing it
to my attention! :)
Hmm, I still have to look further into it, but I think parallel
checkout wouldn't be compatible with the run_process_parallel()
framework, in its current form.
The first obstacle would be that, IIUC, the framework only allows
callers to pass information to the child processes through argv,
right? I saw that you already have a patch lifting this limitation
[1], but the feed_pipe callback requires a strbuf and parallel
checkout communicates through pkt-line. This is important because,
otherwise, the workers wouldn't know where each parallel_checkout_item
finishes. (The size of each item is variable and, since we send binary
data, it might contain anything, even bytes normally used as delimiter
such as '\0' or '\n'.)
Another difficulty is that the framework combines the child processes'
stdout and stderr, dumping it to the foreground process' stderr.
Parallel checkout expects workers to print errors to stderr, but it
needs the stdout of each worker to get the results back. This is used
both to get stat() data from the workers and to advance the checkout
progress bar. I see that you've also sent a patch adding more
flexibility in this area [2], but I'm not sure if parallel checkout
could use it without pkt-lines and separating stdout from stderr
(although, of course, the latter could be implemented in the framework
as an optional feature).
Also, we might want to later improve the main-process<=>worker
protocol by adding work stealing. This might help in the workload
balancing while still minimizing the chances of having multiple
workers writing to the same part of the working tree (as would happen
with round-robin, for example). I already have a rough patch for this
[3], but it needs to be timed and tested further. With work stealing,
the protocol becomes a little more complex, so it might not be
suitable for the callback-style interface of run_process_parallel().
I'm not really sure... I guess parallel checkout could use
run_process_parallel(), but it seems to me that it would require a
good amount of work on the framework and parallel checkout itself. I
don't know if it would be worth it.
[1]: https://lore.kernel.org/git/20201205014607.1464119-15-emilyshaffer@google.com/
[2]: https://lore.kernel.org/git/20201205014607.1464119-17-emilyshaffer@google.com/
[3]: https://github.com/matheustavares/git/commit/a483df570a3cdd1165354388ea3c9fcc0ec66aaf
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v4 12/19] parallel-checkout: support progress displaying
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (10 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 11/19] parallel-checkout: make it truly parallel Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
` (7 subsequent siblings)
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
parallel-checkout.c | 34 +++++++++++++++++++++++++++++++---
parallel-checkout.h | 4 +++-
unpack-trees.c | 11 ++++++++---
3 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 2d77998f46..72ac93d541 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -2,6 +2,7 @@
#include "entry.h"
#include "parallel-checkout.h"
#include "pkt-line.h"
+#include "progress.h"
#include "run-command.h"
#include "streaming.h"
#include "thread-utils.h"
@@ -16,6 +17,8 @@ struct parallel_checkout {
enum pc_status status;
struct parallel_checkout_item *items;
size_t nr, alloc;
+ struct progress *progress;
+ unsigned int *progress_cnt;
};
static struct parallel_checkout parallel_checkout;
@@ -125,6 +128,20 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
return 0;
}
+size_t pc_queue_size(void)
+{
+ return parallel_checkout.nr;
+}
+
+static void advance_progress_meter(void)
+{
+ if (parallel_checkout.progress) {
+ (*parallel_checkout.progress_cnt)++;
+ display_progress(parallel_checkout.progress,
+ *parallel_checkout.progress_cnt);
+ }
+}
+
static int handle_results(struct checkout *state)
{
int ret = 0;
@@ -173,6 +190,7 @@ static int handle_results(struct checkout *state)
*/
ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
state, NULL, NULL);
+ advance_progress_meter();
break;
case PC_ITEM_PENDING:
have_pending = 1;
@@ -500,6 +518,9 @@ static void parse_and_save_result(const char *line, int len,
pc_item->status = res->status;
if (st)
pc_item->st = *st;
+
+ if (res->status != PC_ITEM_COLLIDED)
+ advance_progress_meter();
}
@@ -559,11 +580,16 @@ static void write_items_sequentially(struct checkout *state)
{
size_t i;
- for (i = 0; i < parallel_checkout.nr; i++)
- write_pc_item(¶llel_checkout.items[i], state);
+ for (i = 0; i < parallel_checkout.nr; i++) {
+ struct parallel_checkout_item *pc_item = ¶llel_checkout.items[i];
+ write_pc_item(pc_item, state);
+ if (pc_item->status != PC_ITEM_COLLIDED)
+ advance_progress_meter();
+ }
}
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+ struct progress *progress, unsigned int *progress_cnt)
{
int ret;
@@ -571,6 +597,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold
BUG("cannot run parallel checkout: uninitialized or already running");
parallel_checkout.status = PC_RUNNING;
+ parallel_checkout.progress = progress;
+ parallel_checkout.progress_cnt = progress_cnt;
if (parallel_checkout.nr < num_workers)
num_workers = parallel_checkout.nr;
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 54314ccdc5..8377b179d5 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -24,13 +24,15 @@ void init_parallel_checkout(void);
* write and return 0.
*/
int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+size_t pc_queue_size(void);
/*
* Write all the queued entries, returning 0 on success. If the number of
* entries is smaller than the specified threshold, the operation is performed
* sequentially.
*/
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+ struct progress *progress, unsigned int *progress_cnt);
/****************************************************************
* Interface with checkout--helper
diff --git a/unpack-trees.c b/unpack-trees.c
index 117ed42370..e05e6ceff2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -471,17 +471,22 @@ static int check_updates(struct unpack_trees_options *o,
struct cache_entry *ce = index->cache[i];
if (ce->ce_flags & CE_UPDATE) {
+ size_t last_pc_queue_size = pc_queue_size();
+
if (ce->ce_flags & CE_WT_REMOVE)
BUG("both update and delete flags are set on %s",
ce->name);
- display_progress(progress, ++cnt);
ce->ce_flags &= ~CE_UPDATE;
errs |= checkout_entry(ce, &state, NULL, NULL);
+
+ if (last_pc_queue_size == pc_queue_size())
+ display_progress(progress, ++cnt);
}
}
- stop_progress(&progress);
if (pc_workers > 1)
- errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
+ errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+ progress, &cnt);
+ stop_progress(&progress);
errs |= finish_delayed_checkout(&state, NULL);
git_attr_set_direction(GIT_ATTR_CHECKIN);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 13/19] make_transient_cache_entry(): optionally alloc from mem_pool
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (11 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 12/19] parallel-checkout: support progress displaying Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
` (6 subsequent siblings)
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/checkout--helper.c | 2 +-
builtin/checkout.c | 2 +-
builtin/difftool.c | 2 +-
cache.h | 10 +++++-----
read-cache.c | 12 ++++++++----
unpack-trees.c | 2 +-
6 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/builtin/checkout--helper.c b/builtin/checkout--helper.c
index a61ed76f0d..5d6f3e71d0 100644
--- a/builtin/checkout--helper.c
+++ b/builtin/checkout--helper.c
@@ -38,7 +38,7 @@ static void packet_to_pc_item(char *line, int len,
}
memset(pc_item, 0, sizeof(*pc_item));
- pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+ pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len, NULL);
pc_item->ce->ce_namelen = fixed_portion->name_len;
pc_item->ce->ce_mode = fixed_portion->ce_mode;
memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index b18b9d6f3c..c0bf5e6711 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -291,7 +291,7 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
- ce = make_transient_cache_entry(mode, &oid, path, 2);
+ ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
if (!ce)
die(_("make_cache_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL, nr_checkouts);
diff --git a/builtin/difftool.c b/builtin/difftool.c
index dfa22b67eb..5e7a57c8c2 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -323,7 +323,7 @@ static int checkout_path(unsigned mode, struct object_id *oid,
struct cache_entry *ce;
int ret;
- ce = make_transient_cache_entry(mode, oid, path, 0);
+ ce = make_transient_cache_entry(mode, oid, path, 0, NULL);
ret = checkout_entry(ce, state, NULL, NULL);
discard_cache_entry(ce);
diff --git a/cache.h b/cache.h
index ccfeb9ba2b..b5074b2cb2 100644
--- a/cache.h
+++ b/cache.h
@@ -355,16 +355,16 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate,
size_t name_len);
/*
- * Create a cache_entry that is not intended to be added to an index.
- * Caller is responsible for discarding the cache_entry
- * with `discard_cache_entry`.
+ * Create a cache_entry that is not intended to be added to an index. If mp is
+ * not NULL, the entry is allocated within the given memory pool. Caller is
+ * responsible for discarding the cache_entry with `discard_cache_entry`.
*/
struct cache_entry *make_transient_cache_entry(unsigned int mode,
const struct object_id *oid,
const char *path,
- int stage);
+ int stage, struct mem_pool *mp);
-struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp);
/*
* Discard cache entry.
diff --git a/read-cache.c b/read-cache.c
index ecf6f68994..f9bac760af 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -813,8 +813,10 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate, size_t le
return mem_pool__ce_calloc(find_mem_pool(istate), len);
}
-struct cache_entry *make_empty_transient_cache_entry(size_t len)
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp)
{
+ if (mp)
+ return mem_pool__ce_calloc(mp, len);
return xcalloc(1, cache_entry_size(len));
}
@@ -848,8 +850,10 @@ struct cache_entry *make_cache_entry(struct index_state *istate,
return ret;
}
-struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid,
- const char *path, int stage)
+struct cache_entry *make_transient_cache_entry(unsigned int mode,
+ const struct object_id *oid,
+ const char *path, int stage,
+ struct mem_pool *mp)
{
struct cache_entry *ce;
int len;
@@ -860,7 +864,7 @@ struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct o
}
len = strlen(path);
- ce = make_empty_transient_cache_entry(len);
+ ce = make_empty_transient_cache_entry(len, mp);
oidcpy(&ce->oid, oid);
memcpy(ce->name, path, len);
diff --git a/unpack-trees.c b/unpack-trees.c
index e05e6ceff2..dcb40dc8fa 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1031,7 +1031,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
size_t len = traverse_path_len(info, tree_entry_len(n));
struct cache_entry *ce =
is_transient ?
- make_empty_transient_cache_entry(len) :
+ make_empty_transient_cache_entry(len, NULL) :
make_empty_cache_entry(istate, len);
ce->ce_mode = create_ce_mode(n->mode);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 14/19] builtin/checkout.c: complete parallel checkout support
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (12 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 15/19] checkout-index: add " Matheus Tavares
` (5 subsequent siblings)
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
There is one code path in builtin/checkout.c which still doesn't benefit
from parallel checkout because it calls checkout_entry() directly,
instead of unpack_trees(). Let's add parallel support for this missing
spot as well. Note: the transient cache entries allocated in
checkout_merged() are now allocated in a mem_pool which is only
discarded after parallel checkout finishes. This is done because the
entries need to be valid when run_parallel_checkout() is called.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/checkout.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/builtin/checkout.c b/builtin/checkout.c
index c0bf5e6711..ddc4079b85 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -27,6 +27,7 @@
#include "wt-status.h"
#include "xdiff-interface.h"
#include "entry.h"
+#include "parallel-checkout.h"
static const char * const checkout_usage[] = {
N_("git checkout [<options>] <branch>"),
@@ -230,7 +231,8 @@ static int checkout_stage(int stage, const struct cache_entry *ce, int pos,
return error(_("path '%s' does not have their version"), ce->name);
}
-static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts)
+static int checkout_merged(int pos, const struct checkout *state,
+ int *nr_checkouts, struct mem_pool *ce_mem_pool)
{
struct cache_entry *ce = active_cache[pos];
const char *path = ce->name;
@@ -291,11 +293,10 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
- ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
+ ce = make_transient_cache_entry(mode, &oid, path, 2, ce_mem_pool);
if (!ce)
die(_("make_cache_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL, nr_checkouts);
- discard_cache_entry(ce);
return status;
}
@@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
int nr_checkouts = 0, nr_unmerged = 0;
int errs = 0;
int pos;
+ int pc_workers, pc_threshold;
+ struct mem_pool ce_mem_pool;
state.force = 1;
state.refresh_cache = 1;
state.istate = &the_index;
+ mem_pool_init(&ce_mem_pool, 0);
+ get_parallel_checkout_configs(&pc_workers, &pc_threshold);
init_checkout_metadata(&state.meta, info->refname,
info->commit ? &info->commit->object.oid : &info->oid,
NULL);
enable_delayed_checkout(&state);
+ if (pc_workers > 1)
+ init_parallel_checkout();
for (pos = 0; pos < active_nr; pos++) {
struct cache_entry *ce = active_cache[pos];
if (ce->ce_flags & CE_MATCHED) {
@@ -384,10 +391,15 @@ static int checkout_worktree(const struct checkout_opts *opts,
&nr_checkouts, opts->overlay_mode);
else if (opts->merge)
errs |= checkout_merged(pos, &state,
- &nr_unmerged);
+ &nr_unmerged,
+ &ce_mem_pool);
pos = skip_same_name(ce, pos) - 1;
}
}
+ if (pc_workers > 1)
+ errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+ NULL, NULL);
+ mem_pool_discard(&ce_mem_pool, should_validate_cache_entries());
remove_marked_cache_entries(&the_index, 1);
remove_scheduled_dirs();
errs |= finish_delayed_checkout(&state, &nr_checkouts);
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 15/19] checkout-index: add parallel checkout support
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (13 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
` (4 subsequent siblings)
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/checkout-index.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 9276ed0258..9a2e255f58 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -12,6 +12,7 @@
#include "cache-tree.h"
#include "parse-options.h"
#include "entry.h"
+#include "parallel-checkout.h"
#define CHECKOUT_ALL 4
static int nul_term_line;
@@ -169,6 +170,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
int force = 0, quiet = 0, not_new = 0;
int index_opt = 0;
int err = 0;
+ int pc_workers, pc_threshold;
struct option builtin_checkout_index_options[] = {
OPT_BOOL('a', "all", &all,
N_("check out all files in the index")),
@@ -223,6 +225,14 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
}
+ if (!to_tempfile)
+ get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+ else
+ pc_workers = 1;
+
+ if (pc_workers > 1)
+ init_parallel_checkout();
+
/* Check out named files first */
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
@@ -262,12 +272,17 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
strbuf_release(&buf);
}
- if (err)
- return 1;
-
if (all)
checkout_all(prefix, prefix_length);
+ if (pc_workers > 1) {
+ err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+ NULL, NULL);
+ }
+
+ if (err)
+ return 1;
+
if (is_lock_file_locked(&lock_file) &&
write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
die("Unable to write new index file");
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 16/19] parallel-checkout: add tests for basic operations
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (14 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 15/19] checkout-index: add " Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
` (3 subsequent siblings)
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Add tests to populate the working tree during clone and checkout using
the sequential and parallel modes, to confirm that they produce
identical results. Also test basic checkout mechanics, such as checking
for symlinks in the leading directories and the abidance to --force.
Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But it will also be used by other
parallel-checkout tests in the following patches.
Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
t/lib-parallel-checkout.sh | 40 +++++++
t/t2080-parallel-checkout-basics.sh | 170 ++++++++++++++++++++++++++++
2 files changed, 210 insertions(+)
create mode 100644 t/lib-parallel-checkout.sh
create mode 100755 t/t2080-parallel-checkout-basics.sh
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
new file mode 100644
index 0000000000..4dad9043fb
--- /dev/null
+++ b/t/lib-parallel-checkout.sh
@@ -0,0 +1,40 @@
+# Helpers for t208* tests
+
+# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
+# and checks that the number of workers spawned is equal to $3.
+#
+git_pc()
+{
+ if test $# -lt 4
+ then
+ BUG "too few arguments to git_pc()"
+ fi &&
+
+ workers=$1 threshold=$2 expected_workers=$3 &&
+ shift 3 &&
+
+ rm -f trace &&
+ GIT_TRACE2="$(pwd)/trace" git \
+ -c checkout.workers=$workers \
+ -c checkout.thresholdForParallelism=$threshold \
+ -c advice.detachedHead=0 \
+ "$@" &&
+
+ # Check that the expected number of workers has been used. Note that it
+ # can be different from the requested number in two cases: when the
+ # threshold is not reached; and when there are not enough
+ # parallel-eligible entries for all workers.
+ #
+ local workers_in_trace=$(grep "child_start\[..*\] git checkout--helper" trace | wc -l) &&
+ test $workers_in_trace -eq $expected_workers &&
+ rm -f trace
+}
+
+# Verify that both the working tree and the index were created correctly
+verify_checkout()
+{
+ git -C "$1" diff-index --quiet HEAD -- &&
+ git -C "$1" diff-index --quiet --cached HEAD -- &&
+ git -C "$1" status --porcelain >"$1".status &&
+ test_must_be_empty "$1".status
+}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
new file mode 100755
index 0000000000..edea88f14f
--- /dev/null
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -0,0 +1,170 @@
+#!/bin/sh
+
+test_description='parallel-checkout basics
+
+Ensure that parallel-checkout basically works on clone and checkout, spawning
+the required number of workers and correctly populating both the index and
+working tree.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+# Test parallel-checkout with different operations (creation, deletion,
+# modification) and entry types. A branch switch from B1 to B2 will contain:
+#
+# - a (file): modified
+# - e/x (file): deleted
+# - b (symlink): deleted
+# - b/f (file): created
+# - e (symlink): created
+# - d (submodule): created
+#
+test_expect_success SYMLINKS 'setup repo for checkout with various operations' '
+ git init various &&
+ (
+ cd various &&
+ git checkout -b B1 &&
+ echo a>a &&
+ mkdir e &&
+ echo e/x >e/x &&
+ ln -s e b &&
+ git add -A &&
+ git commit -m B1 &&
+
+ git checkout -b B2 &&
+ echo modified >a &&
+ rm -rf e &&
+ rm b &&
+ mkdir b &&
+ echo b/f >b/f &&
+ ln -s b e &&
+ git init d &&
+ test_commit -C d f &&
+ git submodule add ./d &&
+ git add -A &&
+ git commit -m B2 &&
+
+ git checkout --recurse-submodules B1
+ )
+'
+
+test_expect_success SYMLINKS 'sequential checkout' '
+ cp -R various various_sequential &&
+ git_pc 1 0 0 -C various_sequential checkout --recurse-submodules B2 &&
+ verify_checkout various_sequential
+'
+
+test_expect_success SYMLINKS 'parallel checkout' '
+ cp -R various various_parallel &&
+ git_pc 2 0 2 -C various_parallel checkout --recurse-submodules B2 &&
+ verify_checkout various_parallel
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout (threshold)' '
+ cp -R various various_sequential_fallback &&
+ git_pc 2 100 0 -C various_sequential_fallback checkout --recurse-submodules B2 &&
+ verify_checkout various_sequential_fallback
+'
+
+test_expect_success SYMLINKS 'parallel checkout on clone' '
+ git -C various checkout --recurse-submodules B2 &&
+ git_pc 2 0 2 clone --recurse-submodules various various_parallel_clone &&
+ verify_checkout various_parallel_clone
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout on clone (threshold)' '
+ git -C various checkout --recurse-submodules B2 &&
+ git_pc 2 100 0 clone --recurse-submodules various various_sequential_fallback_clone &&
+ verify_checkout various_sequential_fallback_clone
+'
+
+# Just to be paranoid, actually compare the working trees' contents directly.
+test_expect_success SYMLINKS 'compare the working trees' '
+ rm -rf various_*/.git &&
+ rm -rf various_*/d/.git &&
+
+ diff -r various_sequential various_parallel &&
+ diff -r various_sequential various_sequential_fallback &&
+ diff -r various_sequential various_parallel_clone &&
+ diff -r various_sequential various_sequential_fallback_clone
+'
+
+test_cmp_str()
+{
+ echo "$1" >tmp &&
+ test_cmp tmp "$2"
+}
+
+test_expect_success 'parallel checkout respects --[no]-force' '
+ git init dirty &&
+ (
+ cd dirty &&
+ mkdir D &&
+ test_commit D/F &&
+ test_commit F &&
+
+ echo changed >F.t &&
+ rm -rf D &&
+ echo changed >D &&
+
+ # We expect 0 workers because there is nothing to be updated
+ git_pc 2 0 0 checkout HEAD &&
+ test_path_is_file D &&
+ test_cmp_str changed D &&
+ test_cmp_str changed F.t &&
+
+ git_pc 2 0 2 checkout --force HEAD &&
+ test_path_is_dir D &&
+ test_cmp_str D/F D/F.t &&
+ test_cmp_str F F.t
+ )
+'
+
+test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
+ git init symlinks &&
+ (
+ cd symlinks &&
+ mkdir D E &&
+
+ # Create two entries in D to have enough work for 2 parallel
+ # workers
+ test_commit D/A &&
+ test_commit D/B &&
+ test_commit E/C &&
+ rm -rf D &&
+ ln -s E D &&
+
+ git_pc 2 0 2 checkout --force HEAD &&
+ ! test -L D &&
+ test_cmp_str D/A D/A.t &&
+ test_cmp_str D/B D/B.t
+ )
+'
+
+test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'symlink colliding with leading dir' '
+ git init colliding-symlink &&
+ (
+ cd colliding-symlink &&
+ file_hex=$(git hash-object -w --stdin </dev/null) &&
+ file_oct=$(echo $file_hex | hex2oct) &&
+
+ sym_hex=$(echo "./D" | git hash-object -w --stdin) &&
+ sym_oct=$(echo $sym_hex | hex2oct) &&
+
+ printf "100644 D/A\0${file_oct}" >tree &&
+ printf "100644 E/B\0${file_oct}" >>tree &&
+ printf "120000 e\0${sym_oct}" >>tree &&
+
+ tree_hex=$(git hash-object -w -t tree --stdin <tree) &&
+ commit_hex=$(git commit-tree -m collisions $tree_hex) &&
+ git update-ref refs/heads/colliding-symlink $commit_hex &&
+
+ git_pc 2 0 2 checkout colliding-symlink &&
+ test_path_is_dir D &&
+ test_path_is_missing D/B
+ )
+'
+
+test_done
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 17/19] parallel-checkout: add tests related to clone collisions
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (15 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
` (2 subsequent siblings)
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Add tests to confirm that path collisions are properly reported during a
clone operation using parallel-checkout.
Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
t/lib-parallel-checkout.sh | 4 +-
t/t2081-parallel-checkout-collisions.sh | 98 +++++++++++++++++++++++++
2 files changed, 100 insertions(+), 2 deletions(-)
create mode 100755 t/t2081-parallel-checkout-collisions.sh
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index 4dad9043fb..e62a433eb1 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -18,7 +18,7 @@ git_pc()
-c checkout.workers=$workers \
-c checkout.thresholdForParallelism=$threshold \
-c advice.detachedHead=0 \
- "$@" &&
+ "$@" 2>&8 &&
# Check that the expected number of workers has been used. Note that it
# can be different from the requested number in two cases: when the
@@ -28,7 +28,7 @@ git_pc()
local workers_in_trace=$(grep "child_start\[..*\] git checkout--helper" trace | wc -l) &&
test $workers_in_trace -eq $expected_workers &&
rm -f trace
-}
+} 8>&2 2>&4
# Verify that both the working tree and the index were created correctly
verify_checkout()
diff --git a/t/t2081-parallel-checkout-collisions.sh b/t/t2081-parallel-checkout-collisions.sh
new file mode 100755
index 0000000000..5cab2dcd2c
--- /dev/null
+++ b/t/t2081-parallel-checkout-collisions.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+
+test_description='parallel-checkout collisions
+
+When there are path collisions during a clone, Git should report a warning
+listing all of the colliding entries. The sequential code detects a collision
+by calling lstat() before trying to open(O_CREAT) the file. Then, to find the
+colliding pair of an item k, it searches cache_entry[0, k-1].
+
+This is not sufficient in parallel checkout since:
+
+- A colliding file may be created between the lstat() and open() calls;
+- A colliding entry might appear in the second half of the cache_entry array.
+
+The tests in this file make sure that the collision detection code is extended
+for parallel checkout.
+'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+TEST_ROOT="$PWD"
+
+test_expect_success CASE_INSENSITIVE_FS 'setup' '
+ file_x_hex=$(git hash-object -w --stdin </dev/null) &&
+ file_x_oct=$(echo $file_x_hex | hex2oct) &&
+
+ attr_hex=$(echo "file_x filter=logger" | git hash-object -w --stdin) &&
+ attr_oct=$(echo $attr_hex | hex2oct) &&
+
+ printf "100644 FILE_X\0${file_x_oct}" >tree &&
+ printf "100644 FILE_x\0${file_x_oct}" >>tree &&
+ printf "100644 file_X\0${file_x_oct}" >>tree &&
+ printf "100644 file_x\0${file_x_oct}" >>tree &&
+ printf "100644 .gitattributes\0${attr_oct}" >>tree &&
+
+ tree_hex=$(git hash-object -w -t tree --stdin <tree) &&
+ commit_hex=$(git commit-tree -m collisions $tree_hex) &&
+ git update-ref refs/heads/collisions $commit_hex &&
+
+ write_script "$TEST_ROOT"/logger_script <<-\EOF
+ echo "$@" >>filter.log
+ EOF
+'
+
+for mode in parallel sequential-fallback
+do
+
+ case $mode in
+ parallel) workers=2 threshold=0 expected_workers=2 ;;
+ sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
+ esac
+
+ test_expect_success CASE_INSENSITIVE_FS "collision detection on $mode clone" '
+ git_pc $workers $threshold $expected_workers \
+ clone --branch=collisions . $mode 2>$mode.stderr &&
+
+ grep FILE_X $mode.stderr &&
+ grep FILE_x $mode.stderr &&
+ grep file_X $mode.stderr &&
+ grep file_x $mode.stderr &&
+ test_i18ngrep "the following paths have collided" $mode.stderr
+ '
+
+ # The following test ensures that the collision detection code is
+ # correctly looking for colliding peers in the second half of the
+ # cache_entry array. This is done by defining a smudge command for the
+ # *last* array entry, which makes it non-eligible for parallel-checkout.
+ # The last entry is then checked out *before* any worker is spawned,
+ # making it succeed and the workers' entries collide.
+ #
+ # Note: this test don't work on Windows because, on this system,
+ # collision detection uses strcmp() when core.ignoreCase=false. And we
+ # have to set core.ignoreCase=false so that only 'file_x' matches the
+ # pattern of the filter attribute. But it works on OSX, where collision
+ # detection uses inode.
+ #
+ test_expect_success CASE_INSENSITIVE_FS,!MINGW,!CYGWIN "collision detection on $mode clone w/ filter" '
+ git_pc $workers $threshold $expected_workers \
+ -c core.ignoreCase=false \
+ -c filter.logger.smudge="\"$TEST_ROOT/logger_script\" %f" \
+ clone --branch=collisions . ${mode}_with_filter \
+ 2>${mode}_with_filter.stderr &&
+
+ grep FILE_X ${mode}_with_filter.stderr &&
+ grep FILE_x ${mode}_with_filter.stderr &&
+ grep file_X ${mode}_with_filter.stderr &&
+ grep file_x ${mode}_with_filter.stderr &&
+ test_i18ngrep "the following paths have collided" ${mode}_with_filter.stderr &&
+
+ # Make sure only "file_x" was filtered
+ test_path_is_file ${mode}_with_filter/filter.log &&
+ echo file_x >expected.filter.log &&
+ test_cmp ${mode}_with_filter/filter.log expected.filter.log
+ '
+done
+
+test_done
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 18/19] parallel-checkout: add tests related to .gitattributes
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (16 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
Add tests to confirm that `struct conv_attrs` data is correctly passed
from the main process to the workers, and that they properly smudge
files before writing to the working tree. Also check that
non-parallel-eligible entries, such as regular files that require
external filters, are correctly smudge and written when
parallel-checkout is enabled.
Note: to avoid repeating code, some helper functions are extracted from
t0028 into a common lib file.
Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
t/lib-encoding.sh | 25 ++++
t/t0028-working-tree-encoding.sh | 25 +---
t/t2082-parallel-checkout-attributes.sh | 174 ++++++++++++++++++++++++
3 files changed, 200 insertions(+), 24 deletions(-)
create mode 100644 t/lib-encoding.sh
create mode 100755 t/t2082-parallel-checkout-attributes.sh
diff --git a/t/lib-encoding.sh b/t/lib-encoding.sh
new file mode 100644
index 0000000000..c52ffbbed5
--- /dev/null
+++ b/t/lib-encoding.sh
@@ -0,0 +1,25 @@
+# Encoding helpers used by t0028 and t2082
+
+test_lazy_prereq NO_UTF16_BOM '
+ test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
+'
+
+test_lazy_prereq NO_UTF32_BOM '
+ test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
+'
+
+write_utf16 () {
+ if test_have_prereq NO_UTF16_BOM
+ then
+ printf '\376\377'
+ fi &&
+ iconv -f UTF-8 -t UTF-16
+}
+
+write_utf32 () {
+ if test_have_prereq NO_UTF32_BOM
+ then
+ printf '\0\0\376\377'
+ fi &&
+ iconv -f UTF-8 -t UTF-32
+}
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index bfc4fb9af5..4fffc3a639 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -3,33 +3,10 @@
test_description='working-tree-encoding conversion via gitattributes'
. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-encoding.sh"
GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
-test_lazy_prereq NO_UTF16_BOM '
- test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
-'
-
-test_lazy_prereq NO_UTF32_BOM '
- test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
-'
-
-write_utf16 () {
- if test_have_prereq NO_UTF16_BOM
- then
- printf '\376\377'
- fi &&
- iconv -f UTF-8 -t UTF-16
-}
-
-write_utf32 () {
- if test_have_prereq NO_UTF32_BOM
- then
- printf '\0\0\376\377'
- fi &&
- iconv -f UTF-8 -t UTF-32
-}
-
test_expect_success 'setup test files' '
git config core.eol lf &&
diff --git a/t/t2082-parallel-checkout-attributes.sh b/t/t2082-parallel-checkout-attributes.sh
new file mode 100755
index 0000000000..6800574588
--- /dev/null
+++ b/t/t2082-parallel-checkout-attributes.sh
@@ -0,0 +1,174 @@
+#!/bin/sh
+
+test_description='parallel-checkout: attributes
+
+Verify that parallel-checkout correctly creates files that require
+conversions, as specified in .gitattributes. The main point here is
+to check that the conv_attr data is correctly sent to the workers
+and that it contains sufficient information to smudge files
+properly (without access to the index or attribute stack).
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+. "$TEST_DIRECTORY/lib-encoding.sh"
+
+test_expect_success 'parallel-checkout with ident' '
+ git init ident &&
+ (
+ cd ident &&
+ echo "A ident" >.gitattributes &&
+ echo "\$Id\$" >A &&
+ echo "\$Id\$" >B &&
+ git add -A &&
+ git commit -m id &&
+
+ rm A B &&
+ git_pc 2 0 2 reset --hard &&
+ hexsz=$(test_oid hexsz) &&
+ grep -E "\\\$Id: [0-9a-f]{$hexsz} \\\$" A &&
+ grep "\\\$Id\\\$" B
+ )
+'
+
+test_expect_success 'parallel-checkout with re-encoding' '
+ git init encoding &&
+ (
+ cd encoding &&
+ echo text >utf8-text &&
+ cat utf8-text | write_utf16 >utf16-text &&
+
+ echo "A working-tree-encoding=UTF-16" >.gitattributes &&
+ cp utf16-text A &&
+ cp utf16-text B &&
+ git add A B .gitattributes &&
+ git commit -m encoding &&
+
+ # Check that A (and only A) is stored in UTF-8
+ git cat-file -p :A >A.internal &&
+ test_cmp_bin utf8-text A.internal &&
+ git cat-file -p :B >B.internal &&
+ test_cmp_bin utf16-text B.internal &&
+
+ # Check that A is re-encoded during checkout
+ rm A B &&
+ git_pc 2 0 2 checkout A B &&
+ test_cmp_bin utf16-text A
+ )
+'
+
+test_expect_success 'parallel-checkout with eol conversions' '
+ git init eol &&
+ (
+ cd eol &&
+ git config core.autocrlf false &&
+ printf "multi\r\nline\r\ntext" >crlf-text &&
+ printf "multi\nline\ntext" >lf-text &&
+
+ echo "A text eol=crlf" >.gitattributes &&
+ echo "B -text" >>.gitattributes &&
+ cp crlf-text A &&
+ cp crlf-text B &&
+ git add A B .gitattributes &&
+ git commit -m eol &&
+
+ # Check that A (and only A) is stored with LF format
+ git cat-file -p :A >A.internal &&
+ test_cmp_bin lf-text A.internal &&
+ git cat-file -p :B >B.internal &&
+ test_cmp_bin crlf-text B.internal &&
+
+ # Check that A is converted to CRLF during checkout
+ rm A B &&
+ git_pc 2 0 2 checkout A B &&
+ test_cmp_bin crlf-text A
+ )
+'
+
+test_cmp_str()
+{
+ echo "$1" >tmp &&
+ test_cmp tmp "$2"
+}
+
+# Entries that require an external filter are not eligible for parallel
+# checkout. Check that both the parallel-eligible and non-eligible entries are
+# properly writen in a single checkout process.
+#
+test_expect_success 'parallel-checkout and external filter' '
+ git init filter &&
+ (
+ cd filter &&
+ git config filter.x2y.clean "tr x y" &&
+ git config filter.x2y.smudge "tr y x" &&
+ git config filter.x2y.required true &&
+
+ echo "A filter=x2y" >.gitattributes &&
+ echo x >A &&
+ echo x >B &&
+ echo x >C &&
+ git add -A &&
+ git commit -m filter &&
+
+ # Check that A (and only A) was cleaned
+ git cat-file -p :A >A.internal &&
+ test_cmp_str y A.internal &&
+ git cat-file -p :B >B.internal &&
+ test_cmp_str x B.internal &&
+ git cat-file -p :C >C.internal &&
+ test_cmp_str x C.internal &&
+
+ rm A B C *.internal &&
+ git_pc 2 0 2 checkout A B C &&
+ test_cmp_str x A &&
+ test_cmp_str x B &&
+ test_cmp_str x C
+ )
+'
+
+# The delayed queue is independent from the parallel queue, and they should be
+# able to work together in the same checkout process.
+#
+test_expect_success PERL 'parallel-checkout and delayed checkout' '
+ write_script rot13-filter.pl "$PERL_PATH" \
+ <"$TEST_DIRECTORY"/t0021/rot13-filter.pl &&
+ test_config_global filter.delay.process \
+ "\"$(pwd)/rot13-filter.pl\" \"$(pwd)/delayed.log\" clean smudge delay" &&
+ test_config_global filter.delay.required true &&
+
+ echo "a b c" >delay-content &&
+ echo "n o p" >delay-rot13-content &&
+
+ git init delayed &&
+ (
+ cd delayed &&
+ echo "*.a filter=delay" >.gitattributes &&
+ cp ../delay-content test-delay10.a &&
+ cp ../delay-content test-delay11.a &&
+ echo parallel >parallel1.b &&
+ echo parallel >parallel2.b &&
+ git add -A &&
+ git commit -m delayed &&
+
+ # Check that the stored data was cleaned
+ git cat-file -p :test-delay10.a > delay10.internal &&
+ test_cmp delay10.internal ../delay-rot13-content &&
+ git cat-file -p :test-delay11.a > delay11.internal &&
+ test_cmp delay11.internal ../delay-rot13-content &&
+ rm *.internal &&
+
+ rm *.a *.b
+ ) &&
+
+ git_pc 2 0 2 -C delayed checkout -f &&
+ verify_checkout delayed &&
+
+ # Check that the *.a files got to the delay queue and were filtered
+ grep "smudge test-delay10.a .* \[DELAYED\]" delayed.log &&
+ grep "smudge test-delay11.a .* \[DELAYED\]" delayed.log &&
+ test_cmp delayed/test-delay10.a delay-content &&
+ test_cmp delayed/test-delay11.a delay-content
+'
+
+test_done
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v4 19/19] ci: run test round with parallel-checkout enabled
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (17 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
@ 2020-11-04 20:33 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
19 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-11-04 20:33 UTC (permalink / raw)
To: git; +Cc: gitster, git, chriscool, peff, newren, jrnieder, martin.agren
We already have tests for the basic parallel-checkout operations. But
this code can also run in other commands, such as git-read-tree and
git-sparse-checkout, which are currently not tested with multiple
workers. To promote a wider test coverage without duplicating tests:
1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally
force parallel-checkout execution during the whole test suite.
2. Include this variable in the second test round of the linux-gcc job
of our ci scripts. This round runs `make test` again with some
optional GIT_TEST_* variables enabled, so there is no additional
overhead in exercising the parallel-checkout code here.
Note: the specific parallel-checkout tests t208* cannot be used in
combination with GIT_TEST_CHECKOUT_WORKERS as they need to set and check
the number of workers by themselves. So skip those tests when this flag
is set.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
ci/run-build-and-tests.sh | 1 +
parallel-checkout.c | 14 ++++++++++++++
t/README | 4 ++++
t/lib-parallel-checkout.sh | 6 ++++++
4 files changed, 25 insertions(+)
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 6c27b886b8..aa32ddc361 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -22,6 +22,7 @@ linux-gcc)
export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
export GIT_TEST_MULTI_PACK_INDEX=1
export GIT_TEST_ADD_I_USE_BUILTIN=1
+ export GIT_TEST_CHECKOUT_WORKERS=2
make test
;;
linux-clang)
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 72ac93d541..33f36fb1bf 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -32,6 +32,20 @@ enum pc_status parallel_checkout_status(void)
void get_parallel_checkout_configs(int *num_workers, int *threshold)
{
+ char *env_workers = getenv("GIT_TEST_CHECKOUT_WORKERS");
+
+ if (env_workers && *env_workers) {
+ if (strtol_i(env_workers, 10, num_workers)) {
+ die("invalid value for GIT_TEST_CHECKOUT_WORKERS: '%s'",
+ env_workers);
+ }
+ if (*num_workers < 1)
+ *num_workers = online_cpus();
+
+ *threshold = 0;
+ return;
+ }
+
if (git_config_get_int("checkout.workers", num_workers))
*num_workers = 1;
else if (*num_workers < 1)
diff --git a/t/README b/t/README
index 2adaf7c2d2..cd1b15c55a 100644
--- a/t/README
+++ b/t/README
@@ -425,6 +425,10 @@ GIT_TEST_DEFAULT_HASH=<hash-algo> specifies which hash algorithm to
use in the test scripts. Recognized values for <hash-algo> are "sha1"
and "sha256".
+GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
+to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
+execution of the parallel-checkout code.
+
Naming Tests
------------
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index e62a433eb1..7b454da375 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -1,5 +1,11 @@
# Helpers for t208* tests
+if ! test -z "$GIT_TEST_CHECKOUT_WORKERS"
+then
+ skip_all="skipping test, GIT_TEST_CHECKOUT_WORKERS is set"
+ test_done
+fi
+
# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
# and checks that the number of workers spawned is equal to $3.
#
--
2.28.0
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 0/9] Parallel Checkout (part I)
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
` (18 preceding siblings ...)
2020-11-04 20:33 ` [PATCH v4 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
` (11 more replies)
19 siblings, 12 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren
The previous rounds got many great suggestions about patches 1 to 10,
but not as many comments on the latter patches. Christian commented that
patches 10 and 11 are too long/complex, making the overall series harder
to review. So he suggested that I eject patches 10 to 19, and send them
later in a separated part. This will hopefully make the series easier to
review and move forward. (I also hope to include a desing doc in part 2
to make those two bigger patches more digestible.)
So this part is now composed only of the 9 preparatory patches, which
mainly focus on: (1) adding the 'struct conv_attrs' parameter to some
convert.c and entry.c functions (to avoid re-loading the attributes
during parallel checkout); and (2) making some functions public (for
parallel-checkout.c's later use).
Changes since v4:
General:
- Removed "[matheus.bernardino: ...]" lines from patches 1 to 4.
- Ejected patches 10 to 19, which will now be sent in part 2.
Patch 2:
- Fixed typo on commit message (s/on a index/on an index/).
- Added "_ca" to convert_to_working_tree_internal()'s name.
Patch 4:
- Reworded patch title for better clarity, as suggested by Christian.
Patch 5:
- Mentioned about moving a checkout_entry() comment from entry.c to
entry.h in the patch's message.
Patch 7:
- Make patch's title more explicit and reworded message for clarity.
Patch 8:
- Fixed the last call to write_entry() in checkout_entry(): it should
pass 'ca', not NULL.
Patch 9:
- Defined checkout_entry() -- which is now a wrapper to
checkout_entry_ca() -- as a static inline function instead of a macro.
- Shortened patch's title.
Note: to see the big picture where these patches should fit in, please
check the previous round with the complete series:
https://lore.kernel.org/git/cover.1604521275.git.matheus.bernardino@usp.br/
Jeff Hostetler (4):
convert: make convert_attrs() and convert structs public
convert: add [async_]convert_to_working_tree_ca() variants
convert: add get_stream_filter_ca() variant
convert: add classification for conv_attrs struct
Matheus Tavares (5):
entry: extract a header file for entry.c functions
entry: make fstat_output() and read_blob_entry() public
entry: extract update_ce_after_write() from write_entry()
entry: move conv_attrs lookup up to checkout_entry()
entry: add checkout_entry_ca() taking preloaded conv_attrs
apply.c | 1 +
builtin/checkout-index.c | 1 +
builtin/checkout.c | 1 +
builtin/difftool.c | 1 +
cache.h | 24 -------
convert.c | 143 ++++++++++++++++++++-------------------
convert.h | 96 +++++++++++++++++++++++---
entry.c | 85 +++++++++++++----------
entry.h | 59 ++++++++++++++++
unpack-trees.c | 1 +
10 files changed, 275 insertions(+), 137 deletions(-)
create mode 100644 entry.h
Range-diff against v4:
1: 2726f6dc05 ! 1: fa04185237 convert: make convert_attrs() and convert structs public
@@ Commit message
convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
- [matheus.bernardino: squash and reword msg]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
## convert.c ##
2: fc03417592 ! 2: 2d3c1dc0a1 convert: add [async_]convert_to_working_tree_ca() variants
@@ Commit message
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
- precomputed 'struct conv_attrs', not relying, thus, on a index state.
+ precomputed 'struct conv_attrs', not relying, thus, on an index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
@@ Commit message
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
- [matheus.bernardino: squash, remove one function definition and reword]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
## convert.c ##
@@ convert.c: void convert_to_git_filter_fd(const struct index_state *istate,
}
-static int convert_to_working_tree_internal(const struct index_state *istate,
-+static int convert_to_working_tree_internal(const struct conv_attrs *ca,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- int normalizing,
-@@ convert.c: static int convert_to_working_tree_internal(const struct index_state *istate,
- struct delayed_checkout *dco)
+- const char *path, const char *src,
+- size_t len, struct strbuf *dst,
+- int normalizing,
+- const struct checkout_metadata *meta,
+- struct delayed_checkout *dco)
++static int convert_to_working_tree_ca_internal(const struct conv_attrs *ca,
++ const char *path, const char *src,
++ size_t len, struct strbuf *dst,
++ int normalizing,
++ const struct checkout_metadata *meta,
++ struct delayed_checkout *dco)
{
int ret = 0, ret_filter = 0;
- struct conv_attrs ca;
@@ convert.c: static int convert_to_working_tree_internal(const struct index_state
+ void *dco)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
-+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, dco);
++ return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
++ meta, dco);
}
-int convert_to_working_tree(const struct index_state *istate,
@@ convert.c: static int convert_to_working_tree_internal(const struct index_state
+ const struct checkout_metadata *meta)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
-+ return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, NULL);
++ return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
++ meta, NULL);
}
int renormalize_buffer(const struct index_state *istate, const char *path,
@@ convert.c: static int convert_to_working_tree_internal(const struct index_state
+ int ret;
+
+ convert_attrs(istate, &ca, path);
-+ ret = convert_to_working_tree_internal(&ca, path, src, len, dst, 1, NULL, NULL);
++ ret = convert_to_working_tree_ca_internal(&ca, path, src, len, dst, 1,
++ NULL, NULL);
if (ret) {
src = dst->buf;
len = dst->len;
3: 8ce20f1031 ! 3: 8af17f6a9b convert: add get_stream_filter_ca() variant
@@ Commit message
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
- [matheus.bernardino: move header comment to ca() variant and reword msg]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
## convert.c ##
4: aa1eb461f4 ! 4: f829e2e08f convert: add conv_attrs classification
@@ Metadata
Author: Jeff Hostetler <jeffhost@microsoft.com>
## Commit message ##
- convert: add conv_attrs classification
+ convert: add classification for conv_attrs struct
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
@@ Commit message
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
- [matheus.bernardino: use classification in get_stream_filter_ca()]
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
## convert.c ##
5: cb3dea224b ! 5: 934ead9519 entry: extract a header file for entry.c functions
@@ Commit message
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
- new entry.h header.
+ new entry.h header. While at it let's also move a comment related to
+ checkout_entry() from entry.c to entry.h as it's more useful to describe
+ the function there.
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
6: 46ed6274d7 = 6: da6d1d7624 entry: make fstat_output() and read_blob_entry() public
7: a0479d02ff ! 7: def654606d entry: extract cache_entry update from write_entry()
@@ Metadata
Author: Matheus Tavares <matheus.bernardino@usp.br>
## Commit message ##
- entry: extract cache_entry update from write_entry()
+ entry: extract update_ce_after_write() from write_entry()
- This code will be used by the parallel checkout functions, outside
- entry.c, so extract it to a public function.
+ The code that updates the in-memory index information after an entry is
+ written currently resides in write_entry(). Extract it to a public
+ function so that it can be called by the parallel checkout functions,
+ outside entry.c, in a later patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
8: 5c993cc27f ! 8: fbde901518 entry: move conv_attrs lookup up to checkout_entry()
@@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
+ ca = &ca_buf;
+ }
+
-+ return write_entry(ce, path.buf, NULL, state, 0);
++ return write_entry(ce, path.buf, ca, state, 0);
}
void unlink_entry(const struct cache_entry *ce)
9: aa635bda21 ! 9: 0556d32e8c entry: add checkout_entry_ca() which takes preloaded conv_attrs
@@ Metadata
Author: Matheus Tavares <matheus.bernardino@usp.br>
## Commit message ##
- entry: add checkout_entry_ca() which takes preloaded conv_attrs
+ entry: add checkout_entry_ca() taking preloaded conv_attrs
The parallel checkout machinery will call checkout_entry() for entries
that could not be written in parallel due to path collisions. At this
@@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
convert_attrs(state->istate, &ca_buf, ce->name);
ca = &ca_buf;
}
-
-- return write_entry(ce, path.buf, NULL, state, 0);
-+ return write_entry(ce, path.buf, ca, state, 0);
- }
-
- void unlink_entry(const struct cache_entry *ce)
## entry.h ##
@@ entry.h: struct checkout {
@@ entry.h: struct checkout {
*/
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts);
-+#define checkout_entry(ce, state, topath, nr_checkouts) \
-+ checkout_entry_ca(ce, NULL, state, topath, nr_checkouts)
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts);
++static inline int checkout_entry(struct cache_entry *ce,
++ const struct checkout *state, char *topath,
++ int *nr_checkouts)
++{
++ return checkout_entry_ca(ce, NULL, state, topath, nr_checkouts);
++}
void enable_delayed_checkout(struct checkout *state);
int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
10: bc8447cd9c < -: ---------- unpack-trees: add basic support for parallel checkout
11: 815137685a < -: ---------- parallel-checkout: make it truly parallel
12: 2b42621582 < -: ---------- parallel-checkout: support progress displaying
13: 960116579a < -: ---------- make_transient_cache_entry(): optionally alloc from mem_pool
14: fb9f2f580c < -: ---------- builtin/checkout.c: complete parallel checkout support
15: a844451e58 < -: ---------- checkout-index: add parallel checkout support
16: 3733857ffa < -: ---------- parallel-checkout: add tests for basic operations
17: c8a2974f81 < -: ---------- parallel-checkout: add tests related to clone collisions
18: 86fccd57d5 < -: ---------- parallel-checkout: add tests related to .gitattributes
19: 7f3e23cc38 < -: ---------- ci: run test round with parallel-checkout enabled
--
2.29.2
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v5 1/9] convert: make convert_attrs() and convert structs public
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
` (10 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
parallel checkout implementation. Note that crlf_action is renamed to
convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 35 ++++++++---------------------------
convert.h | 24 ++++++++++++++++++++++++
2 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/convert.c b/convert.c
index ee360c2f07..f13b001273 100644
--- a/convert.c
+++ b/convert.c
@@ -24,17 +24,6 @@
#define CONVERT_STAT_BITS_TXT_CRLF 0x2
#define CONVERT_STAT_BITS_BIN 0x4
-enum crlf_action {
- CRLF_UNDEFINED,
- CRLF_BINARY,
- CRLF_TEXT,
- CRLF_TEXT_INPUT,
- CRLF_TEXT_CRLF,
- CRLF_AUTO,
- CRLF_AUTO_INPUT,
- CRLF_AUTO_CRLF
-};
-
struct text_stat {
/* NUL, CR, LF and CRLF counts */
unsigned nul, lonecr, lonelf, crlf;
@@ -172,7 +161,7 @@ static int text_eol_is_crlf(void)
return 0;
}
-static enum eol output_eol(enum crlf_action crlf_action)
+static enum eol output_eol(enum convert_crlf_action crlf_action)
{
switch (crlf_action) {
case CRLF_BINARY:
@@ -246,7 +235,7 @@ static int has_crlf_in_index(const struct index_state *istate, const char *path)
}
static int will_convert_lf_to_crlf(struct text_stat *stats,
- enum crlf_action crlf_action)
+ enum convert_crlf_action crlf_action)
{
if (output_eol(crlf_action) != EOL_CRLF)
return 0;
@@ -499,7 +488,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
static int crlf_to_git(const struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *buf,
- enum crlf_action crlf_action, int conv_flags)
+ enum convert_crlf_action crlf_action, int conv_flags)
{
struct text_stat stats;
char *dst;
@@ -585,8 +574,8 @@ static int crlf_to_git(const struct index_state *istate,
return 1;
}
-static int crlf_to_worktree(const char *src, size_t len,
- struct strbuf *buf, enum crlf_action crlf_action)
+static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf,
+ enum convert_crlf_action crlf_action)
{
char *to_free = NULL;
struct text_stat stats;
@@ -1247,7 +1236,7 @@ static const char *git_path_check_encoding(struct attr_check_item *check)
return value;
}
-static enum crlf_action git_path_check_crlf(struct attr_check_item *check)
+static enum convert_crlf_action git_path_check_crlf(struct attr_check_item *check)
{
const char *value = check->value;
@@ -1297,18 +1286,10 @@ static int git_path_check_ident(struct attr_check_item *check)
return !!ATTR_TRUE(value);
}
-struct conv_attrs {
- struct convert_driver *drv;
- enum crlf_action attr_action; /* What attr says */
- enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
- int ident;
- const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
-};
-
static struct attr_check *check;
-static void convert_attrs(const struct index_state *istate,
- struct conv_attrs *ca, const char *path)
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path)
{
struct attr_check_item *ccheck = NULL;
diff --git a/convert.h b/convert.h
index e29d1026a6..5678e99922 100644
--- a/convert.h
+++ b/convert.h
@@ -63,6 +63,30 @@ struct checkout_metadata {
struct object_id blob;
};
+enum convert_crlf_action {
+ CRLF_UNDEFINED,
+ CRLF_BINARY,
+ CRLF_TEXT,
+ CRLF_TEXT_INPUT,
+ CRLF_TEXT_CRLF,
+ CRLF_AUTO,
+ CRLF_AUTO_INPUT,
+ CRLF_AUTO_CRLF
+};
+
+struct convert_driver;
+
+struct conv_attrs {
+ struct convert_driver *drv;
+ enum convert_crlf_action attr_action; /* What attr says */
+ enum convert_crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
+ int ident;
+ const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
+};
+
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path);
+
extern enum eol core_eol;
extern char *check_roundtrip_encoding;
const char *get_cached_convert_stats_ascii(const struct index_state *istate,
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 2/9] convert: add [async_]convert_to_working_tree_ca() variants
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
` (9 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
precomputed 'struct conv_attrs', not relying, thus, on an index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
- We will already load the conversion attributes in checkout_entry(),
before conversion, to decide whether a path is eligible for parallel
checkout. Therefore, it would be wasteful to load them again later,
for the actual conversion.
- The parallel workers will be responsible for reading, converting and
writing blobs to the working tree. They won't have access to the main
process' index state, so they cannot load the attributes. Instead,
they will receive the preloaded ones and call the _ca() variant of
the conversion functions. Furthermore, the attributes machinery is
optimized to handle paths in sequential order, so it's better to leave
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 60 +++++++++++++++++++++++++++++--------------------------
convert.h | 37 +++++++++++++++++++++++++---------
2 files changed, 60 insertions(+), 37 deletions(-)
diff --git a/convert.c b/convert.c
index f13b001273..0307374241 100644
--- a/convert.c
+++ b/convert.c
@@ -1447,19 +1447,16 @@ void convert_to_git_filter_fd(const struct index_state *istate,
ident_to_git(dst->buf, dst->len, dst, ca.ident);
}
-static int convert_to_working_tree_internal(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- int normalizing,
- const struct checkout_metadata *meta,
- struct delayed_checkout *dco)
+static int convert_to_working_tree_ca_internal(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ int normalizing,
+ const struct checkout_metadata *meta,
+ struct delayed_checkout *dco)
{
int ret = 0, ret_filter = 0;
- struct conv_attrs ca;
-
- convert_attrs(istate, &ca, path);
- ret |= ident_to_worktree(src, len, dst, ca.ident);
+ ret |= ident_to_worktree(src, len, dst, ca->ident);
if (ret) {
src = dst->buf;
len = dst->len;
@@ -1469,49 +1466,56 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
* is a smudge or process filter (even if the process filter doesn't
* support smudge). The filters might expect CRLFs.
*/
- if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
- ret |= crlf_to_worktree(src, len, dst, ca.crlf_action);
+ if ((ca->drv && (ca->drv->smudge || ca->drv->process)) || !normalizing) {
+ ret |= crlf_to_worktree(src, len, dst, ca->crlf_action);
if (ret) {
src = dst->buf;
len = dst->len;
}
}
- ret |= encode_to_worktree(path, src, len, dst, ca.working_tree_encoding);
+ ret |= encode_to_worktree(path, src, len, dst, ca->working_tree_encoding);
if (ret) {
src = dst->buf;
len = dst->len;
}
ret_filter = apply_filter(
- path, src, len, -1, dst, ca.drv, CAP_SMUDGE, meta, dco);
- if (!ret_filter && ca.drv && ca.drv->required)
- die(_("%s: smudge filter %s failed"), path, ca.drv->name);
+ path, src, len, -1, dst, ca->drv, CAP_SMUDGE, meta, dco);
+ if (!ret_filter && ca->drv && ca->drv->required)
+ die(_("%s: smudge filter %s failed"), path, ca->drv->name);
return ret | ret_filter;
}
-int async_convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta,
- void *dco)
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
+ return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
+ meta, dco);
}
-int convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta)
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
+ return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
+ meta, NULL);
}
int renormalize_buffer(const struct index_state *istate, const char *path,
const char *src, size_t len, struct strbuf *dst)
{
- int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
+ struct conv_attrs ca;
+ int ret;
+
+ convert_attrs(istate, &ca, path);
+ ret = convert_to_working_tree_ca_internal(&ca, path, src, len, dst, 1,
+ NULL, NULL);
if (ret) {
src = dst->buf;
len = dst->len;
diff --git a/convert.h b/convert.h
index 5678e99922..a4838b5e5c 100644
--- a/convert.h
+++ b/convert.h
@@ -99,15 +99,34 @@ const char *get_convert_attr_ascii(const struct index_state *istate,
int convert_to_git(const struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta,
- void *dco);
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta);
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco);
+static inline int convert_to_working_tree(const struct index_state *istate,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return convert_to_working_tree_ca(&ca, path, src, len, dst, meta);
+}
+static inline int async_convert_to_working_tree(const struct index_state *istate,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return async_convert_to_working_tree_ca(&ca, path, src, len, dst, meta, dco);
+}
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
int renormalize_buffer(const struct index_state *istate,
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 3/9] convert: add get_stream_filter_ca() variant
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 4/9] convert: add classification for conv_attrs struct Matheus Tavares
` (8 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Like the previous patch, we will also need to call get_stream_filter()
with a precomputed `struct conv_attrs`, when we add support for parallel
checkout workers. So add the _ca() variant which takes the conversion
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 28 +++++++++++++++++-----------
convert.h | 2 ++
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/convert.c b/convert.c
index 0307374241..9af6aafc5a 100644
--- a/convert.c
+++ b/convert.c
@@ -1942,34 +1942,31 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
}
/*
- * Return an appropriately constructed filter for the path, or NULL if
+ * Return an appropriately constructed filter for the given ca, or NULL if
* the contents cannot be filtered without reading the whole thing
* in-core.
*
* Note that you would be crazy to set CRLF, smudge/clean or ident to a
* large binary blob you would want us not to slurp into the memory!
*/
-struct stream_filter *get_stream_filter(const struct index_state *istate,
- const char *path,
- const struct object_id *oid)
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid)
{
- struct conv_attrs ca;
struct stream_filter *filter = NULL;
- convert_attrs(istate, &ca, path);
- if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
+ if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
return NULL;
- if (ca.working_tree_encoding)
+ if (ca->working_tree_encoding)
return NULL;
- if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
return NULL;
- if (ca.ident)
+ if (ca->ident)
filter = ident_filter(oid);
- if (output_eol(ca.crlf_action) == EOL_CRLF)
+ if (output_eol(ca->crlf_action) == EOL_CRLF)
filter = cascade_filter(filter, lf_to_crlf_filter());
else
filter = cascade_filter(filter, &null_filter_singleton);
@@ -1977,6 +1974,15 @@ struct stream_filter *get_stream_filter(const struct index_state *istate,
return filter;
}
+struct stream_filter *get_stream_filter(const struct index_state *istate,
+ const char *path,
+ const struct object_id *oid)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return get_stream_filter_ca(&ca, oid);
+}
+
void free_stream_filter(struct stream_filter *filter)
{
filter->vtbl->free(filter);
diff --git a/convert.h b/convert.h
index a4838b5e5c..484b50965d 100644
--- a/convert.h
+++ b/convert.h
@@ -179,6 +179,8 @@ struct stream_filter; /* opaque */
struct stream_filter *get_stream_filter(const struct index_state *istate,
const char *path,
const struct object_id *);
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid);
void free_stream_filter(struct stream_filter *);
int is_null_stream_filter(struct stream_filter *);
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 4/9] convert: add classification for conv_attrs struct
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (2 preceding siblings ...)
2020-12-16 14:50 ` [PATCH v5 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 5/9] entry: extract a header file for entry.c functions Matheus Tavares
` (7 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
This will be used in a later commit when deciding whether to add a file
to the parallel or delayed queue during checkout. For now, we can also
use it in get_stream_filter_ca() to simplify the function (as the
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 26 +++++++++++++++++++-------
convert.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+), 7 deletions(-)
diff --git a/convert.c b/convert.c
index 9af6aafc5a..b9d25d9a47 100644
--- a/convert.c
+++ b/convert.c
@@ -1954,13 +1954,7 @@ struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
{
struct stream_filter *filter = NULL;
- if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
- return NULL;
-
- if (ca->working_tree_encoding)
- return NULL;
-
- if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ if (classify_conv_attrs(ca) != CA_CLASS_STREAMABLE)
return NULL;
if (ca->ident)
@@ -2016,3 +2010,21 @@ void clone_checkout_metadata(struct checkout_metadata *dst,
if (blob)
oidcpy(&dst->blob, blob);
}
+
+enum conv_attrs_classification classify_conv_attrs(const struct conv_attrs *ca)
+{
+ if (ca->drv) {
+ if (ca->drv->process)
+ return CA_CLASS_INCORE_PROCESS;
+ if (ca->drv->smudge || ca->drv->clean)
+ return CA_CLASS_INCORE_FILTER;
+ }
+
+ if (ca->working_tree_encoding)
+ return CA_CLASS_INCORE;
+
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ return CA_CLASS_INCORE;
+
+ return CA_CLASS_STREAMABLE;
+}
diff --git a/convert.h b/convert.h
index 484b50965d..43e567a59b 100644
--- a/convert.h
+++ b/convert.h
@@ -200,4 +200,37 @@ int stream_filter(struct stream_filter *,
const char *input, size_t *isize_p,
char *output, size_t *osize_p);
+enum conv_attrs_classification {
+ /*
+ * The blob must be loaded into a buffer before it can be
+ * smudged. All smudging is done in-proc.
+ */
+ CA_CLASS_INCORE,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * single-file driver filter, such as rot13.
+ */
+ CA_CLASS_INCORE_FILTER,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * long-running driver process, such as LFS. This might or
+ * might not use delayed operations. (The important thing is
+ * that there is a single subordinate long-running process
+ * handling all associated blobs and in case of delayed
+ * operations, may hold per-blob state.)
+ */
+ CA_CLASS_INCORE_PROCESS,
+
+ /*
+ * The blob can be streamed and smudged without needing to
+ * completely read it into a buffer.
+ */
+ CA_CLASS_STREAMABLE,
+};
+
+enum conv_attrs_classification classify_conv_attrs(
+ const struct conv_attrs *ca);
+
#endif /* CONVERT_H */
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 5/9] entry: extract a header file for entry.c functions
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (3 preceding siblings ...)
2020-12-16 14:50 ` [PATCH v5 4/9] convert: add classification for conv_attrs struct Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
` (6 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren
The declarations of entry.c's public functions and structures currently
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
new entry.h header. While at it let's also move a comment related to
checkout_entry() from entry.c to entry.h as it's more useful to describe
the function there.
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
apply.c | 1 +
builtin/checkout-index.c | 1 +
builtin/checkout.c | 1 +
builtin/difftool.c | 1 +
cache.h | 24 -----------------------
entry.c | 9 +--------
entry.h | 42 ++++++++++++++++++++++++++++++++++++++++
unpack-trees.c | 1 +
8 files changed, 48 insertions(+), 32 deletions(-)
create mode 100644 entry.h
diff --git a/apply.c b/apply.c
index 4a4e9a0158..fda7b4f770 100644
--- a/apply.c
+++ b/apply.c
@@ -21,6 +21,7 @@
#include "quote.h"
#include "rerere.h"
#include "apply.h"
+#include "entry.h"
struct gitdiff_data {
struct strbuf *root;
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 4bbfc92dce..9276ed0258 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -11,6 +11,7 @@
#include "quote.h"
#include "cache-tree.h"
#include "parse-options.h"
+#include "entry.h"
#define CHECKOUT_ALL 4
static int nul_term_line;
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 9b82119129..f92f29bad3 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -26,6 +26,7 @@
#include "unpack-trees.h"
#include "wt-status.h"
#include "xdiff-interface.h"
+#include "entry.h"
static const char * const checkout_usage[] = {
N_("git checkout [<options>] <branch>"),
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 6e18e623fd..ef25729d49 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -23,6 +23,7 @@
#include "lockfile.h"
#include "object-store.h"
#include "dir.h"
+#include "entry.h"
static int trust_exit_code;
diff --git a/cache.h b/cache.h
index 8d279bc110..a3a70cd8a4 100644
--- a/cache.h
+++ b/cache.h
@@ -1711,30 +1711,6 @@ const char *show_ident_date(const struct ident_split *id,
*/
int ident_cmp(const struct ident_split *, const struct ident_split *);
-struct checkout {
- struct index_state *istate;
- const char *base_dir;
- int base_dir_len;
- struct delayed_checkout *delayed_checkout;
- struct checkout_metadata meta;
- unsigned force:1,
- quiet:1,
- not_new:1,
- clone:1,
- refresh_cache:1;
-};
-#define CHECKOUT_INIT { NULL, "" }
-
-#define TEMPORARY_FILENAME_LENGTH 25
-int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath, int *nr_checkouts);
-void enable_delayed_checkout(struct checkout *state);
-int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
-/*
- * Unlink the last component and schedule the leading directories for
- * removal, such that empty directories get removed.
- */
-void unlink_entry(const struct cache_entry *ce);
-
struct cache_def {
struct strbuf path;
int flags;
diff --git a/entry.c b/entry.c
index a0532f1f00..b0b8099699 100644
--- a/entry.c
+++ b/entry.c
@@ -6,6 +6,7 @@
#include "submodule.h"
#include "progress.h"
#include "fsmonitor.h"
+#include "entry.h"
static void create_directories(const char *path, int path_len,
const struct checkout *state)
@@ -429,14 +430,6 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-/*
- * Write the contents from ce out to the working tree.
- *
- * When topath[] is not NULL, instead of writing to the working tree
- * file named by ce, a temporary file is created by this function and
- * its name is returned in topath[], which must be able to hold at
- * least TEMPORARY_FILENAME_LENGTH bytes long.
- */
int checkout_entry(struct cache_entry *ce, const struct checkout *state,
char *topath, int *nr_checkouts)
{
diff --git a/entry.h b/entry.h
new file mode 100644
index 0000000000..acbbb90220
--- /dev/null
+++ b/entry.h
@@ -0,0 +1,42 @@
+#ifndef ENTRY_H
+#define ENTRY_H
+
+#include "cache.h"
+#include "convert.h"
+
+struct checkout {
+ struct index_state *istate;
+ const char *base_dir;
+ int base_dir_len;
+ struct delayed_checkout *delayed_checkout;
+ struct checkout_metadata meta;
+ unsigned force:1,
+ quiet:1,
+ not_new:1,
+ clone:1,
+ refresh_cache:1;
+};
+#define CHECKOUT_INIT { NULL, "" }
+
+#define TEMPORARY_FILENAME_LENGTH 25
+/*
+ * Write the contents from ce out to the working tree.
+ *
+ * When topath[] is not NULL, instead of writing to the working tree
+ * file named by ce, a temporary file is created by this function and
+ * its name is returned in topath[], which must be able to hold at
+ * least TEMPORARY_FILENAME_LENGTH bytes long.
+ */
+int checkout_entry(struct cache_entry *ce, const struct checkout *state,
+ char *topath, int *nr_checkouts);
+
+void enable_delayed_checkout(struct checkout *state);
+int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
+
+/*
+ * Unlink the last component and schedule the leading directories for
+ * removal, such that empty directories get removed.
+ */
+void unlink_entry(const struct cache_entry *ce);
+
+#endif /* ENTRY_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 323280dd48..a511fadd89 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -16,6 +16,7 @@
#include "fsmonitor.h"
#include "object-store.h"
#include "promisor-remote.h"
+#include "entry.h"
/*
* Error messages expected by scripts out of plumbing commands such as
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 6/9] entry: make fstat_output() and read_blob_entry() public
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (4 preceding siblings ...)
2020-12-16 14:50 ` [PATCH v5 5/9] entry: extract a header file for entry.c functions Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
` (5 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren
These two functions will be used by the parallel checkout code, so let's
make them public. Note: fstat_output() is renamed to
fstat_checkout_output(), now that it has become public, seeking to avoid
future name collisions.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 8 ++++----
entry.h | 3 +++
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/entry.c b/entry.c
index b0b8099699..b36071a610 100644
--- a/entry.c
+++ b/entry.c
@@ -84,7 +84,7 @@ static int create_file(const char *path, unsigned int mode)
return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
}
-static void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
{
enum object_type type;
void *blob_data = read_object_file(&ce->oid, &type, size);
@@ -109,7 +109,7 @@ static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempf
}
}
-static int fstat_output(int fd, const struct checkout *state, struct stat *st)
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st)
{
/* use fstat() only when path == ce->name */
if (fstat_is_reliable() &&
@@ -132,7 +132,7 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path,
return -1;
result |= stream_blob_to_fd(fd, &ce->oid, filter, 1);
- *fstat_done = fstat_output(fd, state, statbuf);
+ *fstat_done = fstat_checkout_output(fd, state, statbuf);
result |= close(fd);
if (result)
@@ -346,7 +346,7 @@ static int write_entry(struct cache_entry *ce,
wrote = write_in_full(fd, new_blob, size);
if (!to_tempfile)
- fstat_done = fstat_output(fd, state, &st);
+ fstat_done = fstat_checkout_output(fd, state, &st);
close(fd);
free(new_blob);
if (wrote < 0)
diff --git a/entry.h b/entry.h
index acbbb90220..60df93ca78 100644
--- a/entry.h
+++ b/entry.h
@@ -39,4 +39,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
*/
void unlink_entry(const struct cache_entry *ce);
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+
#endif /* ENTRY_H */
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 7/9] entry: extract update_ce_after_write() from write_entry()
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (5 preceding siblings ...)
2020-12-16 14:50 ` [PATCH v5 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
` (4 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren
The code that updates the in-memory index information after an entry is
written currently resides in write_entry(). Extract it to a public
function so that it can be called by the parallel checkout functions,
outside entry.c, in a later patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 25 ++++++++++++++++---------
entry.h | 2 ++
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/entry.c b/entry.c
index b36071a610..1d2df188e5 100644
--- a/entry.c
+++ b/entry.c
@@ -251,6 +251,18 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
return errs;
}
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st)
+{
+ if (state->refresh_cache) {
+ assert(state->istate);
+ fill_stat_cache_info(state->istate, ce, st);
+ ce->ce_flags |= CE_UPDATE_IN_BASE;
+ mark_fsmonitor_invalid(state->istate, ce);
+ state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ }
+}
+
static int write_entry(struct cache_entry *ce,
char *path, const struct checkout *state, int to_tempfile)
{
@@ -371,15 +383,10 @@ static int write_entry(struct cache_entry *ce,
finish:
if (state->refresh_cache) {
- assert(state->istate);
- if (!fstat_done)
- if (lstat(ce->name, &st) < 0)
- return error_errno("unable to stat just-written file %s",
- ce->name);
- fill_stat_cache_info(state->istate, ce, &st);
- ce->ce_flags |= CE_UPDATE_IN_BASE;
- mark_fsmonitor_invalid(state->istate, ce);
- state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ if (!fstat_done && lstat(ce->name, &st) < 0)
+ return error_errno("unable to stat just-written file %s",
+ ce->name);
+ update_ce_after_write(state, ce , &st);
}
delayed:
return 0;
diff --git a/entry.h b/entry.h
index 60df93ca78..ea7290bcd5 100644
--- a/entry.h
+++ b/entry.h
@@ -41,5 +41,7 @@ void unlink_entry(const struct cache_entry *ce);
void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st);
#endif /* ENTRY_H */
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 8/9] entry: move conv_attrs lookup up to checkout_entry()
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (6 preceding siblings ...)
2020-12-16 14:50 ` [PATCH v5 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
` (3 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren
In a following patch, checkout_entry() will use conv_attrs to decide
whether an entry should be enqueued for parallel checkout or not. But
the attributes lookup only happens lower in this call stack. To avoid
the unnecessary work of loading the attributes twice, let's move it up
to checkout_entry(), and pass the loaded struct down to write_entry().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/entry.c b/entry.c
index 1d2df188e5..62ddd348bc 100644
--- a/entry.c
+++ b/entry.c
@@ -263,8 +263,9 @@ void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
}
}
-static int write_entry(struct cache_entry *ce,
- char *path, const struct checkout *state, int to_tempfile)
+/* Note: ca is used (and required) iff the entry refers to a regular file. */
+static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca,
+ const struct checkout *state, int to_tempfile)
{
unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
struct delayed_checkout *dco = state->delayed_checkout;
@@ -281,8 +282,7 @@ static int write_entry(struct cache_entry *ce,
clone_checkout_metadata(&meta, &state->meta, &ce->oid);
if (ce_mode_s_ifmt == S_IFREG) {
- struct stream_filter *filter = get_stream_filter(state->istate, ce->name,
- &ce->oid);
+ struct stream_filter *filter = get_stream_filter_ca(ca, &ce->oid);
if (filter &&
!streaming_write_entry(ce, path, filter,
state, to_tempfile,
@@ -329,14 +329,17 @@ static int write_entry(struct cache_entry *ce,
* Convert from git internal format to working tree format
*/
if (dco && dco->state != CE_NO_DELAY) {
- ret = async_convert_to_working_tree(state->istate, ce->name, new_blob,
- size, &buf, &meta, dco);
+ ret = async_convert_to_working_tree_ca(ca, ce->name,
+ new_blob, size,
+ &buf, &meta, dco);
if (ret && string_list_has_string(&dco->paths, ce->name)) {
free(new_blob);
goto delayed;
}
- } else
- ret = convert_to_working_tree(state->istate, ce->name, new_blob, size, &buf, &meta);
+ } else {
+ ret = convert_to_working_tree_ca(ca, ce->name, new_blob,
+ size, &buf, &meta);
+ }
if (ret) {
free(new_blob);
@@ -442,6 +445,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
+ struct conv_attrs ca_buf, *ca = NULL;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -454,8 +458,13 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
}
- if (topath)
- return write_entry(ce, topath, state, 1);
+ if (topath) {
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
+ }
+ return write_entry(ce, topath, ca, state, 1);
+ }
strbuf_reset(&path);
strbuf_add(&path, state->base_dir, state->base_dir_len);
@@ -517,9 +526,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
create_directories(path.buf, path.len, state);
+
if (nr_checkouts)
(*nr_checkouts)++;
- return write_entry(ce, path.buf, state, 0);
+
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
+ }
+
+ return write_entry(ce, path.buf, ca, state, 0);
}
void unlink_entry(const struct cache_entry *ce)
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v5 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (7 preceding siblings ...)
2020-12-16 14:50 ` [PATCH v5 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
@ 2020-12-16 14:50 ` Matheus Tavares
2020-12-16 15:27 ` [PATCH v5 0/9] Parallel Checkout (part I) Christian Couder
` (2 subsequent siblings)
11 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2020-12-16 14:50 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, git, chriscool, peff, newren,
jrnieder, martin.agren
The parallel checkout machinery will call checkout_entry() for entries
that could not be written in parallel due to path collisions. At this
point, we will already be holding the conversion attributes for each
entry, and it would be wasteful to let checkout_entry() load these
again. Instead, let's add the checkout_entry_ca() variant, which
optionally takes a preloaded conv_attrs struct.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 11 ++++++-----
entry.h | 16 ++++++++++++++--
2 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/entry.c b/entry.c
index 62ddd348bc..9d79a5671f 100644
--- a/entry.c
+++ b/entry.c
@@ -440,12 +440,13 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts)
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts)
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
- struct conv_attrs ca_buf, *ca = NULL;
+ struct conv_attrs ca_buf;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -459,7 +460,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
}
if (topath) {
- if (S_ISREG(ce->ce_mode)) {
+ if (S_ISREG(ce->ce_mode) && !ca) {
convert_attrs(state->istate, &ca_buf, ce->name);
ca = &ca_buf;
}
@@ -530,7 +531,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
if (nr_checkouts)
(*nr_checkouts)++;
- if (S_ISREG(ce->ce_mode)) {
+ if (S_ISREG(ce->ce_mode) && !ca) {
convert_attrs(state->istate, &ca_buf, ce->name);
ca = &ca_buf;
}
diff --git a/entry.h b/entry.h
index ea7290bcd5..b8c0e170dc 100644
--- a/entry.h
+++ b/entry.h
@@ -26,9 +26,21 @@ struct checkout {
* file named by ce, a temporary file is created by this function and
* its name is returned in topath[], which must be able to hold at
* least TEMPORARY_FILENAME_LENGTH bytes long.
+ *
+ * With checkout_entry_ca(), callers can optionally pass a preloaded
+ * conv_attrs struct (to avoid reloading it), when ce refers to a
+ * regular file. If ca is NULL, the attributes will be loaded
+ * internally when (and if) needed.
*/
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts);
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts);
+static inline int checkout_entry(struct cache_entry *ce,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts)
+{
+ return checkout_entry_ca(ce, NULL, state, topath, nr_checkouts);
+}
void enable_delayed_checkout(struct checkout *state);
int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
--
2.29.2
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v5 0/9] Parallel Checkout (part I)
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (8 preceding siblings ...)
2020-12-16 14:50 ` [PATCH v5 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
@ 2020-12-16 15:27 ` Christian Couder
2020-12-17 1:11 ` Junio C Hamano
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
11 siblings, 0 replies; 154+ messages in thread
From: Christian Couder @ 2020-12-16 15:27 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, Junio C Hamano, Jeff Hostetler, Christian Couder, Jeff King,
Elijah Newren, Jonathan Nieder, Martin Ågren
On Wed, Dec 16, 2020 at 3:50 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> The previous rounds got many great suggestions about patches 1 to 10,
> but not as many comments on the latter patches. Christian commented that
> patches 10 and 11 are too long/complex, making the overall series harder
> to review. So he suggested that I eject patches 10 to 19, and send them
> later in a separated part. This will hopefully make the series easier to
> review and move forward. (I also hope to include a desing doc in part 2
> to make those two bigger patches more digestible.)
Thanks, and yeah, sorry I suggested that privately, but should have
done it on the mailing list.
I actually think that patches 10 and 11 (which each one contains 400+
new lines) in the previous series should be mostly alone in a part 2,
with perhaps a part 3 that would have most of the rest, so
improvements and tests.
It might also be possible to split at least a bit a few things in
patches 10 and 11. For example I think it's ok to add new
configuration in a separate patch even if it's not used yet. It can
just reserve the name. That could be in part 2 then.
> Changes since v4:
From a quick look at the range-diff, it looks good to me!
Thanks,
Christian.
^ permalink raw reply [flat|nested] 154+ messages in thread
* Re: [PATCH v5 0/9] Parallel Checkout (part I)
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (9 preceding siblings ...)
2020-12-16 15:27 ` [PATCH v5 0/9] Parallel Checkout (part I) Christian Couder
@ 2020-12-17 1:11 ` Junio C Hamano
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
11 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2020-12-17 1:11 UTC (permalink / raw)
To: Matheus Tavares
Cc: git, christian.couder, git, chriscool, peff, newren, jrnieder,
martin.agren
Matheus Tavares <matheus.bernardino@usp.br> writes:
> The previous rounds got many great suggestions about patches 1 to 10,
> but not as many comments on the latter patches. Christian commented that
> patches 10 and 11 are too long/complex, making the overall series harder
> to review. So he suggested that I eject patches 10 to 19, and send them
> later in a separated part. This will hopefully make the series easier to
> review and move forward. (I also hope to include a desing doc in part 2
> to make those two bigger patches more digestible.)
>
> So this part is now composed only of the 9 preparatory patches, which
> mainly focus on: (1) adding the 'struct conv_attrs' parameter to some
> convert.c and entry.c functions (to avoid re-loading the attributes
> during parallel checkout); and (2) making some functions public (for
> parallel-checkout.c's later use).
All of these patches look sensible to me.
Will replace. Thanks.
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v6 0/9] Parallel Checkout (part 1)
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
` (10 preceding siblings ...)
2020-12-17 1:11 ` Junio C Hamano
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
` (9 more replies)
11 siblings, 10 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git
Preparatory API changes for parallel checkout. This version was rebased
on top of 'master'. The only change required in this change of base was
the `entry.h` inclusion on `builtin/stash.c`, at the commit that creates
this header file.
Jeff Hostetler (4):
convert: make convert_attrs() and convert structs public
convert: add [async_]convert_to_working_tree_ca() variants
convert: add get_stream_filter_ca() variant
convert: add classification for conv_attrs struct
Matheus Tavares (5):
entry: extract a header file for entry.c functions
entry: make fstat_output() and read_blob_entry() public
entry: extract update_ce_after_write() from write_entry()
entry: move conv_attrs lookup up to checkout_entry()
entry: add checkout_entry_ca() taking preloaded conv_attrs
apply.c | 1 +
builtin/checkout-index.c | 1 +
builtin/checkout.c | 1 +
builtin/difftool.c | 1 +
builtin/stash.c | 1 +
cache.h | 24 -------
convert.c | 143 ++++++++++++++++++++-------------------
convert.h | 96 +++++++++++++++++++++++---
entry.c | 85 +++++++++++++----------
entry.h | 59 ++++++++++++++++
unpack-trees.c | 1 +
11 files changed, 276 insertions(+), 137 deletions(-)
create mode 100644 entry.h
Range-diff against v5:
1: 9909ccee14 = 1: 5e8a1b6a1c convert: make convert_attrs() and convert structs public
2: ec4e645aea = 2: 91d1a3063b convert: add [async_]convert_to_working_tree_ca() variants
3: 1834a54dfd = 3: ab1bf85b75 convert: add get_stream_filter_ca() variant
4: b26022af45 = 4: 01ac6176cc convert: add classification for conv_attrs struct
5: 837fccde5b ! 5: 4d85af1579 entry: extract a header file for entry.c functions
@@ builtin/difftool.c
static int trust_exit_code;
+ ## builtin/stash.c ##
+@@
+ #include "log-tree.h"
+ #include "diffcore.h"
+ #include "exec-cmd.h"
++#include "entry.h"
+
+ #define INCLUDE_ALL_FILES 2
+
+
## cache.h ##
@@ cache.h: const char *show_ident_date(const struct ident_split *id,
*/
6: 231e81fb82 = 6: af8b1691cc entry: make fstat_output() and read_blob_entry() public
7: 2fd8d35242 = 7: b28e518c0d entry: extract update_ce_after_write() from write_entry()
8: 7531ad195b = 8: 87b9d4590a entry: move conv_attrs lookup up to checkout_entry()
9: bf6b7259cb = 9: aa36bfee87 entry: add checkout_entry_ca() taking preloaded conv_attrs
--
2.30.1
^ permalink raw reply [flat|nested] 154+ messages in thread
* [PATCH v6 1/9] convert: make convert_attrs() and convert structs public
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
` (8 subsequent siblings)
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
parallel checkout implementation. Note that crlf_action is renamed to
convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 35 ++++++++---------------------------
convert.h | 24 ++++++++++++++++++++++++
2 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/convert.c b/convert.c
index 2d3a5a713c..d0a1a4ad9b 100644
--- a/convert.c
+++ b/convert.c
@@ -24,17 +24,6 @@
#define CONVERT_STAT_BITS_TXT_CRLF 0x2
#define CONVERT_STAT_BITS_BIN 0x4
-enum crlf_action {
- CRLF_UNDEFINED,
- CRLF_BINARY,
- CRLF_TEXT,
- CRLF_TEXT_INPUT,
- CRLF_TEXT_CRLF,
- CRLF_AUTO,
- CRLF_AUTO_INPUT,
- CRLF_AUTO_CRLF
-};
-
struct text_stat {
/* NUL, CR, LF and CRLF counts */
unsigned nul, lonecr, lonelf, crlf;
@@ -172,7 +161,7 @@ static int text_eol_is_crlf(void)
return 0;
}
-static enum eol output_eol(enum crlf_action crlf_action)
+static enum eol output_eol(enum convert_crlf_action crlf_action)
{
switch (crlf_action) {
case CRLF_BINARY:
@@ -246,7 +235,7 @@ static int has_crlf_in_index(const struct index_state *istate, const char *path)
}
static int will_convert_lf_to_crlf(struct text_stat *stats,
- enum crlf_action crlf_action)
+ enum convert_crlf_action crlf_action)
{
if (output_eol(crlf_action) != EOL_CRLF)
return 0;
@@ -499,7 +488,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
static int crlf_to_git(const struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *buf,
- enum crlf_action crlf_action, int conv_flags)
+ enum convert_crlf_action crlf_action, int conv_flags)
{
struct text_stat stats;
char *dst;
@@ -585,8 +574,8 @@ static int crlf_to_git(const struct index_state *istate,
return 1;
}
-static int crlf_to_worktree(const char *src, size_t len,
- struct strbuf *buf, enum crlf_action crlf_action)
+static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf,
+ enum convert_crlf_action crlf_action)
{
char *to_free = NULL;
struct text_stat stats;
@@ -1247,7 +1236,7 @@ static const char *git_path_check_encoding(struct attr_check_item *check)
return value;
}
-static enum crlf_action git_path_check_crlf(struct attr_check_item *check)
+static enum convert_crlf_action git_path_check_crlf(struct attr_check_item *check)
{
const char *value = check->value;
@@ -1297,18 +1286,10 @@ static int git_path_check_ident(struct attr_check_item *check)
return !!ATTR_TRUE(value);
}
-struct conv_attrs {
- struct convert_driver *drv;
- enum crlf_action attr_action; /* What attr says */
- enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
- int ident;
- const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
-};
-
static struct attr_check *check;
-static void convert_attrs(const struct index_state *istate,
- struct conv_attrs *ca, const char *path)
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path)
{
struct attr_check_item *ccheck = NULL;
diff --git a/convert.h b/convert.h
index e29d1026a6..5678e99922 100644
--- a/convert.h
+++ b/convert.h
@@ -63,6 +63,30 @@ struct checkout_metadata {
struct object_id blob;
};
+enum convert_crlf_action {
+ CRLF_UNDEFINED,
+ CRLF_BINARY,
+ CRLF_TEXT,
+ CRLF_TEXT_INPUT,
+ CRLF_TEXT_CRLF,
+ CRLF_AUTO,
+ CRLF_AUTO_INPUT,
+ CRLF_AUTO_CRLF
+};
+
+struct convert_driver;
+
+struct conv_attrs {
+ struct convert_driver *drv;
+ enum convert_crlf_action attr_action; /* What attr says */
+ enum convert_crlf_action crlf_action; /* When no attr is set, use core.autocrlf */
+ int ident;
+ const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
+};
+
+void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path);
+
extern enum eol core_eol;
extern char *check_roundtrip_encoding;
const char *get_cached_convert_stats_ascii(const struct index_state *istate,
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 2/9] convert: add [async_]convert_to_working_tree_ca() variants
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
` (7 subsequent siblings)
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
precomputed 'struct conv_attrs', not relying, thus, on an index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
- We will already load the conversion attributes in checkout_entry(),
before conversion, to decide whether a path is eligible for parallel
checkout. Therefore, it would be wasteful to load them again later,
for the actual conversion.
- The parallel workers will be responsible for reading, converting and
writing blobs to the working tree. They won't have access to the main
process' index state, so they cannot load the attributes. Instead,
they will receive the preloaded ones and call the _ca() variant of
the conversion functions. Furthermore, the attributes machinery is
optimized to handle paths in sequential order, so it's better to leave
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 60 +++++++++++++++++++++++++++++--------------------------
convert.h | 37 +++++++++++++++++++++++++---------
2 files changed, 60 insertions(+), 37 deletions(-)
diff --git a/convert.c b/convert.c
index d0a1a4ad9b..f2be014af8 100644
--- a/convert.c
+++ b/convert.c
@@ -1446,19 +1446,16 @@ void convert_to_git_filter_fd(const struct index_state *istate,
ident_to_git(dst->buf, dst->len, dst, ca.ident);
}
-static int convert_to_working_tree_internal(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- int normalizing,
- const struct checkout_metadata *meta,
- struct delayed_checkout *dco)
+static int convert_to_working_tree_ca_internal(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ int normalizing,
+ const struct checkout_metadata *meta,
+ struct delayed_checkout *dco)
{
int ret = 0, ret_filter = 0;
- struct conv_attrs ca;
-
- convert_attrs(istate, &ca, path);
- ret |= ident_to_worktree(src, len, dst, ca.ident);
+ ret |= ident_to_worktree(src, len, dst, ca->ident);
if (ret) {
src = dst->buf;
len = dst->len;
@@ -1468,49 +1465,56 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
* is a smudge or process filter (even if the process filter doesn't
* support smudge). The filters might expect CRLFs.
*/
- if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
- ret |= crlf_to_worktree(src, len, dst, ca.crlf_action);
+ if ((ca->drv && (ca->drv->smudge || ca->drv->process)) || !normalizing) {
+ ret |= crlf_to_worktree(src, len, dst, ca->crlf_action);
if (ret) {
src = dst->buf;
len = dst->len;
}
}
- ret |= encode_to_worktree(path, src, len, dst, ca.working_tree_encoding);
+ ret |= encode_to_worktree(path, src, len, dst, ca->working_tree_encoding);
if (ret) {
src = dst->buf;
len = dst->len;
}
ret_filter = apply_filter(
- path, src, len, -1, dst, ca.drv, CAP_SMUDGE, meta, dco);
- if (!ret_filter && ca.drv && ca.drv->required)
- die(_("%s: smudge filter %s failed"), path, ca.drv->name);
+ path, src, len, -1, dst, ca->drv, CAP_SMUDGE, meta, dco);
+ if (!ret_filter && ca->drv && ca->drv->required)
+ die(_("%s: smudge filter %s failed"), path, ca->drv->name);
return ret | ret_filter;
}
-int async_convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta,
- void *dco)
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
+ return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
+ meta, dco);
}
-int convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta)
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
{
- return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
+ return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
+ meta, NULL);
}
int renormalize_buffer(const struct index_state *istate, const char *path,
const char *src, size_t len, struct strbuf *dst)
{
- int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
+ struct conv_attrs ca;
+ int ret;
+
+ convert_attrs(istate, &ca, path);
+ ret = convert_to_working_tree_ca_internal(&ca, path, src, len, dst, 1,
+ NULL, NULL);
if (ret) {
src = dst->buf;
len = dst->len;
diff --git a/convert.h b/convert.h
index 5678e99922..a4838b5e5c 100644
--- a/convert.h
+++ b/convert.h
@@ -99,15 +99,34 @@ const char *get_convert_attr_ascii(const struct index_state *istate,
int convert_to_git(const struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
- const char *path, const char *src,
- size_t len, struct strbuf *dst,
- const struct checkout_metadata *meta,
- void *dco);
+int convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta);
+int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco);
+static inline int convert_to_working_tree(const struct index_state *istate,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return convert_to_working_tree_ca(&ca, path, src, len, dst, meta);
+}
+static inline int async_convert_to_working_tree(const struct index_state *istate,
+ const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ const struct checkout_metadata *meta,
+ void *dco)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return async_convert_to_working_tree_ca(&ca, path, src, len, dst, meta, dco);
+}
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
int renormalize_buffer(const struct index_state *istate,
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 3/9] convert: add get_stream_filter_ca() variant
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 4/9] convert: add classification for conv_attrs struct Matheus Tavares
` (6 subsequent siblings)
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Like the previous patch, we will also need to call get_stream_filter()
with a precomputed `struct conv_attrs`, when we add support for parallel
checkout workers. So add the _ca() variant which takes the conversion
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 28 +++++++++++++++++-----------
convert.h | 2 ++
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/convert.c b/convert.c
index f2be014af8..3f39c62cac 100644
--- a/convert.c
+++ b/convert.c
@@ -1941,34 +1941,31 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
}
/*
- * Return an appropriately constructed filter for the path, or NULL if
+ * Return an appropriately constructed filter for the given ca, or NULL if
* the contents cannot be filtered without reading the whole thing
* in-core.
*
* Note that you would be crazy to set CRLF, smudge/clean or ident to a
* large binary blob you would want us not to slurp into the memory!
*/
-struct stream_filter *get_stream_filter(const struct index_state *istate,
- const char *path,
- const struct object_id *oid)
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid)
{
- struct conv_attrs ca;
struct stream_filter *filter = NULL;
- convert_attrs(istate, &ca, path);
- if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
+ if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
return NULL;
- if (ca.working_tree_encoding)
+ if (ca->working_tree_encoding)
return NULL;
- if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
return NULL;
- if (ca.ident)
+ if (ca->ident)
filter = ident_filter(oid);
- if (output_eol(ca.crlf_action) == EOL_CRLF)
+ if (output_eol(ca->crlf_action) == EOL_CRLF)
filter = cascade_filter(filter, lf_to_crlf_filter());
else
filter = cascade_filter(filter, &null_filter_singleton);
@@ -1976,6 +1973,15 @@ struct stream_filter *get_stream_filter(const struct index_state *istate,
return filter;
}
+struct stream_filter *get_stream_filter(const struct index_state *istate,
+ const char *path,
+ const struct object_id *oid)
+{
+ struct conv_attrs ca;
+ convert_attrs(istate, &ca, path);
+ return get_stream_filter_ca(&ca, oid);
+}
+
void free_stream_filter(struct stream_filter *filter)
{
filter->vtbl->free(filter);
diff --git a/convert.h b/convert.h
index a4838b5e5c..484b50965d 100644
--- a/convert.h
+++ b/convert.h
@@ -179,6 +179,8 @@ struct stream_filter; /* opaque */
struct stream_filter *get_stream_filter(const struct index_state *istate,
const char *path,
const struct object_id *);
+struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
+ const struct object_id *oid);
void free_stream_filter(struct stream_filter *);
int is_null_stream_filter(struct stream_filter *);
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 4/9] convert: add classification for conv_attrs struct
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
` (2 preceding siblings ...)
2021-03-23 14:19 ` [PATCH v6 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 5/9] entry: extract a header file for entry.c functions Matheus Tavares
` (5 subsequent siblings)
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
This will be used in a later commit when deciding whether to add a file
to the parallel or delayed queue during checkout. For now, we can also
use it in get_stream_filter_ca() to simplify the function (as the
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
convert.c | 26 +++++++++++++++++++-------
convert.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+), 7 deletions(-)
diff --git a/convert.c b/convert.c
index 3f39c62cac..3298e4acff 100644
--- a/convert.c
+++ b/convert.c
@@ -1953,13 +1953,7 @@ struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
{
struct stream_filter *filter = NULL;
- if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean))
- return NULL;
-
- if (ca->working_tree_encoding)
- return NULL;
-
- if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ if (classify_conv_attrs(ca) != CA_CLASS_STREAMABLE)
return NULL;
if (ca->ident)
@@ -2015,3 +2009,21 @@ void clone_checkout_metadata(struct checkout_metadata *dst,
if (blob)
oidcpy(&dst->blob, blob);
}
+
+enum conv_attrs_classification classify_conv_attrs(const struct conv_attrs *ca)
+{
+ if (ca->drv) {
+ if (ca->drv->process)
+ return CA_CLASS_INCORE_PROCESS;
+ if (ca->drv->smudge || ca->drv->clean)
+ return CA_CLASS_INCORE_FILTER;
+ }
+
+ if (ca->working_tree_encoding)
+ return CA_CLASS_INCORE;
+
+ if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF)
+ return CA_CLASS_INCORE;
+
+ return CA_CLASS_STREAMABLE;
+}
diff --git a/convert.h b/convert.h
index 484b50965d..43e567a59b 100644
--- a/convert.h
+++ b/convert.h
@@ -200,4 +200,37 @@ int stream_filter(struct stream_filter *,
const char *input, size_t *isize_p,
char *output, size_t *osize_p);
+enum conv_attrs_classification {
+ /*
+ * The blob must be loaded into a buffer before it can be
+ * smudged. All smudging is done in-proc.
+ */
+ CA_CLASS_INCORE,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * single-file driver filter, such as rot13.
+ */
+ CA_CLASS_INCORE_FILTER,
+
+ /*
+ * The blob must be loaded into a buffer, but uses a
+ * long-running driver process, such as LFS. This might or
+ * might not use delayed operations. (The important thing is
+ * that there is a single subordinate long-running process
+ * handling all associated blobs and in case of delayed
+ * operations, may hold per-blob state.)
+ */
+ CA_CLASS_INCORE_PROCESS,
+
+ /*
+ * The blob can be streamed and smudged without needing to
+ * completely read it into a buffer.
+ */
+ CA_CLASS_STREAMABLE,
+};
+
+enum conv_attrs_classification classify_conv_attrs(
+ const struct conv_attrs *ca);
+
#endif /* CONVERT_H */
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 5/9] entry: extract a header file for entry.c functions
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
` (3 preceding siblings ...)
2021-03-23 14:19 ` [PATCH v6 4/9] convert: add classification for conv_attrs struct Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
` (4 subsequent siblings)
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git
The declarations of entry.c's public functions and structures currently
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
new entry.h header. While at it let's also move a comment related to
checkout_entry() from entry.c to entry.h as it's more useful to describe
the function there.
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
apply.c | 1 +
builtin/checkout-index.c | 1 +
builtin/checkout.c | 1 +
builtin/difftool.c | 1 +
builtin/stash.c | 1 +
cache.h | 24 -----------------------
entry.c | 9 +--------
entry.h | 42 ++++++++++++++++++++++++++++++++++++++++
unpack-trees.c | 1 +
9 files changed, 49 insertions(+), 32 deletions(-)
create mode 100644 entry.h
diff --git a/apply.c b/apply.c
index 6695a931e9..466f880d73 100644
--- a/apply.c
+++ b/apply.c
@@ -21,6 +21,7 @@
#include "quote.h"
#include "rerere.h"
#include "apply.h"
+#include "entry.h"
struct gitdiff_data {
struct strbuf *root;
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 023e49e271..c0bf4ac1b2 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -11,6 +11,7 @@
#include "quote.h"
#include "cache-tree.h"
#include "parse-options.h"
+#include "entry.h"
#define CHECKOUT_ALL 4
static int nul_term_line;
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2d6550bc3c..06c90d76e8 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -26,6 +26,7 @@
#include "unpack-trees.h"
#include "wt-status.h"
#include "xdiff-interface.h"
+#include "entry.h"
static const char * const checkout_usage[] = {
N_("git checkout [<options>] <branch>"),
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 6e18e623fd..ef25729d49 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -23,6 +23,7 @@
#include "lockfile.h"
#include "object-store.h"
#include "dir.h"
+#include "entry.h"
static int trust_exit_code;
diff --git a/builtin/stash.c b/builtin/stash.c
index 3477e940e3..0cdcc75618 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -15,6 +15,7 @@
#include "log-tree.h"
#include "diffcore.h"
#include "exec-cmd.h"
+#include "entry.h"
#define INCLUDE_ALL_FILES 2
diff --git a/cache.h b/cache.h
index 6fda8091f1..5d45d145fa 100644
--- a/cache.h
+++ b/cache.h
@@ -1621,30 +1621,6 @@ const char *show_ident_date(const struct ident_split *id,
*/
int ident_cmp(const struct ident_split *, const struct ident_split *);
-struct checkout {
- struct index_state *istate;
- const char *base_dir;
- int base_dir_len;
- struct delayed_checkout *delayed_checkout;
- struct checkout_metadata meta;
- unsigned force:1,
- quiet:1,
- not_new:1,
- clone:1,
- refresh_cache:1;
-};
-#define CHECKOUT_INIT { NULL, "" }
-
-#define TEMPORARY_FILENAME_LENGTH 25
-int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath, int *nr_checkouts);
-void enable_delayed_checkout(struct checkout *state);
-int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
-/*
- * Unlink the last component and schedule the leading directories for
- * removal, such that empty directories get removed.
- */
-void unlink_entry(const struct cache_entry *ce);
-
struct cache_def {
struct strbuf path;
int flags;
diff --git a/entry.c b/entry.c
index 7b9f43716f..c3e511bfb3 100644
--- a/entry.c
+++ b/entry.c
@@ -6,6 +6,7 @@
#include "submodule.h"
#include "progress.h"
#include "fsmonitor.h"
+#include "entry.h"
static void create_directories(const char *path, int path_len,
const struct checkout *state)
@@ -429,14 +430,6 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-/*
- * Write the contents from ce out to the working tree.
- *
- * When topath[] is not NULL, instead of writing to the working tree
- * file named by ce, a temporary file is created by this function and
- * its name is returned in topath[], which must be able to hold at
- * least TEMPORARY_FILENAME_LENGTH bytes long.
- */
int checkout_entry(struct cache_entry *ce, const struct checkout *state,
char *topath, int *nr_checkouts)
{
diff --git a/entry.h b/entry.h
new file mode 100644
index 0000000000..acbbb90220
--- /dev/null
+++ b/entry.h
@@ -0,0 +1,42 @@
+#ifndef ENTRY_H
+#define ENTRY_H
+
+#include "cache.h"
+#include "convert.h"
+
+struct checkout {
+ struct index_state *istate;
+ const char *base_dir;
+ int base_dir_len;
+ struct delayed_checkout *delayed_checkout;
+ struct checkout_metadata meta;
+ unsigned force:1,
+ quiet:1,
+ not_new:1,
+ clone:1,
+ refresh_cache:1;
+};
+#define CHECKOUT_INIT { NULL, "" }
+
+#define TEMPORARY_FILENAME_LENGTH 25
+/*
+ * Write the contents from ce out to the working tree.
+ *
+ * When topath[] is not NULL, instead of writing to the working tree
+ * file named by ce, a temporary file is created by this function and
+ * its name is returned in topath[], which must be able to hold at
+ * least TEMPORARY_FILENAME_LENGTH bytes long.
+ */
+int checkout_entry(struct cache_entry *ce, const struct checkout *state,
+ char *topath, int *nr_checkouts);
+
+void enable_delayed_checkout(struct checkout *state);
+int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
+
+/*
+ * Unlink the last component and schedule the leading directories for
+ * removal, such that empty directories get removed.
+ */
+void unlink_entry(const struct cache_entry *ce);
+
+#endif /* ENTRY_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 9af8e796b3..f6cc6a8117 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -16,6 +16,7 @@
#include "fsmonitor.h"
#include "object-store.h"
#include "promisor-remote.h"
+#include "entry.h"
/*
* Error messages expected by scripts out of plumbing commands such as
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 6/9] entry: make fstat_output() and read_blob_entry() public
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
` (4 preceding siblings ...)
2021-03-23 14:19 ` [PATCH v6 5/9] entry: extract a header file for entry.c functions Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
` (3 subsequent siblings)
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git
These two functions will be used by the parallel checkout code, so let's
make them public. Note: fstat_output() is renamed to
fstat_checkout_output(), now that it has become public, seeking to avoid
future name collisions.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 8 ++++----
entry.h | 3 +++
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/entry.c b/entry.c
index c3e511bfb3..1e2d9f7baa 100644
--- a/entry.c
+++ b/entry.c
@@ -84,7 +84,7 @@ static int create_file(const char *path, unsigned int mode)
return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
}
-static void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
{
enum object_type type;
void *blob_data = read_object_file(&ce->oid, &type, size);
@@ -109,7 +109,7 @@ static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempf
}
}
-static int fstat_output(int fd, const struct checkout *state, struct stat *st)
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st)
{
/* use fstat() only when path == ce->name */
if (fstat_is_reliable() &&
@@ -132,7 +132,7 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path,
return -1;
result |= stream_blob_to_fd(fd, &ce->oid, filter, 1);
- *fstat_done = fstat_output(fd, state, statbuf);
+ *fstat_done = fstat_checkout_output(fd, state, statbuf);
result |= close(fd);
if (result)
@@ -346,7 +346,7 @@ static int write_entry(struct cache_entry *ce,
wrote = write_in_full(fd, new_blob, size);
if (!to_tempfile)
- fstat_done = fstat_output(fd, state, &st);
+ fstat_done = fstat_checkout_output(fd, state, &st);
close(fd);
free(new_blob);
if (wrote < 0)
diff --git a/entry.h b/entry.h
index acbbb90220..60df93ca78 100644
--- a/entry.h
+++ b/entry.h
@@ -39,4 +39,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
*/
void unlink_entry(const struct cache_entry *ce);
+void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
+int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+
#endif /* ENTRY_H */
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 7/9] entry: extract update_ce_after_write() from write_entry()
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
` (5 preceding siblings ...)
2021-03-23 14:19 ` [PATCH v6 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
` (2 subsequent siblings)
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git
The code that updates the in-memory index information after an entry is
written currently resides in write_entry(). Extract it to a public
function so that it can be called by the parallel checkout functions,
outside entry.c, in a later patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 25 ++++++++++++++++---------
entry.h | 2 ++
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/entry.c b/entry.c
index 1e2d9f7baa..4cf0db352f 100644
--- a/entry.c
+++ b/entry.c
@@ -251,6 +251,18 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
return errs;
}
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st)
+{
+ if (state->refresh_cache) {
+ assert(state->istate);
+ fill_stat_cache_info(state->istate, ce, st);
+ ce->ce_flags |= CE_UPDATE_IN_BASE;
+ mark_fsmonitor_invalid(state->istate, ce);
+ state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ }
+}
+
static int write_entry(struct cache_entry *ce,
char *path, const struct checkout *state, int to_tempfile)
{
@@ -371,15 +383,10 @@ static int write_entry(struct cache_entry *ce,
finish:
if (state->refresh_cache) {
- assert(state->istate);
- if (!fstat_done)
- if (lstat(ce->name, &st) < 0)
- return error_errno("unable to stat just-written file %s",
- ce->name);
- fill_stat_cache_info(state->istate, ce, &st);
- ce->ce_flags |= CE_UPDATE_IN_BASE;
- mark_fsmonitor_invalid(state->istate, ce);
- state->istate->cache_changed |= CE_ENTRY_CHANGED;
+ if (!fstat_done && lstat(ce->name, &st) < 0)
+ return error_errno("unable to stat just-written file %s",
+ ce->name);
+ update_ce_after_write(state, ce , &st);
}
delayed:
return 0;
diff --git a/entry.h b/entry.h
index 60df93ca78..ea7290bcd5 100644
--- a/entry.h
+++ b/entry.h
@@ -41,5 +41,7 @@ void unlink_entry(const struct cache_entry *ce);
void *read_blob_entry(const struct cache_entry *ce, unsigned long *size);
int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st);
+void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
+ struct stat *st);
#endif /* ENTRY_H */
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 8/9] entry: move conv_attrs lookup up to checkout_entry()
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
` (6 preceding siblings ...)
2021-03-23 14:19 ` [PATCH v6 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
2021-03-23 17:34 ` [PATCH v6 0/9] Parallel Checkout (part 1) Junio C Hamano
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git
In a following patch, checkout_entry() will use conv_attrs to decide
whether an entry should be enqueued for parallel checkout or not. But
the attributes lookup only happens lower in this call stack. To avoid
the unnecessary work of loading the attributes twice, let's move it up
to checkout_entry(), and pass the loaded struct down to write_entry().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/entry.c b/entry.c
index 4cf0db352f..6339d54843 100644
--- a/entry.c
+++ b/entry.c
@@ -263,8 +263,9 @@ void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
}
}
-static int write_entry(struct cache_entry *ce,
- char *path, const struct checkout *state, int to_tempfile)
+/* Note: ca is used (and required) iff the entry refers to a regular file. */
+static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca,
+ const struct checkout *state, int to_tempfile)
{
unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
struct delayed_checkout *dco = state->delayed_checkout;
@@ -281,8 +282,7 @@ static int write_entry(struct cache_entry *ce,
clone_checkout_metadata(&meta, &state->meta, &ce->oid);
if (ce_mode_s_ifmt == S_IFREG) {
- struct stream_filter *filter = get_stream_filter(state->istate, ce->name,
- &ce->oid);
+ struct stream_filter *filter = get_stream_filter_ca(ca, &ce->oid);
if (filter &&
!streaming_write_entry(ce, path, filter,
state, to_tempfile,
@@ -329,14 +329,17 @@ static int write_entry(struct cache_entry *ce,
* Convert from git internal format to working tree format
*/
if (dco && dco->state != CE_NO_DELAY) {
- ret = async_convert_to_working_tree(state->istate, ce->name, new_blob,
- size, &buf, &meta, dco);
+ ret = async_convert_to_working_tree_ca(ca, ce->name,
+ new_blob, size,
+ &buf, &meta, dco);
if (ret && string_list_has_string(&dco->paths, ce->name)) {
free(new_blob);
goto delayed;
}
- } else
- ret = convert_to_working_tree(state->istate, ce->name, new_blob, size, &buf, &meta);
+ } else {
+ ret = convert_to_working_tree_ca(ca, ce->name, new_blob,
+ size, &buf, &meta);
+ }
if (ret) {
free(new_blob);
@@ -442,6 +445,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
+ struct conv_attrs ca_buf, *ca = NULL;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -454,8 +458,13 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
}
- if (topath)
- return write_entry(ce, topath, state, 1);
+ if (topath) {
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
+ }
+ return write_entry(ce, topath, ca, state, 1);
+ }
strbuf_reset(&path);
strbuf_add(&path, state->base_dir, state->base_dir_len);
@@ -517,9 +526,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
return 0;
create_directories(path.buf, path.len, state);
+
if (nr_checkouts)
(*nr_checkouts)++;
- return write_entry(ce, path.buf, state, 0);
+
+ if (S_ISREG(ce->ce_mode)) {
+ convert_attrs(state->istate, &ca_buf, ce->name);
+ ca = &ca_buf;
+ }
+
+ return write_entry(ce, path.buf, ca, state, 0);
}
void unlink_entry(const struct cache_entry *ce)
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* [PATCH v6 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
` (7 preceding siblings ...)
2021-03-23 14:19 ` [PATCH v6 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
@ 2021-03-23 14:19 ` Matheus Tavares
2021-03-23 17:34 ` [PATCH v6 0/9] Parallel Checkout (part 1) Junio C Hamano
9 siblings, 0 replies; 154+ messages in thread
From: Matheus Tavares @ 2021-03-23 14:19 UTC (permalink / raw)
To: gitster; +Cc: git
The parallel checkout machinery will call checkout_entry() for entries
that could not be written in parallel due to path collisions. At this
point, we will already be holding the conversion attributes for each
entry, and it would be wasteful to let checkout_entry() load these
again. Instead, let's add the checkout_entry_ca() variant, which
optionally takes a preloaded conv_attrs struct.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
entry.c | 11 ++++++-----
entry.h | 16 ++++++++++++++--
2 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/entry.c b/entry.c
index 6339d54843..2ce16414a7 100644
--- a/entry.c
+++ b/entry.c
@@ -440,12 +440,13 @@ static void mark_colliding_entries(const struct checkout *state,
}
}
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts)
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts)
{
static struct strbuf path = STRBUF_INIT;
struct stat st;
- struct conv_attrs ca_buf, *ca = NULL;
+ struct conv_attrs ca_buf;
if (ce->ce_flags & CE_WT_REMOVE) {
if (topath)
@@ -459,7 +460,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
}
if (topath) {
- if (S_ISREG(ce->ce_mode)) {
+ if (S_ISREG(ce->ce_mode) && !ca) {
convert_attrs(state->istate, &ca_buf, ce->name);
ca = &ca_buf;
}
@@ -530,7 +531,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state,
if (nr_checkouts)
(*nr_checkouts)++;
- if (S_ISREG(ce->ce_mode)) {
+ if (S_ISREG(ce->ce_mode) && !ca) {
convert_attrs(state->istate, &ca_buf, ce->name);
ca = &ca_buf;
}
diff --git a/entry.h b/entry.h
index ea7290bcd5..b8c0e170dc 100644
--- a/entry.h
+++ b/entry.h
@@ -26,9 +26,21 @@ struct checkout {
* file named by ce, a temporary file is created by this function and
* its name is returned in topath[], which must be able to hold at
* least TEMPORARY_FILENAME_LENGTH bytes long.
+ *
+ * With checkout_entry_ca(), callers can optionally pass a preloaded
+ * conv_attrs struct (to avoid reloading it), when ce refers to a
+ * regular file. If ca is NULL, the attributes will be loaded
+ * internally when (and if) needed.
*/
-int checkout_entry(struct cache_entry *ce, const struct checkout *state,
- char *topath, int *nr_checkouts);
+int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts);
+static inline int checkout_entry(struct cache_entry *ce,
+ const struct checkout *state, char *topath,
+ int *nr_checkouts)
+{
+ return checkout_entry_ca(ce, NULL, state, topath, nr_checkouts);
+}
void enable_delayed_checkout(struct checkout *state);
int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
--
2.30.1
^ permalink raw reply related [flat|nested] 154+ messages in thread
* Re: [PATCH v6 0/9] Parallel Checkout (part 1)
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
` (8 preceding siblings ...)
2021-03-23 14:19 ` [PATCH v6 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
@ 2021-03-23 17:34 ` Junio C Hamano
9 siblings, 0 replies; 154+ messages in thread
From: Junio C Hamano @ 2021-03-23 17:34 UTC (permalink / raw)
To: Matheus Tavares; +Cc: git
Matheus Tavares <matheus.bernardino@usp.br> writes:
> Preparatory API changes for parallel checkout. This version was rebased
> on top of 'master'. The only change required in this change of base was
> the `entry.h` inclusion on `builtin/stash.c`, at the commit that creates
> this header file.
Thanks. I think you had part-2 that depended on this one, so if I
have a(n old) copy I'll rebase them on this round myself.
^ permalink raw reply [flat|nested] 154+ messages in thread