git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] wildmatch precompilation interface
@ 2018-02-25 20:35 Ævar Arnfjörð Bjarmason
  2018-02-25 20:35 ` [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-25 20:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Ævar Arnfjörð Bjarmason

My recently landed wildmatch test series was in preperation for some
more major wildmatch changes.

Here's another series that prepares for bigger changes in wildmatch,
it adds an interface to pre-compile the patterns. Right now there's no
point in doing this, and it's harmless since none of the codepaths are
that performance sensitive, but down the line this'll save us time as
we'll be able to skip re-parsing the pattern each time with a better
wildmatch backend.

Ævar Arnfjörð Bjarmason (2):
  wildmatch: add interface for precompiling wildmatch() patterns
  wildmatch: use the new precompiling wildmatch()

 builtin/name-rev.c |  7 ++++++-
 builtin/replace.c  |  7 ++++---
 config.c           |  8 ++++++--
 refs.c             |  7 ++++---
 wildmatch.c        | 25 +++++++++++++++++++++++++
 wildmatch.h        | 11 +++++++++++
 6 files changed, 56 insertions(+), 9 deletions(-)

-- 
2.15.1.424.g9478a66081


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns
  2018-02-25 20:35 [PATCH 0/2] wildmatch precompilation interface Ævar Arnfjörð Bjarmason
@ 2018-02-25 20:35 ` Ævar Arnfjörð Bjarmason
  2018-02-26 10:53   ` Duy Nguyen
  2018-02-25 20:35 ` [PATCH 2/2] wildmatch: use the new precompiling wildmatch() Ævar Arnfjörð Bjarmason
  2018-02-26 11:01 ` [PATCH 0/2] wildmatch precompilation interface Duy Nguyen
  2 siblings, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-25 20:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Ævar Arnfjörð Bjarmason

Add the scaffolding necessary for precompiling wildmatch()
patterns.

There is currently no point in doing this with the wildmatch()
function we have now, since it can't make any use of precompiling the
pattern.

But adding this interface and making use of it will make it easy to
refactor the wildmatch() function to parse the pattern into opcodes as
some glob() implementations do, or to drop an alternate wildmatch()
backend in which trades parsing slowness for faster matching, such as
the PCRE v2 conversion function that understands the wildmatch()
syntax.

It's very unlikely that we'll remove the wildmatch() function as a
convenience wrapper even if we end up requiring a separate compilation
step in some future implementation. There are a lot of one-shot
wildmatches in the codebase, in that case most likely wildmatch() will
be kept around as a shorthand for wildmatch_{compile,match,free}().

I modeled this interface on the PCRE v2 interface. I didn't go with a
glob(3) & globfree(3)-like interface because that would require every
wildmatch() user to pass a dummy parameter, which I got rid of in
55d3426929 ("wildmatch: remove unused wildopts parameter",
2017-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 wildmatch.c | 25 +++++++++++++++++++++++++
 wildmatch.h | 11 +++++++++++
 2 files changed, 36 insertions(+)

diff --git a/wildmatch.c b/wildmatch.c
index d074c1be10..032f339391 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -276,3 +276,28 @@ int wildmatch(const char *pattern, const char *text, unsigned int flags)
 {
 	return dowild((const uchar*)pattern, (const uchar*)text, flags);
 }
+
+struct wildmatch_compiled *wildmatch_compile(const char *pattern,
+					     unsigned int flags)
+{
+	struct wildmatch_compiled *wildmatch_compiled = xmalloc(
+		sizeof(struct wildmatch_compiled));
+	wildmatch_compiled->pattern = xstrdup(pattern);
+	wildmatch_compiled->flags = flags;
+
+	return wildmatch_compiled;
+}
+
+int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
+		    const char *text)
+{
+	return wildmatch(wildmatch_compiled->pattern, text,
+			 wildmatch_compiled->flags);
+}
+
+void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled)
+{
+	if (wildmatch_compiled)
+		free((void *)wildmatch_compiled->pattern);
+	free(wildmatch_compiled);
+}
diff --git a/wildmatch.h b/wildmatch.h
index b8c826aa68..2fc00e0ca0 100644
--- a/wildmatch.h
+++ b/wildmatch.h
@@ -10,5 +10,16 @@
 #define WM_ABORT_ALL -1
 #define WM_ABORT_TO_STARSTAR -2
 
+struct wildmatch_compiled {
+	const char *pattern;
+	unsigned int flags;
+};
+
 int wildmatch(const char *pattern, const char *text, unsigned int flags);
+struct wildmatch_compiled *wildmatch_compile(const char *pattern,
+					     unsigned int flags);
+int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
+		    const char *text);
+void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled);
+
 #endif
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] wildmatch: use the new precompiling wildmatch()
  2018-02-25 20:35 [PATCH 0/2] wildmatch precompilation interface Ævar Arnfjörð Bjarmason
  2018-02-25 20:35 ` [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns Ævar Arnfjörð Bjarmason
@ 2018-02-25 20:35 ` Ævar Arnfjörð Bjarmason
  2018-02-26 11:00   ` Duy Nguyen
  2018-02-26 11:01 ` [PATCH 0/2] wildmatch precompilation interface Duy Nguyen
  2 siblings, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-25 20:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Ævar Arnfjörð Bjarmason

Make some limited use of the wildmatch() interface for "precompiling"
patterns, although the current implementation does not do this yet.

The main hot codepath in dir.c is not being touched yet, but some
other invocations where we repeatedly match the same glob against
multiple strings have been migrated.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/name-rev.c | 7 ++++++-
 builtin/replace.c  | 7 ++++---
 config.c           | 8 ++++++--
 refs.c             | 7 ++++---
 4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 9e088ebd11..c75ac8d9af 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -128,14 +128,19 @@ static void name_rev(struct commit *commit,
 static int subpath_matches(const char *path, const char *filter)
 {
 	const char *subpath = path;
+	struct wildmatch_compiled *wildmatch_compiled =
+		wildmatch_compile(filter, 0);
 
 	while (subpath) {
-		if (!wildmatch(filter, subpath, 0))
+		if (!wildmatch_match(wildmatch_compiled, subpath)) {
+			wildmatch_free(wildmatch_compiled);
 			return subpath - path;
+		}
 		subpath = strchr(subpath, '/');
 		if (subpath)
 			subpath++;
 	}
+	wildmatch_free(wildmatch_compiled);
 	return -1;
 }
 
diff --git a/builtin/replace.c b/builtin/replace.c
index 83d3235721..9be72f2b7b 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -32,7 +32,7 @@ enum replace_format {
 };
 
 struct show_data {
-	const char *pattern;
+	struct wildmatch_compiled *wildmatch_compiled;
 	enum replace_format format;
 };
 
@@ -41,7 +41,7 @@ static int show_reference(const char *refname, const struct object_id *oid,
 {
 	struct show_data *data = cb_data;
 
-	if (!wildmatch(data->pattern, refname, 0)) {
+	if (!wildmatch_match(data->wildmatch_compiled, refname)) {
 		if (data->format == REPLACE_FORMAT_SHORT)
 			printf("%s\n", refname);
 		else if (data->format == REPLACE_FORMAT_MEDIUM)
@@ -70,7 +70,7 @@ static int list_replace_refs(const char *pattern, const char *format)
 
 	if (pattern == NULL)
 		pattern = "*";
-	data.pattern = pattern;
+	data.wildmatch_compiled = wildmatch_compile(pattern, 0);
 
 	if (format == NULL || *format == '\0' || !strcmp(format, "short"))
 		data.format = REPLACE_FORMAT_SHORT;
@@ -84,6 +84,7 @@ static int list_replace_refs(const char *pattern, const char *format)
 		    format);
 
 	for_each_replace_ref(show_reference, (void *)&data);
+	wildmatch_free(data.wildmatch_compiled);
 
 	return 0;
 }
diff --git a/config.c b/config.c
index b0c20e6cb8..0f595de971 100644
--- a/config.c
+++ b/config.c
@@ -210,6 +210,7 @@ static int include_by_gitdir(const struct config_options *opts,
 	int ret = 0, prefix;
 	const char *git_dir;
 	int already_tried_absolute = 0;
+	struct wildmatch_compiled *wildmatch_compiled = NULL;
 
 	if (opts->git_dir)
 		git_dir = opts->git_dir;
@@ -237,8 +238,10 @@ static int include_by_gitdir(const struct config_options *opts,
 			goto done;
 	}
 
-	ret = !wildmatch(pattern.buf + prefix, text.buf + prefix,
-			 icase ? WM_CASEFOLD : 0);
+	if (!wildmatch_compiled)
+		wildmatch_compiled = wildmatch_compile(pattern.buf + prefix,
+						       icase ? WM_CASEFOLD : 0);
+	ret = !wildmatch_match(wildmatch_compiled, text.buf + prefix);
 
 	if (!ret && !already_tried_absolute) {
 		/*
@@ -257,6 +260,7 @@ static int include_by_gitdir(const struct config_options *opts,
 done:
 	strbuf_release(&pattern);
 	strbuf_release(&text);
+	wildmatch_free(wildmatch_compiled);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 20ba82b434..c631793d1e 100644
--- a/refs.c
+++ b/refs.c
@@ -213,7 +213,7 @@ char *resolve_refdup(const char *refname, int resolve_flags,
 
 /* The argument to filter_refs */
 struct ref_filter {
-	const char *pattern;
+	struct wildmatch_compiled *code;
 	each_ref_fn *fn;
 	void *cb_data;
 };
@@ -291,7 +291,7 @@ static int filter_refs(const char *refname, const struct object_id *oid,
 {
 	struct ref_filter *filter = (struct ref_filter *)data;
 
-	if (wildmatch(filter->pattern, refname, 0))
+	if (wildmatch_match(filter->code, refname))
 		return 0;
 	return filter->fn(refname, oid, flags, filter->cb_data);
 }
@@ -454,12 +454,13 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 		strbuf_addch(&real_pattern, '*');
 	}
 
-	filter.pattern = real_pattern.buf;
+	filter.code = wildmatch_compile(real_pattern.buf, 0);
 	filter.fn = fn;
 	filter.cb_data = cb_data;
 	ret = for_each_ref(filter_refs, &filter);
 
 	strbuf_release(&real_pattern);
+	wildmatch_free(filter.code);
 	return ret;
 }
 
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns
  2018-02-25 20:35 ` [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns Ævar Arnfjörð Bjarmason
@ 2018-02-26 10:53   ` Duy Nguyen
  2018-02-26 12:19     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2018-02-26 10:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeff King

On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Add the scaffolding necessary for precompiling wildmatch()
> patterns.
>
> There is currently no point in doing this with the wildmatch()
> function we have now, since it can't make any use of precompiling the
> pattern.
>
> But adding this interface and making use of it will make it easy to
> refactor the wildmatch() function to parse the pattern into opcodes as
> some glob() implementations do, or to drop an alternate wildmatch()
> backend in which trades parsing slowness for faster matching, such as
> the PCRE v2 conversion function that understands the wildmatch()
> syntax.
>
> It's very unlikely that we'll remove the wildmatch() function as a
> convenience wrapper even if we end up requiring a separate compilation
> step in some future implementation. There are a lot of one-shot
> wildmatches in the codebase, in that case most likely wildmatch() will
> be kept around as a shorthand for wildmatch_{compile,match,free}().
>
> I modeled this interface on the PCRE v2 interface. I didn't go with a
> glob(3) & globfree(3)-like interface because that would require every
> wildmatch() user to pass a dummy parameter, which I got rid of in
> 55d3426929 ("wildmatch: remove unused wildopts parameter",
> 2017-06-22).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  wildmatch.c | 25 +++++++++++++++++++++++++
>  wildmatch.h | 11 +++++++++++
>  2 files changed, 36 insertions(+)
>
> diff --git a/wildmatch.c b/wildmatch.c
> index d074c1be10..032f339391 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -276,3 +276,28 @@ int wildmatch(const char *pattern, const char *text, unsigned int flags)
>  {
>         return dowild((const uchar*)pattern, (const uchar*)text, flags);
>  }
> +
> +struct wildmatch_compiled *wildmatch_compile(const char *pattern,
> +                                            unsigned int flags)
> +{
> +       struct wildmatch_compiled *wildmatch_compiled = xmalloc(
> +               sizeof(struct wildmatch_compiled));

struct wildmatch_compiled *data = xmalloc(sizeof(*data));

?

It shortens the line a bit. We already use WM_ prefix for wildmatch
flags, perhaps we can use it for wildmatch structs too (e.g.
wm_compiled instead)

> +       wildmatch_compiled->pattern = xstrdup(pattern);
> +       wildmatch_compiled->flags = flags;
> +
> +       return wildmatch_compiled;
> +}
> +
> +int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
> +                   const char *text)
> +{
> +       return wildmatch(wildmatch_compiled->pattern, text,
> +                        wildmatch_compiled->flags);
> +}
> +
> +void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled)
> +{
> +       if (wildmatch_compiled)
> +               free((void *)wildmatch_compiled->pattern);

Why do make pattern type "const char *" then remove "const" with
typecast here? Why not just "char *" in wildmatch_compiled?

If the reason is to avoid other users from peeking in and modifying
it, then perhaps you can move struct wildmatch_compiled to wildmatch.c
and keep it an opaque struct pointer.

> +       free(wildmatch_compiled);
> +}
> diff --git a/wildmatch.h b/wildmatch.h
> index b8c826aa68..2fc00e0ca0 100644
> --- a/wildmatch.h
> +++ b/wildmatch.h
> @@ -10,5 +10,16 @@
>  #define WM_ABORT_ALL -1
>  #define WM_ABORT_TO_STARSTAR -2
>
> +struct wildmatch_compiled {
> +       const char *pattern;
> +       unsigned int flags;
> +};
> +
>  int wildmatch(const char *pattern, const char *text, unsigned int flags);
> +struct wildmatch_compiled *wildmatch_compile(const char *pattern,
> +                                            unsigned int flags);
> +int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
> +                   const char *text);
> +void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled);
> +
>  #endif
> --
> 2.15.1.424.g9478a66081
>



-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] wildmatch: use the new precompiling wildmatch()
  2018-02-25 20:35 ` [PATCH 2/2] wildmatch: use the new precompiling wildmatch() Ævar Arnfjörð Bjarmason
@ 2018-02-26 11:00   ` Duy Nguyen
  2018-02-26 12:17     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2018-02-26 11:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeff King

On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> diff --git a/config.c b/config.c
> index b0c20e6cb8..0f595de971 100644
> --- a/config.c
> +++ b/config.c
> @@ -210,6 +210,7 @@ static int include_by_gitdir(const struct config_options *opts,
>         int ret = 0, prefix;
>         const char *git_dir;
>         int already_tried_absolute = 0;
> +       struct wildmatch_compiled *wildmatch_compiled = NULL;
>
>         if (opts->git_dir)
>                 git_dir = opts->git_dir;
> @@ -237,8 +238,10 @@ static int include_by_gitdir(const struct config_options *opts,
>                         goto done;
>         }
>
> -       ret = !wildmatch(pattern.buf + prefix, text.buf + prefix,
> -                        icase ? WM_CASEFOLD : 0);
> +       if (!wildmatch_compiled)
> +               wildmatch_compiled = wildmatch_compile(pattern.buf + prefix,
> +                                                      icase ? WM_CASEFOLD : 0);
> +       ret = !wildmatch_match(wildmatch_compiled, text.buf + prefix);

This is a one-shot matching. Is it worth converting to the new interface?

>
>         if (!ret && !already_tried_absolute) {
>                 /*
> @@ -257,6 +260,7 @@ static int include_by_gitdir(const struct config_options *opts,
>  done:
>         strbuf_release(&pattern);
>         strbuf_release(&text);
> +       wildmatch_free(wildmatch_compiled);
>         return ret;
>  }
>
> diff --git a/refs.c b/refs.c
> index 20ba82b434..c631793d1e 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -213,7 +213,7 @@ char *resolve_refdup(const char *refname, int resolve_flags,
>
>  /* The argument to filter_refs */
>  struct ref_filter {
> -       const char *pattern;
> +       struct wildmatch_compiled *code;

This actually makes me think if we should name this struct wildmatch_pattern.

>         each_ref_fn *fn;
>         void *cb_data;
>  };
> @@ -291,7 +291,7 @@ static int filter_refs(const char *refname, const struct object_id *oid,
>  {
>         struct ref_filter *filter = (struct ref_filter *)data;
>
> -       if (wildmatch(filter->pattern, refname, 0))
> +       if (wildmatch_match(filter->code, refname))
>                 return 0;
>         return filter->fn(refname, oid, flags, filter->cb_data);
>  }
> @@ -454,12 +454,13 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
>                 strbuf_addch(&real_pattern, '*');
>         }
>
> -       filter.pattern = real_pattern.buf;
> +       filter.code = wildmatch_compile(real_pattern.buf, 0);
>         filter.fn = fn;
>         filter.cb_data = cb_data;
>         ret = for_each_ref(filter_refs, &filter);
>
>         strbuf_release(&real_pattern);
> +       wildmatch_free(filter.code);
>         return ret;
>  }
>
> --
> 2.15.1.424.g9478a66081
>
-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] wildmatch precompilation interface
  2018-02-25 20:35 [PATCH 0/2] wildmatch precompilation interface Ævar Arnfjörð Bjarmason
  2018-02-25 20:35 ` [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns Ævar Arnfjörð Bjarmason
  2018-02-25 20:35 ` [PATCH 2/2] wildmatch: use the new precompiling wildmatch() Ævar Arnfjörð Bjarmason
@ 2018-02-26 11:01 ` Duy Nguyen
  2018-02-26 12:34   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2018-02-26 11:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeff King

On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> My recently landed wildmatch test series was in preperation for some
> more major wildmatch changes.
>
> Here's another series that prepares for bigger changes in wildmatch,
> it adds an interface to pre-compile the patterns. Right now there's no
> point in doing this, and it's harmless since none of the codepaths are
> that performance sensitive, but down the line this'll save us time as
> we'll be able to skip re-parsing the pattern each time with a better
> wildmatch backend.

I don't see any big problem with this, but should this be a standalone
series? Some changes look harmless now, but I'd rather see the real
precompile implementation to see how it impacts (or benefits) the
converted call sites.
-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] wildmatch: use the new precompiling wildmatch()
  2018-02-26 11:00   ` Duy Nguyen
@ 2018-02-26 12:17     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-26 12:17 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Junio C Hamano, Jeff King


On Mon, Feb 26 2018, Duy Nguyen jotted:

> On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> diff --git a/config.c b/config.c
>> index b0c20e6cb8..0f595de971 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -210,6 +210,7 @@ static int include_by_gitdir(const struct config_options *opts,
>>         int ret = 0, prefix;
>>         const char *git_dir;
>>         int already_tried_absolute = 0;
>> +       struct wildmatch_compiled *wildmatch_compiled = NULL;
>>
>>         if (opts->git_dir)
>>                 git_dir = opts->git_dir;
>> @@ -237,8 +238,10 @@ static int include_by_gitdir(const struct config_options *opts,
>>                         goto done;
>>         }
>>
>> -       ret = !wildmatch(pattern.buf + prefix, text.buf + prefix,
>> -                        icase ? WM_CASEFOLD : 0);
>> +       if (!wildmatch_compiled)
>> +               wildmatch_compiled = wildmatch_compile(pattern.buf + prefix,
>> +                                                      icase ? WM_CASEFOLD : 0);
>> +       ret = !wildmatch_match(wildmatch_compiled, text.buf + prefix);
>
> This is a one-shot matching.

It will match once *or* twice depending on how the goto in that codepath
does.

> Is it worth converting to the new interface?

I moved it more as a showcase, to show how the interface looks like in
common scenarios. It obviously won't be a performance bottleneck or
anything like that for this specific codepath.

>>
>>         if (!ret && !already_tried_absolute) {
>>                 /*
>> @@ -257,6 +260,7 @@ static int include_by_gitdir(const struct config_options *opts,
>>  done:
>>         strbuf_release(&pattern);
>>         strbuf_release(&text);
>> +       wildmatch_free(wildmatch_compiled);
>>         return ret;
>>  }
>>
>> diff --git a/refs.c b/refs.c
>> index 20ba82b434..c631793d1e 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -213,7 +213,7 @@ char *resolve_refdup(const char *refname, int resolve_flags,
>>
>>  /* The argument to filter_refs */
>>  struct ref_filter {
>> -       const char *pattern;
>> +       struct wildmatch_compiled *code;
>
> This actually makes me think if we should name this struct wildmatch_pattern.

Yeah, that's a better name.

>>         each_ref_fn *fn;
>>         void *cb_data;
>>  };
>> @@ -291,7 +291,7 @@ static int filter_refs(const char *refname, const struct object_id *oid,
>>  {
>>         struct ref_filter *filter = (struct ref_filter *)data;
>>
>> -       if (wildmatch(filter->pattern, refname, 0))
>> +       if (wildmatch_match(filter->code, refname))
>>                 return 0;
>>         return filter->fn(refname, oid, flags, filter->cb_data);
>>  }
>> @@ -454,12 +454,13 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
>>                 strbuf_addch(&real_pattern, '*');
>>         }
>>
>> -       filter.pattern = real_pattern.buf;
>> +       filter.code = wildmatch_compile(real_pattern.buf, 0);
>>         filter.fn = fn;
>>         filter.cb_data = cb_data;
>>         ret = for_each_ref(filter_refs, &filter);
>>
>>         strbuf_release(&real_pattern);
>> +       wildmatch_free(filter.code);
>>         return ret;
>>  }
>>
>> --
>> 2.15.1.424.g9478a66081
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns
  2018-02-26 10:53   ` Duy Nguyen
@ 2018-02-26 12:19     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-26 12:19 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Junio C Hamano, Jeff King


On Mon, Feb 26 2018, Duy Nguyen jotted:

> On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> Add the scaffolding necessary for precompiling wildmatch()
>> patterns.
>>
>> There is currently no point in doing this with the wildmatch()
>> function we have now, since it can't make any use of precompiling the
>> pattern.
>>
>> But adding this interface and making use of it will make it easy to
>> refactor the wildmatch() function to parse the pattern into opcodes as
>> some glob() implementations do, or to drop an alternate wildmatch()
>> backend in which trades parsing slowness for faster matching, such as
>> the PCRE v2 conversion function that understands the wildmatch()
>> syntax.
>>
>> It's very unlikely that we'll remove the wildmatch() function as a
>> convenience wrapper even if we end up requiring a separate compilation
>> step in some future implementation. There are a lot of one-shot
>> wildmatches in the codebase, in that case most likely wildmatch() will
>> be kept around as a shorthand for wildmatch_{compile,match,free}().
>>
>> I modeled this interface on the PCRE v2 interface. I didn't go with a
>> glob(3) & globfree(3)-like interface because that would require every
>> wildmatch() user to pass a dummy parameter, which I got rid of in
>> 55d3426929 ("wildmatch: remove unused wildopts parameter",
>> 2017-06-22).
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  wildmatch.c | 25 +++++++++++++++++++++++++
>>  wildmatch.h | 11 +++++++++++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/wildmatch.c b/wildmatch.c
>> index d074c1be10..032f339391 100644
>> --- a/wildmatch.c
>> +++ b/wildmatch.c
>> @@ -276,3 +276,28 @@ int wildmatch(const char *pattern, const char *text, unsigned int flags)
>>  {
>>         return dowild((const uchar*)pattern, (const uchar*)text, flags);
>>  }
>> +
>> +struct wildmatch_compiled *wildmatch_compile(const char *pattern,
>> +                                            unsigned int flags)
>> +{
>> +       struct wildmatch_compiled *wildmatch_compiled = xmalloc(
>> +               sizeof(struct wildmatch_compiled));
>
> struct wildmatch_compiled *data = xmalloc(sizeof(*data));
>
> ?
>
> It shortens the line a bit. We already use WM_ prefix for wildmatch
> flags, perhaps we can use it for wildmatch structs too (e.g.
> wm_compiled instead)

Thanks, that's better.

>> +       wildmatch_compiled->pattern = xstrdup(pattern);
>> +       wildmatch_compiled->flags = flags;
>> +
>> +       return wildmatch_compiled;
>> +}
>> +
>> +int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
>> +                   const char *text)
>> +{
>> +       return wildmatch(wildmatch_compiled->pattern, text,
>> +                        wildmatch_compiled->flags);
>> +}
>> +
>> +void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled)
>> +{
>> +       if (wildmatch_compiled)
>> +               free((void *)wildmatch_compiled->pattern);
>
> Why do make pattern type "const char *" then remove "const" with
> typecast here? Why not just "char *" in wildmatch_compiled?
>
> If the reason is to avoid other users from peeking in and modifying
> it, then perhaps you can move struct wildmatch_compiled to wildmatch.c
> and keep it an opaque struct pointer.

Yes, it's to indicate that "pattern" won't ever be modified. I think
it's more readable / self documenting to have the compiler enforce that
via const, even though it requires the cast to free it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] wildmatch precompilation interface
  2018-02-26 11:01 ` [PATCH 0/2] wildmatch precompilation interface Duy Nguyen
@ 2018-02-26 12:34   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-26 12:34 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Junio C Hamano, Jeff King


On Mon, Feb 26 2018, Duy Nguyen jotted:

> On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> My recently landed wildmatch test series was in preperation for some
>> more major wildmatch changes.
>>
>> Here's another series that prepares for bigger changes in wildmatch,
>> it adds an interface to pre-compile the patterns. Right now there's no
>> point in doing this, and it's harmless since none of the codepaths are
>> that performance sensitive, but down the line this'll save us time as
>> we'll be able to skip re-parsing the pattern each time with a better
>> wildmatch backend.
>
> I don't see any big problem with this, but should this be a standalone
> series? Some changes look harmless now, but I'd rather see the real
> precompile implementation to see how it impacts (or benefits) the
> converted call sites.

Let's drop this for now sinceyou're on the fence about it, I wasn't all
that sure myself and thought I'd send it in for comments.

I don't have anything ready for submission from the rest of this series,
but figured (if you/others didn't mind) that it might be easier to
review the addition of the interface at first, but indeed, on second
thought that doesn't make sense.

The state of what I have is:

 1. <this tiny series>
 2. <WIP add the wildmatch compile interface to more stuff, notably dir.c>

    The dir.c use (and some tree-related stuff) are the real hot
    codepaths using wildmatch.
 3. <WIP Replace the wildmatch backend with syntax-compatible PCRE>

    This is somewhat of a mess right now. Preliminary tests reveal that
    the pathological case tested for by t/perf/p0100-globbing.sh is
    wildly faster, but that most regular matching is a bit slower,
    although that might be me being stupid with the interface.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-02-26 12:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-25 20:35 [PATCH 0/2] wildmatch precompilation interface Ævar Arnfjörð Bjarmason
2018-02-25 20:35 ` [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns Ævar Arnfjörð Bjarmason
2018-02-26 10:53   ` Duy Nguyen
2018-02-26 12:19     ` Ævar Arnfjörð Bjarmason
2018-02-25 20:35 ` [PATCH 2/2] wildmatch: use the new precompiling wildmatch() Ævar Arnfjörð Bjarmason
2018-02-26 11:00   ` Duy Nguyen
2018-02-26 12:17     ` Ævar Arnfjörð Bjarmason
2018-02-26 11:01 ` [PATCH 0/2] wildmatch precompilation interface Duy Nguyen
2018-02-26 12:34   ` Ævar Arnfjörð Bjarmason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).