git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] urlmatch: allow regexp-based matches
@ 2017-01-23 13:06 Patrick Steinhardt
  2017-01-23 13:06 ` [PATCH v1 1/2] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
                   ` (24 more replies)
  0 siblings, 25 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-23 13:06 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Patrick Steinhardt

Hi,

Short disclaimer: this patch results from work for a client at my
day job at elego Software Solutions GmbH. As such, I'm using my
work mail address and added a new mailmap entry. I wasn't exactly
certain if the mailmap entry should've been created in a separate
commit series, as it has nothing to do with the actual topic --
I can re-send it separately if requested.

This patch is mostly a request for comments. The use case is to
be able to configure an HTTP proxy for all subdomains of a
certain domain where there are hundreds of subdomains. The most
flexible way I could imagine was by using regular expressions for
the matching, which is how I implemented it for now. So users can
now create a configuration key like
`http.?http://.*\\.example\\.com.*` to apply settings for all
subdomains of `example.com`.

I tried to make this feature as backwards-compatible as it can be
by having the '?' prefix. Older clients will barf when trying to
normalize the URL as '?' is not in the set of allowed characters
for a URL, and for newer clients there will be no change in
behavior for previously configured `http.<url>.*` keys.

Regards
Patrick Steinhardt

Patrick Steinhardt (2):
  mailmap: add Patrick Steinhardt's work address
  urlmatch: allow regex-based URL matching

 .mailmap                 |  1 +
 Documentation/config.txt |  6 ++++-
 t/t1300-repo-config.sh   | 31 ++++++++++++++++++++++++++
 urlmatch.c               | 57 ++++++++++++++++++++++++++++++++++++++----------
 4 files changed, 82 insertions(+), 13 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v1 1/2] mailmap: add Patrick Steinhardt's work address
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
@ 2017-01-23 13:06 ` Patrick Steinhardt
  2017-01-23 13:06 ` [PATCH v1 2/2] urlmatch: allow regex-based URL matching Patrick Steinhardt
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-23 13:06 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Patrick Steinhardt

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 9c87a3840..ea59205b9 100644
--- a/.mailmap
+++ b/.mailmap
@@ -177,6 +177,7 @@ Paolo Bonzini <bonzini@gnu.org> <paolo.bonzini@lu.unisi.ch>
 Pascal Obry <pascal@obry.net> <pascal.obry@gmail.com>
 Pascal Obry <pascal@obry.net> <pascal.obry@wanadoo.fr>
 Pat Notz <patnotz@gmail.com> <pknotz@sandia.gov>
+Patrick Steinhardt <ps@pks.im> <patrick.steinhardt@elego.de>
 Paul Mackerras <paulus@samba.org> <paulus@dorrigo.(none)>
 Paul Mackerras <paulus@samba.org> <paulus@pogo.(none)>
 Peter Baumann <waste.manager@gmx.de> <Peter.B.Baumann@stud.informatik.uni-erlangen.de>
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 2/2] urlmatch: allow regex-based URL matching
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
  2017-01-23 13:06 ` [PATCH v1 1/2] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
@ 2017-01-23 13:06 ` Patrick Steinhardt
  2017-01-23 19:53 ` [PATCH v1 0/2] urlmatch: allow regexp-based matches Junio C Hamano
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-23 13:06 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Patrick Steinhardt

The URL matching function computes for two URLs whether they match not.
The match is performed by splitting up the URL into different parts and
then doing an exact comparison with the to-be-matched URL.

The main user of `urlmatch` is the configuration subsystem. It allows to
set certain configurations based on the URL which is being connected to
via keys like `http.<url>.*`. A common use case for this is to set
proxies for only some remotes which match the given URL. Unfortunately,
having exact matches for all parts of the URL can become quite tedious
in some setups. Imagine for example a corporate network where there are
dozens or even hundreds of subdomains, which would have to be configured
individually.

This commit introduces the ability to have regex-based URL matches. A
user can prefix a configuration key's URL with a question mark ('?') to
use regular expressions instead of exact matches in order to find all
matching URLs. A user can now simply add a key
`http.?http://.*\\.example\\.com.proxy` to set a proxy for all
subdomains of `example.com`. When no question mark is given as a prefix,
then the configuration subsystem will use the old algorithm based on
exact matches.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 Documentation/config.txt |  6 ++++-
 t/t1300-repo-config.sh   | 31 ++++++++++++++++++++++++++
 urlmatch.c               | 57 ++++++++++++++++++++++++++++++++++++++----------
 3 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 506431267..23651b19e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1906,7 +1906,11 @@ http.followRedirects::
 
 http.<url>.*::
 	Any of the http.* options above can be applied selectively to some URLs.
-	For a config key to match a URL, each element of the config key is
+	There are two different modes to match URLs: if the config key's URL is
+	prefixed with a `?`, it allows to make use of regular expressions. An
+	example for this is `http.?http://.*\\.example\\.com.*` to match all
+	subdomains of `example.com`.
+	If the key is not prefixed with a `?`, each element of the config key is
 	compared to that of the URL, in the following order:
 +
 --
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5a2..fbbc58304 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,6 +1177,37 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
+test_expect_success 'regex-based urlmatch' '
+	cat >.git/config <<-\EOF &&
+	[http]
+		sslVerify
+	[http "?https://.*\\.example\\.com"]
+		sslVerify = false
+		cookieFile = /tmp/cookie.txt
+	EOF
+
+	test_expect_code 1 git config --bool --get-urlmatch doesnt.exist https://good.example.com >actual &&
+	test_must_be_empty actual &&
+
+	test_expect_code 1 git config --bool --get-urlmatch doesnt.exist https://good-example.com >actual &&
+	test_must_be_empty actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo false >expect &&
+	git config --bool --get-urlmatch http.sslverify https://subdomain.example.com >actual &&
+	test_cmp expect actual &&
+
+	{
+		echo http.cookiefile /tmp/cookie.txt &&
+		echo http.sslverify false
+	} >expect &&
+	git config --get-urlmatch HTTP https://subdomain.example.com >actual &&
+	test_cmp expect actual
+'
+
 # good section hygiene
 test_expect_failure 'unsetting the last key in a section removes header' '
 	cat >.git/config <<-\EOF &&
diff --git a/urlmatch.c b/urlmatch.c
index 132d342bc..8ed5047e3 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -490,18 +490,51 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 	}
 	dot = strrchr(key, '.');
 	if (dot) {
-		char *config_url, *norm_url;
-		struct url_info norm_info;
-
-		config_url = xmemdupz(key, dot - key);
-		norm_url = url_normalize(config_url, &norm_info);
-		free(config_url);
-		if (!norm_url)
-			return 0;
-		matched_len = match_urls(url, &norm_info, &user_matched);
-		free(norm_url);
-		if (!matched_len)
-			return 0;
+		/*
+		 * When the configuration key's URL is prefixed
+		 * with a '?', regular expressions are enabled to
+		 * match the URL instead of the exact-match
+		 * algorithm.
+		 */
+		if (starts_with(key, "?")) {
+			char *config_url;
+			regex_t reg;
+			int status;
+
+			config_url = xmemdupz(key + 1, dot - key - 1);
+			if (regcomp(&reg, config_url, REG_EXTENDED)) {
+				warning(_("Cannot prepare URL regexp %s"),
+					config_url);
+				free(config_url);
+				return 0;
+			}
+
+			status = regexec(&reg, url->url, 0, NULL, 0);
+			free(config_url);
+			regfree(&reg);
+
+			if (status) {
+				 if (status != REG_NOMATCH)
+					warning(_("regexec returned %d for input '%s'"),
+						status, url->url);
+
+				return 0;
+			}
+		} else {
+			char *config_url, *norm_url;
+			struct url_info norm_info;
+
+			config_url = xmemdupz(key, dot - key);
+			norm_url = url_normalize(config_url, &norm_info);
+			free(config_url);
+			if (!norm_url)
+				return 0;
+			matched_len = match_urls(url, &norm_info, &user_matched);
+			free(norm_url);
+			if (!matched_len)
+				return 0;
+		}
+
 		key = dot + 1;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 0/2] urlmatch: allow regexp-based matches
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
  2017-01-23 13:06 ` [PATCH v1 1/2] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
  2017-01-23 13:06 ` [PATCH v1 2/2] urlmatch: allow regex-based URL matching Patrick Steinhardt
@ 2017-01-23 19:53 ` Junio C Hamano
  2017-01-24 11:29   ` Patrick Steinhardt
  2017-01-24 17:00 ` [PATCH v2 0/4] " Patrick Steinhardt
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2017-01-23 19:53 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Patrick Steinhardt

Patrick Steinhardt <patrick.steinhardt@elego.de> writes:

> This patch is mostly a request for comments. The use case is to
> be able to configure an HTTP proxy for all subdomains of a
> certain domain where there are hundreds of subdomains. The most
> flexible way I could imagine was by using regular expressions for
> the matching, which is how I implemented it for now. So users can
> now create a configuration key like
> `http.?http://.*\\.example\\.com.*` to apply settings for all
> subdomains of `example.com`.

While reading 2/2, I got an impression that this is "too" flexible
and possibly operates at a wrong level.  I would have expected that
the wildcarding to be limited to the host part only and hook into
match_urls(), allowing the users of the new feature to still take
advantage of the existing support of "http://me@example.com" that
limits the match to the case that the connection is authenticated
for a user, for example, by newly allowing "http://me@*.example.com"
or something like that.

Because you cannot have a literal '*' in your hostname, I would
imagine that supporting a match pattern "http://me@*.example.com"
would be already backward compatible without requiring a leading
question-mark.

I also personally would prefer these textual matching to be done
with glob not with regexp, by the way, as the above description of
mine shows.

Thanks.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 0/2] urlmatch: allow regexp-based matches
  2017-01-23 19:53 ` [PATCH v1 0/2] urlmatch: allow regexp-based matches Junio C Hamano
@ 2017-01-24 11:29   ` Patrick Steinhardt
  0 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-24 11:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 2423 bytes --]

On Mon, Jan 23, 2017 at 11:53:43AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <patrick.steinhardt@elego.de> writes:
> 
> > This patch is mostly a request for comments. The use case is to
> > be able to configure an HTTP proxy for all subdomains of a
> > certain domain where there are hundreds of subdomains. The most
> > flexible way I could imagine was by using regular expressions for
> > the matching, which is how I implemented it for now. So users can
> > now create a configuration key like
> > `http.?http://.*\\.example\\.com.*` to apply settings for all
> > subdomains of `example.com`.
> 
> While reading 2/2, I got an impression that this is "too" flexible
> and possibly operates at a wrong level.  I would have expected that
> the wildcarding to be limited to the host part only and hook into
> match_urls(), allowing the users of the new feature to still take
> advantage of the existing support of "http://me@example.com" that
> limits the match to the case that the connection is authenticated
> for a user, for example, by newly allowing "http://me@*.example.com"
> or something like that.
> 
> Because you cannot have a literal '*' in your hostname, I would
> imagine that supporting a match pattern "http://me@*.example.com"
> would be already backward compatible without requiring a leading
> question-mark.
> 
> I also personally would prefer these textual matching to be done
> with glob not with regexp, by the way, as the above description of
> mine shows.
> 
> Thanks.

Thanks for your feedback. Using globs in the hostname only was my
first intent, as well. I later on took regular expressions
instead so as to allow further flexibility for the user. The
reasoning was that there might be other use cases which cannot
actually be solved with using globs only, even if I myself wasn't
aware of different ones. So this might be indeed over-engineered
when using regular expressions.

There are several questions though regarding semantics with
globs, where I'd like to have additional opinions on.

- should a glob only be allowed for actual subdomains, allowing
  "http://*.example.com" but not "http://*example.com"?

- should a glob also match multiple nestings of subdomains? E.g.
  "http://*.example.com" would match "http://foo.example.com" but
  not "http://foo.bar.example.com"?

I'll send a version 2 soon-ish.

Regards
Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 0/4] urlmatch: allow regexp-based matches
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2017-01-23 19:53 ` [PATCH v1 0/2] urlmatch: allow regexp-based matches Junio C Hamano
@ 2017-01-24 17:00 ` Patrick Steinhardt
  2017-01-24 17:00 ` [PATCH v2 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

Hi,

This is version two of my patch series.

The use case is to be able to configure an HTTP proxy for all
subdomains of a domain where there are hundreds of subdomains.

Previously, I have been using complete regular expressions with
an escape-mechanism to match the configuration key's URLs.
According to Junio's comments, I changed this mechanism to a much
simpler one, where the user is only allowed to use globbing for
the host part of the URL. That is a user can now specify a key
`http.https://*.example.com` to match all sub-domains of
`example.com`. For now I've decided to implement it such that a
single `*` matches a single subdomain only, so for example
`https://foo.bar.example.com` would not match in this case. This
is similar to how shell-globbing works usually, so it should not
be of much surprise. It's also highlighted in the documentation.

I did not include an interdiff as too much has changed between
the two versions.

Regards
Patrick

Patrick Steinhardt (4):
  mailmap: add Patrick Steinhardt's work address
  urlmatch: enable normalization of URLs with globs
  urlmatch: split host and port fields in `struct url_info`
  urlmatch: allow globbing for the URL host part

 .mailmap                 |  1 +
 Documentation/config.txt |  5 +++-
 t/t1300-repo-config.sh   | 36 +++++++++++++++++++++++++
 urlmatch.c               | 68 +++++++++++++++++++++++++++++++++++++++++-------
 urlmatch.h               |  9 ++++---
 5 files changed, 104 insertions(+), 15 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 1/4] mailmap: add Patrick Steinhardt's work address
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2017-01-24 17:00 ` [PATCH v2 0/4] " Patrick Steinhardt
@ 2017-01-24 17:00 ` Patrick Steinhardt
  2017-01-24 17:00 ` [PATCH v2 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 9c87a3840..ea59205b9 100644
--- a/.mailmap
+++ b/.mailmap
@@ -177,6 +177,7 @@ Paolo Bonzini <bonzini@gnu.org> <paolo.bonzini@lu.unisi.ch>
 Pascal Obry <pascal@obry.net> <pascal.obry@gmail.com>
 Pascal Obry <pascal@obry.net> <pascal.obry@wanadoo.fr>
 Pat Notz <patnotz@gmail.com> <pknotz@sandia.gov>
+Patrick Steinhardt <ps@pks.im> <patrick.steinhardt@elego.de>
 Paul Mackerras <paulus@samba.org> <paulus@dorrigo.(none)>
 Paul Mackerras <paulus@samba.org> <paulus@pogo.(none)>
 Peter Baumann <waste.manager@gmx.de> <Peter.B.Baumann@stud.informatik.uni-erlangen.de>
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 2/4] urlmatch: enable normalization of URLs with globs
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2017-01-24 17:00 ` [PATCH v2 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
@ 2017-01-24 17:00 ` Patrick Steinhardt
  2017-01-24 17:00 ` [PATCH v2 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

The `url_normalize` function is used to validate and normalize URLs. As
such, it does not allow for some special characters to be part of the
URLs that are to be normalized. As we want to allow using globs in some
configuration keys making use of URLs, namely `http.<url>.<key>`, but
still normalize them, we need to somehow enable some additional allowed
characters.

To do this without having to change all callers of `url_normalize`,
where most do not actually want globbing at all, we split off another
function `url_normalize_1`. This function accepts an additional
parameter `allow_globs`, which is subsequently called by `url_normalize`
with `allow_globs=0`.

As of now, this function is not used with globbing enabled. A caller
will be added in the following commit.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index 132d342bc..d350478c0 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,7 +63,7 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
-char *url_normalize(const char *url, struct url_info *out_info)
+static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
 	 * Normalize NUL-terminated url using the following rules:
@@ -191,7 +191,12 @@ char *url_normalize(const char *url, struct url_info *out_info)
 		strbuf_release(&norm);
 		return NULL;
 	}
-	spanned = strspn(url, URL_HOST_CHARS);
+
+	if (allow_globs)
+		spanned = strspn(url, URL_HOST_CHARS "*");
+	else
+		spanned = strspn(url, URL_HOST_CHARS);
+
 	if (spanned < colon_ptr - url) {
 		/* Host name has invalid characters */
 		if (out_info) {
@@ -380,6 +385,11 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return result;
 }
 
+char *url_normalize(const char *url, struct url_info *out_info)
+{
+	return url_normalize_1(url, out_info, 0);
+}
+
 static size_t url_match_prefix(const char *url,
 			       const char *url_prefix,
 			       size_t url_prefix_len)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 3/4] urlmatch: split host and port fields in `struct url_info`
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2017-01-24 17:00 ` [PATCH v2 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
@ 2017-01-24 17:00 ` Patrick Steinhardt
  2017-01-24 17:00 ` [PATCH v2 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

The `url_info` structure contains information about a normalized URL
with the URL's components being represented by different fields. The
host and port part though are to be accessed by the same `host` field,
so that getting the host and/or port separately becomes more involved
than really necessary.

To make the port more readily accessible, split up the host and port
fields. Namely, the `host_len` will not include the port length anymore
and a new `port_off` field has been added which includes the offset to
the port, if available.

The only user of these fields is `url_normalize_1`. This change makes it
easier later on to treat host and port differently when introducing
globs for domains.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 16 ++++++++++++----
 urlmatch.h |  9 +++++----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index d350478c0..e328905eb 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -104,7 +104,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 	struct strbuf norm;
 	size_t spanned;
 	size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0;
-	size_t host_off=0, host_len=0, port_len=0, path_off, path_len, result_len;
+	size_t host_off=0, host_len=0, port_off=0, port_len=0, path_off, path_len, result_len;
 	const char *slash_ptr, *at_ptr, *colon_ptr, *path_start;
 	char *result;
 
@@ -263,6 +263,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 				return NULL;
 			}
 			strbuf_addch(&norm, ':');
+			port_off = norm.len;
 			strbuf_add(&norm, url, slash_ptr - url);
 			port_len = slash_ptr - url;
 		}
@@ -270,7 +271,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		url = slash_ptr;
 	}
 	if (host_off)
-		host_len = norm.len - host_off;
+		host_len = norm.len - host_off - (port_len ? port_len + 1 : 0);
 
 
 	/*
@@ -378,6 +379,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		out_info->passwd_len = passwd_len;
 		out_info->host_off = host_off;
 		out_info->host_len = host_len;
+		out_info->port_off = port_off;
 		out_info->port_len = port_len;
 		out_info->path_off = path_off;
 		out_info->path_len = path_len;
@@ -464,11 +466,17 @@ static int match_urls(const struct url_info *url,
 		usermatched = 1;
 	}
 
-	/* check the host and port */
+	/* check the host */
 	if (url_prefix->host_len != url->host_len ||
 	    strncmp(url->url + url->host_off,
 		    url_prefix->url + url_prefix->host_off, url->host_len))
-		return 0; /* host names and/or ports do not match */
+		return 0; /* host names do not match */
+
+	/* check the port */
+	if (url_prefix->port_len != url->port_len ||
+	    strncmp(url->url + url->port_off,
+		    url_prefix->url + url_prefix->port_off, url->port_len))
+		return 0; /* ports do not match */
 
 	/* check the path */
 	pathmatchlen = url_match_prefix(
diff --git a/urlmatch.h b/urlmatch.h
index 528862adc..0ea812b03 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -18,11 +18,12 @@ struct url_info {
 	size_t passwd_len;	/* length of passwd; if passwd_off != 0 but
 				   passwd_len == 0, an empty passwd was given */
 	size_t host_off;	/* offset into url to start of host name (0 => none) */
-	size_t host_len;	/* length of host name; this INCLUDES any ':portnum';
+	size_t host_len;	/* length of host name;
 				 * file urls may have host_len == 0 */
-	size_t port_len;	/* if a portnum is present (port_len != 0), it has
-				 * this length (excluding the leading ':') at the
-				 * end of the host name (always 0 for file urls) */
+	size_t port_off;	/* offset into url to start of port number (0 => none) */
+	size_t port_len;	/* if a portnum is present (port_off != 0), it has
+				 * this length (excluding the leading ':') starting
+				 * from port_off (always 0 for file urls) */
 	size_t path_off;	/* offset into url to the start of the url path;
 				 * this will always point to a '/' character
 				 * after the url has been normalized */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 4/4] urlmatch: allow globbing for the URL host part
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2017-01-24 17:00 ` [PATCH v2 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
@ 2017-01-24 17:00 ` Patrick Steinhardt
  2017-01-24 17:52   ` Philip Oakley
  2017-01-25  9:56 ` [PATCH v3 0/4] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-24 17:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

The URL matching function computes for two URLs whether they match not.
The match is performed by splitting up the URL into different parts and
then doing an exact comparison with the to-be-matched URL.

The main user of `urlmatch` is the configuration subsystem. It allows to
set certain configurations based on the URL which is being connected to
via keys like `http.<url>.*`. A common use case for this is to set
proxies for only some remotes which match the given URL. Unfortunately,
having exact matches for all parts of the URL can become quite tedious
in some setups. Imagine for example a corporate network where there are
dozens or even hundreds of subdomains, which would have to be configured
individually.

This commit introduces the ability to use globbing in the host-part of
the URLs. A user can simply specify a `*` as part of the host name to
match all subdomains at this level. For example adding a configuration
key `http.https://*.example.com.proxy` will match all subdomains of
`https://example.com`.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 Documentation/config.txt |  5 ++++-
 t/t1300-repo-config.sh   | 36 ++++++++++++++++++++++++++++++++++++
 urlmatch.c               | 38 ++++++++++++++++++++++++++++++++++----
 3 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 506431267..a78921c2b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1914,7 +1914,10 @@ http.<url>.*::
   must match exactly between the config key and the URL.
 
 . Host/domain name (e.g., `example.com` in `https://example.com/`).
-  This field must match exactly between the config key and the URL.
+  This field must match between the config key and the URL. It is
+  possible to use globs in the config key to match all subdomains, e.g.
+  `https://*.example.com/` to match all subdomains of `example.com`. Note
+  that a glob only every matches a single part of the hostname.
 
 . Port number (e.g., `8080` in `http://example.com:8080/`).
   This field must match exactly between the config key and the URL.
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5a2..ec545e092 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
+test_expect_success 'glob-based urlmatch' '
+	cat >.git/config <<-\EOF &&
+	[http]
+		sslVerify
+	[http "https://*.example.com"]
+		sslVerify = false
+		cookieFile = /tmp/cookie.txt
+	EOF
+
+	test_expect_code 1 git config --bool --get-urlmatch doesnt.exist https://good.example.com >actual &&
+	test_must_be_empty actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://good-example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.sslverify https://deep.nested.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo false >expect &&
+	git config --bool --get-urlmatch http.sslverify https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	{
+		echo http.cookiefile /tmp/cookie.txt &&
+		echo http.sslverify false
+	} >expect &&
+	git config --get-urlmatch HTTP https://good.example.com >actual &&
+	test_cmp expect actual
+'
+
 # good section hygiene
 test_expect_failure 'unsetting the last key in a section removes header' '
 	cat >.git/config <<-\EOF &&
diff --git a/urlmatch.c b/urlmatch.c
index e328905eb..53ff972a6 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,6 +63,38 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static int match_host(const struct url_info *url_info,
+		      const struct url_info *pattern_info)
+{
+	char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
+	char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
+	char *url_tok, *pat_tok, *url_save, *pat_save;
+	int matching;
+
+	url_tok = strtok_r(url, ".", &url_save);
+	pat_tok = strtok_r(pat, ".", &pat_save);
+
+	for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
+				   pat_tok = strtok_r(NULL, ".", &pat_save)) {
+		if (!strcmp(pat_tok, "*"))
+			continue; /* a simple glob matches everything */
+
+		if (strcmp(url_tok, pat_tok)) {
+			/* subdomains do not match */
+			matching = 0;
+			break;
+		}
+	}
+
+	/* matching if both URL and pattern are at their ends */
+	matching = (url_tok == NULL && pat_tok == NULL);
+
+	free(url);
+	free(pat);
+
+	return matching;
+}
+
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
@@ -467,9 +499,7 @@ static int match_urls(const struct url_info *url,
 	}
 
 	/* check the host */
-	if (url_prefix->host_len != url->host_len ||
-	    strncmp(url->url + url->host_off,
-		    url_prefix->url + url_prefix->host_off, url->host_len))
+	if (!match_host(url, url_prefix))
 		return 0; /* host names do not match */
 
 	/* check the port */
@@ -512,7 +542,7 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		struct url_info norm_info;
 
 		config_url = xmemdupz(key, dot - key);
-		norm_url = url_normalize(config_url, &norm_info);
+		norm_url = url_normalize_1(config_url, &norm_info, 1);
 		free(config_url);
 		if (!norm_url)
 			return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 4/4] urlmatch: allow globbing for the URL host part
  2017-01-24 17:00 ` [PATCH v2 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
@ 2017-01-24 17:52   ` Philip Oakley
  2017-01-25  9:57     ` Patrick Steinhardt
  0 siblings, 1 reply; 37+ messages in thread
From: Philip Oakley @ 2017-01-24 17:52 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

From: "Patrick Steinhardt" <patrick.steinhardt@elego.de>

a quick comment on the documentation part ..

> The URL matching function computes for two URLs whether they match not.
> The match is performed by splitting up the URL into different parts and
> then doing an exact comparison with the to-be-matched URL.
>
> The main user of `urlmatch` is the configuration subsystem. It allows to
> set certain configurations based on the URL which is being connected to
> via keys like `http.<url>.*`. A common use case for this is to set
> proxies for only some remotes which match the given URL. Unfortunately,
> having exact matches for all parts of the URL can become quite tedious
> in some setups. Imagine for example a corporate network where there are
> dozens or even hundreds of subdomains, which would have to be configured
> individually.
>
> This commit introduces the ability to use globbing in the host-part of
> the URLs. A user can simply specify a `*` as part of the host name to
> match all subdomains at this level. For example adding a configuration
> key `http.https://*.example.com.proxy` will match all subdomains of
> `https://example.com`.
>
> Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
> ---
> Documentation/config.txt |  5 ++++-
> t/t1300-repo-config.sh   | 36 ++++++++++++++++++++++++++++++++++++
> urlmatch.c               | 38 ++++++++++++++++++++++++++++++++++----
> 3 files changed, 74 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 506431267..a78921c2b 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1914,7 +1914,10 @@ http.<url>.*::
>   must match exactly between the config key and the URL.
>
> . Host/domain name (e.g., `example.com` in `https://example.com/`).
> -  This field must match exactly between the config key and the URL.
> +  This field must match between the config key and the URL. It is
> +  possible to use globs in the config key to match all subdomains, e.g.
> +  `https://*.example.com/` to match all subdomains of `example.com`. Note
> +  that a glob only every matches a single part of the hostname.

[s/every/ever/ ?]

the "match all subdomains" appears to contradict the "a glob only ever 
matches a single part ".

Maybe borrow the example from the 0/4 cover letter
"so for example `https://foo.bar.example.com` would not match in the case of 
`http.https://*.example.com` " (If I understood it correctly.

A simple example often clarifies much better than more words.
--
Philip

>
> . Port number (e.g., `8080` in `http://example.com:8080/`).
>   This field must match exactly between the config key and the URL.
> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index 923bfc5a2..ec545e092 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh
> @@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' '
>  test_cmp expect actual
> '
>
> +test_expect_success 'glob-based urlmatch' '
> + cat >.git/config <<-\EOF &&
> + [http]
> + sslVerify
> + [http "https://*.example.com"]
> + sslVerify = false
> + cookieFile = /tmp/cookie.txt
> + EOF
> +
> + test_expect_code 1 git config --bool --get-urlmatch doesnt.exist 
> https://good.example.com >actual &&
> + test_must_be_empty actual &&
> +
> + echo true >expect &&
> + git config --bool --get-urlmatch http.SSLverify https://example.com 
>  >actual &&
> + test_cmp expect actual &&
> +
> + echo true >expect &&
> + git config --bool --get-urlmatch http.SSLverify https://good-example.com 
>  >actual &&
> + test_cmp expect actual &&
> +
> + echo true >expect &&
> + git config --bool --get-urlmatch http.sslverify 
> https://deep.nested.example.com >actual &&
> + test_cmp expect actual &&
> +
> + echo false >expect &&
> + git config --bool --get-urlmatch http.sslverify https://good.example.com 
>  >actual &&
> + test_cmp expect actual &&
> +
> + {
> + echo http.cookiefile /tmp/cookie.txt &&
> + echo http.sslverify false
> + } >expect &&
> + git config --get-urlmatch HTTP https://good.example.com >actual &&
> + test_cmp expect actual
> +'
> +
> # good section hygiene
> test_expect_failure 'unsetting the last key in a section removes header' '
>  cat >.git/config <<-\EOF &&
> diff --git a/urlmatch.c b/urlmatch.c
> index e328905eb..53ff972a6 100644
> --- a/urlmatch.c
> +++ b/urlmatch.c
> @@ -63,6 +63,38 @@ static int append_normalized_escapes(struct strbuf 
> *buf,
>  return 1;
> }
>
> +static int match_host(const struct url_info *url_info,
> +       const struct url_info *pattern_info)
> +{
> + char *url = xmemdupz(url_info->url + url_info->host_off, 
> url_info->host_len);
> + char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, 
> pattern_info->host_len);
> + char *url_tok, *pat_tok, *url_save, *pat_save;
> + int matching;
> +
> + url_tok = strtok_r(url, ".", &url_save);
> + pat_tok = strtok_r(pat, ".", &pat_save);
> +
> + for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
> +    pat_tok = strtok_r(NULL, ".", &pat_save)) {
> + if (!strcmp(pat_tok, "*"))
> + continue; /* a simple glob matches everything */
> +
> + if (strcmp(url_tok, pat_tok)) {
> + /* subdomains do not match */
> + matching = 0;
> + break;
> + }
> + }
> +
> + /* matching if both URL and pattern are at their ends */
> + matching = (url_tok == NULL && pat_tok == NULL);
> +
> + free(url);
> + free(pat);
> +
> + return matching;
> +}
> +
> static char *url_normalize_1(const char *url, struct url_info *out_info, 
> char allow_globs)
> {
>  /*
> @@ -467,9 +499,7 @@ static int match_urls(const struct url_info *url,
>  }
>
>  /* check the host */
> - if (url_prefix->host_len != url->host_len ||
> -     strncmp(url->url + url->host_off,
> -     url_prefix->url + url_prefix->host_off, url->host_len))
> + if (!match_host(url, url_prefix))
>  return 0; /* host names do not match */
>
>  /* check the port */
> @@ -512,7 +542,7 @@ int urlmatch_config_entry(const char *var, const char 
> *value, void *cb)
>  struct url_info norm_info;
>
>  config_url = xmemdupz(key, dot - key);
> - norm_url = url_normalize(config_url, &norm_info);
> + norm_url = url_normalize_1(config_url, &norm_info, 1);
>  free(config_url);
>  if (!norm_url)
>  return 0;
> -- 
> 2.11.0
>
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 0/4] urlmatch: allow regexp-based matches
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2017-01-24 17:00 ` [PATCH v2 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
@ 2017-01-25  9:56 ` Patrick Steinhardt
  2017-01-25  9:56 ` [PATCH v3 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-25  9:56 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

Hi,

This is version three of my patch series. The previous version
can be found at [1]. The use case is to be able to configure an
HTTP proxy for all subdomains of a domain where there are
hundreds of subdomains.

This version addresses a comment by Philip Oakley regarding the
documentation. You can find the interdiff below.

Regards
Patrick

[1]: http://public-inbox.org/git/20170124170031.18069-1-patrick.steinhardt@elego.de/T/#u

Patrick Steinhardt (4):
  mailmap: add Patrick Steinhardt's work address
  urlmatch: enable normalization of URLs with globs
  urlmatch: split host and port fields in `struct url_info`
  urlmatch: allow globbing for the URL host part

 .mailmap                 |  1 +
 Documentation/config.txt |  5 +++-
 t/t1300-repo-config.sh   | 36 +++++++++++++++++++++++++
 urlmatch.c               | 68 +++++++++++++++++++++++++++++++++++++++++-------
 urlmatch.h               |  9 ++++---
 5 files changed, 104 insertions(+), 15 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a78921c2b..078e9b490 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1915,9 +1915,9 @@ http.<url>.*::
 
 . Host/domain name (e.g., `example.com` in `https://example.com/`).
   This field must match between the config key and the URL. It is
-  possible to use globs in the config key to match all subdomains, e.g.
-  `https://*.example.com/` to match all subdomains of `example.com`. Note
-  that a glob only every matches a single part of the hostname.
+  possible to specify a `*` as part of the host name to match all subdomains
+  at this level. `https://*.example.com/` for example would match
+  `https://foo.example.com/`, but not `https://foo.bar.example.com/`.
 
 . Port number (e.g., `8080` in `http://example.com:8080/`).
   This field must match exactly between the config key and the URL.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 1/4] mailmap: add Patrick Steinhardt's work address
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2017-01-25  9:56 ` [PATCH v3 0/4] urlmatch: allow regexp-based matches Patrick Steinhardt
@ 2017-01-25  9:56 ` Patrick Steinhardt
  2017-01-25  9:56 ` [PATCH v3 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-25  9:56 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 9c87a3840..ea59205b9 100644
--- a/.mailmap
+++ b/.mailmap
@@ -177,6 +177,7 @@ Paolo Bonzini <bonzini@gnu.org> <paolo.bonzini@lu.unisi.ch>
 Pascal Obry <pascal@obry.net> <pascal.obry@gmail.com>
 Pascal Obry <pascal@obry.net> <pascal.obry@wanadoo.fr>
 Pat Notz <patnotz@gmail.com> <pknotz@sandia.gov>
+Patrick Steinhardt <ps@pks.im> <patrick.steinhardt@elego.de>
 Paul Mackerras <paulus@samba.org> <paulus@dorrigo.(none)>
 Paul Mackerras <paulus@samba.org> <paulus@pogo.(none)>
 Peter Baumann <waste.manager@gmx.de> <Peter.B.Baumann@stud.informatik.uni-erlangen.de>
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/4] urlmatch: enable normalization of URLs with globs
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (9 preceding siblings ...)
  2017-01-25  9:56 ` [PATCH v3 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
@ 2017-01-25  9:56 ` Patrick Steinhardt
  2017-01-25  9:56 ` [PATCH v3 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-25  9:56 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

The `url_normalize` function is used to validate and normalize URLs. As
such, it does not allow for some special characters to be part of the
URLs that are to be normalized. As we want to allow using globs in some
configuration keys making use of URLs, namely `http.<url>.<key>`, but
still normalize them, we need to somehow enable some additional allowed
characters.

To do this without having to change all callers of `url_normalize`,
where most do not actually want globbing at all, we split off another
function `url_normalize_1`. This function accepts an additional
parameter `allow_globs`, which is subsequently called by `url_normalize`
with `allow_globs=0`.

As of now, this function is not used with globbing enabled. A caller
will be added in the following commit.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index 132d342bc..d350478c0 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,7 +63,7 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
-char *url_normalize(const char *url, struct url_info *out_info)
+static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
 	 * Normalize NUL-terminated url using the following rules:
@@ -191,7 +191,12 @@ char *url_normalize(const char *url, struct url_info *out_info)
 		strbuf_release(&norm);
 		return NULL;
 	}
-	spanned = strspn(url, URL_HOST_CHARS);
+
+	if (allow_globs)
+		spanned = strspn(url, URL_HOST_CHARS "*");
+	else
+		spanned = strspn(url, URL_HOST_CHARS);
+
 	if (spanned < colon_ptr - url) {
 		/* Host name has invalid characters */
 		if (out_info) {
@@ -380,6 +385,11 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return result;
 }
 
+char *url_normalize(const char *url, struct url_info *out_info)
+{
+	return url_normalize_1(url, out_info, 0);
+}
+
 static size_t url_match_prefix(const char *url,
 			       const char *url_prefix,
 			       size_t url_prefix_len)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/4] urlmatch: split host and port fields in `struct url_info`
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (10 preceding siblings ...)
  2017-01-25  9:56 ` [PATCH v3 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
@ 2017-01-25  9:56 ` Patrick Steinhardt
  2017-01-25  9:56 ` [PATCH v3 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-25  9:56 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

The `url_info` structure contains information about a normalized URL
with the URL's components being represented by different fields. The
host and port part though are to be accessed by the same `host` field,
so that getting the host and/or port separately becomes more involved
than really necessary.

To make the port more readily accessible, split up the host and port
fields. Namely, the `host_len` will not include the port length anymore
and a new `port_off` field has been added which includes the offset to
the port, if available.

The only user of these fields is `url_normalize_1`. This change makes it
easier later on to treat host and port differently when introducing
globs for domains.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 16 ++++++++++++----
 urlmatch.h |  9 +++++----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index d350478c0..e328905eb 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -104,7 +104,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 	struct strbuf norm;
 	size_t spanned;
 	size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0;
-	size_t host_off=0, host_len=0, port_len=0, path_off, path_len, result_len;
+	size_t host_off=0, host_len=0, port_off=0, port_len=0, path_off, path_len, result_len;
 	const char *slash_ptr, *at_ptr, *colon_ptr, *path_start;
 	char *result;
 
@@ -263,6 +263,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 				return NULL;
 			}
 			strbuf_addch(&norm, ':');
+			port_off = norm.len;
 			strbuf_add(&norm, url, slash_ptr - url);
 			port_len = slash_ptr - url;
 		}
@@ -270,7 +271,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		url = slash_ptr;
 	}
 	if (host_off)
-		host_len = norm.len - host_off;
+		host_len = norm.len - host_off - (port_len ? port_len + 1 : 0);
 
 
 	/*
@@ -378,6 +379,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		out_info->passwd_len = passwd_len;
 		out_info->host_off = host_off;
 		out_info->host_len = host_len;
+		out_info->port_off = port_off;
 		out_info->port_len = port_len;
 		out_info->path_off = path_off;
 		out_info->path_len = path_len;
@@ -464,11 +466,17 @@ static int match_urls(const struct url_info *url,
 		usermatched = 1;
 	}
 
-	/* check the host and port */
+	/* check the host */
 	if (url_prefix->host_len != url->host_len ||
 	    strncmp(url->url + url->host_off,
 		    url_prefix->url + url_prefix->host_off, url->host_len))
-		return 0; /* host names and/or ports do not match */
+		return 0; /* host names do not match */
+
+	/* check the port */
+	if (url_prefix->port_len != url->port_len ||
+	    strncmp(url->url + url->port_off,
+		    url_prefix->url + url_prefix->port_off, url->port_len))
+		return 0; /* ports do not match */
 
 	/* check the path */
 	pathmatchlen = url_match_prefix(
diff --git a/urlmatch.h b/urlmatch.h
index 528862adc..0ea812b03 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -18,11 +18,12 @@ struct url_info {
 	size_t passwd_len;	/* length of passwd; if passwd_off != 0 but
 				   passwd_len == 0, an empty passwd was given */
 	size_t host_off;	/* offset into url to start of host name (0 => none) */
-	size_t host_len;	/* length of host name; this INCLUDES any ':portnum';
+	size_t host_len;	/* length of host name;
 				 * file urls may have host_len == 0 */
-	size_t port_len;	/* if a portnum is present (port_len != 0), it has
-				 * this length (excluding the leading ':') at the
-				 * end of the host name (always 0 for file urls) */
+	size_t port_off;	/* offset into url to start of port number (0 => none) */
+	size_t port_len;	/* if a portnum is present (port_off != 0), it has
+				 * this length (excluding the leading ':') starting
+				 * from port_off (always 0 for file urls) */
 	size_t path_off;	/* offset into url to the start of the url path;
 				 * this will always point to a '/' character
 				 * after the url has been normalized */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 4/4] urlmatch: allow globbing for the URL host part
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (11 preceding siblings ...)
  2017-01-25  9:56 ` [PATCH v3 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
@ 2017-01-25  9:56 ` Patrick Steinhardt
  2017-01-26 20:43   ` Junio C Hamano
  2017-01-27  6:21   ` Patrick Steinhardt
  2017-01-27 10:32 ` [PATCH v4 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
                   ` (11 subsequent siblings)
  24 siblings, 2 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-25  9:56 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

The URL matching function computes for two URLs whether they match not.
The match is performed by splitting up the URL into different parts and
then doing an exact comparison with the to-be-matched URL.

The main user of `urlmatch` is the configuration subsystem. It allows to
set certain configurations based on the URL which is being connected to
via keys like `http.<url>.*`. A common use case for this is to set
proxies for only some remotes which match the given URL. Unfortunately,
having exact matches for all parts of the URL can become quite tedious
in some setups. Imagine for example a corporate network where there are
dozens or even hundreds of subdomains, which would have to be configured
individually.

This commit introduces the ability to use globbing in the host-part of
the URLs. A user can simply specify a `*` as part of the host name to
match all subdomains at this level. For example adding a configuration
key `http.https://*.example.com.proxy` will match all subdomains of
`https://example.com`.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 Documentation/config.txt |  5 ++++-
 t/t1300-repo-config.sh   | 36 ++++++++++++++++++++++++++++++++++++
 urlmatch.c               | 38 ++++++++++++++++++++++++++++++++++----
 3 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 506431267..078e9b490 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1914,7 +1914,10 @@ http.<url>.*::
   must match exactly between the config key and the URL.
 
 . Host/domain name (e.g., `example.com` in `https://example.com/`).
-  This field must match exactly between the config key and the URL.
+  This field must match between the config key and the URL. It is
+  possible to specify a `*` as part of the host name to match all subdomains
+  at this level. `https://*.example.com/` for example would match
+  `https://foo.example.com/`, but not `https://foo.bar.example.com/`.
 
 . Port number (e.g., `8080` in `http://example.com:8080/`).
   This field must match exactly between the config key and the URL.
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5a2..ec545e092 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
+test_expect_success 'glob-based urlmatch' '
+	cat >.git/config <<-\EOF &&
+	[http]
+		sslVerify
+	[http "https://*.example.com"]
+		sslVerify = false
+		cookieFile = /tmp/cookie.txt
+	EOF
+
+	test_expect_code 1 git config --bool --get-urlmatch doesnt.exist https://good.example.com >actual &&
+	test_must_be_empty actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://good-example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.sslverify https://deep.nested.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo false >expect &&
+	git config --bool --get-urlmatch http.sslverify https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	{
+		echo http.cookiefile /tmp/cookie.txt &&
+		echo http.sslverify false
+	} >expect &&
+	git config --get-urlmatch HTTP https://good.example.com >actual &&
+	test_cmp expect actual
+'
+
 # good section hygiene
 test_expect_failure 'unsetting the last key in a section removes header' '
 	cat >.git/config <<-\EOF &&
diff --git a/urlmatch.c b/urlmatch.c
index e328905eb..53ff972a6 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,6 +63,38 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static int match_host(const struct url_info *url_info,
+		      const struct url_info *pattern_info)
+{
+	char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
+	char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
+	char *url_tok, *pat_tok, *url_save, *pat_save;
+	int matching;
+
+	url_tok = strtok_r(url, ".", &url_save);
+	pat_tok = strtok_r(pat, ".", &pat_save);
+
+	for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
+				   pat_tok = strtok_r(NULL, ".", &pat_save)) {
+		if (!strcmp(pat_tok, "*"))
+			continue; /* a simple glob matches everything */
+
+		if (strcmp(url_tok, pat_tok)) {
+			/* subdomains do not match */
+			matching = 0;
+			break;
+		}
+	}
+
+	/* matching if both URL and pattern are at their ends */
+	matching = (url_tok == NULL && pat_tok == NULL);
+
+	free(url);
+	free(pat);
+
+	return matching;
+}
+
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
@@ -467,9 +499,7 @@ static int match_urls(const struct url_info *url,
 	}
 
 	/* check the host */
-	if (url_prefix->host_len != url->host_len ||
-	    strncmp(url->url + url->host_off,
-		    url_prefix->url + url_prefix->host_off, url->host_len))
+	if (!match_host(url, url_prefix))
 		return 0; /* host names do not match */
 
 	/* check the port */
@@ -512,7 +542,7 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		struct url_info norm_info;
 
 		config_url = xmemdupz(key, dot - key);
-		norm_url = url_normalize(config_url, &norm_info);
+		norm_url = url_normalize_1(config_url, &norm_info, 1);
 		free(config_url);
 		if (!norm_url)
 			return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 4/4] urlmatch: allow globbing for the URL host part
  2017-01-24 17:52   ` Philip Oakley
@ 2017-01-25  9:57     ` Patrick Steinhardt
  0 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-25  9:57 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git, Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2827 bytes --]

On Tue, Jan 24, 2017 at 05:52:39PM -0000, Philip Oakley wrote:
> From: "Patrick Steinhardt" <patrick.steinhardt@elego.de>
> 
> a quick comment on the documentation part ..
> 
> > The URL matching function computes for two URLs whether they match not.
> > The match is performed by splitting up the URL into different parts and
> > then doing an exact comparison with the to-be-matched URL.
> >
> > The main user of `urlmatch` is the configuration subsystem. It allows to
> > set certain configurations based on the URL which is being connected to
> > via keys like `http.<url>.*`. A common use case for this is to set
> > proxies for only some remotes which match the given URL. Unfortunately,
> > having exact matches for all parts of the URL can become quite tedious
> > in some setups. Imagine for example a corporate network where there are
> > dozens or even hundreds of subdomains, which would have to be configured
> > individually.
> >
> > This commit introduces the ability to use globbing in the host-part of
> > the URLs. A user can simply specify a `*` as part of the host name to
> > match all subdomains at this level. For example adding a configuration
> > key `http.https://*.example.com.proxy` will match all subdomains of
> > `https://example.com`.
> >
> > Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
> > ---
> > Documentation/config.txt |  5 ++++-
> > t/t1300-repo-config.sh   | 36 ++++++++++++++++++++++++++++++++++++
> > urlmatch.c               | 38 ++++++++++++++++++++++++++++++++++----
> > 3 files changed, 74 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 506431267..a78921c2b 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -1914,7 +1914,10 @@ http.<url>.*::
> >   must match exactly between the config key and the URL.
> >
> > . Host/domain name (e.g., `example.com` in `https://example.com/`).
> > -  This field must match exactly between the config key and the URL.
> > +  This field must match between the config key and the URL. It is
> > +  possible to use globs in the config key to match all subdomains, e.g.
> > +  `https://*.example.com/` to match all subdomains of `example.com`. Note
> > +  that a glob only every matches a single part of the hostname.
> 
> [s/every/ever/ ?]
> 
> the "match all subdomains" appears to contradict the "a glob only ever 
> matches a single part ".
> 
> Maybe borrow the example from the 0/4 cover letter
> "so for example `https://foo.bar.example.com` would not match in the case of 
> `http.https://*.example.com` " (If I understood it correctly.
> 
> A simple example often clarifies much better than more words.

Thanks! (Hopefully) improved this in v3.

Regards
Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/4] urlmatch: allow globbing for the URL host part
  2017-01-25  9:56 ` [PATCH v3 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
@ 2017-01-26 20:43   ` Junio C Hamano
  2017-01-26 20:49     ` Junio C Hamano
  2017-01-27  6:21   ` Patrick Steinhardt
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2017-01-26 20:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Patrick Steinhardt, Philip Oakley

Patrick Steinhardt <patrick.steinhardt@elego.de> writes:

> The URL matching function computes for two URLs whether they match not.
> The match is performed by splitting up the URL into different parts and
> then doing an exact comparison with the to-be-matched URL.
>
> The main user of `urlmatch` is the configuration subsystem. It allows to
> set certain configurations based on the URL which is being connected to
> via keys like `http.<url>.*`. A common use case for this is to set
> proxies for only some remotes which match the given URL. Unfortunately,
> having exact matches for all parts of the URL can become quite tedious
> in some setups. Imagine for example a corporate network where there are
> dozens or even hundreds of subdomains, which would have to be configured
> individually.
>
> This commit introduces the ability to use globbing in the host-part of
> the URLs. A user can simply specify a `*` as part of the host name to
> match all subdomains at this level. For example adding a configuration
> key `http.https://*.example.com.proxy` will match all subdomains of
> `https://example.com`.

This is probably a useful improvement.

Having said that, when I mentioned "glob", I meant to also support
something like this:

	https://www[1-4].ibm.com/

And when people read "glob", that is what they expect.

So calling this "the ability to use globbing" is misleading.
The last paragraph in the log message above needs a bit of
tweaking, perhaps like this:

	Allow users to write an asterisk '*' in place of any 'host'
	or 'subdomain' label as part of the host name.  For example,
	"http.https://*.example.com.proxy" sets "http.proxy" for all
	direct subdomains of "https://example.com",
	e.g. "https://foo.example.com", but not
	"https://foo.bar.example.com".

Fortunately, your update to config.txt, which is facing the end
users, does not misuse the word and instead is explicit that the
only thing the matcher does is to match '*' to a single hierarchy.
It is clear that even http://www*.ibm.com/ is not supported from
the description, which is good.

>  . Host/domain name (e.g., `example.com` in `https://example.com/`).
> -  This field must match exactly between the config key and the URL.
> +  This field must match between the config key and the URL. It is
> +  possible to specify a `*` as part of the host name to match all subdomains
> +  at this level. `https://*.example.com/` for example would match
> +  `https://foo.example.com/`, but not `https://foo.bar.example.com/`.

This is good as-is.

>  . Port number (e.g., `8080` in `http://example.com:8080/`).
>    This field must match exactly between the config key and the URL.
> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index 923bfc5a2..ec545e092 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh
> @@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'glob-based urlmatch' '

This is not "glob".  A more generic term "wildcard" is OK.

> +	cat >.git/config <<-\EOF &&
> +	[http]
> +		sslVerify
> ...
> +static int match_host(const struct url_info *url_info,
> +		      const struct url_info *pattern_info)
> +{
> +	char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
> +	char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
> +	char *url_tok, *pat_tok, *url_save, *pat_save;
> +	int matching;
> +
> +	url_tok = strtok_r(url, ".", &url_save);
> +	pat_tok = strtok_r(pat, ".", &pat_save);

Hmph, this will be the first use of strtok_r() in our codebase.
Does everybody have it?

For a use like this where your delimiter set is a singleton, it may
be simpler to do the usual strchrnul() or memchr() based loop.  The
attached is my attempt to do so on top of this patch.

> +
> +	for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
> +				   pat_tok = strtok_r(NULL, ".", &pat_save)) {
> +		if (!strcmp(pat_tok, "*"))
> +			continue; /* a simple glob matches everything */

s/glob/asterisk/

Other than that, the patch looks OK.



diff --git a/urlmatch.c b/urlmatch.c
index 53ff972a60..8dfc7fd28a 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,36 +63,47 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static const char *end_of_token(const char *s, int c, size_t n)
+{
+	const char *next = memchr(s, c, n);
+	if (!next)
+		next = s + n;
+	return next;
+}
+
 static int match_host(const struct url_info *url_info,
 		      const struct url_info *pattern_info)
 {
-	char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
-	char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
-	char *url_tok, *pat_tok, *url_save, *pat_save;
-	int matching;
-
-	url_tok = strtok_r(url, ".", &url_save);
-	pat_tok = strtok_r(pat, ".", &pat_save);
-
-	for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
-				   pat_tok = strtok_r(NULL, ".", &pat_save)) {
-		if (!strcmp(pat_tok, "*"))
-			continue; /* a simple glob matches everything */
-
-		if (strcmp(url_tok, pat_tok)) {
-			/* subdomains do not match */
-			matching = 0;
-			break;
-		}
+	const char *url = url_info->url + url_info->host_off;
+	const char *pat = pattern_info->url + pattern_info->host_off;
+	int url_len = url_info->host_len;
+	int pat_len = pattern_info->host_len;
+
+	while (url_len && pat_len) {
+		const char *url_next = end_of_token(url, '.', url_len);
+		const char *pat_next = end_of_token(pat, '.', pat_len);
+
+		if (pat_next == pat + 1 && pat[0] == '*')
+			/* wildcard matches anything */
+			;
+		else if ((pat_next - pat) == (url_next - url) &&
+			 !memcmp(url, pat, url_next - url))
+			/* the components are the same */
+			;
+		else
+			return 0; /* found an unmatch */
+
+		if (url_next < url + url_len)
+			url_next++;
+		url_len -= url_next - url;
+		url = url_next;
+		if (pat_next < pat + pat_len)
+			pat_next++;
+		pat_len -= pat_next - pat;
+		pat = pat_next;
 	}
 
-	/* matching if both URL and pattern are at their ends */
-	matching = (url_tok == NULL && pat_tok == NULL);
-
-	free(url);
-	free(pat);
-
-	return matching;
+	return 1;
 }
 
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/4] urlmatch: allow globbing for the URL host part
  2017-01-26 20:43   ` Junio C Hamano
@ 2017-01-26 20:49     ` Junio C Hamano
  2017-01-26 21:12       ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2017-01-26 20:49 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Patrick Steinhardt, Philip Oakley

Junio C Hamano <gitster@pobox.com> writes:

> +	while (url_len && pat_len) {
> +		const char *url_next = end_of_token(url, '.', url_len);
> +		const char *pat_next = end_of_token(pat, '.', pat_len);
> + ...
>  	}
>  
> +	return 1;

Embarrassing.  The last one must be "have they both run out?" i.e.

	return (!url_len && !pat_len);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/4] urlmatch: allow globbing for the URL host part
  2017-01-26 20:49     ` Junio C Hamano
@ 2017-01-26 21:12       ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2017-01-26 21:12 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Patrick Steinhardt, Philip Oakley

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> +	while (url_len && pat_len) {
>> +		const char *url_next = end_of_token(url, '.', url_len);
>> +		const char *pat_next = end_of_token(pat, '.', pat_len);
>> + ...
>>  	}
>>  
>> +	return 1;
>
> Embarrassing.  The last one must be "have they both run out?" i.e.
>
> 	return (!url_len && !pat_len);

OK, here is my second try.  The added test piece is to catch the
silly mistake I made above.

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index ec545e0929..33fd59fbb3 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,7 +1177,7 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
-test_expect_success 'glob-based urlmatch' '
+test_expect_success 'urlmatch with wildcard' '
 	cat >.git/config <<-\EOF &&
 	[http]
 		sslVerify
@@ -1210,6 +1210,10 @@ test_expect_success 'glob-based urlmatch' '
 		echo http.sslverify false
 	} >expect &&
 	git config --get-urlmatch HTTP https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.sslverify >expect &&
+	git config --get-urlmatch HTTP https://more.example.com.au >actual &&
 	test_cmp expect actual
 '
 
diff --git a/urlmatch.c b/urlmatch.c
index 53ff972a60..0e007a3e07 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,36 +63,47 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static const char *end_of_token(const char *s, int c, size_t n)
+{
+	const char *next = memchr(s, c, n);
+	if (!next)
+		next = s + n;
+	return next;
+}
+
 static int match_host(const struct url_info *url_info,
 		      const struct url_info *pattern_info)
 {
-	char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
-	char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
-	char *url_tok, *pat_tok, *url_save, *pat_save;
-	int matching;
-
-	url_tok = strtok_r(url, ".", &url_save);
-	pat_tok = strtok_r(pat, ".", &pat_save);
-
-	for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
-				   pat_tok = strtok_r(NULL, ".", &pat_save)) {
-		if (!strcmp(pat_tok, "*"))
-			continue; /* a simple glob matches everything */
-
-		if (strcmp(url_tok, pat_tok)) {
-			/* subdomains do not match */
-			matching = 0;
-			break;
-		}
+	const char *url = url_info->url + url_info->host_off;
+	const char *pat = pattern_info->url + pattern_info->host_off;
+	int url_len = url_info->host_len;
+	int pat_len = pattern_info->host_len;
+
+	while (url_len && pat_len) {
+		const char *url_next = end_of_token(url, '.', url_len);
+		const char *pat_next = end_of_token(pat, '.', pat_len);
+
+		if (pat_next == pat + 1 && pat[0] == '*')
+			/* wildcard matches anything */
+			;
+		else if ((pat_next - pat) == (url_next - url) &&
+			 !memcmp(url, pat, url_next - url))
+			/* the components are the same */
+			;
+		else
+			return 0; /* found an unmatch */
+
+		if (url_next < url + url_len)
+			url_next++;
+		url_len -= url_next - url;
+		url = url_next;
+		if (pat_next < pat + pat_len)
+			pat_next++;
+		pat_len -= pat_next - pat;
+		pat = pat_next;
 	}
 
-	/* matching if both URL and pattern are at their ends */
-	matching = (url_tok == NULL && pat_tok == NULL);
-
-	free(url);
-	free(pat);
-
-	return matching;
+	return (!url_len && !pat_len);
 }
 
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/4] urlmatch: allow globbing for the URL host part
  2017-01-25  9:56 ` [PATCH v3 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
  2017-01-26 20:43   ` Junio C Hamano
@ 2017-01-27  6:21   ` Patrick Steinhardt
  2017-01-27 17:45     ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-27  6:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 3017 bytes --]

On Thu, Jan 26, 2017 at 12:43:31PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <patrick.steinhardt@elego.de> writes:
> 
> > The URL matching function computes for two URLs whether they match not.
> > The match is performed by splitting up the URL into different parts and
> > then doing an exact comparison with the to-be-matched URL.
> >
> > The main user of `urlmatch` is the configuration subsystem. It allows to
> > set certain configurations based on the URL which is being connected to
> > via keys like `http.<url>.*`. A common use case for this is to set
> > proxies for only some remotes which match the given URL. Unfortunately,
> > having exact matches for all parts of the URL can become quite tedious
> > in some setups. Imagine for example a corporate network where there are
> > dozens or even hundreds of subdomains, which would have to be configured
> > individually.
> >
> > This commit introduces the ability to use globbing in the host-part of
> > the URLs. A user can simply specify a `*` as part of the host name to
> > match all subdomains at this level. For example adding a configuration
> > key `http.https://*.example.com.proxy` will match all subdomains of
> > `https://example.com`.
> 
> This is probably a useful improvement.
> 
> Having said that, when I mentioned "glob", I meant to also support
> something like this:
> 
> 	https://www[1-4].ibm.com/

The problem with additional extended syntax like proposed by you
is that we would indeed need an escaping mechanism here. '[]' are
already allowed inside the host part to enable IPv6 hosts of the
form 'https://[2001:0db8:]/', so the syntax is now ambiguous. So
we have to be cautios which characters to enable for globbing
syntax. As of now, I think we can only safely include '*' and '?'
here without escaping mechanisms.

If additional use cases come up we might still extend the syntax
later on to allow for more special syntax.

> And when people read "glob", that is what they expect.
> 
> So calling this "the ability to use globbing" is misleading.
> The last paragraph in the log message above needs a bit of
> tweaking, perhaps like this:
> 
> 	Allow users to write an asterisk '*' in place of any 'host'
> 	or 'subdomain' label as part of the host name.  For example,
> 	"http.https://*.example.com.proxy" sets "http.proxy" for all
> 	direct subdomains of "https://example.com",
> 	e.g. "https://foo.example.com", but not
> 	"https://foo.bar.example.com".
> 
> Fortunately, your update to config.txt, which is facing the end
> users, does not misuse the word and instead is explicit that the
> only thing the matcher does is to match '*' to a single hierarchy.
> It is clear that even http://www*.ibm.com/ is not supported from
> the description, which is good.

I agree that globbing is the wrong word here. I'll swap in
"wildcard" where applicable.

I'll send a version 4 later on. Thanks again for your feedback
and improvements.

Regards
Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 0/5] urlmatch: allow wildcard-based matches
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (12 preceding siblings ...)
  2017-01-25  9:56 ` [PATCH v3 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
@ 2017-01-27 10:32 ` Patrick Steinhardt
  2017-01-30 22:00   ` Junio C Hamano
  2017-01-27 10:32 ` [PATCH v4 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-27 10:32 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

Hi,

so this is part four of my patch series. The previous version can
be found at [1]. The use case is to be able to configure an HTTP
proxy for all subdomains of a domain where there are hundreds of
subdomains.

Changes to the previous version:

 - applied Junio's proposed patch to replace `strtok_r` with a
   `memchr`-based loop
 - applied Junio's proposed rewrite of the commit message of
   patch 5
 - I realized that with my patches, "ranking" of URLs was broken.
   Previously, we've always taken the longest matching URL. As
   previously, only the user and path could actually differ, only
   these two components were used for the comparison. I've
   changed this now to also include the host part so that URLs
   with a longer host will take precedence. This resulted in a
   the patch 4.
 - New tests are included which examine if the precedence-rules
   are actually honored correctly. The tests are part of patches
   4 and 5.

You can find the interdiff below.

Regards
Patrick

[1]: http://public-inbox.org/git/20170125095648.4116-1-patrick.steinhardt@elego.de/T/#t

Patrick Steinhardt (5):
  mailmap: add Patrick Steinhardt's work address
  urlmatch: enable normalization of URLs with globs
  urlmatch: split host and port fields in `struct url_info`
  urlmatch: include host and port in urlmatch length
  urlmatch: allow globbing for the URL host part

 .mailmap                 |   1 +
 Documentation/config.txt |   5 +-
 t/t1300-repo-config.sh   | 105 ++++++++++++++++++++++++++++++++++++
 urlmatch.c               | 138 +++++++++++++++++++++++++++++++++++------------
 urlmatch.h               |  12 +++--
 5 files changed, 220 insertions(+), 41 deletions(-)

-- 
2.11.0

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index ec545e092..052f12021 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,7 +1177,72 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
-test_expect_success 'glob-based urlmatch' '
+test_expect_success 'urlmatch favors more specific URLs' '
+	cat >.git/config <<-\EOF &&
+	[http "https://example.com/"]
+		cookieFile = /tmp/root.txt
+	[http "https://example.com/subdirectory"]
+		cookieFile = /tmp/subdirectory.txt
+	[http "https://user@example.com/"]
+		cookieFile = /tmp/user.txt
+	[http "https://averylonguser@example.com/"]
+		cookieFile = /tmp/averylonguser.txt
+	[http "https://preceding.example.com"]
+		cookieFile = /tmp/preceding.txt
+	[http "https://*.example.com"]
+		cookieFile = /tmp/wildcard.txt
+	[http "https://*.example.com/wildcardwithsubdomain"]
+		cookieFile = /tmp/wildcardwithsubdomain.txt
+	[http "https://trailing.example.com"]
+		cookieFile = /tmp/trailing.txt
+	[http "https://user@*.example.com/"]
+		cookieFile = /tmp/wildcardwithuser.txt
+	[http "https://sub.example.com/"]
+		cookieFile = /tmp/sub.txt
+	EOF
+
+	echo http.cookiefile /tmp/root.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com/subdirectory >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com/subdirectory/nested >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/user.txt >expect &&
+	git config --get-urlmatch HTTP https://user@example.com/ >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://averylonguser@example.com/subdirectory >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/preceding.txt >expect &&
+	git config --get-urlmatch HTTP https://preceding.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/wildcard.txt >expect &&
+	git config --get-urlmatch HTTP https://wildcard.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/sub.txt >expect &&
+	git config --get-urlmatch HTTP https://sub.example.com/wildcardwithsubdomain >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/trailing.txt >expect &&
+	git config --get-urlmatch HTTP https://trailing.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/sub.txt >expect &&
+	git config --get-urlmatch HTTP https://user@sub.example.com >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'urlmatch with wildcard' '
 	cat >.git/config <<-\EOF &&
 	[http]
 		sslVerify
@@ -1210,6 +1275,10 @@ test_expect_success 'glob-based urlmatch' '
 		echo http.sslverify false
 	} >expect &&
 	git config --get-urlmatch HTTP https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.sslverify >expect &&
+	git config --get-urlmatch HTTP https://more.example.com.au >actual &&
 	test_cmp expect actual
 '
 
diff --git a/urlmatch.c b/urlmatch.c
index 53ff972a6..bb5267732 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,36 +63,47 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static const char *end_of_token(const char *s, int c, size_t n)
+{
+	const char *next = memchr(s, c, n);
+	if (!next)
+		next = s + n;
+	return next;
+}
+
 static int match_host(const struct url_info *url_info,
 		      const struct url_info *pattern_info)
 {
-	char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
-	char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
-	char *url_tok, *pat_tok, *url_save, *pat_save;
-	int matching;
+	const char *url = url_info->url + url_info->host_off;
+	const char *pat = pattern_info->url + pattern_info->host_off;
+	int url_len = url_info->host_len;
+	int pat_len = pattern_info->host_len;
 
-	url_tok = strtok_r(url, ".", &url_save);
-	pat_tok = strtok_r(pat, ".", &pat_save);
+	while (url_len && pat_len) {
+		const char *url_next = end_of_token(url, '.', url_len);
+		const char *pat_next = end_of_token(pat, '.', pat_len);
 
-	for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
-				   pat_tok = strtok_r(NULL, ".", &pat_save)) {
-		if (!strcmp(pat_tok, "*"))
-			continue; /* a simple glob matches everything */
+		if (pat_next == pat + 1 && pat[0] == '*')
+			/* wildcard matches anything */
+			;
+		else if ((pat_next - pat) == (url_next - url) &&
+			 !memcmp(url, pat, url_next - url))
+			/* the components are the same */
+			;
+		else
+			return 0; /* found an unmatch */
 
-		if (strcmp(url_tok, pat_tok)) {
-			/* subdomains do not match */
-			matching = 0;
-			break;
-		}
+		if (url_next < url + url_len)
+			url_next++;
+		url_len -= url_next - url;
+		url = url_next;
+		if (pat_next < pat + pat_len)
+			pat_next++;
+		pat_len -= pat_next - pat;
+		pat = pat_next;
 	}
 
-	/* matching if both URL and pattern are at their ends */
-	matching = (url_tok == NULL && pat_tok == NULL);
-
-	free(url);
-	free(pat);
-
-	return matching;
+	return (!url_len && !pat_len);
 }
 
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
@@ -458,7 +469,7 @@ static size_t url_match_prefix(const char *url,
 
 static int match_urls(const struct url_info *url,
 		      const struct url_info *url_prefix,
-		      int *exactusermatch)
+		      struct urlmatch_item *match)
 {
 	/*
 	 * url_prefix matches url if the scheme, host and port of url_prefix
@@ -477,8 +488,8 @@ static int match_urls(const struct url_info *url,
 	 * contained a user name or false if url_prefix did not have a
 	 * user name.  If there is no match *exactusermatch is left untouched.
 	 */
-	int usermatched = 0;
-	int pathmatchlen;
+	char usermatched = 0;
+	size_t pathmatchlen;
 
 	if (!url || !url_prefix || !url->url || !url_prefix->url)
 		return 0;
@@ -513,22 +524,38 @@ static int match_urls(const struct url_info *url,
 		url->url + url->path_off,
 		url_prefix->url + url_prefix->path_off,
 		url_prefix->url_len - url_prefix->path_off);
+	if (!pathmatchlen)
+		return 0; /* paths do not match */
 
-	if (pathmatchlen && exactusermatch)
-		*exactusermatch = usermatched;
-	return pathmatchlen;
+	if (match) {
+		match->hostmatch_len = url_prefix->host_len;
+		match->pathmatch_len = pathmatchlen;
+		match->user_matched = usermatched;
+	}
+
+	return 1;
+}
+
+static int cmp_matches(const struct urlmatch_item *a,
+		       const struct urlmatch_item *b)
+{
+	if (a->hostmatch_len != b->hostmatch_len)
+		return a->hostmatch_len < b->hostmatch_len ? -1 : 1;
+	if (a->pathmatch_len != b->pathmatch_len)
+		return a->pathmatch_len < b->pathmatch_len ? -1 : 1;
+	if (a->user_matched != b->user_matched)
+		return b->user_matched ? -1 : 1;
+	return 0;
 }
 
 int urlmatch_config_entry(const char *var, const char *value, void *cb)
 {
 	struct string_list_item *item;
 	struct urlmatch_config *collect = cb;
-	struct urlmatch_item *matched;
+	struct urlmatch_item matched;
 	struct url_info *url = &collect->url;
 	const char *key, *dot;
 	struct strbuf synthkey = STRBUF_INIT;
-	size_t matched_len = 0;
-	int user_matched = 0;
 	int retval;
 
 	if (!skip_prefix(var, collect->section, &key) || *(key++) != '.') {
@@ -546,9 +573,9 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		free(config_url);
 		if (!norm_url)
 			return 0;
-		matched_len = match_urls(url, &norm_info, &user_matched);
+		retval = match_urls(url, &norm_info, &matched);
 		free(norm_url);
-		if (!matched_len)
+		if (!retval)
 			return 0;
 		key = dot + 1;
 	}
@@ -558,24 +585,17 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 
 	item = string_list_insert(&collect->vars, key);
 	if (!item->util) {
-		matched = xcalloc(1, sizeof(*matched));
-		item->util = matched;
+		item->util = xcalloc(1, sizeof(matched));
 	} else {
-		matched = item->util;
-		/*
-		 * Is our match shorter?  Is our match the same
-		 * length, and without user while the current
-		 * candidate is with user?  Then we cannot use it.
-		 */
-		if (matched_len < matched->matched_len ||
-		    ((matched_len == matched->matched_len) &&
-		     (!user_matched && matched->user_matched)))
+		if (cmp_matches(&matched, item->util) <= 0)
+			 /*
+			  * Our match is worse than the old one,
+			  * we cannot use it.
+			  */
 			return 0;
-		/* Otherwise, replace it with this one. */
 	}
 
-	matched->matched_len = matched_len;
-	matched->user_matched = user_matched;
+	memcpy(item->util, &matched, sizeof(matched));
 	strbuf_addstr(&synthkey, collect->section);
 	strbuf_addch(&synthkey, '.');
 	strbuf_addstr(&synthkey, key);
diff --git a/urlmatch.h b/urlmatch.h
index 0ea812b03..37ee5da85 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -34,7 +34,8 @@ struct url_info {
 extern char *url_normalize(const char *, struct url_info *);
 
 struct urlmatch_item {
-	size_t matched_len;
+	size_t hostmatch_len;
+	size_t pathmatch_len;
 	char user_matched;
 };
 

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 1/5] mailmap: add Patrick Steinhardt's work address
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (13 preceding siblings ...)
  2017-01-27 10:32 ` [PATCH v4 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
@ 2017-01-27 10:32 ` Patrick Steinhardt
  2017-01-27 10:32 ` [PATCH v4 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-27 10:32 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 9c87a3840..ea59205b9 100644
--- a/.mailmap
+++ b/.mailmap
@@ -177,6 +177,7 @@ Paolo Bonzini <bonzini@gnu.org> <paolo.bonzini@lu.unisi.ch>
 Pascal Obry <pascal@obry.net> <pascal.obry@gmail.com>
 Pascal Obry <pascal@obry.net> <pascal.obry@wanadoo.fr>
 Pat Notz <patnotz@gmail.com> <pknotz@sandia.gov>
+Patrick Steinhardt <ps@pks.im> <patrick.steinhardt@elego.de>
 Paul Mackerras <paulus@samba.org> <paulus@dorrigo.(none)>
 Paul Mackerras <paulus@samba.org> <paulus@pogo.(none)>
 Peter Baumann <waste.manager@gmx.de> <Peter.B.Baumann@stud.informatik.uni-erlangen.de>
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 2/5] urlmatch: enable normalization of URLs with globs
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (14 preceding siblings ...)
  2017-01-27 10:32 ` [PATCH v4 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
@ 2017-01-27 10:32 ` Patrick Steinhardt
  2017-01-27 10:32 ` [PATCH v4 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-27 10:32 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

The `url_normalize` function is used to validate and normalize URLs. As
such, it does not allow for some special characters to be part of the
URLs that are to be normalized. As we want to allow using globs in some
configuration keys making use of URLs, namely `http.<url>.<key>`, but
still normalize them, we need to somehow enable some additional allowed
characters.

To do this without having to change all callers of `url_normalize`,
where most do not actually want globbing at all, we split off another
function `url_normalize_1`. This function accepts an additional
parameter `allow_globs`, which is subsequently called by `url_normalize`
with `allow_globs=0`.

As of now, this function is not used with globbing enabled. A caller
will be added in the following commit.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index 132d342bc..d350478c0 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,7 +63,7 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
-char *url_normalize(const char *url, struct url_info *out_info)
+static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
 	 * Normalize NUL-terminated url using the following rules:
@@ -191,7 +191,12 @@ char *url_normalize(const char *url, struct url_info *out_info)
 		strbuf_release(&norm);
 		return NULL;
 	}
-	spanned = strspn(url, URL_HOST_CHARS);
+
+	if (allow_globs)
+		spanned = strspn(url, URL_HOST_CHARS "*");
+	else
+		spanned = strspn(url, URL_HOST_CHARS);
+
 	if (spanned < colon_ptr - url) {
 		/* Host name has invalid characters */
 		if (out_info) {
@@ -380,6 +385,11 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return result;
 }
 
+char *url_normalize(const char *url, struct url_info *out_info)
+{
+	return url_normalize_1(url, out_info, 0);
+}
+
 static size_t url_match_prefix(const char *url,
 			       const char *url_prefix,
 			       size_t url_prefix_len)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 3/5] urlmatch: split host and port fields in `struct url_info`
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (15 preceding siblings ...)
  2017-01-27 10:32 ` [PATCH v4 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
@ 2017-01-27 10:32 ` Patrick Steinhardt
  2017-01-27 10:32 ` [PATCH v4 4/5] urlmatch: include host and port in urlmatch length Patrick Steinhardt
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-27 10:32 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

The `url_info` structure contains information about a normalized URL
with the URL's components being represented by different fields. The
host and port part though are to be accessed by the same `host` field,
so that getting the host and/or port separately becomes more involved
than really necessary.

To make the port more readily accessible, split up the host and port
fields. Namely, the `host_len` will not include the port length anymore
and a new `port_off` field has been added which includes the offset to
the port, if available.

The only user of these fields is `url_normalize_1`. This change makes it
easier later on to treat host and port differently when introducing
globs for domains.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 16 ++++++++++++----
 urlmatch.h |  9 +++++----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index d350478c0..e328905eb 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -104,7 +104,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 	struct strbuf norm;
 	size_t spanned;
 	size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0;
-	size_t host_off=0, host_len=0, port_len=0, path_off, path_len, result_len;
+	size_t host_off=0, host_len=0, port_off=0, port_len=0, path_off, path_len, result_len;
 	const char *slash_ptr, *at_ptr, *colon_ptr, *path_start;
 	char *result;
 
@@ -263,6 +263,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 				return NULL;
 			}
 			strbuf_addch(&norm, ':');
+			port_off = norm.len;
 			strbuf_add(&norm, url, slash_ptr - url);
 			port_len = slash_ptr - url;
 		}
@@ -270,7 +271,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		url = slash_ptr;
 	}
 	if (host_off)
-		host_len = norm.len - host_off;
+		host_len = norm.len - host_off - (port_len ? port_len + 1 : 0);
 
 
 	/*
@@ -378,6 +379,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		out_info->passwd_len = passwd_len;
 		out_info->host_off = host_off;
 		out_info->host_len = host_len;
+		out_info->port_off = port_off;
 		out_info->port_len = port_len;
 		out_info->path_off = path_off;
 		out_info->path_len = path_len;
@@ -464,11 +466,17 @@ static int match_urls(const struct url_info *url,
 		usermatched = 1;
 	}
 
-	/* check the host and port */
+	/* check the host */
 	if (url_prefix->host_len != url->host_len ||
 	    strncmp(url->url + url->host_off,
 		    url_prefix->url + url_prefix->host_off, url->host_len))
-		return 0; /* host names and/or ports do not match */
+		return 0; /* host names do not match */
+
+	/* check the port */
+	if (url_prefix->port_len != url->port_len ||
+	    strncmp(url->url + url->port_off,
+		    url_prefix->url + url_prefix->port_off, url->port_len))
+		return 0; /* ports do not match */
 
 	/* check the path */
 	pathmatchlen = url_match_prefix(
diff --git a/urlmatch.h b/urlmatch.h
index 528862adc..0ea812b03 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -18,11 +18,12 @@ struct url_info {
 	size_t passwd_len;	/* length of passwd; if passwd_off != 0 but
 				   passwd_len == 0, an empty passwd was given */
 	size_t host_off;	/* offset into url to start of host name (0 => none) */
-	size_t host_len;	/* length of host name; this INCLUDES any ':portnum';
+	size_t host_len;	/* length of host name;
 				 * file urls may have host_len == 0 */
-	size_t port_len;	/* if a portnum is present (port_len != 0), it has
-				 * this length (excluding the leading ':') at the
-				 * end of the host name (always 0 for file urls) */
+	size_t port_off;	/* offset into url to start of port number (0 => none) */
+	size_t port_len;	/* if a portnum is present (port_off != 0), it has
+				 * this length (excluding the leading ':') starting
+				 * from port_off (always 0 for file urls) */
 	size_t path_off;	/* offset into url to the start of the url path;
 				 * this will always point to a '/' character
 				 * after the url has been normalized */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 4/5] urlmatch: include host and port in urlmatch length
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (16 preceding siblings ...)
  2017-01-27 10:32 ` [PATCH v4 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
@ 2017-01-27 10:32 ` Patrick Steinhardt
  2017-01-27 10:32 ` [PATCH v4 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-27 10:32 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

In order to be able to rank positive matches by `urlmatch`, we inspect
the path length and user part to decide whether a match is better than
another match. As all other parts are matched exactly between both URLs,
this is the right thing to do right now.

In the future, though, we want to introduce wild cards for the domain
part. When doing this, though, it does not make sense anymore to only
compare the path lengths. Instead, we also want to compare the domain
lengths to determine which of both URLs matches the host part more
closely.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 t/t1300-repo-config.sh | 33 ++++++++++++++++++++++++++++
 urlmatch.c             | 59 +++++++++++++++++++++++++++++---------------------
 urlmatch.h             |  3 ++-
 3 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5a2..6c844d519 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,6 +1177,39 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
+test_expect_success 'urlmatch favors more specific URLs' '
+	cat >.git/config <<-\EOF &&
+	[http "https://example.com/"]
+		cookieFile = /tmp/root.txt
+	[http "https://example.com/subdirectory"]
+		cookieFile = /tmp/subdirectory.txt
+	[http "https://user@example.com/"]
+		cookieFile = /tmp/user.txt
+	[http "https://averylonguser@example.com/"]
+		cookieFile = /tmp/averylonguser.txt
+	EOF
+
+	echo http.cookiefile /tmp/root.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com/subdirectory >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com/subdirectory/nested >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/user.txt >expect &&
+	git config --get-urlmatch HTTP https://user@example.com/ >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://averylonguser@example.com/subdirectory >actual &&
+	test_cmp expect actual
+'
+
 # good section hygiene
 test_expect_failure 'unsetting the last key in a section removes header' '
 	cat >.git/config <<-\EOF &&
diff --git a/urlmatch.c b/urlmatch.c
index e328905eb..f35d00a6e 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -426,7 +426,7 @@ static size_t url_match_prefix(const char *url,
 
 static int match_urls(const struct url_info *url,
 		      const struct url_info *url_prefix,
-		      int *exactusermatch)
+		      struct urlmatch_item *match)
 {
 	/*
 	 * url_prefix matches url if the scheme, host and port of url_prefix
@@ -445,8 +445,8 @@ static int match_urls(const struct url_info *url,
 	 * contained a user name or false if url_prefix did not have a
 	 * user name.  If there is no match *exactusermatch is left untouched.
 	 */
-	int usermatched = 0;
-	int pathmatchlen;
+	char usermatched = 0;
+	size_t pathmatchlen;
 
 	if (!url || !url_prefix || !url->url || !url_prefix->url)
 		return 0;
@@ -483,22 +483,38 @@ static int match_urls(const struct url_info *url,
 		url->url + url->path_off,
 		url_prefix->url + url_prefix->path_off,
 		url_prefix->url_len - url_prefix->path_off);
+	if (!pathmatchlen)
+		return 0; /* paths do not match */
 
-	if (pathmatchlen && exactusermatch)
-		*exactusermatch = usermatched;
-	return pathmatchlen;
+	if (match) {
+		match->hostmatch_len = url_prefix->host_len;
+		match->pathmatch_len = pathmatchlen;
+		match->user_matched = usermatched;
+	}
+
+	return 1;
+}
+
+static int cmp_matches(const struct urlmatch_item *a,
+		       const struct urlmatch_item *b)
+{
+	if (a->hostmatch_len != b->hostmatch_len)
+		return a->hostmatch_len < b->hostmatch_len ? -1 : 1;
+	if (a->pathmatch_len != b->pathmatch_len)
+		return a->pathmatch_len < b->pathmatch_len ? -1 : 1;
+	if (a->user_matched != b->user_matched)
+		return b->user_matched ? -1 : 1;
+	return 0;
 }
 
 int urlmatch_config_entry(const char *var, const char *value, void *cb)
 {
 	struct string_list_item *item;
 	struct urlmatch_config *collect = cb;
-	struct urlmatch_item *matched;
+	struct urlmatch_item matched;
 	struct url_info *url = &collect->url;
 	const char *key, *dot;
 	struct strbuf synthkey = STRBUF_INIT;
-	size_t matched_len = 0;
-	int user_matched = 0;
 	int retval;
 
 	if (!skip_prefix(var, collect->section, &key) || *(key++) != '.') {
@@ -516,9 +532,9 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		free(config_url);
 		if (!norm_url)
 			return 0;
-		matched_len = match_urls(url, &norm_info, &user_matched);
+		retval = match_urls(url, &norm_info, &matched);
 		free(norm_url);
-		if (!matched_len)
+		if (!retval)
 			return 0;
 		key = dot + 1;
 	}
@@ -528,24 +544,17 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 
 	item = string_list_insert(&collect->vars, key);
 	if (!item->util) {
-		matched = xcalloc(1, sizeof(*matched));
-		item->util = matched;
+		item->util = xcalloc(1, sizeof(matched));
 	} else {
-		matched = item->util;
-		/*
-		 * Is our match shorter?  Is our match the same
-		 * length, and without user while the current
-		 * candidate is with user?  Then we cannot use it.
-		 */
-		if (matched_len < matched->matched_len ||
-		    ((matched_len == matched->matched_len) &&
-		     (!user_matched && matched->user_matched)))
+		if (cmp_matches(&matched, item->util) <= 0)
+			 /*
+			  * Our match is worse than the old one,
+			  * we cannot use it.
+			  */
 			return 0;
-		/* Otherwise, replace it with this one. */
 	}
 
-	matched->matched_len = matched_len;
-	matched->user_matched = user_matched;
+	memcpy(item->util, &matched, sizeof(matched));
 	strbuf_addstr(&synthkey, collect->section);
 	strbuf_addch(&synthkey, '.');
 	strbuf_addstr(&synthkey, key);
diff --git a/urlmatch.h b/urlmatch.h
index 0ea812b03..37ee5da85 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -34,7 +34,8 @@ struct url_info {
 extern char *url_normalize(const char *, struct url_info *);
 
 struct urlmatch_item {
-	size_t matched_len;
+	size_t hostmatch_len;
+	size_t pathmatch_len;
 	char user_matched;
 };
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 5/5] urlmatch: allow globbing for the URL host part
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (17 preceding siblings ...)
  2017-01-27 10:32 ` [PATCH v4 4/5] urlmatch: include host and port in urlmatch length Patrick Steinhardt
@ 2017-01-27 10:32 ` Patrick Steinhardt
  2017-01-31  9:01 ` [PATCH v5 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-27 10:32 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt, Philip Oakley

The URL matching function computes for two URLs whether they match not.
The match is performed by splitting up the URL into different parts and
then doing an exact comparison with the to-be-matched URL.

The main user of `urlmatch` is the configuration subsystem. It allows to
set certain configurations based on the URL which is being connected to
via keys like `http.<url>.*`. A common use case for this is to set
proxies for only some remotes which match the given URL. Unfortunately,
having exact matches for all parts of the URL can become quite tedious
in some setups. Imagine for example a corporate network where there are
dozens or even hundreds of subdomains, which would have to be configured
individually.

Allow users to write an asterisk '*' in place of any 'host' or
'subdomain' label as part of the host name.  For example,
"http.https://*.example.com.proxy" sets "http.proxy" for all direct
subdomains of "https://example.com", e.g. "https://foo.example.com", but
not "https://foo.bar.example.com".

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |  5 +++-
 t/t1300-repo-config.sh   | 72 ++++++++++++++++++++++++++++++++++++++++++++++++
 urlmatch.c               | 49 +++++++++++++++++++++++++++++---
 3 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 506431267..078e9b490 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1914,7 +1914,10 @@ http.<url>.*::
   must match exactly between the config key and the URL.
 
 . Host/domain name (e.g., `example.com` in `https://example.com/`).
-  This field must match exactly between the config key and the URL.
+  This field must match between the config key and the URL. It is
+  possible to specify a `*` as part of the host name to match all subdomains
+  at this level. `https://*.example.com/` for example would match
+  `https://foo.example.com/`, but not `https://foo.bar.example.com/`.
 
 . Port number (e.g., `8080` in `http://example.com:8080/`).
   This field must match exactly between the config key and the URL.
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 6c844d519..052f12021 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1187,6 +1187,18 @@ test_expect_success 'urlmatch favors more specific URLs' '
 		cookieFile = /tmp/user.txt
 	[http "https://averylonguser@example.com/"]
 		cookieFile = /tmp/averylonguser.txt
+	[http "https://preceding.example.com"]
+		cookieFile = /tmp/preceding.txt
+	[http "https://*.example.com"]
+		cookieFile = /tmp/wildcard.txt
+	[http "https://*.example.com/wildcardwithsubdomain"]
+		cookieFile = /tmp/wildcardwithsubdomain.txt
+	[http "https://trailing.example.com"]
+		cookieFile = /tmp/trailing.txt
+	[http "https://user@*.example.com/"]
+		cookieFile = /tmp/wildcardwithuser.txt
+	[http "https://sub.example.com/"]
+		cookieFile = /tmp/sub.txt
 	EOF
 
 	echo http.cookiefile /tmp/root.txt >expect &&
@@ -1207,6 +1219,66 @@ test_expect_success 'urlmatch favors more specific URLs' '
 
 	echo http.cookiefile /tmp/subdirectory.txt >expect &&
 	git config --get-urlmatch HTTP https://averylonguser@example.com/subdirectory >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/preceding.txt >expect &&
+	git config --get-urlmatch HTTP https://preceding.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/wildcard.txt >expect &&
+	git config --get-urlmatch HTTP https://wildcard.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/sub.txt >expect &&
+	git config --get-urlmatch HTTP https://sub.example.com/wildcardwithsubdomain >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/trailing.txt >expect &&
+	git config --get-urlmatch HTTP https://trailing.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/sub.txt >expect &&
+	git config --get-urlmatch HTTP https://user@sub.example.com >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'urlmatch with wildcard' '
+	cat >.git/config <<-\EOF &&
+	[http]
+		sslVerify
+	[http "https://*.example.com"]
+		sslVerify = false
+		cookieFile = /tmp/cookie.txt
+	EOF
+
+	test_expect_code 1 git config --bool --get-urlmatch doesnt.exist https://good.example.com >actual &&
+	test_must_be_empty actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://good-example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.sslverify https://deep.nested.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo false >expect &&
+	git config --bool --get-urlmatch http.sslverify https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	{
+		echo http.cookiefile /tmp/cookie.txt &&
+		echo http.sslverify false
+	} >expect &&
+	git config --get-urlmatch HTTP https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.sslverify >expect &&
+	git config --get-urlmatch HTTP https://more.example.com.au >actual &&
 	test_cmp expect actual
 '
 
diff --git a/urlmatch.c b/urlmatch.c
index f35d00a6e..bb5267732 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,6 +63,49 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static const char *end_of_token(const char *s, int c, size_t n)
+{
+	const char *next = memchr(s, c, n);
+	if (!next)
+		next = s + n;
+	return next;
+}
+
+static int match_host(const struct url_info *url_info,
+		      const struct url_info *pattern_info)
+{
+	const char *url = url_info->url + url_info->host_off;
+	const char *pat = pattern_info->url + pattern_info->host_off;
+	int url_len = url_info->host_len;
+	int pat_len = pattern_info->host_len;
+
+	while (url_len && pat_len) {
+		const char *url_next = end_of_token(url, '.', url_len);
+		const char *pat_next = end_of_token(pat, '.', pat_len);
+
+		if (pat_next == pat + 1 && pat[0] == '*')
+			/* wildcard matches anything */
+			;
+		else if ((pat_next - pat) == (url_next - url) &&
+			 !memcmp(url, pat, url_next - url))
+			/* the components are the same */
+			;
+		else
+			return 0; /* found an unmatch */
+
+		if (url_next < url + url_len)
+			url_next++;
+		url_len -= url_next - url;
+		url = url_next;
+		if (pat_next < pat + pat_len)
+			pat_next++;
+		pat_len -= pat_next - pat;
+		pat = pat_next;
+	}
+
+	return (!url_len && !pat_len);
+}
+
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
@@ -467,9 +510,7 @@ static int match_urls(const struct url_info *url,
 	}
 
 	/* check the host */
-	if (url_prefix->host_len != url->host_len ||
-	    strncmp(url->url + url->host_off,
-		    url_prefix->url + url_prefix->host_off, url->host_len))
+	if (!match_host(url, url_prefix))
 		return 0; /* host names do not match */
 
 	/* check the port */
@@ -528,7 +569,7 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		struct url_info norm_info;
 
 		config_url = xmemdupz(key, dot - key);
-		norm_url = url_normalize(config_url, &norm_info);
+		norm_url = url_normalize_1(config_url, &norm_info, 1);
 		free(config_url);
 		if (!norm_url)
 			return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/4] urlmatch: allow globbing for the URL host part
  2017-01-27  6:21   ` Patrick Steinhardt
@ 2017-01-27 17:45     ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2017-01-27 17:45 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Patrick Steinhardt, Philip Oakley

Patrick Steinhardt <patrick.steinhardt@elego.de> writes:

>> This is probably a useful improvement.
>> 
>> Having said that, when I mentioned "glob", I meant to also support
>> something like this:
>> 
>> 	https://www[1-4].ibm.com/
>
> The problem with additional extended syntax like proposed by you
> is that we would indeed need an escaping mechanism here.

True.  I think a true shell globbing is overkill (so is regexp) and
just a simple wildcarding with '*' would be a good first step that
is easy to explain and later extend as needed.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/5] urlmatch: allow wildcard-based matches
  2017-01-27 10:32 ` [PATCH v4 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
@ 2017-01-30 22:00   ` Junio C Hamano
  2017-01-30 22:52     ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2017-01-30 22:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Patrick Steinhardt, Philip Oakley

Patrick Steinhardt <patrick.steinhardt@elego.de> writes:

>  - I realized that with my patches, "ranking" of URLs was broken.
>    Previously, we've always taken the longest matching URL. As
>    previously, only the user and path could actually differ, only
>    these two components were used for the comparison. I've
>    changed this now to also include the host part so that URLs
>    with a longer host will take precedence. This resulted in a
>    the patch 4.

Good thinking.  I was wondering about this, too.

Thanks.  Will read it through and replace.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/5] urlmatch: allow wildcard-based matches
  2017-01-30 22:00   ` Junio C Hamano
@ 2017-01-30 22:52     ` Junio C Hamano
  2017-01-31  8:26       ` Patrick Steinhardt
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2017-01-30 22:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Patrick Steinhardt, Philip Oakley

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <patrick.steinhardt@elego.de> writes:
>
>>  - I realized that with my patches, "ranking" of URLs was broken.
>>    Previously, we've always taken the longest matching URL. As
>>    previously, only the user and path could actually differ, only
>>    these two components were used for the comparison. I've
>>    changed this now to also include the host part so that URLs
>>    with a longer host will take precedence. This resulted in a
>>    the patch 4.
>
> Good thinking.  I was wondering about this, too.
>
> Thanks.  Will read it through and replace.

Ugh.  When applied on top of 4e59582ff7 ("Seventh batch for 2.12",
2017-01-23), Git fails its self-test in t5551 #31 (I do not run with
EXPENSIVE so some tests liks #30 are skipped, if it matters).


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/5] urlmatch: allow wildcard-based matches
  2017-01-30 22:52     ` Junio C Hamano
@ 2017-01-31  8:26       ` Patrick Steinhardt
  0 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-31  8:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

On Mon, Jan 30, 2017 at 02:52:00PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Patrick Steinhardt <patrick.steinhardt@elego.de> writes:
> >
> >>  - I realized that with my patches, "ranking" of URLs was broken.
> >>    Previously, we've always taken the longest matching URL. As
> >>    previously, only the user and path could actually differ, only
> >>    these two components were used for the comparison. I've
> >>    changed this now to also include the host part so that URLs
> >>    with a longer host will take precedence. This resulted in a
> >>    the patch 4.
> >
> > Good thinking.  I was wondering about this, too.
> >
> > Thanks.  Will read it through and replace.
> 
> Ugh.  When applied on top of 4e59582ff7 ("Seventh batch for 2.12",
> 2017-01-23), Git fails its self-test in t5551 #31 (I do not run with
> EXPENSIVE so some tests liks #30 are skipped, if it matters).
> 

Thanks for letting me know. I didn't have any errors on my other
machine, but was actually able to reproduce test failures on this
one.

Embarassingly, I forgot to zero-initialize the `struct
urlmatch_item`, leading to undefined behavior when searching for
configuration keys without a dot inside. This is the case for
nearly all keys, except for example the ones with a URL inside.

Will re-send a fixed version.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 0/5] urlmatch: allow wildcard-based matches
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (18 preceding siblings ...)
  2017-01-27 10:32 ` [PATCH v4 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt
@ 2017-01-31  9:01 ` Patrick Steinhardt
  2017-01-31  9:01 ` [PATCH v5 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-31  9:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano

Hi,

this is version 5 of my patch series. The previous version can
be found at [1]. The use case is to be able to configure an HTTP
proxy for all subdomains of a domain where there are hundreds of
subdomains.

This includes only a single change, interdiff is included below.
The previous version had an embarassing bug because a variable
was not properly initialized in all cases, leading to undefined
behavior. I also verified that the patches work on top of
4e59582ff (Seventh batch for 2.12, 2017-01-23), where Junio
reported the test failures.

Regards
Patrick

Patrick Steinhardt (5):
  mailmap: add Patrick Steinhardt's work address
  urlmatch: enable normalization of URLs with globs
  urlmatch: split host and port fields in `struct url_info`
  urlmatch: include host in urlmatch ranking
  urlmatch: allow globbing for the URL host part

 .mailmap                 |   1 +
 Documentation/config.txt |   5 +-
 t/t1300-repo-config.sh   | 105 ++++++++++++++++++++++++++++++++++++
 urlmatch.c               | 138 +++++++++++++++++++++++++++++++++++------------
 urlmatch.h               |  12 +++--
 5 files changed, 220 insertions(+), 41 deletions(-)

-- 
2.11.0

diff --git a/urlmatch.c b/urlmatch.c
index bb5267732..6c12f1a48 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -552,7 +552,7 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 {
 	struct string_list_item *item;
 	struct urlmatch_config *collect = cb;
-	struct urlmatch_item matched;
+	struct urlmatch_item matched = {0};
 	struct url_info *url = &collect->url;
 	const char *key, *dot;
 	struct strbuf synthkey = STRBUF_INIT;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 1/5] mailmap: add Patrick Steinhardt's work address
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (19 preceding siblings ...)
  2017-01-31  9:01 ` [PATCH v5 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
@ 2017-01-31  9:01 ` Patrick Steinhardt
  2017-01-31  9:01 ` [PATCH v5 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-31  9:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

From: Patrick Steinhardt <patrick.steinhardt@elego.de>

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 9c87a3840..ea59205b9 100644
--- a/.mailmap
+++ b/.mailmap
@@ -177,6 +177,7 @@ Paolo Bonzini <bonzini@gnu.org> <paolo.bonzini@lu.unisi.ch>
 Pascal Obry <pascal@obry.net> <pascal.obry@gmail.com>
 Pascal Obry <pascal@obry.net> <pascal.obry@wanadoo.fr>
 Pat Notz <patnotz@gmail.com> <pknotz@sandia.gov>
+Patrick Steinhardt <ps@pks.im> <patrick.steinhardt@elego.de>
 Paul Mackerras <paulus@samba.org> <paulus@dorrigo.(none)>
 Paul Mackerras <paulus@samba.org> <paulus@pogo.(none)>
 Peter Baumann <waste.manager@gmx.de> <Peter.B.Baumann@stud.informatik.uni-erlangen.de>
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 2/5] urlmatch: enable normalization of URLs with globs
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (20 preceding siblings ...)
  2017-01-31  9:01 ` [PATCH v5 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
@ 2017-01-31  9:01 ` Patrick Steinhardt
  2017-01-31  9:01 ` [PATCH v5 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-31  9:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

From: Patrick Steinhardt <patrick.steinhardt@elego.de>

The `url_normalize` function is used to validate and normalize URLs. As
such, it does not allow for some special characters to be part of the
URLs that are to be normalized. As we want to allow using globs in some
configuration keys making use of URLs, namely `http.<url>.<key>`, but
still normalize them, we need to somehow enable some additional allowed
characters.

To do this without having to change all callers of `url_normalize`,
where most do not actually want globbing at all, we split off another
function `url_normalize_1`. This function accepts an additional
parameter `allow_globs`, which is subsequently called by `url_normalize`
with `allow_globs=0`.

As of now, this function is not used with globbing enabled. A caller
will be added in the following commit.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index 132d342bc..d350478c0 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,7 +63,7 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
-char *url_normalize(const char *url, struct url_info *out_info)
+static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
 	 * Normalize NUL-terminated url using the following rules:
@@ -191,7 +191,12 @@ char *url_normalize(const char *url, struct url_info *out_info)
 		strbuf_release(&norm);
 		return NULL;
 	}
-	spanned = strspn(url, URL_HOST_CHARS);
+
+	if (allow_globs)
+		spanned = strspn(url, URL_HOST_CHARS "*");
+	else
+		spanned = strspn(url, URL_HOST_CHARS);
+
 	if (spanned < colon_ptr - url) {
 		/* Host name has invalid characters */
 		if (out_info) {
@@ -380,6 +385,11 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return result;
 }
 
+char *url_normalize(const char *url, struct url_info *out_info)
+{
+	return url_normalize_1(url, out_info, 0);
+}
+
 static size_t url_match_prefix(const char *url,
 			       const char *url_prefix,
 			       size_t url_prefix_len)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 3/5] urlmatch: split host and port fields in `struct url_info`
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (21 preceding siblings ...)
  2017-01-31  9:01 ` [PATCH v5 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
@ 2017-01-31  9:01 ` Patrick Steinhardt
  2017-01-31  9:01 ` [PATCH v5 4/5] urlmatch: include host in urlmatch ranking Patrick Steinhardt
  2017-01-31  9:01 ` [PATCH v5 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-31  9:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

From: Patrick Steinhardt <patrick.steinhardt@elego.de>

The `url_info` structure contains information about a normalized URL
with the URL's components being represented by different fields. The
host and port part though are to be accessed by the same `host` field,
so that getting the host and/or port separately becomes more involved
than really necessary.

To make the port more readily accessible, split up the host and port
fields. Namely, the `host_len` will not include the port length anymore
and a new `port_off` field has been added which includes the offset to
the port, if available.

The only user of these fields is `url_normalize_1`. This change makes it
easier later on to treat host and port differently when introducing
globs for domains.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 urlmatch.c | 16 ++++++++++++----
 urlmatch.h |  9 +++++----
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index d350478c0..e328905eb 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -104,7 +104,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 	struct strbuf norm;
 	size_t spanned;
 	size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0;
-	size_t host_off=0, host_len=0, port_len=0, path_off, path_len, result_len;
+	size_t host_off=0, host_len=0, port_off=0, port_len=0, path_off, path_len, result_len;
 	const char *slash_ptr, *at_ptr, *colon_ptr, *path_start;
 	char *result;
 
@@ -263,6 +263,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 				return NULL;
 			}
 			strbuf_addch(&norm, ':');
+			port_off = norm.len;
 			strbuf_add(&norm, url, slash_ptr - url);
 			port_len = slash_ptr - url;
 		}
@@ -270,7 +271,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		url = slash_ptr;
 	}
 	if (host_off)
-		host_len = norm.len - host_off;
+		host_len = norm.len - host_off - (port_len ? port_len + 1 : 0);
 
 
 	/*
@@ -378,6 +379,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		out_info->passwd_len = passwd_len;
 		out_info->host_off = host_off;
 		out_info->host_len = host_len;
+		out_info->port_off = port_off;
 		out_info->port_len = port_len;
 		out_info->path_off = path_off;
 		out_info->path_len = path_len;
@@ -464,11 +466,17 @@ static int match_urls(const struct url_info *url,
 		usermatched = 1;
 	}
 
-	/* check the host and port */
+	/* check the host */
 	if (url_prefix->host_len != url->host_len ||
 	    strncmp(url->url + url->host_off,
 		    url_prefix->url + url_prefix->host_off, url->host_len))
-		return 0; /* host names and/or ports do not match */
+		return 0; /* host names do not match */
+
+	/* check the port */
+	if (url_prefix->port_len != url->port_len ||
+	    strncmp(url->url + url->port_off,
+		    url_prefix->url + url_prefix->port_off, url->port_len))
+		return 0; /* ports do not match */
 
 	/* check the path */
 	pathmatchlen = url_match_prefix(
diff --git a/urlmatch.h b/urlmatch.h
index 528862adc..0ea812b03 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -18,11 +18,12 @@ struct url_info {
 	size_t passwd_len;	/* length of passwd; if passwd_off != 0 but
 				   passwd_len == 0, an empty passwd was given */
 	size_t host_off;	/* offset into url to start of host name (0 => none) */
-	size_t host_len;	/* length of host name; this INCLUDES any ':portnum';
+	size_t host_len;	/* length of host name;
 				 * file urls may have host_len == 0 */
-	size_t port_len;	/* if a portnum is present (port_len != 0), it has
-				 * this length (excluding the leading ':') at the
-				 * end of the host name (always 0 for file urls) */
+	size_t port_off;	/* offset into url to start of port number (0 => none) */
+	size_t port_len;	/* if a portnum is present (port_off != 0), it has
+				 * this length (excluding the leading ':') starting
+				 * from port_off (always 0 for file urls) */
 	size_t path_off;	/* offset into url to the start of the url path;
 				 * this will always point to a '/' character
 				 * after the url has been normalized */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 4/5] urlmatch: include host in urlmatch ranking
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (22 preceding siblings ...)
  2017-01-31  9:01 ` [PATCH v5 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
@ 2017-01-31  9:01 ` Patrick Steinhardt
  2017-01-31  9:01 ` [PATCH v5 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-31  9:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

From: Patrick Steinhardt <patrick.steinhardt@elego.de>

In order to be able to rank positive matches by `urlmatch`, we inspect
the path length and user part to decide whether a match is better than
another match. As all other parts are matched exactly between both URLs,
this is the correct thing to do right now.

In the future, though, we want to introduce wild cards for the domain
part. When doing this, it does not make sense anymore to only compare
the path lengths. Instead, we also want to compare the domain lengths to
determine which of both URLs matches the host part more closely.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
---
 t/t1300-repo-config.sh | 33 ++++++++++++++++++++++++++++
 urlmatch.c             | 59 +++++++++++++++++++++++++++++---------------------
 urlmatch.h             |  3 ++-
 3 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5a2..6c844d519 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1177,6 +1177,39 @@ test_expect_success 'urlmatch' '
 	test_cmp expect actual
 '
 
+test_expect_success 'urlmatch favors more specific URLs' '
+	cat >.git/config <<-\EOF &&
+	[http "https://example.com/"]
+		cookieFile = /tmp/root.txt
+	[http "https://example.com/subdirectory"]
+		cookieFile = /tmp/subdirectory.txt
+	[http "https://user@example.com/"]
+		cookieFile = /tmp/user.txt
+	[http "https://averylonguser@example.com/"]
+		cookieFile = /tmp/averylonguser.txt
+	EOF
+
+	echo http.cookiefile /tmp/root.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com/subdirectory >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://example.com/subdirectory/nested >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/user.txt >expect &&
+	git config --get-urlmatch HTTP https://user@example.com/ >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/subdirectory.txt >expect &&
+	git config --get-urlmatch HTTP https://averylonguser@example.com/subdirectory >actual &&
+	test_cmp expect actual
+'
+
 # good section hygiene
 test_expect_failure 'unsetting the last key in a section removes header' '
 	cat >.git/config <<-\EOF &&
diff --git a/urlmatch.c b/urlmatch.c
index e328905eb..f79887825 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -426,7 +426,7 @@ static size_t url_match_prefix(const char *url,
 
 static int match_urls(const struct url_info *url,
 		      const struct url_info *url_prefix,
-		      int *exactusermatch)
+		      struct urlmatch_item *match)
 {
 	/*
 	 * url_prefix matches url if the scheme, host and port of url_prefix
@@ -445,8 +445,8 @@ static int match_urls(const struct url_info *url,
 	 * contained a user name or false if url_prefix did not have a
 	 * user name.  If there is no match *exactusermatch is left untouched.
 	 */
-	int usermatched = 0;
-	int pathmatchlen;
+	char usermatched = 0;
+	size_t pathmatchlen;
 
 	if (!url || !url_prefix || !url->url || !url_prefix->url)
 		return 0;
@@ -483,22 +483,38 @@ static int match_urls(const struct url_info *url,
 		url->url + url->path_off,
 		url_prefix->url + url_prefix->path_off,
 		url_prefix->url_len - url_prefix->path_off);
+	if (!pathmatchlen)
+		return 0; /* paths do not match */
 
-	if (pathmatchlen && exactusermatch)
-		*exactusermatch = usermatched;
-	return pathmatchlen;
+	if (match) {
+		match->hostmatch_len = url_prefix->host_len;
+		match->pathmatch_len = pathmatchlen;
+		match->user_matched = usermatched;
+	}
+
+	return 1;
+}
+
+static int cmp_matches(const struct urlmatch_item *a,
+		       const struct urlmatch_item *b)
+{
+	if (a->hostmatch_len != b->hostmatch_len)
+		return a->hostmatch_len < b->hostmatch_len ? -1 : 1;
+	if (a->pathmatch_len != b->pathmatch_len)
+		return a->pathmatch_len < b->pathmatch_len ? -1 : 1;
+	if (a->user_matched != b->user_matched)
+		return b->user_matched ? -1 : 1;
+	return 0;
 }
 
 int urlmatch_config_entry(const char *var, const char *value, void *cb)
 {
 	struct string_list_item *item;
 	struct urlmatch_config *collect = cb;
-	struct urlmatch_item *matched;
+	struct urlmatch_item matched = {0};
 	struct url_info *url = &collect->url;
 	const char *key, *dot;
 	struct strbuf synthkey = STRBUF_INIT;
-	size_t matched_len = 0;
-	int user_matched = 0;
 	int retval;
 
 	if (!skip_prefix(var, collect->section, &key) || *(key++) != '.') {
@@ -516,9 +532,9 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		free(config_url);
 		if (!norm_url)
 			return 0;
-		matched_len = match_urls(url, &norm_info, &user_matched);
+		retval = match_urls(url, &norm_info, &matched);
 		free(norm_url);
-		if (!matched_len)
+		if (!retval)
 			return 0;
 		key = dot + 1;
 	}
@@ -528,24 +544,17 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 
 	item = string_list_insert(&collect->vars, key);
 	if (!item->util) {
-		matched = xcalloc(1, sizeof(*matched));
-		item->util = matched;
+		item->util = xcalloc(1, sizeof(matched));
 	} else {
-		matched = item->util;
-		/*
-		 * Is our match shorter?  Is our match the same
-		 * length, and without user while the current
-		 * candidate is with user?  Then we cannot use it.
-		 */
-		if (matched_len < matched->matched_len ||
-		    ((matched_len == matched->matched_len) &&
-		     (!user_matched && matched->user_matched)))
+		if (cmp_matches(&matched, item->util) <= 0)
+			 /*
+			  * Our match is worse than the old one,
+			  * we cannot use it.
+			  */
 			return 0;
-		/* Otherwise, replace it with this one. */
 	}
 
-	matched->matched_len = matched_len;
-	matched->user_matched = user_matched;
+	memcpy(item->util, &matched, sizeof(matched));
 	strbuf_addstr(&synthkey, collect->section);
 	strbuf_addch(&synthkey, '.');
 	strbuf_addstr(&synthkey, key);
diff --git a/urlmatch.h b/urlmatch.h
index 0ea812b03..37ee5da85 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -34,7 +34,8 @@ struct url_info {
 extern char *url_normalize(const char *, struct url_info *);
 
 struct urlmatch_item {
-	size_t matched_len;
+	size_t hostmatch_len;
+	size_t pathmatch_len;
 	char user_matched;
 };
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 5/5] urlmatch: allow globbing for the URL host part
  2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
                   ` (23 preceding siblings ...)
  2017-01-31  9:01 ` [PATCH v5 4/5] urlmatch: include host in urlmatch ranking Patrick Steinhardt
@ 2017-01-31  9:01 ` Patrick Steinhardt
  24 siblings, 0 replies; 37+ messages in thread
From: Patrick Steinhardt @ 2017-01-31  9:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano, Patrick Steinhardt

From: Patrick Steinhardt <patrick.steinhardt@elego.de>

The URL matching function computes for two URLs whether they match not.
The match is performed by splitting up the URL into different parts and
then doing an exact comparison with the to-be-matched URL.

The main user of `urlmatch` is the configuration subsystem. It allows to
set certain configurations based on the URL which is being connected to
via keys like `http.<url>.*`. A common use case for this is to set
proxies for only some remotes which match the given URL. Unfortunately,
having exact matches for all parts of the URL can become quite tedious
in some setups. Imagine for example a corporate network where there are
dozens or even hundreds of subdomains, which would have to be configured
individually.

Allow users to write an asterisk '*' in place of any 'host' or
'subdomain' label as part of the host name.  For example,
"http.https://*.example.com.proxy" sets "http.proxy" for all direct
subdomains of "https://example.com", e.g. "https://foo.example.com", but
not "https://foo.bar.example.com".

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |  5 +++-
 t/t1300-repo-config.sh   | 72 ++++++++++++++++++++++++++++++++++++++++++++++++
 urlmatch.c               | 49 +++++++++++++++++++++++++++++---
 3 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index af2ae4cc0..ee155d8a6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1914,7 +1914,10 @@ http.<url>.*::
   must match exactly between the config key and the URL.
 
 . Host/domain name (e.g., `example.com` in `https://example.com/`).
-  This field must match exactly between the config key and the URL.
+  This field must match between the config key and the URL. It is
+  possible to specify a `*` as part of the host name to match all subdomains
+  at this level. `https://*.example.com/` for example would match
+  `https://foo.example.com/`, but not `https://foo.bar.example.com/`.
 
 . Port number (e.g., `8080` in `http://example.com:8080/`).
   This field must match exactly between the config key and the URL.
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 6c844d519..052f12021 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1187,6 +1187,18 @@ test_expect_success 'urlmatch favors more specific URLs' '
 		cookieFile = /tmp/user.txt
 	[http "https://averylonguser@example.com/"]
 		cookieFile = /tmp/averylonguser.txt
+	[http "https://preceding.example.com"]
+		cookieFile = /tmp/preceding.txt
+	[http "https://*.example.com"]
+		cookieFile = /tmp/wildcard.txt
+	[http "https://*.example.com/wildcardwithsubdomain"]
+		cookieFile = /tmp/wildcardwithsubdomain.txt
+	[http "https://trailing.example.com"]
+		cookieFile = /tmp/trailing.txt
+	[http "https://user@*.example.com/"]
+		cookieFile = /tmp/wildcardwithuser.txt
+	[http "https://sub.example.com/"]
+		cookieFile = /tmp/sub.txt
 	EOF
 
 	echo http.cookiefile /tmp/root.txt >expect &&
@@ -1207,6 +1219,66 @@ test_expect_success 'urlmatch favors more specific URLs' '
 
 	echo http.cookiefile /tmp/subdirectory.txt >expect &&
 	git config --get-urlmatch HTTP https://averylonguser@example.com/subdirectory >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/preceding.txt >expect &&
+	git config --get-urlmatch HTTP https://preceding.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/wildcard.txt >expect &&
+	git config --get-urlmatch HTTP https://wildcard.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/sub.txt >expect &&
+	git config --get-urlmatch HTTP https://sub.example.com/wildcardwithsubdomain >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/trailing.txt >expect &&
+	git config --get-urlmatch HTTP https://trailing.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.cookiefile /tmp/sub.txt >expect &&
+	git config --get-urlmatch HTTP https://user@sub.example.com >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'urlmatch with wildcard' '
+	cat >.git/config <<-\EOF &&
+	[http]
+		sslVerify
+	[http "https://*.example.com"]
+		sslVerify = false
+		cookieFile = /tmp/cookie.txt
+	EOF
+
+	test_expect_code 1 git config --bool --get-urlmatch doesnt.exist https://good.example.com >actual &&
+	test_must_be_empty actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.SSLverify https://good-example.com >actual &&
+	test_cmp expect actual &&
+
+	echo true >expect &&
+	git config --bool --get-urlmatch http.sslverify https://deep.nested.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo false >expect &&
+	git config --bool --get-urlmatch http.sslverify https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	{
+		echo http.cookiefile /tmp/cookie.txt &&
+		echo http.sslverify false
+	} >expect &&
+	git config --get-urlmatch HTTP https://good.example.com >actual &&
+	test_cmp expect actual &&
+
+	echo http.sslverify >expect &&
+	git config --get-urlmatch HTTP https://more.example.com.au >actual &&
 	test_cmp expect actual
 '
 
diff --git a/urlmatch.c b/urlmatch.c
index f79887825..6c12f1a48 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,6 +63,49 @@ static int append_normalized_escapes(struct strbuf *buf,
 	return 1;
 }
 
+static const char *end_of_token(const char *s, int c, size_t n)
+{
+	const char *next = memchr(s, c, n);
+	if (!next)
+		next = s + n;
+	return next;
+}
+
+static int match_host(const struct url_info *url_info,
+		      const struct url_info *pattern_info)
+{
+	const char *url = url_info->url + url_info->host_off;
+	const char *pat = pattern_info->url + pattern_info->host_off;
+	int url_len = url_info->host_len;
+	int pat_len = pattern_info->host_len;
+
+	while (url_len && pat_len) {
+		const char *url_next = end_of_token(url, '.', url_len);
+		const char *pat_next = end_of_token(pat, '.', pat_len);
+
+		if (pat_next == pat + 1 && pat[0] == '*')
+			/* wildcard matches anything */
+			;
+		else if ((pat_next - pat) == (url_next - url) &&
+			 !memcmp(url, pat, url_next - url))
+			/* the components are the same */
+			;
+		else
+			return 0; /* found an unmatch */
+
+		if (url_next < url + url_len)
+			url_next++;
+		url_len -= url_next - url;
+		url = url_next;
+		if (pat_next < pat + pat_len)
+			pat_next++;
+		pat_len -= pat_next - pat;
+		pat = pat_next;
+	}
+
+	return (!url_len && !pat_len);
+}
+
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
@@ -467,9 +510,7 @@ static int match_urls(const struct url_info *url,
 	}
 
 	/* check the host */
-	if (url_prefix->host_len != url->host_len ||
-	    strncmp(url->url + url->host_off,
-		    url_prefix->url + url_prefix->host_off, url->host_len))
+	if (!match_host(url, url_prefix))
 		return 0; /* host names do not match */
 
 	/* check the port */
@@ -528,7 +569,7 @@ int urlmatch_config_entry(const char *var, const char *value, void *cb)
 		struct url_info norm_info;
 
 		config_url = xmemdupz(key, dot - key);
-		norm_url = url_normalize(config_url, &norm_info);
+		norm_url = url_normalize_1(config_url, &norm_info, 1);
 		free(config_url);
 		if (!norm_url)
 			return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-01-31  9:09 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
2017-01-23 13:06 ` [PATCH v1 1/2] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-23 13:06 ` [PATCH v1 2/2] urlmatch: allow regex-based URL matching Patrick Steinhardt
2017-01-23 19:53 ` [PATCH v1 0/2] urlmatch: allow regexp-based matches Junio C Hamano
2017-01-24 11:29   ` Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 0/4] " Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
2017-01-24 17:52   ` Philip Oakley
2017-01-25  9:57     ` Patrick Steinhardt
2017-01-25  9:56 ` [PATCH v3 0/4] urlmatch: allow regexp-based matches Patrick Steinhardt
2017-01-25  9:56 ` [PATCH v3 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-25  9:56 ` [PATCH v3 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-25  9:56 ` [PATCH v3 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-25  9:56 ` [PATCH v3 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
2017-01-26 20:43   ` Junio C Hamano
2017-01-26 20:49     ` Junio C Hamano
2017-01-26 21:12       ` Junio C Hamano
2017-01-27  6:21   ` Patrick Steinhardt
2017-01-27 17:45     ` Junio C Hamano
2017-01-27 10:32 ` [PATCH v4 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
2017-01-30 22:00   ` Junio C Hamano
2017-01-30 22:52     ` Junio C Hamano
2017-01-31  8:26       ` Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 4/5] urlmatch: include host and port in urlmatch length Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt
2017-01-31  9:01 ` [PATCH v5 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
2017-01-31  9:01 ` [PATCH v5 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-31  9:01 ` [PATCH v5 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-31  9:01 ` [PATCH v5 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-31  9:01 ` [PATCH v5 4/5] urlmatch: include host in urlmatch ranking Patrick Steinhardt
2017-01-31  9:01 ` [PATCH v5 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).