All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations
@ 2009-03-11 15:17 Johan Sørensen
  2009-03-11 15:58 ` Johannes Sixt
  0 siblings, 1 reply; 14+ messages in thread
From: Johan Sørensen @ 2009-03-11 15:17 UTC (permalink / raw)
  To: git; +Cc: Johan Sørensen

The argument is an executable script that will receive the path to the repos
the client wishes to clone as an argument. It is then the responsibility of the
script to return a zero-terminated string on its stdout with the real path of
the target repository.

This buys us a lot of flexibility when it comes to managing different
repositories, possibly located in many different dirs, but with a uniform
url-structure to the outside world.
---
 Documentation/git-daemon.txt |    7 ++++
 daemon.c                     |   75 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 81 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
index 36f00ae..1eca344 100644
--- a/Documentation/git-daemon.txt
+++ b/Documentation/git-daemon.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 	     [--strict-paths] [--base-path=path] [--base-path-relaxed]
 	     [--user-path | --user-path=path]
 	     [--interpolated-path=pathtemplate]
+	     [--path-filter=executable]
 	     [--reuseaddr] [--detach] [--pid-file=file]
 	     [--enable=service] [--disable=service]
 	     [--allow-override=service] [--forbid-override=service]
@@ -71,6 +72,12 @@ OPTIONS
 	After interpolation, the path is validated against the directory
 	whitelist.
 
+--path-filter=executable::
+	To support a more flexible directory layout a path filter script 
+	can be used. The executable will receive the requested path from
+	the client as arg0. The executable must return a zero-terminated
+	string on stdout which is the real path 'git-daemon' should serve.
+
 --export-all::
 	Allow pulling from all directories that look like GIT repositories
 	(have the 'objects' and 'refs' subdirectories), even if they
diff --git a/daemon.c b/daemon.c
index d93cf96..b2571df 100644
--- a/daemon.c
+++ b/daemon.c
@@ -22,6 +22,7 @@ static const char daemon_usage[] =
 "           [--strict-paths] [--base-path=path] [--base-path-relaxed]\n"
 "           [--user-path | --user-path=path]\n"
 "           [--interpolated-path=path]\n"
+"           [--path-filter=path]\n"
 "           [--reuseaddr] [--detach] [--pid-file=file]\n"
 "           [--[enable|disable|allow-override|forbid-override]=service]\n"
 "           [--inetd | [--listen=host_or_ipaddr] [--port=n]\n"
@@ -58,6 +59,10 @@ static char *canon_hostname;
 static char *ip_address;
 static char *tcp_port;
 
+/* if defined, the script will be executed with the requested path on stdin
+ * and _must_ return with a successful exitcode and the new path on stdout */
+static char *path_filter_script;
+
 static void logreport(int priority, const char *err, va_list params)
 {
 	if (log_syslog) {
@@ -287,9 +292,62 @@ static int git_daemon_config(const char *var, const char *value, void *cb)
 	return 0;
 }
 
+static char *run_path_filter_script(char *requested_dir) {
+	pid_t pid;
+	char result[256]; /* arbitary */
+	char *real_path;
+	int pipe_out[2];
+	int exit_code = 1;
+
+	pipe(pipe_out);
+
+	loginfo("Executing path filter script: '%s %s'", path_filter_script, requested_dir);
+
+	switch ((pid = fork())) {
+		case -1:
+			logerror("path filter script fork() failed: %s", strerror(errno));
+			return NULL;
+		case 0:
+		close(pipe_out[0]);
+		dup2(pipe_out[1], 1);
+		close(pipe_out[1]);
+
+		execl(path_filter_script, path_filter_script, requested_dir, NULL);
+
+		/* execl failed if we got here */
+		logerror("path filter script execl() failed: %s", strerror(errno));
+		return NULL;
+	default:
+		close(pipe_out[1]);
+	
+		while(waitpid(pid, &exit_code, 0) < 0) {
+			switch(errno) {
+			case EINTR:
+			continue;
+			default:
+				logerror("path filter script waitpid() fail: %s", strerror(errno));
+				goto waitpid_error;
+			}
+		}
+		if (WIFEXITED(exit_code) && WEXITSTATUS(exit_code) == 0) {
+			read(pipe_out[0], result, sizeof(result) - 1);
+			loginfo("path filter script result: %s", result);
+		}
+		waitpid_error:
+		close(pipe_out[0]);
+	}
+
+	if (result) {
+		real_path = result;
+		return real_path;
+	}
+	return NULL;
+}
+
 static int run_service(char *dir, struct daemon_service *service)
 {
 	const char *path;
+	char *real_dir;
 	int enabled = service->enabled;
 
 	loginfo("Request %s for '%s'", service->name, dir);
@@ -299,8 +357,19 @@ static int run_service(char *dir, struct daemon_service *service)
 		errno = EACCES;
 		return -1;
 	}
+	loginfo("path_filter_script %s", path_filter_script);
+	if (!path_filter_script) {
+		real_dir = dir;
+	} else {
+		char *tdir;
+		if ((tdir = run_path_filter_script(dir))) {
+			real_dir = tdir;
+		} else {
+			real_dir = dir;
+		}
+	}
 
-	if (!(path = path_ok(dir)))
+	if (!(path = path_ok(real_dir)))
 		return -1;
 
 	/*
@@ -1018,6 +1087,10 @@ int main(int argc, char **argv)
 			pid_file = arg + 11;
 			continue;
 		}
+		if (!prefixcmp(arg, "--path-filter=")) {
+			path_filter_script = arg + 14;
+			continue;
+		}
 		if (!strcmp(arg, "--detach")) {
 			detach = 1;
 			log_syslog = 1;
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations
  2009-03-11 15:17 [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations Johan Sørensen
@ 2009-03-11 15:58 ` Johannes Sixt
  2009-03-12 10:13   ` Johan Sørensen
  2009-03-12 10:26   ` Johan Sørensen
  0 siblings, 2 replies; 14+ messages in thread
From: Johannes Sixt @ 2009-03-11 15:58 UTC (permalink / raw)
  To: Johan Sørensen; +Cc: git

Johan Sørensen schrieb:
> The argument is an executable script that will receive the path to the repos
> the client wishes to clone as an argument. It is then the responsibility of the
> script to return a zero-terminated string on its stdout with the real path of
> the target repository.
> 
> This buys us a lot of flexibility when it comes to managing different
> repositories, possibly located in many different dirs, but with a uniform
> url-structure to the outside world.

It's the first time that I see a deamon with this feature - except perhaps
Apache's ModRewrite. Are you sure you are not working around your problem
at the wrong place?

Doesn't --interpolated-path already solve your problem? If not, then you
at least you must describe in the documentation the use-cases when
--path-filter should be preferred.

Your implementation does not pass the target hostname to the script, but
it should; otherwise you lose flexibility (for virtual hosting).

> +static char *run_path_filter_script(char *requested_dir) {
> +	pid_t pid;
> +	char result[256]; /* arbitary */
> +	char *real_path;
> +	int pipe_out[2];
> +	int exit_code = 1;
> +
> +	pipe(pipe_out);
> +
> +	loginfo("Executing path filter script: '%s %s'", path_filter_script, requested_dir);
> +
> +	switch ((pid = fork())) {
> +		case -1:
> +			logerror("path filter script fork() failed: %s", strerror(errno));
> +			return NULL;
> +		case 0:
> +		close(pipe_out[0]);
> +		dup2(pipe_out[1], 1);
> +		close(pipe_out[1]);
> +
> +		execl(path_filter_script, path_filter_script, requested_dir, NULL);

Use start_command()/finish_command() instead of rolling your own fork/exec
combo.

-- Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations
  2009-03-11 15:58 ` Johannes Sixt
@ 2009-03-12 10:13   ` Johan Sørensen
  2009-03-12 11:29     ` Johannes Schindelin
  2009-03-12 10:26   ` Johan Sørensen
  1 sibling, 1 reply; 14+ messages in thread
From: Johan Sørensen @ 2009-03-12 10:13 UTC (permalink / raw)
  To: git; +Cc: Johan Sørensen

The parameter for filter-path is an executable that will receive the service
name, the client hostname and path to the repos the client requests as as
arguments. It is then the responsibility of the script to return a zero
terminated string on its stdout with the real path of the target repository.
---
 Documentation/git-daemon.txt |   13 ++++++++++
 daemon.c                     |   53 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
index 36f00ae..efd1687 100644
--- a/Documentation/git-daemon.txt
+++ b/Documentation/git-daemon.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 	     [--strict-paths] [--base-path=path] [--base-path-relaxed]
 	     [--user-path | --user-path=path]
 	     [--interpolated-path=pathtemplate]
+	     [--path-filter=executable]
 	     [--reuseaddr] [--detach] [--pid-file=file]
 	     [--enable=service] [--disable=service]
 	     [--allow-override=service] [--forbid-override=service]
@@ -71,6 +72,18 @@ OPTIONS
 	After interpolation, the path is validated against the directory
 	whitelist.
 
+--path-filter=executable::
+	To support a more flexible directory layout a path filter script 
+	can be used. The executable will receive the service name (upload-pack, 
+	upload-archive or receive-pack), the client hostname and the request git 
+	directory as arguments. The executable must return a zero-terminated string
+	on stdout which is the real path 'git-daemon' should serve. This is useful
+	when --interpolated-path doesn't buy you enough flexibility. You could for
+	instance keep support for old clone urls if you rename your repository, or
+	fetch a custom url-mapping from a third-party repo manager application, or
+	map deeply nested repository directories to a more sensible layout for the 
+	outside world.
+
 --export-all::
 	Allow pulling from all directories that look like GIT repositories
 	(have the 'objects' and 'refs' subdirectories), even if they
diff --git a/daemon.c b/daemon.c
index d93cf96..e6777c6 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "pkt-line.h"
 #include "exec_cmd.h"
+#include "run-command.h"
 
 #include <syslog.h>
 
@@ -22,6 +23,7 @@ static const char daemon_usage[] =
 "           [--strict-paths] [--base-path=path] [--base-path-relaxed]\n"
 "           [--user-path | --user-path=path]\n"
 "           [--interpolated-path=path]\n"
+"           [--path-filter=path]\n"
 "           [--reuseaddr] [--detach] [--pid-file=file]\n"
 "           [--[enable|disable|allow-override|forbid-override]=service]\n"
 "           [--inetd | [--listen=host_or_ipaddr] [--port=n]\n"
@@ -58,6 +60,10 @@ static char *canon_hostname;
 static char *ip_address;
 static char *tcp_port;
 
+/* if defined, the script will be executed with the requested path on stdin
+ * and _must_ return with a successful exitcode and the new path on stdout */
+static char *path_filter_script;
+
 static void logreport(int priority, const char *err, va_list params)
 {
 	if (log_syslog) {
@@ -287,6 +293,37 @@ static int git_daemon_config(const char *var, const char *value, void *cb)
 	return 0;
 }
 
+static char *run_path_filter_script(struct daemon_service *s, char *host, char *dir) {
+	char result[256]; /* arbitary */
+	char *real_path;
+	struct child_process filter_cmd;
+	const char *args[] = { path_filter_script, s->name, host, dir, NULL };
+
+	loginfo("Executing path filter script: '%s %s'", path_filter_script, dir);
+	memset(&filter_cmd, 0, sizeof(filter_cmd));
+	filter_cmd.argv = args;
+	filter_cmd.out = -1;
+	
+	if (start_command(&filter_cmd)) {
+		logerror("path filter: unable to fork path_filter_script");
+		return NULL;
+	}
+	
+	read(filter_cmd.out, result, sizeof(result) - 1);
+	
+	close(filter_cmd.out);
+	if (finish_command(&filter_cmd)) {
+		logerror("path filter died with strange error");
+		return NULL;
+	}
+
+	if (result) {
+		real_path = result;
+		return real_path;
+	}
+	return NULL;
+}
+
 static int run_service(char *dir, struct daemon_service *service)
 {
 	const char *path;
@@ -495,6 +532,7 @@ static void parse_extra_args(char *extra_args, int buflen)
 static int execute(struct sockaddr *addr)
 {
 	static char line[1000];
+	char *path;
 	int pktlen, len, i;
 
 	if (addr) {
@@ -553,11 +591,20 @@ static int execute(struct sockaddr *addr)
 		if (!prefixcmp(line, "git-") &&
 		    !strncmp(s->name, line + 4, namelen) &&
 		    line[namelen + 4] == ' ') {
+			path = line + namelen + 5;
+			if (path_filter_script) {
+				loginfo("path_filter_script %s for path %s", path_filter_script, path);
+				char *tdir;
+				if ((tdir = run_path_filter_script(s, hostname, path))) {
+					path = tdir;
+				}
+			}
+			
 			/*
 			 * Note: The directory here is probably context sensitive,
 			 * and might depend on the actual service being performed.
 			 */
-			return run_service(line + namelen + 5, s);
+			return run_service(path, s);
 		}
 	}
 
@@ -1018,6 +1065,10 @@ int main(int argc, char **argv)
 			pid_file = arg + 11;
 			continue;
 		}
+		if (!prefixcmp(arg, "--path-filter=")) {
+			path_filter_script = arg + 14;
+			continue;
+		}
 		if (!strcmp(arg, "--detach")) {
 			detach = 1;
 			log_syslog = 1;
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-11 15:58 ` Johannes Sixt
  2009-03-12 10:13   ` Johan Sørensen
@ 2009-03-12 10:26   ` Johan Sørensen
  1 sibling, 0 replies; 14+ messages in thread
From: Johan Sørensen @ 2009-03-12 10:26 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On Wed, Mar 11, 2009 at 4:58 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Johan Sørensen schrieb:
>> This buys us a lot of flexibility when it comes to managing different
>> repositories, possibly located in many different dirs, but with a uniform
>> url-structure to the outside world.
>
> It's the first time that I see a deamon with this feature - except perhaps
> Apache's ModRewrite. Are you sure you are not working around your problem
> at the wrong place?
>
> Doesn't --interpolated-path already solve your problem? If not, then you
> at least you must describe in the documentation the use-cases when
> --path-filter should be preferred.

Maybe I am barking up the wrong tree, but here's my real-world use
case: I'm currently working on some bigger changes for gitorious.org,
where the repository url-structure could potentially change over time,
as a consequence of various features. Using the path-filter script I
can keep the old urls around and still working, and I can map any url
to a on-disk uniquely hashed path, so I don't have to move the files
around, maintain symlinks and so forth for information the gitorious
application already has nicely structured and easy to lookup.

I know these may be highly specialized needs, but so is
interpolated-path for the common user. I think this patch could be
useful for anyone else wanting to set up a flexible repo hosting
system. I think the url-structure is a major part of the UI for any
app exposing them, even for a git-daemon, so the mod_rewrite
comparison isn't too far fetched in my opinion...

> Your implementation does not pass the target hostname to the script, but
> it should; otherwise you lose flexibility (for virtual hosting).

Good point. I've added the hostname as well as the service name as
arguments for the script.

>> +     switch ((pid = fork())) {
[snip]
>
> Use start_command()/finish_command() instead of rolling your own fork/exec
> combo.

Ah nice! I'm sending an updated patch.


>
> -- Hannes
>

Cheers,
JS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations
  2009-03-12 10:13   ` Johan Sørensen
@ 2009-03-12 11:29     ` Johannes Schindelin
  2009-03-12 15:48       ` Johan Sørensen
  2009-03-12 19:06       ` Johan Sørensen
  0 siblings, 2 replies; 14+ messages in thread
From: Johannes Schindelin @ 2009-03-12 11:29 UTC (permalink / raw)
  To: Johan Sørensen; +Cc: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4612 bytes --]

Hi,

Disclaimer: if you are offended by constructive criticism, or likely to
answer with insults to the comments I offer, please stop reading this mail
now (and please do not answer my mail, either). :-)

Still with me?  Good.  Nice to meet you.

Just for the record: responding to a patch is my strongest way of saying
that I appreciate your work.

On Thu, 12 Mar 2009, Johan Sørensen wrote:

> The parameter for filter-path is an executable that will receive the 
> service name, the client hostname and path to the repos the client 
> requests as as arguments. It is then the responsibility of the script to 
> return a zero terminated string on its stdout with the real path of the 
> target repository.
> ---

A sign-off is missing...

More importantly, you might want to point out the security concerns of 
running a script with the full permissions of git-daemon.  (AFAICT from 
your patch you are not dropping any privileges at any point.)

Which brings me to another idea: we have quite a few places in Git where 
we use regular expressions.  Would they not be enough for your use case?

> diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
> index 36f00ae..efd1687 100644
> --- a/Documentation/git-daemon.txt
> +++ b/Documentation/git-daemon.txt
> @@ -71,6 +72,18 @@ OPTIONS
>  	After interpolation, the path is validated against the directory
>  	whitelist.
>  
> +--path-filter=executable::
> +	To support a more flexible directory layout a path filter script 
> +	can be used. The executable will receive the service name (upload-pack, 
> +	upload-archive or receive-pack), the client hostname and the request git 
> +	directory as arguments. The executable must return a zero-terminated string
> +	on stdout which is the real path 'git-daemon' should serve. This is useful
> +	when --interpolated-path doesn't buy you enough flexibility. You could for
> +	instance keep support for old clone urls if you rename your repository, or
> +	fetch a custom url-mapping from a third-party repo manager application, or
> +	map deeply nested repository directories to a more sensible layout for the 
> +	outside world.

Please keep the lines shorter than 81 characters.

> diff --git a/daemon.c b/daemon.c
> index d93cf96..e6777c6 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -287,6 +293,37 @@ static int git_daemon_config(const char *var, const char *value, void *cb)
>  	return 0;
>  }
>  
> +static char *run_path_filter_script(struct daemon_service *s, char *host, char *dir) {

Again, pretty long line.  (I will refrain from saying that for every long 
line, but please cooperate by pretending I did ;-)

But there is more: what about concurrent accesses?

> +	char result[256]; /* arbitary */

Why not PATH_MAX?

> +	char *real_path;
> +	struct child_process filter_cmd;
> +	const char *args[] = { path_filter_script, s->name, host, dir, NULL };
> +
> +	loginfo("Executing path filter script: '%s %s'", path_filter_script, dir);
> +	memset(&filter_cmd, 0, sizeof(filter_cmd));
> +	filter_cmd.argv = args;
> +	filter_cmd.out = -1;
> +	
> +	if (start_command(&filter_cmd)) {
> +		logerror("path filter: unable to fork path_filter_script");
> +		return NULL;
> +	}
> +	
> +	read(filter_cmd.out, result, sizeof(result) - 1);

No error checking?

BTW we do have strbuf_read(), which would solve your "static char *" 
problem nicely.

> +	close(filter_cmd.out);
> +	if (finish_command(&filter_cmd)) {
> +		logerror("path filter died with strange error");
> +		return NULL;
> +	}
> +
> +	if (result) {
> +		real_path = result;
> +		return real_path;
> +	}
> +	return NULL;

What would be the difference if you wrote

	return result;

instead?

> @@ -495,6 +532,7 @@ static void parse_extra_args(char *extra_args, int buflen)
>  static int execute(struct sockaddr *addr)
>  {
>  	static char line[1000];
> +	char *path;

Is it not rather "const char *"?  But that point would be moot should you 
decide to use strbufs.

> @@ -553,11 +591,20 @@ static int execute(struct sockaddr *addr)
>  		if (!prefixcmp(line, "git-") &&
>  		    !strncmp(s->name, line + 4, namelen) &&
>  		    line[namelen + 4] == ' ') {
> +			path = line + namelen + 5;
> +			if (path_filter_script) {
> +				loginfo("path_filter_script %s for path %s", path_filter_script, path);
> +				char *tdir;

Declaration after a call to a function.

> +				if ((tdir = run_path_filter_script(s, hostname, path))) {
> +					path = tdir;
> +				}

Unnecessary curly brackets.

And your code would be even easier to read if your 
run_path_filter_script() would never return NULL, but the unchanged path 
instead.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations
  2009-03-12 11:29     ` Johannes Schindelin
@ 2009-03-12 15:48       ` Johan Sørensen
  2009-03-12 16:50         ` Johannes Schindelin
  2009-03-12 19:06       ` Johan Sørensen
  1 sibling, 1 reply; 14+ messages in thread
From: Johan Sørensen @ 2009-03-12 15:48 UTC (permalink / raw)
  To: git; +Cc: Johan Sørensen

The parameter for filter-path is an executable that will receive the service
name, the client hostname and path to the repos the client requests as as
arguments. It is then the responsibility of the script to return a zero
terminated string on its stdout with the real path of the target repository.

Signed-off-by: Johan Sørensen <johan@johansorensen.com>
---
 Documentation/git-daemon.txt |   15 +++++++++++
 daemon.c                     |   54 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
index 36f00ae..bf8d31f 100644
--- a/Documentation/git-daemon.txt
+++ b/Documentation/git-daemon.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 	     [--strict-paths] [--base-path=path] [--base-path-relaxed]
 	     [--user-path | --user-path=path]
 	     [--interpolated-path=pathtemplate]
+	     [--path-filter=executable]
 	     [--reuseaddr] [--detach] [--pid-file=file]
 	     [--enable=service] [--disable=service]
 	     [--allow-override=service] [--forbid-override=service]
@@ -71,6 +72,20 @@ OPTIONS
 	After interpolation, the path is validated against the directory
 	whitelist.
 
+--path-filter=executable::
+	To support a more flexible directory layout a path filter script
+	can be used. The executable will receive the service name (upload-pack,
+	upload-archive or receive-pack), the client hostname and the request git
+	directory as arguments. The executable must return a zero-terminated
+	string on stdout which is the real path 'git-daemon' should serve. This
+	is useful when --interpolated-path doesn't buy you enough flexibility.
+	You could for instance keep support for old clone urls if you rename your
+	repository, or fetch a custom url-mapping from a third-party repo manager
+	application, or	map deeply nested repository directories to a more
+	sensible layout for the outside world.
+	Please be aware that the executable spawned will have the same privileges
+	as the user you are running the git-daemon under.
+
 --export-all::
 	Allow pulling from all directories that look like GIT repositories
 	(have the 'objects' and 'refs' subdirectories), even if they
diff --git a/daemon.c b/daemon.c
index d93cf96..e865e78 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "pkt-line.h"
 #include "exec_cmd.h"
+#include "run-command.h"
 
 #include <syslog.h>
 
@@ -22,6 +23,7 @@ static const char daemon_usage[] =
 "           [--strict-paths] [--base-path=path] [--base-path-relaxed]\n"
 "           [--user-path | --user-path=path]\n"
 "           [--interpolated-path=path]\n"
+"           [--path-filter=path]\n"
 "           [--reuseaddr] [--detach] [--pid-file=file]\n"
 "           [--[enable|disable|allow-override|forbid-override]=service]\n"
 "           [--inetd | [--listen=host_or_ipaddr] [--port=n]\n"
@@ -58,6 +60,11 @@ static char *canon_hostname;
 static char *ip_address;
 static char *tcp_port;
 
+/* if defined, the script will be executed with the service name, hostname,
+ * and requested path on stdin and _must_ return with a successful exitcode
+ * and the new path on stdout */
+static char *path_filter_script;
+
 static void logreport(int priority, const char *err, va_list params)
 {
 	if (log_syslog) {
@@ -287,6 +294,42 @@ static int git_daemon_config(const char *var, const char *value, void *cb)
 	return 0;
 }
 
+static char *run_path_filter_script(struct daemon_service *s, char *host,
+			    char *dir) {
+	struct strbuf result_buf = STRBUF_INIT;
+	struct child_process filter_cmd;
+	const char *args[] = { path_filter_script, s->name, host, dir, NULL };
+
+	loginfo("Executing path filter script: '%s %s %s %s'",
+					path_filter_script, s->name, host, dir);
+	memset(&filter_cmd, 0, sizeof(filter_cmd));
+	filter_cmd.argv = args;
+	filter_cmd.out = -1;
+
+	if (start_command(&filter_cmd)) {
+		logerror("path filter: unable to fork path_filter_script");
+		return dir;
+	}
+
+	if (strbuf_read(&result_buf, filter_cmd.out, PATH_MAX) < 0) {
+		strbuf_release(&result_buf);
+		close(filter_cmd.out);
+		logerror("path filter: script read returned %s", strerror(errno));
+		return dir;
+	}
+
+	close(filter_cmd.out);
+	if (finish_command(&filter_cmd)) {
+		logerror("path filter script died with strange error");
+		return dir;
+	}
+
+	if (result_buf.len > 0)
+		dir = strbuf_detach(&result_buf, NULL);
+
+	return dir;
+}
+
 static int run_service(char *dir, struct daemon_service *service)
 {
 	const char *path;
@@ -557,7 +600,12 @@ static int execute(struct sockaddr *addr)
 			 * Note: The directory here is probably context sensitive,
 			 * and might depend on the actual service being performed.
 			 */
-			return run_service(line + namelen + 5, s);
+			if (path_filter_script) {
+				return run_service(run_path_filter_script(s, hostname,
+				                   line + namelen + 5), s);
+			} else {
+				return run_service(line + namelen + 5, s);
+			}
 		}
 	}
 
@@ -1018,6 +1066,10 @@ int main(int argc, char **argv)
 			pid_file = arg + 11;
 			continue;
 		}
+		if (!prefixcmp(arg, "--path-filter=")) {
+			path_filter_script = arg + 14;
+			continue;
+		}
 		if (!strcmp(arg, "--detach")) {
 			detach = 1;
 			log_syslog = 1;
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations
  2009-03-12 15:48       ` Johan Sørensen
@ 2009-03-12 16:50         ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2009-03-12 16:50 UTC (permalink / raw)
  To: Johan Sørensen; +Cc: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 575 bytes --]

Hi,

On Thu, 12 Mar 2009, Johan Sørensen wrote:

> The parameter for filter-path is an executable that will receive the service
> name, the client hostname and path to the repos the client requests as as
> arguments. It is then the responsibility of the script to return a zero
> terminated string on its stdout with the real path of the target repository.
> 
> Signed-off-by: Johan Sørensen <johan@johansorensen.com>

I see that you addressed some, but not all of my concerns.  Do you think 
that I am wrong?  Then please, by all means, increase my knowledge.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-12 11:29     ` Johannes Schindelin
  2009-03-12 15:48       ` Johan Sørensen
@ 2009-03-12 19:06       ` Johan Sørensen
  2009-03-14  6:58         ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Johan Sørensen @ 2009-03-12 19:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Thu, Mar 12, 2009 at 12:29 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

(all my comments below refer to my latest patch)

> More importantly, you might want to point out the security concerns of
> running a script with the full permissions of git-daemon.  (AFAICT from
> your patch you are not dropping any privileges at any point.)

Do you really think this is needed? It doesn't seem like running the
hook scripts does anything more than trusting the script author and
permissions of the hook scripts (?). I see the path-filter script
exactly the same way, with the exception of having to double-check the
user supplied path the script receives.

> Which brings me to another idea: we have quite a few places in Git where
> we use regular expressions.  Would they not be enough for your use case?

Hmm, please do elaborate on your idea. If you mean being able to
supply a bunch of regexp mappings when starting the daemon then it
wouldn't cut it for me; I'm in need of something more dynamic.

>> diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
>> index 36f00ae..efd1687 100644
>> --- a/Documentation/git-daemon.txt
>> +++ b/Documentation/git-daemon.txt
>> @@ -71,6 +72,18 @@ OPTIONS
[snip]
> Please keep the lines shorter than 81 characters.

I believe the longest line I've added in the docs is 77.

> But there is more: what about concurrent accesses?

The external path-filter script is run from the execute(), which is
forked+exec'ed for each incoming connection to the daemon, so that
would mean a concurrency of one in that child-process, unless I've
missed something in the code path?

>> +     read(filter_cmd.out, result, sizeof(result) - 1);
>
> No error checking?
>
> BTW we do have strbuf_read(), which would solve your "static char *"
> problem nicely.

I'm using strbuf_read() now, but this being my very first git patch, I
may have misunderstood the api slightly?

> And your code would be even easier to read if your
> run_path_filter_script() would never return NULL, but the unchanged path
> instead.

Done.

Thanks for giving my patch the run-through. I'm still curious to hear
what people think about the idea in general though!

>
> Ciao,
> Dscho
>

Cheers,
JS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-12 19:06       ` Johan Sørensen
@ 2009-03-14  6:58         ` Junio C Hamano
  2009-03-14 14:39           ` Johan Sørensen
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2009-03-14  6:58 UTC (permalink / raw)
  To: Johan Sørensen; +Cc: Johannes Schindelin, git

Johan Sørensen <johan@johansorensen.com> writes:

>> More importantly, you might want to point out the security concerns of
>> running a script with the full permissions of git-daemon.  (AFAICT from
>> your patch you are not dropping any privileges at any point.)
>
> Do you really think this is needed? It doesn't seem like running the
> hook scripts does anything more than trusting the script author and
> permissions of the hook scripts (?). I see the path-filter script
> exactly the same way, with the exception of having to double-check the
> user supplied path the script receives.

If I am not misreading the patch (I only skimmed it), the script is what
is given to the git-daemon process from its command line, so it is under
total control of the site owner.  It is much much much less problematic
than the security worry of allowing random hook scripts to be installed in
the repositories hosted at a hosting site.  I think Dscho is being a bit
too paranoid in this particular case.

However, being paranoid is a good thing when we talk about instructions we
give to the end users.  The site owner who uses this facility needs to be
aware that the script is run as the same user that runs git-daemon, and
that more than one instances of the script can be run at the same time.
The script writer needs to be careful about using the same scratchpad
location for the temporary files the script uses and not letting multiple
instances of scripts stomping on each other's toes.  These things need to
be documented.

Do you run git-daemon from inetd, or standalone, by the way?  I am
wondering how well it would scale if you spawn an external "filter path"
script (by the way, "filter path" sounds as if it checks and conditionally
denies access to, or something like that, which is not what you are using
it for.  It is more about rewriting paths, a la mod_rewrite, and I think
the option is misnamed) every time you get a request.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-14  6:58         ` Junio C Hamano
@ 2009-03-14 14:39           ` Johan Sørensen
  2009-03-14 18:23             ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Johan Sørensen @ 2009-03-14 14:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Sat, Mar 14, 2009 at 7:58 AM, Junio C Hamano <gitster@pobox.com> wrote:
> However, being paranoid is a good thing when we talk about instructions we
> give to the end users.  The site owner who uses this facility needs to be
> aware that the script is run as the same user that runs git-daemon, and
> that more than one instances of the script can be run at the same time.
> The script writer needs to be careful about using the same scratchpad
> location for the temporary files the script uses and not letting multiple
> instances of scripts stomping on each other's toes.  These things need to
> be documented.

Will expand the docs further.

> Do you run git-daemon from inetd, or standalone, by the way?

Standalone.

> I am wondering how well it would scale if you spawn an external "filter path"
> script every time you get a request.

A quick test of 250 consecutive requests with ls-remote to localhost
(all without the --verbose flag), slowest run:
- Baseline (no --filter-path agument): 3.39s

$ cat filter.c
#import "stdio.h"
int main (int argc, char const *argv[]) {
	printf("%s", "/existing.git\0");
	return 0;
}
- 3.84s

$ cat filter.rb
#!/usr/bin/ruby
print "/existing.git\0"
- 4.76s

So, obviously highly dependent on how long it takes the script to
launch and how much work it does. And yes, neither of the above really
does anything :) nor takes any increased cpu load into account

Another approach is to keep the external script running and feed it on
stdin, but that would involve a bit more micro-management of the
external process. I will revisit that idea if I find out that's
needed.

> (by the way, "filter path" sounds as if it checks and conditionally
> denies access to, or something like that, which is not what you are using
> it for.  It is more about rewriting paths, a la mod_rewrite, and I think
> the option is misnamed)

Maybe --rewrite-script or --rewrite-command  instead?

Cheers,
JS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-14 14:39           ` Johan Sørensen
@ 2009-03-14 18:23             ` Junio C Hamano
  2009-03-19  0:15               ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2009-03-14 18:23 UTC (permalink / raw)
  To: Johan Sørensen; +Cc: Johannes Schindelin, git

Johan Sørensen <johan@johansorensen.com> writes:

>> Do you run git-daemon from inetd, or standalone, by the way?
>
> Standalone.
>
>> I am wondering how well it would scale if you spawn an external "filter path"
>> script every time you get a request.
>
> A quick test of 250 consecutive requests with ls-remote to localhost
> (all without the --verbose flag), slowest run:
> - Baseline (no --filter-path agument): 3.39s
>
> $ cat filter.c
> #import "stdio.h"
> int main (int argc, char const *argv[]) {
> 	printf("%s", "/existing.git\0");
> 	return 0;
> }
> - 3.84s
>
> $ cat filter.rb
> #!/usr/bin/ruby
> print "/existing.git\0"
> - 4.76s
>
> So, obviously highly dependent on how long it takes the script to
> launch and how much work it does. And yes, neither of the above really
> does anything :) nor takes any increased cpu load into account
>
> Another approach is to keep the external script running and feed it on
> stdin, but that would involve a bit more micro-management of the
> external process. I will revisit that idea if I find out that's
> needed.

I actually was hoping (especially we have Dscho on Cc: list) that somebody
like you would start suggesting a "plug in" approach to load .so files,
which would lead to a easy-to-port dso support with the help from msysgit
folks we can use later in other parts of the system (e.g. customizable
filters used for diff textconv, clean/smudge, etc.)

>> (by the way, "filter path" sounds as if it checks and conditionally
>> denies access to, or something like that, which is not what you are using
>> it for.  It is more about rewriting paths, a la mod_rewrite, and I think
>> the option is misnamed)
>
> Maybe --rewrite-script or --rewrite-command  instead?

Perhaps.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-14 18:23             ` Junio C Hamano
@ 2009-03-19  0:15               ` Johannes Schindelin
  2009-03-19 13:02                 ` Johan Sørensen
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2009-03-19  0:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Sørensen, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2072 bytes --]

Hi,

On Sat, 14 Mar 2009, Junio C Hamano wrote:

> Johan Sørensen <johan@johansorensen.com> writes:
> 
> >> Do you run git-daemon from inetd, or standalone, by the way?
> >
> > Standalone.
> >
> >> I am wondering how well it would scale if you spawn an external 
> >>"filter path" script every time you get a request.
> >
> > A quick test of 250 consecutive requests with ls-remote to localhost
> > (all without the --verbose flag), slowest run:
> > - Baseline (no --filter-path agument): 3.39s
> >
> > $ cat filter.c
> > #import "stdio.h"
> > int main (int argc, char const *argv[]) {
> > 	printf("%s", "/existing.git\0");
> > 	return 0;
> > }
> > - 3.84s
> >
> > $ cat filter.rb
> > #!/usr/bin/ruby
> > print "/existing.git\0"
> > - 4.76s
> >
> > So, obviously highly dependent on how long it takes the script to 
> > launch and how much work it does. And yes, neither of the above really 
> > does anything :) nor takes any increased cpu load into account
> >
> > Another approach is to keep the external script running and feed it on 
> > stdin, but that would involve a bit more micro-management of the 
> > external process. I will revisit that idea if I find out that's 
> > needed.
> 
> I actually was hoping (especially we have Dscho on Cc: list) that somebody
> like you would start suggesting a "plug in" approach to load .so files,
> which would lead to a easy-to-port dso support with the help from msysgit
> folks we can use later in other parts of the system (e.g. customizable
> filters used for diff textconv, clean/smudge, etc.)

I do not like that at all.  Dynamic libraries -- especially on Windows -- 
are a major hassle.

However, I cannot think of anything Johan might want to do that would not 
be possible using a bunch of regular expressions together with 
substitions.

FWIW I have experimental code in my personal tree that sports 
strbuf_regsub(), a function to replace matches of a regular expression 
(possibly with groups) by a given string (which may contain \0 .. \9, 
being replaced with the respective group's contents).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-19  0:15               ` Johannes Schindelin
@ 2009-03-19 13:02                 ` Johan Sørensen
  2009-03-20 22:27                   ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Johan Sørensen @ 2009-03-19 13:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

2009/3/19 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>> I actually was hoping (especially we have Dscho on Cc: list) that somebody
>> like you would start suggesting a "plug in" approach to load .so files,
>> which would lead to a easy-to-port dso support with the help from msysgit
>> folks we can use later in other parts of the system (e.g. customizable
>> filters used for diff textconv, clean/smudge, etc.)
>
> I do not like that at all.  Dynamic libraries -- especially on Windows --
> are a major hassle.
>
> However, I cannot think of anything Johan might want to do that would not
> be possible using a bunch of regular expressions together with
> substitions.

Let me reiterate my use-case then: I need to dynamically substitute
one path with another. Perhaps "map" paints a better picture than
"substitute" here. Please refer to my second mail in this thread for
more details.

The only way I can see regexps work, is that if they're read, on a
per-request basis (reloading git-daemon every time they change is just
silly), from somewhere outside the git-daemon. Then, you might as well
take the full-on approach this patch provides.

Cheers,
JS


>
> FWIW I have experimental code in my personal tree that sports
> strbuf_regsub(), a function to replace matches of a regular expression
> (possibly with groups) by a given string (which may contain \0 .. \9,
> being replaced with the respective group's contents).
>
> Ciao,
> Dscho
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Introduce a filter-path argument to git-daemon, for doing  custom path transformations
  2009-03-19 13:02                 ` Johan Sørensen
@ 2009-03-20 22:27                   ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2009-03-20 22:27 UTC (permalink / raw)
  To: Johan Sørensen; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2242 bytes --]

Hi,

On Thu, 19 Mar 2009, Johan Sørensen wrote:

> 2009/3/19 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> >> I actually was hoping (especially we have Dscho on Cc: list) that 
> >> somebody like you would start suggesting a "plug in" approach to load 
> >> .so files, which would lead to a easy-to-port dso support with the 
> >> help from msysgit folks we can use later in other parts of the system 
> >> (e.g. customizable filters used for diff textconv, clean/smudge, 
> >> etc.)
> >
> > I do not like that at all.  Dynamic libraries -- especially on Windows 
> > -- are a major hassle.
> >
> > However, I cannot think of anything Johan might want to do that would 
> > not be possible using a bunch of regular expressions together with 
> > substitions.
> 
> Let me reiterate my use-case then: I need to dynamically substitute one 
> path with another. Perhaps "map" paints a better picture than 
> "substitute" here. Please refer to my second mail in this thread for 
> more details.
> 
> The only way I can see regexps work, is that if they're read, on a 
> per-request basis (reloading git-daemon every time they change is just 
> silly), from somewhere outside the git-daemon. Then, you might as well 
> take the full-on approach this patch provides.
>
> > FWIW I have experimental code in my personal tree that sports 
> > strbuf_regsub(), a function to replace matches of a regular expression 
> > (possibly with groups) by a given string (which may contain \0 .. \9, 
> > being replaced with the respective group's contents).

Do not get me wrong, I can see your use case.

But I have been cautioning against other possibly regrettably things, and 
it gave me _no_ pleasure at all to be proven correct in hindsight.

I'd rather be called grumpy old Git, be ridiculed and insulted, but at the 
same time have precautions in git.git that prevent having to admit 
mournfully that some change was not so brilliant after all.

So if some rules consisting of regular expressions with appropriate 
substitutions, even if they will have to be updated from time to time, 
solve your case, I'd rather have that than allow a server to run external 
programs that are not exactly well audited against all kinds of attacks.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-03-20 22:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-11 15:17 [PATCH] Introduce a filter-path argument to git-daemon, for doing custom path transformations Johan Sørensen
2009-03-11 15:58 ` Johannes Sixt
2009-03-12 10:13   ` Johan Sørensen
2009-03-12 11:29     ` Johannes Schindelin
2009-03-12 15:48       ` Johan Sørensen
2009-03-12 16:50         ` Johannes Schindelin
2009-03-12 19:06       ` Johan Sørensen
2009-03-14  6:58         ` Junio C Hamano
2009-03-14 14:39           ` Johan Sørensen
2009-03-14 18:23             ` Junio C Hamano
2009-03-19  0:15               ` Johannes Schindelin
2009-03-19 13:02                 ` Johan Sørensen
2009-03-20 22:27                   ` Johannes Schindelin
2009-03-12 10:26   ` Johan Sørensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.