All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables
@ 2015-09-04 10:52 Giuseppe Bilotta
  2015-09-04 12:54 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Giuseppe Bilotta @ 2015-09-04 10:52 UTC (permalink / raw)
  To: Git List

Hello all,


I'm going to tell a tale of the oddest bug I've ever come across in
Git and aks for suggestions on how we can improve the user experience
in a sensible way.

I've been stymied for days trying to solve the following issue, which
I came across every time I tried to push to a specific machine when
using git 2.5 (I'm honestly not sure about which version introduced
this, and I haven't runthe necessary bisection due to lack of time,
sorry).

Trying to push any changes with 2.5 resulted in this kind of failure:

user@clientmachine:~/some/git/workdir $ git push
Counting objects: 6, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (6/6), 841 bytes | 0 bytes/s, done.
Total 6 (delta 2), reused 0 (delta 0)
fatal: Could not switch to '/home/user/some/git': No such file or directory
error: unpack failed: unpack-objects abnormal exit
fatal: Could not switch to '/home/user/some/git': No such file or directory
To git@remote.machine:remote-repo
! [remote rejected] master -> master (n/a (unpacker error))
error: failed to push some refs to 'git@remote.machine:remote-repo'

Notice two things: the messages refer to the worktree updir of the
CLIENT machine, and even though it's _completely not obvious_ due to
the missing 'remote:' lines, the messages actually come from the
SERVER. The lack of indicator lines _alone_ took me hours of debugging
before I finally understood that they were coming from the other side
(yes, I woud say that it's a bug), but after this I was even more
perplexed: why the heck would the SERVER try to switch to the updir of
the CLIENT worktree?

I still couldn't do much on the SERVER to debug due to a variety of
reasons, but I finally had a suspicion: it was almost as if the SERVER
was getting the GIT_DIR information from the CLIENT. And why the heck
would _that_ be the case?

I then remembered that the server was actually configured to AcceptEnv
GIT_* in sshd_config, for reasons related to git identity preservation
despite single login account (please don't ask). Turning the AcceptEnv
to a stricter GIT_AUTHOR* and GIT_COMMITTER* solved the issue.

Now, a couple of questions:

1. since when have git internals started exporting GIT_* variables
related to the git dir and worktree location?
2. is it worth making sure that these don't get propagated via ssh?

Regarding 2., there are actually webpages around suggesting the
AcceptEnv GIT_* trick, and these setups are _all_ going to break with
the latest git, so this is something that might be worth looking into.


-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables
  2015-09-04 10:52 [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables Giuseppe Bilotta
@ 2015-09-04 12:54 ` Jeff King
  2015-09-04 14:02   ` Giuseppe Bilotta
  2015-09-04 18:18   ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2015-09-04 12:54 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: Git List

On Fri, Sep 04, 2015 at 12:52:45PM +0200, Giuseppe Bilotta wrote:

> Trying to push any changes with 2.5 resulted in this kind of failure:
> 
> user@clientmachine:~/some/git/workdir $ git push
> Counting objects: 6, done.
> Delta compression using up to 8 threads.
> Compressing objects: 100% (5/5), done.
> Writing objects: 100% (6/6), 841 bytes | 0 bytes/s, done.
> Total 6 (delta 2), reused 0 (delta 0)
> fatal: Could not switch to '/home/user/some/git': No such file or directory
> error: unpack failed: unpack-objects abnormal exit
> fatal: Could not switch to '/home/user/some/git': No such file or directory
> To git@remote.machine:remote-repo
> ! [remote rejected] master -> master (n/a (unpacker error))
> error: failed to push some refs to 'git@remote.machine:remote-repo'
>
> Notice two things: the messages refer to the worktree updir of the
> CLIENT machine, and even though it's _completely not obvious_ due to
> the missing 'remote:' lines, the messages actually come from the
> SERVER. The lack of indicator lines _alone_ took me hours of debugging
> before I finally understood that they were coming from the other side

Older versions of receive-pack would let unpack-objects output go
straight to stderr, but that changed in a22e6f8 (receive-pack: send
pack-processing stderr over sideband, 2012-09-21), which is in git
v1.7.12.3. What version of git is running on the remote server?

E.g., even without going over ssh, if I do:

  git init
  echo content >file && git add file && git commit -m foo
  git init --bare dst.git
  # force unpacker to fail
  chmod -w dst.git/objects
  git push dst.git

I get:

  Counting objects: 3, done.
  Writing objects: 100% (3/3), 205 bytes | 0 bytes/s, done.
  Total 3 (delta 0), reused 0 (delta 0)
  remote: error: insufficient permission for adding an object to repository database ./objects
  remote: fatal: failed to write object
  error: unpack failed: unpack-objects abnormal exit
  To dst.git
   ! [remote rejected] master -> master (unpacker error)
  error: failed to push some refs to 'dst.git'

The "unpack failed" line _does_ come from the remote, but comes straight
from receive-pack, not the child unpack-objects. Receive-pack
distinguishes between errors which should go to the client and which
just go to stderr for debugging (remember that not all transports
actually propagate stderr to the client; ssh is special here). It's
possible we could switch the "unpack failed" to go to the client, but it
is redundant with the "unpacker error" which _does_ go to the client.

> I still couldn't do much on the SERVER to debug due to a variety of
> reasons, but I finally had a suspicion: it was almost as if the SERVER
> was getting the GIT_DIR information from the CLIENT. And why the heck
> would _that_ be the case?
> 
> I then remembered that the server was actually configured to AcceptEnv
> GIT_* in sshd_config, for reasons related to git identity preservation
> despite single login account (please don't ask). Turning the AcceptEnv
> to a stricter GIT_AUTHOR* and GIT_COMMITTER* solved the issue.

I couldn't reproduce this problem, either during a local push, or across
an ssh session (where the client has "SendEnv GIT_*" and the server has
"AcceptEnv GIT_*").

In the local case, we explicitly unset GIT_DIR and related variables in
connect.c:git_connect. We don't seem to do so for the ssh case, though.
I can confirm that the variable makes it across to the remote:

  GIT_DIR=$PWD git push \
    --receive-pack='echo >&2 GIT_DIR=$GIT_DIR; git-receive-pack' \
    remote-host:remote-repo

shows our local $PWD on the remote side (though note that you have to
explicitly set $GIT_DIR; git-push does not do so normally).

On the receiving side, git-receive-pack takes an argument for the repo
path, and calls enter_repo. That should result in calling set_git_dir(),
which overwrites $GIT_DIR in the environment. AFAICT, it has always done
so. So I'm not sure how GIT_DIR would leak through, even on an older
version of git.

> 1. since when have git internals started exporting GIT_* variables
> related to the git dir and worktree location?

It has done so for a long time, though the exact rules for doing so
changes from time to time. Browsing "git log", I couldn't find any
recent changes in this area. It would help if you could bisect the
problem, as I can't manage to replicate it.

> 2. is it worth making sure that these don't get propagated via ssh?

It seems like a reasonable idea for git_connect() to do so in the ssh
case, as well as the local case. That _could_ be a regression for
somebody who uses an ssh-wrapper whose behavior changes based on some
$GIT_* variable, but that seems a bit far-fetched.

It shouldn't be necessary for $GIT_DIR, but it makes sense for other git
variables. E.g., with "AcceptEnv GIT_*", "git -c" config is propagated.
E.g.:

  # make a syntactically bogus commit
  commit=$(git cat-file commit HEAD |
           sed 's/>/>>/' |
	   git hash-object -t commit -w --stdin)
  git update-ref HEAD $commit
  # confirm that git complains
  git fsck

  # confirm that it gets blocked by receive.fsckObjects; the "-c" is
  # applied to the receive-pack directly, so we are already in the
  # "remote" repo.
  git push --receive-pack='git -c receive.fsckobjects=true receive-pack' dst.git

  # now let's try it with a local "-c". This doesn't get propagated, as
  # we clean out any environment-level config before running
  # receive-pack in the "remote". As a result, the push succeeds.
  git -c receive.fsckobjects=true push dst.git

  # and now the same thing over ssh; this _will_ complain, as we
  # propagated the config.
  git -c receive.fsckObjects=true push remote:dst.git

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables
  2015-09-04 12:54 ` Jeff King
@ 2015-09-04 14:02   ` Giuseppe Bilotta
  2015-09-04 14:26     ` Jeff King
  2015-09-04 18:18   ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Giuseppe Bilotta @ 2015-09-04 14:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

Hello,

On Fri, Sep 4, 2015 at 2:54 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Sep 04, 2015 at 12:52:45PM +0200, Giuseppe Bilotta wrote:
>
>> Notice two things: the messages refer to the worktree updir of the
>> CLIENT machine, and even though it's _completely not obvious_ due to
>> the missing 'remote:' lines, the messages actually come from the
>> SERVER. The lack of indicator lines _alone_ took me hours of debugging
>> before I finally understood that they were coming from the other side
>
> Older versions of receive-pack would let unpack-objects output go
> straight to stderr, but that changed in a22e6f8 (receive-pack: send
> pack-processing stderr over sideband, 2012-09-21), which is in git
> v1.7.12.3. What version of git is running on the remote server?

Sorry, I forgot to mention this in my origina email, I should have.

The remote is quite ancient: git version 1.7.0.4

>> I still couldn't do much on the SERVER to debug due to a variety of
>> reasons, but I finally had a suspicion: it was almost as if the SERVER
>> was getting the GIT_DIR information from the CLIENT. And why the heck
>> would _that_ be the case?
>>
>> I then remembered that the server was actually configured to AcceptEnv
>> GIT_* in sshd_config, for reasons related to git identity preservation
>> despite single login account (please don't ask). Turning the AcceptEnv
>> to a stricter GIT_AUTHOR* and GIT_COMMITTER* solved the issue.
>
> I couldn't reproduce this problem, either during a local push, or across
> an ssh session (where the client has "SendEnv GIT_*" and the server has
> "AcceptEnv GIT_*").

[snip]

> On the receiving side, git-receive-pack takes an argument for the repo
> path, and calls enter_repo. That should result in calling set_git_dir(),
> which overwrites $GIT_DIR in the environment. AFAICT, it has always done
> so. So I'm not sure how GIT_DIR would leak through, even on an older
> version of git.

Hm, aside from the ancient version of git, the other side also has
gitolite2  running (can you tell it's an ancient installation that
needs upgrading?).  However,I've done a quick test by creating an
empty bare git repository outside of gitolite2, and I'm still getting
the same error, so I would exclude gitolite2 interference. In fact,
your testing approach gave me the idea to test by prepending env 1>&2
to the receive pack and this way I noticed that the variable being set
was GIT_WORK_TREE. In fact, using --receive-pack 'export -n
GIT_WORK_TREE ; git-receive-pack' fixes the problem for me.

So, I've at least been able to circumscribe the problem to: server git
1.7.0.4, client git 2.5, GIT_WORK_TREE being sent over the ssh
connection. I'll see if I can find some time to do some proper
bisecting next week.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables
  2015-09-04 14:02   ` Giuseppe Bilotta
@ 2015-09-04 14:26     ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2015-09-04 14:26 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: Git List

On Fri, Sep 04, 2015 at 04:02:09PM +0200, Giuseppe Bilotta wrote:

> So, I've at least been able to circumscribe the problem to: server git
> 1.7.0.4, client git 2.5, GIT_WORK_TREE being sent over the ssh
> connection. I'll see if I can find some time to do some proper
> bisecting next week.

That sounds like progress. I was hoping I could replicate it with that
information, but I still can't:

  LOCAL_GIT=git.v2.5.0
  REMOTE_GIT=/home/peff/compile/git/bin-wrappers/git

  $ $LOCAL_GIT version
  git version 2.5.0

  $ ssh git.peff.net $REMOTE_GIT version
  git version 1.7.0.4

  $ ssh git.peff.net "rm -rf /tmp/foo.git; $REMOTE_GIT init --bare /tmp/foo.git"
  $ GIT_DIR=$PWD/.git GIT_WORK_TREE=$PWD $LOCAL_GIT push \
      --receive-pack="env | grep GIT >&2; $REMOTE_GIT receive-pack" \
      git.peff.net:/tmp/foo.git HEAD:branch

  GIT_DIR=/home/peff/tmp/.git
  GIT_WORK_TREE=/home/peff/tmp
  GIT_PREFIX=
  Counting objects: 3, done.
  Writing objects: 100% (3/3), 208 bytes | 0 bytes/s, done.
  Total 3 (delta 0), reused 0 (delta 0)
  To git.peff.net:/tmp/foo.git
   * [new branch]      HEAD -> branch

Interestingly, if I try pushing straight to master, the remote side
complains about trying to push the checked-out branch, even though it's
a bare repo (presumably it's getting confused by the GIT_WORK_TREE
setting; so that's _a_ bug, but not the one you are seeing).

So I'm not sure what is different about my setup and yours.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables
  2015-09-04 12:54 ` Jeff King
  2015-09-04 14:02   ` Giuseppe Bilotta
@ 2015-09-04 18:18   ` Junio C Hamano
  2015-09-04 21:10     ` Giuseppe Bilotta
  2015-09-04 21:44     ` Jeff King
  1 sibling, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2015-09-04 18:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Giuseppe Bilotta, Git List

Jeff King <peff@peff.net> writes:

> It shouldn't be necessary for $GIT_DIR, but it makes sense for other git
> variables. E.g., with "AcceptEnv GIT_*", "git -c" config is propagated.
> E.g.:
> ...

Just to make sure I got you correctly, you are saying that "we
propagate, but that is not correct. We should stop doing so", right?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables
  2015-09-04 18:18   ` Junio C Hamano
@ 2015-09-04 21:10     ` Giuseppe Bilotta
  2015-09-04 21:44     ` Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Giuseppe Bilotta @ 2015-09-04 21:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git List

On Fri, Sep 4, 2015 at 8:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> It shouldn't be necessary for $GIT_DIR, but it makes sense for other git
>> variables. E.g., with "AcceptEnv GIT_*", "git -c" config is propagated.
>> E.g.:
>> ...
>
> Just to make sure I got you correctly, you are saying that "we
> propagate, but that is not correct. We should stop doing so", right?

Indeed, we should not propagate them. For what we know so far, at
least GIT_DIR and GIT_WORK_TREE should be removed from the ssh
environment, in case the remote server has AcceptEnv GIT_* and an
older git version.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables
  2015-09-04 18:18   ` Junio C Hamano
  2015-09-04 21:10     ` Giuseppe Bilotta
@ 2015-09-04 21:44     ` Jeff King
  2015-09-04 22:40       ` [PATCH] git_connect: clear GIT_* environment for ssh Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-09-04 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Giuseppe Bilotta, Git List

On Fri, Sep 04, 2015 at 11:18:01AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It shouldn't be necessary for $GIT_DIR, but it makes sense for other git
> > variables. E.g., with "AcceptEnv GIT_*", "git -c" config is propagated.
> > E.g.:
> > ...
> 
> Just to make sure I got you correctly, you are saying that "we
> propagate, but that is not correct. We should stop doing so", right?

Exactly. We do not propagate config over git:// or http:// (because we
do not share our environment). Nor do we do so over same-machine
connections (because we explicitly clean the environment). So ssh:// is
the odd duck.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] git_connect: clear GIT_* environment for ssh
  2015-09-04 21:44     ` Jeff King
@ 2015-09-04 22:40       ` Jeff King
  2015-09-05 13:50         ` Giuseppe Bilotta
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-09-04 22:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Giuseppe Bilotta, Git List

On Fri, Sep 04, 2015 at 05:44:54PM -0400, Jeff King wrote:

> > Just to make sure I got you correctly, you are saying that "we
> > propagate, but that is not correct. We should stop doing so", right?
> 
> Exactly. We do not propagate config over git:// or http:// (because we
> do not share our environment). Nor do we do so over same-machine
> connections (because we explicitly clean the environment). So ssh:// is
> the odd duck.

So here is the patch I would propose. I'm fairly certain it will solve
Giuseppe's problem, though I think there is something else odd going on
here that I don't understand. I'm worried that we're papering over
another regression. Giuseppe, if you can still find time to do that
bisect, it would be helpful.

-- >8 --
Subject: git_connect: clear GIT_* environment for ssh

When we "switch" to another local repository to run the server
side of a fetch or push, we must clear the variables in
local_repo_env so that our local $GIT_DIR, etc, do not
pollute the upload-pack or receive-pack that is executing in
the "remote" repository.

We have never done so for ssh connections. For the most
part, nobody has noticed because ssh will not pass unknown
environment variables by default. However, it is not out of
the question for a user to configure ssh to pass along GIT_*
variables using SendEnv/AcceptEnv.

We can demonstrate the problem by using "git -c" on a local
command and seeing its impact on a remote repository.  This
config ends up in $GIT_CONFIG_PARAMETERS. In the local case,
the config has no impact, but in the ssh transport, it does
(our test script has a fake ssh that passes through all
environment variables; this isn't normal, but does simulate
one possible setup).

Signed-off-by: Jeff King <peff@peff.net>
---
 connect.c                     |  4 ++--
 t/t5507-remote-environment.sh | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100755 t/t5507-remote-environment.sh

diff --git a/connect.c b/connect.c
index c0144d8..962f990 100644
--- a/connect.c
+++ b/connect.c
@@ -721,6 +721,8 @@ struct child_process *git_connect(int fd[2], const char *url,
 		strbuf_addch(&cmd, ' ');
 		sq_quote_buf(&cmd, path);
 
+		/* remove repo-local variables from the environment */
+		conn->env = local_repo_env;
 		conn->in = conn->out = -1;
 		if (protocol == PROTO_SSH) {
 			const char *ssh;
@@ -778,8 +780,6 @@ struct child_process *git_connect(int fd[2], const char *url,
 			}
 			argv_array_push(&conn->args, ssh_host);
 		} else {
-			/* remove repo-local variables from the environment */
-			conn->env = local_repo_env;
 			conn->use_shell = 1;
 		}
 		argv_array_push(&conn->args, cmd.buf);
diff --git a/t/t5507-remote-environment.sh b/t/t5507-remote-environment.sh
new file mode 100755
index 0000000..e614929
--- /dev/null
+++ b/t/t5507-remote-environment.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+
+test_description='check environment showed to remote side of transports'
+. ./test-lib.sh
+
+test_expect_success 'set up "remote" push situation' '
+	test_commit one &&
+	git config push.default current &&
+	git init remote
+'
+
+test_expect_success 'set up fake ssh' '
+	GIT_SSH_COMMAND="f() {
+		cd \"\$TRASH_DIRECTORY\" &&
+		eval \"\$2\"
+	}; f" &&
+	export GIT_SSH_COMMAND &&
+	export TRASH_DIRECTORY
+'
+
+# due to receive.denyCurrentBranch=true
+test_expect_success 'confirm default push fails' '
+	test_must_fail git push remote
+'
+
+test_expect_success 'config does not travel over same-machine push' '
+	test_must_fail git -c receive.denyCurrentBranch=false push remote
+'
+
+test_expect_success 'config does not travel over ssh push' '
+	test_must_fail git -c receive.denyCurrentBranch=false push host:remote
+'
+
+test_done
-- 
2.5.1.812.ge796bff

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] git_connect: clear GIT_* environment for ssh
  2015-09-04 22:40       ` [PATCH] git_connect: clear GIT_* environment for ssh Jeff King
@ 2015-09-05 13:50         ` Giuseppe Bilotta
  2015-09-08  8:33           ` [PATCH] git_connect: clarify conn->use_shell flag Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Giuseppe Bilotta @ 2015-09-05 13:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List

On Sat, Sep 5, 2015 at 12:40 AM, Jeff King <peff@peff.net> wrote:
> So here is the patch I would propose. I'm fairly certain it will solve
> Giuseppe's problem, though I think there is something else odd going on
> here that I don't understand. I'm worried that we're papering over
> another regression. Giuseppe, if you can still find time to do that
> bisect, it would be helpful.

So, as it happens I was able to script the check in a
not-too-convoluted way using the same pair of machines where I came
across the problem,, and it turns out that the culprit is as obvious
as one would expect:

d95138e695d99d32dcad528a2a7974f434c51e79 is the first bad commit
commit d95138e695d99d32dcad528a2a7974f434c51e79
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date:   Fri Jun 26 17:37:35 2015 +0700

    setup: set env $GIT_WORK_TREE when work tree is set, like $GIT_DIR

which matches my previous findings in that the problem was
GIT_WORK_TREE being leaked. Since setting GIT_WORK_TREE when setting
GIT_DIR is correct, the solution is indeed to clear GIT_* environment
variables for ssh, as you propose. I tested your patch and indeed it
fixes my problem.

I still don't understand why you cannot replicate the issue on your
side. One thing that is _extremely_ important in reproducing the
problem is that the leaked GIT_WORK_TREE point to a non-existent (or
otherwise inaccessible) directory in the server machine. For example,
on my first attempt to reproduce I was trying to use my clone of the
git repo, and it wouldn't work because I _did_ have a ~/src/git on
both machines, even though I was pushing to a
remote.machine:/tmp/test.git

Would it be worth looking at the issue server-side too, as this looks
like something that might have exploit potential? (At the very least,
it could be used to test if/which directories the remote user has
access to.)

> -- >8 --
> Subject: git_connect: clear GIT_* environment for ssh
>
> When we "switch" to another local repository to run the server
> side of a fetch or push, we must clear the variables in
> local_repo_env so that our local $GIT_DIR, etc, do not
> pollute the upload-pack or receive-pack that is executing in
> the "remote" repository.

I think we might want to mention GIT_WORK_TREE explicitly here, since
this seems to be the one causing issues.

> We have never done so for ssh connections. For the most
> part, nobody has noticed because ssh will not pass unknown
> environment variables by default. However, it is not out of
> the question for a user to configure ssh to pass along GIT_*
> variables using SendEnv/AcceptEnv.
>
> We can demonstrate the problem by using "git -c" on a local
> command and seeing its impact on a remote repository.  This
> config ends up in $GIT_CONFIG_PARAMETERS. In the local case,
> the config has no impact, but in the ssh transport, it does
> (our test script has a fake ssh that passes through all
> environment variables; this isn't normal, but does simulate
> one possible setup).
>
> Signed-off-by: Jeff King <peff@peff.net>

The patch works for me, so aside from the suggested commit message
change, I'm all for it.

Reviewed-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>

> ---
>  connect.c                     |  4 ++--
>  t/t5507-remote-environment.sh | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+), 2 deletions(-)
>  create mode 100755 t/t5507-remote-environment.sh
>
> diff --git a/connect.c b/connect.c
> index c0144d8..962f990 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -721,6 +721,8 @@ struct child_process *git_connect(int fd[2], const char *url,
>                 strbuf_addch(&cmd, ' ');
>                 sq_quote_buf(&cmd, path);
>
> +               /* remove repo-local variables from the environment */
> +               conn->env = local_repo_env;

A posteriori, this makes a lot of sense too: why single out a single
protocol in the environment cleanup?

>                 conn->in = conn->out = -1;
>                 if (protocol == PROTO_SSH) {
>                         const char *ssh;
> @@ -778,8 +780,6 @@ struct child_process *git_connect(int fd[2], const char *url,
>                         }
>                         argv_array_push(&conn->args, ssh_host);
>                 } else {
> -                       /* remove repo-local variables from the environment */
> -                       conn->env = local_repo_env;
>                         conn->use_shell = 1;

Of course we're now left with just conn->use_shell = 1 in the non-ssh
case. This could be moved in front of the if, and replaced with
something like conn>use_shell = !(procol == PROTO_SSH), but maybe this
would kill legibility. It's just  that a single line i the else clause
of a large if looks odd.


-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] git_connect: clarify conn->use_shell flag
  2015-09-05 13:50         ` Giuseppe Bilotta
@ 2015-09-08  8:33           ` Jeff King
  2015-09-08 17:18             ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-09-08  8:33 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: Junio C Hamano, Git List

On Sat, Sep 05, 2015 at 03:50:08PM +0200, Giuseppe Bilotta wrote:

> which matches my previous findings in that the problem was
> GIT_WORK_TREE being leaked. Since setting GIT_WORK_TREE when setting
> GIT_DIR is correct, the solution is indeed to clear GIT_* environment
> variables for ssh, as you propose. I tested your patch and indeed it
> fixes my problem.

Yeah, that makes sense.

> I still don't understand why you cannot replicate the issue on your
> side. One thing that is _extremely_ important in reproducing the
> problem is that the leaked GIT_WORK_TREE point to a non-existent (or
> otherwise inaccessible) directory in the server machine. For example,
> on my first attempt to reproduce I was trying to use my clone of the
> git repo, and it wouldn't work because I _did_ have a ~/src/git on
> both machines, even though I was pushing to a
> remote.machine:/tmp/test.git

I checked that the path in question is not accessible in my test setup.
I'm also confused why it doesn't replicate for me.

I noticed that your original report shows receive-pack behaving well,
but the child unpack-objects complaining. That process should not care
about GIT_WORK_TREE either, though.

> Would it be worth looking at the issue server-side too, as this looks
> like something that might have exploit potential? (At the very least,
> it could be used to test if/which directories the remote user has
> access to.)

I think if anything, the server side should be clearing GIT_DIR,
GIT_WORK_TREE, etc, when it does enter_repo(). The point of that
function is to enter a repository which is given externally (generally
on the command-line, though in the case of git-daemon, across the
socket). Clearing any existing local-repo environment variables seems
like it should be part of that.

That's as easy as:

diff --git a/path.c b/path.c
index 95acbaf..395a75d 100644
--- a/path.c
+++ b/path.c
@@ -383,6 +383,10 @@ const char *enter_repo(const char *path, int strict)
 {
 	static char used_path[PATH_MAX];
 	static char validated_path[PATH_MAX];
+	const char * const *env;
+
+	for (env = local_repo_env; *env; env++)
+		unsetenv(*env);
 
 	if (!path)
 		return NULL;

but I suspect it would break some esoteric cases. E.g.:

  git clone -u 'git -c some.option=true upload-pack' foo.git

does what you'd expect now. With that patch, upload-pack would clear its
env before entering "foo.git", and would ignore it.

It would probably be OK to clean GIT_DIR (since we overwrite it in
enter_repo anyway) and GIT_WORK_TREE (since upload-pack, etc, should not
care about it). But the others are much more confusing (should we clear
GIT_ALTERNATE_OBJECT_DIRECTORIES? You would not want that to bleed over
between repositories, but setting it manually seems like a legitimate,
if uncommon, thing to do).

So I'm hesitant to do a patch like that because it seems like the wrong
layer. It's at the interface between the old and new repos that we need
clean these up. And we are (with my other patch) doing that on the
client side. If the server side wants to enforce some protection, that's
a good idea, too. But it should be happening at the ssh layer; i.e.,
disabling "AcceptEnv GIT_*".

> > -- >8 --
> > Subject: git_connect: clear GIT_* environment for ssh
> >
> > When we "switch" to another local repository to run the server
> > side of a fetch or push, we must clear the variables in
> > local_repo_env so that our local $GIT_DIR, etc, do not
> > pollute the upload-pack or receive-pack that is executing in
> > the "remote" repository.
> 
> I think we might want to mention GIT_WORK_TREE explicitly here, since
> this seems to be the one causing issues.

Heh. I avoided doing so because I could not convince it to cause an
issue (and even with that aside, the config thing is wrong by itself,
and I _could_ test that).

> The patch works for me, so aside from the suggested commit message
> change, I'm all for it.

Thanks for testing/reviewing.

> > @@ -778,8 +780,6 @@ struct child_process *git_connect(int fd[2], const char *url,
> >                         }
> >                         argv_array_push(&conn->args, ssh_host);
> >                 } else {
> > -                       /* remove repo-local variables from the environment */
> > -                       conn->env = local_repo_env;
> >                         conn->use_shell = 1;
> 
> Of course we're now left with just conn->use_shell = 1 in the non-ssh
> case. This could be moved in front of the if, and replaced with
> something like conn>use_shell = !(procol == PROTO_SSH), but maybe this
> would kill legibility. It's just  that a single line i the else clause
> of a large if looks odd.

Yeah, I noticed that, too, but I wanted to keep the diff minimal. Maybe
it is worth doing this cleanup on top.

-- >8 --
Subject: [PATCH] git_connect: clarify conn->use_shell flag

When executing user-specified programs, we generally always
want to use a shell, for flexibility and consistency. One
big exception is executing $GIT_SSH, which for historical
reasons must not use a shell.

Once upon a time the logic in git_connect looked like:

  if (protocol == PROTO_SSH) {
	  ... setup ssh ...
  } else {
	  ... setup local connection ...
	  conn->use_shell = 1;
  }

But over time the PROTO_SSH block has grown, and the "local"
block has shrunk so that it contains only conn->use_shell;
it's easy to miss at the end of the large block.  Moreover,
PROTO_SSH now also sometimes sets use_shell, when the new
GIT_SSH_COMMAND is used.

To make the flow easier to follow, let's just set
conn->use_shell when we're setting up the "conn" struct, and
unset it (with a comment) in the historical GIT_SSH case.

Note that for clarity we leave "use_shell" on in the case
that we fall back to our default "ssh" This looks like a
behavior change, but in practice run-command.c optimizes
shell invocations without metacharacters into a straight
execve anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 connect.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/connect.c b/connect.c
index 962f990..6f3f85e 100644
--- a/connect.c
+++ b/connect.c
@@ -723,6 +723,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 
 		/* remove repo-local variables from the environment */
 		conn->env = local_repo_env;
+		conn->use_shell = 1;
 		conn->in = conn->out = -1;
 		if (protocol == PROTO_SSH) {
 			const char *ssh;
@@ -748,15 +749,21 @@ struct child_process *git_connect(int fd[2], const char *url,
 			}
 
 			ssh = getenv("GIT_SSH_COMMAND");
-			if (ssh) {
-				conn->use_shell = 1;
+			if (ssh)
 				putty = 0;
-			} else {
+			else {
 				const char *base;
 				char *ssh_dup;
 
+				/*
+				 * GIT_SSH is the no-shell version of
+				 * GIT_SSH_COMMAND (and must remain so for
+				 * historical compatibility).
+				 */
 				ssh = getenv("GIT_SSH");
-				if (!ssh)
+				if (ssh)
+					conn->use_shell = 0;
+				else
 					ssh = "ssh";
 
 				ssh_dup = xstrdup(ssh);
@@ -779,8 +786,6 @@ struct child_process *git_connect(int fd[2], const char *url,
 				argv_array_push(&conn->args, port);
 			}
 			argv_array_push(&conn->args, ssh_host);
-		} else {
-			conn->use_shell = 1;
 		}
 		argv_array_push(&conn->args, cmd.buf);
 
-- 
2.6.0.rc0.358.gaf70435

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] git_connect: clarify conn->use_shell flag
  2015-09-08  8:33           ` [PATCH] git_connect: clarify conn->use_shell flag Jeff King
@ 2015-09-08 17:18             ` Junio C Hamano
  2015-09-08 21:40               ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2015-09-08 17:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Giuseppe Bilotta, Git List

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] git_connect: clarify conn->use_shell flag
>
> When executing user-specified programs, we generally always
> want to use a shell, for flexibility and consistency. One
> big exception is executing $GIT_SSH, which for historical
> reasons must not use a shell.
>
> Once upon a time the logic in git_connect looked like:
>
>   if (protocol == PROTO_SSH) {
> 	  ... setup ssh ...
>   } else {
> 	  ... setup local connection ...
> 	  conn->use_shell = 1;
>   }
>
> But over time the PROTO_SSH block has grown, and the "local"
> block has shrunk so that it contains only conn->use_shell;
> it's easy to miss at the end of the large block.  Moreover,
> PROTO_SSH now also sometimes sets use_shell, when the new
> GIT_SSH_COMMAND is used.
>
> To make the flow easier to follow, let's just set
> conn->use_shell when we're setting up the "conn" struct, and
> unset it (with a comment) in the historical GIT_SSH case.

Makes perfect sense.  I think another thing that falls into the same
class wrt readability is 'putty'; if the code does putty=0 at the
beginning of "various flavours of SSH", and sets it only when it
checks with "various flavours of plink" when GIT_SSH_COMMAND is not
set, the logic would be even clearer, I suspect.

> Note that for clarity we leave "use_shell" on in the case
> that we fall back to our default "ssh" This looks like a
> behavior change, but in practice run-command.c optimizes
> shell invocations without metacharacters into a straight
> execve anyway.

Hmm, interesting.  I am not sure if that has to be the way, though.
Wouldn't the resulting code become simpler if you do not do that?

That is, is is what I have in mind on top of your patch.  Did I
break something?

 connect.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/connect.c b/connect.c
index 6f3f85e..acd39d7 100644
--- a/connect.c
+++ b/connect.c
@@ -727,7 +727,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 		conn->in = conn->out = -1;
 		if (protocol == PROTO_SSH) {
 			const char *ssh;
-			int putty, tortoiseplink = 0;
+			int putty = 0, tortoiseplink = 0;
 			char *ssh_host = hostandport;
 			const char *port = NULL;
 			get_host_and_port(&ssh_host, &port);
@@ -749,9 +749,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 			}
 
 			ssh = getenv("GIT_SSH_COMMAND");
-			if (ssh)
-				putty = 0;
-			else {
+			if (!ssh) {
 				const char *base;
 				char *ssh_dup;
 
@@ -760,10 +758,10 @@ struct child_process *git_connect(int fd[2], const char *url,
 				 * GIT_SSH_COMMAND (and must remain so for
 				 * historical compatibility).
 				 */
+				conn->use_shell = 0;
+
 				ssh = getenv("GIT_SSH");
-				if (ssh)
-					conn->use_shell = 0;
-				else
+				if (!ssh)
 					ssh = "ssh";
 
 				ssh_dup = xstrdup(ssh);
@@ -771,8 +769,9 @@ struct child_process *git_connect(int fd[2], const char *url,
 
 				tortoiseplink = !strcasecmp(base, "tortoiseplink") ||
 					!strcasecmp(base, "tortoiseplink.exe");
-				putty = !strcasecmp(base, "plink") ||
-					!strcasecmp(base, "plink.exe") || tortoiseplink;
+				putty = tortoiseplink ||
+					!strcasecmp(base, "plink") ||
+					!strcasecmp(base, "plink.exe");
 
 				free(ssh_dup);
 			}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] git_connect: clarify conn->use_shell flag
  2015-09-08 17:18             ` Junio C Hamano
@ 2015-09-08 21:40               ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2015-09-08 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Giuseppe Bilotta, Git List

On Tue, Sep 08, 2015 at 10:18:41AM -0700, Junio C Hamano wrote:

> > To make the flow easier to follow, let's just set
> > conn->use_shell when we're setting up the "conn" struct, and
> > unset it (with a comment) in the historical GIT_SSH case.
> 
> Makes perfect sense.  I think another thing that falls into the same
> class wrt readability is 'putty'; if the code does putty=0 at the
> beginning of "various flavours of SSH", and sets it only when it
> checks with "various flavours of plink" when GIT_SSH_COMMAND is not
> set, the logic would be even clearer, I suspect.

Yeah, I think so.

> > Note that for clarity we leave "use_shell" on in the case
> > that we fall back to our default "ssh" This looks like a
> > behavior change, but in practice run-command.c optimizes
> > shell invocations without metacharacters into a straight
> > execve anyway.
> 
> Hmm, interesting.  I am not sure if that has to be the way, though.
> Wouldn't the resulting code become simpler if you do not do that?

Heh, I originally wrote it that way, and waffled between sending one or
the other.

> That is, is is what I have in mind on top of your patch.  Did I
> break something?
> 
>  connect.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)

I think both changes are correct, and the result looks nice to read.
Feel free to squash them in as you apply.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-09-08 21:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-04 10:52 [RFC] a tale of Git 2.5, ssh transport and GIT_* environment variables Giuseppe Bilotta
2015-09-04 12:54 ` Jeff King
2015-09-04 14:02   ` Giuseppe Bilotta
2015-09-04 14:26     ` Jeff King
2015-09-04 18:18   ` Junio C Hamano
2015-09-04 21:10     ` Giuseppe Bilotta
2015-09-04 21:44     ` Jeff King
2015-09-04 22:40       ` [PATCH] git_connect: clear GIT_* environment for ssh Jeff King
2015-09-05 13:50         ` Giuseppe Bilotta
2015-09-08  8:33           ` [PATCH] git_connect: clarify conn->use_shell flag Jeff King
2015-09-08 17:18             ` Junio C Hamano
2015-09-08 21:40               ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.