All of lore.kernel.org
 help / color / mirror / Atom feed
* How to use git attributes to configure server-side checks?
@ 2011-09-21 19:32 Michael Haggerty
  2011-09-21 20:02 ` Jay Soffian
  2011-09-21 20:17 ` Junio C Hamano
  0 siblings, 2 replies; 31+ messages in thread
From: Michael Haggerty @ 2011-09-21 19:32 UTC (permalink / raw)
  To: git discussion list

I was thinking of using git attributes to configure a server-side
"update" hook that does some basic sanity checking before accepting a
push.  I thought I could do something like

~/.gitattributes:
    *.c whitespace

~/crappy-vendor-code/.gitattributes:
    # This code doesn't conform to our standards; disable check:
    *.c -whitespace

This would allow fine-grained specification of which checks are applied
to which files, and ensure that the hook configuration is kept
synchronized with changes to the content.

What I can't figure out is how a server-side update hook can inquire
about the gitattributes that were associated with a file *in a
particular commit*, as opposed to in the current working tree.  I would
like to ask questions like "was the "whitespace" attribute set on file F
in commit C?"

I see that there is an (undocumented) API involving
git_attr_set_direction() that seems to let gitattributes to be read out
of the index instead of the working tree.  But I am still confused:

1. The "git check-attr" program doesn't seem to expose the
git_attr_set_direction() functionality.

2. Even if it did, would that be enough?  It seems like the update hook
(assuming a bare repository) would have to "git reset" the index to the
commit that it wants to check.  Is that allowed?  Is the index a scratch
space that the update hook can use however it likes?  If so, is there
some kind of locking that would prevent multiple simultaneous pushes
from overwriting each other's index stat, or would the update script
have to implement its own locking scheme?

Am I going about this entirely the wrong way?

Thanks,
Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-21 19:32 How to use git attributes to configure server-side checks? Michael Haggerty
@ 2011-09-21 20:02 ` Jay Soffian
  2011-09-21 20:17 ` Junio C Hamano
  1 sibling, 0 replies; 31+ messages in thread
From: Jay Soffian @ 2011-09-21 20:02 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git discussion list

On Wed, Sep 21, 2011 at 3:32 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> I was thinking of using git attributes to configure a server-side
> "update" hook that does some basic sanity checking before accepting a
> push.  I thought I could do something like
>
> ~/.gitattributes:
>    *.c whitespace
>
> ~/crappy-vendor-code/.gitattributes:
>    # This code doesn't conform to our standards; disable check:
>    *.c -whitespace
>
> This would allow fine-grained specification of which checks are applied
> to which files, and ensure that the hook configuration is kept
> synchronized with changes to the content.

I do this by running diff --check. I'm actually doing it via Jenkins +
Gerrit, but it's done against a bare repo, so the idea should work
just the same via a git hook. Here's the code I run in Jenkins in case
it's at all helpful - https://gist.github.com/6b230f9bd8d4d2fd9895

j.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-21 19:32 How to use git attributes to configure server-side checks? Michael Haggerty
  2011-09-21 20:02 ` Jay Soffian
@ 2011-09-21 20:17 ` Junio C Hamano
  2011-09-22  8:28   ` Michael Haggerty
  2012-02-17 18:42   ` Michael Haggerty
  1 sibling, 2 replies; 31+ messages in thread
From: Junio C Hamano @ 2011-09-21 20:17 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git discussion list

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I was thinking of using git attributes to configure a server-side
> "update" hook that does some basic sanity checking before accepting a
> push.  I thought I could do something like
>
> ~/.gitattributes:
>     *.c whitespace
>
> ~/crappy-vendor-code/.gitattributes:
>     # This code doesn't conform to our standards; disable check:
>     *.c -whitespace
>
> This would allow fine-grained specification of which checks are applied
> to which files, and ensure that the hook configuration is kept
> synchronized with changes to the content.
>
> What I can't figure out is how a server-side update hook can inquire
> about the gitattributes that were associated with a file *in a
> particular commit*, as opposed to in the current working tree.  I would
> like to ask questions like "was the "whitespace" attribute set on file F
> in commit C?"
>
> I see that there is an (undocumented) API involving
> git_attr_set_direction() that seems to let gitattributes to be read out
> of the index instead of the working tree.  But I am still confused:

The words "server side" automatically mean that there should be no working
tree, and where there is no working tree there should be no index, so the
direction should not make any difference.  The attributes that are used to
help whitespace checks should come from project.git/info/attributes in
such a case [*1*].

As to the actual checking of the pushed contents, your pre-receive hook is
called after all the objects received are placed in the object store, but
before the refs are updated to conclude the push, and you can veto the
push by exiting with non-zero status from the hook. Your hook will get
which ref is being updated from what old commit to what new commit, so you
can either

 (1) grab the new commits introduced to the project using rev-list, and
     invoke "git show --check" on each and every one of them; or

 (2) check only the endpoints, by running "git diff --check" between the
     old and new commits. A pushed series may introduce a breakage early
     in the series which is corrected later in the series and you would
     not catch such a case if you used this method.

[Footnote]

*1* granted, that there are people who make a checkout from their post
update hook, perhaps so that a build robot can be told to work on it or
the web server can deliver individual files. But that is merely a crude
substitute of having a proper "install" procedure. As far as the
"server-side" Git that accepts "git push" is concerned, such a working
tree does not exist.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-21 20:17 ` Junio C Hamano
@ 2011-09-22  8:28   ` Michael Haggerty
  2011-09-22 15:41     ` Jay Soffian
                       ` (2 more replies)
  2012-02-17 18:42   ` Michael Haggerty
  1 sibling, 3 replies; 31+ messages in thread
From: Michael Haggerty @ 2011-09-22  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git discussion list, Jay Soffian

On 09/21/2011 10:17 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> I was thinking of using git attributes to configure a server-side
>> "update" hook that does some basic sanity checking before accepting a
>> push.  I thought I could do something like
>>
>> ~/.gitattributes:
>>     *.c whitespace
>>
>> ~/crappy-vendor-code/.gitattributes:
>>     # This code doesn't conform to our standards; disable check:
>>     *.c -whitespace
>>
>> This would allow fine-grained specification of which checks are applied
>> to which files, and ensure that the hook configuration is kept
>> synchronized with changes to the content.
>
> The words "server side" automatically mean that there should be no working
> tree, and where there is no working tree there should be no index, so the
> direction should not make any difference.  The attributes that are used to
> help whitespace checks should come from project.git/info/attributes in
> such a case [*1*].

Thanks for the reply and for explaining how the index can(not) be used
for this purpose.  But what you propose is not flexible enough for me.
I would like the checking configuration to be *versioned* along with the
code.  For example, suppose my project decides to enforce a rule that
all Python code needs to be indented with spaces.  It might be that not
all of our old code adheres to this rule, and that we only want to clean
up the code in master.  I would like to be able to

1. Implement a new "check-space-indent" option in our update hook, which
is applied to any file that has the "check-space-indent" attribute.  At
first, no files have this attribute, so the new test has no effect.

2. Start cleaning up code in master.  Each time a subdirectory tree is
cleaned up, add a line like

    *.py check-space-indent

in a .gitattributes file at the root of the subdirectory tree.  While
this procedure proceeds incrementally, git ensures that code that has
been cleaned up stays cleaned up.

3. When all code is cleaned up, add the above line to the top-level
.gitattributes file in the master branch.  But if there are some parts
of the code that we don't want to clean up (for example, code acquired
from elsewhere that uses different coding standards), we can turn off
the check for that subdirectory tree.

Note that during this whole process all code passes the update hook,
because we can configure it to ignore problems in code that hasn't been
cleaned up yet.  And even at the end of the procedure, it is still
possible to commit to older branches where tabs are still used for
indentation because they don't use the new attribute.

For this to be possible, I would need to determine the git attributes to
apply to a particular file in a particular commit; something like

    git check-attr check-space-indent $SHA1:path/to/file

This does not seem to be possible today without writing my own code to
crawl and parse the gitattributes files from a particular commit.

>  (1) grab the new commits introduced to the project using rev-list, and
>      invoke "git show --check" on each and every one of them; or

Where does "git show --check" read its gitattributes (i.e.,
"whitespace") from?

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22  8:28   ` Michael Haggerty
@ 2011-09-22 15:41     ` Jay Soffian
  2011-09-22 17:13       ` Jeff King
  2011-09-22 17:26     ` Junio C Hamano
  2011-09-22 22:54     ` Jakub Narebski
  2 siblings, 1 reply; 31+ messages in thread
From: Jay Soffian @ 2011-09-22 15:41 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list

On Thu, Sep 22, 2011 at 4:28 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Where does "git show --check" read its gitattributes (i.e.,
> "whitespace") from?

It looks like builtin_checkdiff (diff.c) calls whitespace_rule (ws.c)
which in turn calls git_check_attr (attr.c) which, in a bare repo,
considers $(prefix)/etc/gitattributes, core.attributesfile and
$GIT_DIR/info/attributes, falling back to the default whitespace rule
of trailing_space, space_before_tab, and tab-width of 8
(WS_DEFAULT_RULE in cache.h).

Thank you for this thread. I was under the illusion that diff-tree
--check considered in-tree .gitattributes, but the code seems to
indicate otherwise. :-(

j.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 15:41     ` Jay Soffian
@ 2011-09-22 17:13       ` Jeff King
  2011-09-22 18:41         ` Jay Soffian
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2011-09-22 17:13 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael Haggerty, Junio C Hamano, git discussion list

On Thu, Sep 22, 2011 at 11:41:34AM -0400, Jay Soffian wrote:

> Thank you for this thread. I was under the illusion that diff-tree
> --check considered in-tree .gitattributes, but the code seems to
> indicate otherwise. :-(

No, we definitely don't use in-tree gitattributes. IIRC, there are some
precedence and ordering questions. I think the ordering now is:

  1. Look in $GIT_DIR/info/attributes

  2. If not found, look in per-directory .gitattributes

  3. If not found, look in core.gitattributesfile

Where do in-tree attributes fit in? Between 1 and 2? Or 2 and 3? And
which tree do we look at?

Here are some examples:

  a. If I do "git checkout branch-foo", we should look at branch-foo's
     tree for things like crlf, right?  Do we still fall back to
     per-directory .gitattributes in the working tree? On the one hand,
     they're not really relevant to the commit we're moving to. But they
     are respected in the current code, and can be useful when moving to
     old commits which lack attributes.

     I think this is where the index magic comes in in the current code
     (we do something like "load the index, then respect gitattributes
     from the index").

     So maybe this is solved already.

  b. You're diffing commit $a against commit $b. Whose gitattributes
     have precedence? Is order important? Are gitattributes in the
     working tree and index relevant?

  c. You're diffing commit $a against HEAD. Which gitattributes have
     precedence? Again, is order important (i.e., does "diff -R" look at
     different attributes than "diff")?

I'm sure there are others, too. And I don't think any of these is
insurmountable. But somebody needs to think through a lot of cases and
come up with consistent, sane behavior that does the right thing in each
case (considering both bare repositories and ones with working trees).

-Peff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22  8:28   ` Michael Haggerty
  2011-09-22 15:41     ` Jay Soffian
@ 2011-09-22 17:26     ` Junio C Hamano
  2011-09-23  8:35       ` Michael Haggerty
  2011-09-22 22:54     ` Jakub Narebski
  2 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2011-09-22 17:26 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git discussion list, Jay Soffian

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I would like the checking configuration to be *versioned* along with the
> code.  For example, suppose my project decides to enforce a rule that
> all Python code needs to be indented with spaces.  It might be that not
> all of our old code adheres to this rule, and that we only want to clean
> up the code in master.

You want to sneak in a badly formatted code? Add an entry to the in-tree
attributes file to disable whitespace checking to cover that file!

So even though I agree with you that the check mechanism may need to be
aware of what revision it is checking and adjust which rules are applied
when checking the revision, I do not think using in-tree attribute file is
the right solution to that problem.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 17:13       ` Jeff King
@ 2011-09-22 18:41         ` Jay Soffian
  2011-09-22 19:22           ` Junio C Hamano
  2011-09-22 20:58           ` Jeff King
  0 siblings, 2 replies; 31+ messages in thread
From: Jay Soffian @ 2011-09-22 18:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, Junio C Hamano, git discussion list

On Thu, Sep 22, 2011 at 1:13 PM, Jeff King <peff@peff.net> wrote:
>
> No, we definitely don't use in-tree gitattributes. IIRC, there are some
> precedence and ordering questions. I think the ordering now is:
>
>  1. Look in $GIT_DIR/info/attributes
>
>  2. If not found, look in per-directory .gitattributes
>
>  3. If not found, look in core.gitattributesfile
>
> Where do in-tree attributes fit in? Between 1 and 2? Or 2 and 3? And
> which tree do we look at?

attr.c says:

  a. built-in attributes
  b. $(prefix)/etc/gitattributes unless GIT_ATTR_NOSYSTEM is set
  c. core.attributesfile
  d. per-directory .gitattributes
  e. $GIT_DIR/info/attributes

The mechanics of (d) are established by git_attr_direction:

  GIT_ATTR_CHECKIN: working-copy, then index
  GIT_ATTR_CHECKOUT: index, then working-copy
  GIT_ATTR_INDEX: index-only

Where GIT_ATTR_CHECKIN is the default direction and GIT_ATTR_CHECKOUT
is used while checking-out files from the index. (GIT_ATTR_INDEX is
used only by git-archive.)

Note that (d) only occurs in non-bare repos or if direction is GIT_ATTR_INDEX.

> Here are some examples:
>
>  a. If I do "git checkout branch-foo", we should look at branch-foo's
>     tree for things like crlf, right?  Do we still fall back to
>     per-directory .gitattributes in the working tree? On the one hand,
>     they're not really relevant to the commit we're moving to. But they
>     are respected in the current code, and can be useful when moving to
>     old commits which lack attributes.
>
>     I think this is where the index magic comes in in the current code
>     (we do something like "load the index, then respect gitattributes
>     from the index").
>
>     So maybe this is solved already.
>
>  b. You're diffing commit $a against commit $b. Whose gitattributes
>     have precedence? Is order important? Are gitattributes in the
>     working tree and index relevant?
>
>  c. You're diffing commit $a against HEAD. Which gitattributes have
>     precedence? Again, is order important (i.e., does "diff -R" look at
>     different attributes than "diff")?
>
> I'm sure there are others, too. And I don't think any of these is
> insurmountable. But somebody needs to think through a lot of cases and
> come up with consistent, sane behavior that does the right thing in each
> case (considering both bare repositories and ones with working trees).

I think that diff-index --cached or when there is no working tree,
should set the direction to GIT_ATTR_INDEX so that it may be used with
a bare repo. In such case, the .gitattributes would come from the
index.

Consistent with that, when comparing two commits (diff-tree), I think
you look at the .gitattributes in the second commit.

j.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 18:41         ` Jay Soffian
@ 2011-09-22 19:22           ` Junio C Hamano
  2011-09-22 20:58           ` Jeff King
  1 sibling, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2011-09-22 19:22 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Jeff King, Michael Haggerty, git discussion list

Jay Soffian <jaysoffian@gmail.com> writes:

> Consistent with that, when comparing two commits (diff-tree), I think
> you look at the .gitattributes in the second commit.

That would make "diff A B" and "diff -R B A" behave differently.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 18:41         ` Jay Soffian
  2011-09-22 19:22           ` Junio C Hamano
@ 2011-09-22 20:58           ` Jeff King
  2011-09-22 21:04             ` Jeff King
  2011-09-23 10:06             ` Michael Haggerty
  1 sibling, 2 replies; 31+ messages in thread
From: Jeff King @ 2011-09-22 20:58 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael Haggerty, Junio C Hamano, git discussion list

On Thu, Sep 22, 2011 at 02:41:42PM -0400, Jay Soffian wrote:

> attr.c says:
> 
>   a. built-in attributes
>   b. $(prefix)/etc/gitattributes unless GIT_ATTR_NOSYSTEM is set
>   c. core.attributesfile
>   d. per-directory .gitattributes
>   e. $GIT_DIR/info/attributes
> 
> The mechanics of (d) are established by git_attr_direction:
> 
>   GIT_ATTR_CHECKIN: working-copy, then index
>   GIT_ATTR_CHECKOUT: index, then working-copy
>   GIT_ATTR_INDEX: index-only

Thanks for hunting it down. I had been thinking that "attributes from
tree" would come either before or after (d) above, but the concept
really fits better into the second list (i.e., they're per-directory
attributes, it's just a matter of which set of directories).

> Where GIT_ATTR_CHECKIN is the default direction and GIT_ATTR_CHECKOUT
> is used while checking-out files from the index. (GIT_ATTR_INDEX is
> used only by git-archive.)

Hrm. But git archive works correctly in a bare repo with no index at
all. It looks like we just populate an in-core index and feed that to
git_attr_set_direction. Which is a bit suboptimal for something like
diff, which might not need to look at all parts of the tree (I guess it
is similarly suboptimal for archive with pathspecs).

But still, those are just performance considerations. We could use the
same trick for diff/log in a bare repo. I guess we'd end up refreshing
the index from each commit in a multi-commit log, which might be
noticeably slow.

> Consistent with that, when comparing two commits (diff-tree), I think
> you look at the .gitattributes in the second commit.

That makes some sense to me. As Junio pointed out, there is a catch with
"diff -R". In that case, I would still think you would use the "second"
commit, even though we're reversing the diff. So:

  git diff A B

would not be exactly equivalent to:

  git diff -R B A

in that the second would use attributes from "A" instead of "B".

However, I think this is skirting around a somewhat larger issue, which
is that gitattributes are sometimes about "this is what the file is like
at the current state of history", and sometimes about "this is what this
file is like throughout history".

For example, imagine I've got a repository of binary files. At some
point, I decide to use gitattributes to configure a textconv filter. In
my non-bare repo, when I do "git log" I get nice textconv'd diffs going
back through all of history. But if I push to a bare repo and try to do
a "git log" there, my attributes are ignored. So in this case, I like
that the working tree's attributes are applied retroactively, and I'd
like the bare repo to do the same.

Now consider a different example. I have a repo with a script "foo" that
contains perl code, and I add .gitattributes marking it as such (for
func headers, or maybe I have a magic external diff, or whatever[1]).
Later, I decide that I hate perl and rewrite it in python, updating
gitattributes. Now I _don't_ want the retroactive behavior. When I do a
"git log -p", I want to see the old commits with the perl script shown
using the perl attribute, not the python. But what the heck would it
mean to diff a recent python commit against an old perl one?

Even though we think of the diff attributes as "here's how you diff",
they are really "here are some annotations about the end points". So for
something like a textconv, you care about the attributes of _both_
sides of a diff, applying the attributes of the "from" tree to the paths
in that tree. Something like funcname is harder. I guess you'd probably
want to use the attributes from the "from" tree, since it is about
reporting context from the "from" side.

So the semantics really end up depending on the particular attribute.
Something like "-delta" exists more outside the history or the state of
a particular commit.

So I think doing it "right" would be a lot more complicated than just
reading the commits from a particular tree. I think it would mean adding
more context to all attribute lookups, and having each caller decide how
the results should be used.

However, retroactively applying attributes from the working tree, even
though it is sometimes wrong (e.g., we get the perl/python thing wrong
_now_ if you have a working tree), is often convenient in practice
(because otherwise you end up duplicating your per-directory
gitattributes in your info/attributes file), and rarely is it actually
wrong (because changing the type of a file without changing its path
isn't all that common).

So if the status quo with working trees (i.e., retroactively applying
the current gitattributes to past commits) is good enough, perhaps the
bare-repository solution should be modeled the same way? In other words,
should "git log -p" in a bare repo be looking for attributes not in the
commits being diffed, but rather in HEAD[2]?

That at least would make it consistent with the non-bare behavior. And
then if we want to move forward with making particular attributes more
aware of their context, we can do it in _both_ the bare and non-bare
cases.

-Peff

[1] The perl/python thing is probably not a huge deal, as funcnames are
    the most likely thing to configure. But you can imagine it would be
    much worse if some binary file changes formats, and you were using
    textconv. Or something with word-diff, perhaps.

[2] You can almost do this with:

      git show HEAD:.gitattributes >info/attributes
      git show commit-you-care-about
      rm info/attributes

    except that it won't handle any per-directory attributes files from
    subdirectories. So I think you'd want a "--attributes-from=HEAD"
    diff option or something similar.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 20:58           ` Jeff King
@ 2011-09-22 21:04             ` Jeff King
  2011-09-23 10:06             ` Michael Haggerty
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff King @ 2011-09-22 21:04 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael Haggerty, Junio C Hamano, git discussion list

On Thu, Sep 22, 2011 at 04:58:56PM -0400, Jeff King wrote:

> That makes some sense to me. As Junio pointed out, there is a catch with
> "diff -R". In that case, I would still think you would use the "second"
> commit, even though we're reversing the diff. So:
> 
>   git diff A B
> 
> would not be exactly equivalent to:
> 
>   git diff -R B A
> 
> in that the second would use attributes from "A" instead of "B".

I misread Junio's comment a bit. Re-reading it, this is exactly the
inconsistency he complained about. However, I consider it somewhat of a
feature. We currently have two ways to express the same thing, and you
arrive at one or the other based on the way you are thinking of the
problem. But we can use that to disambiguate between the two cases; one
is about going from A to B, and one is about inverting the operation of
going from B to A. Right now they're equivalent, but they don't have to
be.

If you read the rest of my message, you will see that I think picking
"first" or "second" arbitrarily like this might be barking up the wrong
tree.  But I just wanted to clarify that point.

-Peff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22  8:28   ` Michael Haggerty
  2011-09-22 15:41     ` Jay Soffian
  2011-09-22 17:26     ` Junio C Hamano
@ 2011-09-22 22:54     ` Jakub Narebski
  2011-09-23 10:38       ` Michael Haggerty
  2 siblings, 1 reply; 31+ messages in thread
From: Jakub Narebski @ 2011-09-22 22:54 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list, Jay Soffian

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Thanks for the reply and for explaining how the index can(not) be used
> for this purpose.  But what you propose is not flexible enough for me.
> I would like the checking configuration to be *versioned* along with the
> code. [...]

[...]

> For this to be possible, I would need to determine the git attributes to
> apply to a particular file in a particular commit; something like
> 
>     git check-attr check-space-indent $SHA1:path/to/file
> 
> This does not seem to be possible today without writing my own code to
> crawl and parse the gitattributes files from a particular commit.

Unfortunately it doesn't seem to be there mechanism to query about
state of gitattributes at given commit.

There is a slight problem from the UI point of view of git-check-attr,
namely that there are _three_ pieces of information: a place to read
.gitattributes from (working tree, index, commit), list of attributes
to check (or --all) and list of files (list of paths).  You can use
"--" to separate _two_ pieces of information.

Nb. the ability to read gitattributes from given commit would be
useful also for gitweb (the `encoding` gitattribute, etc.).

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 17:26     ` Junio C Hamano
@ 2011-09-23  8:35       ` Michael Haggerty
  2011-09-23 12:49         ` Stephen Bash
  0 siblings, 1 reply; 31+ messages in thread
From: Michael Haggerty @ 2011-09-23  8:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git discussion list, Jay Soffian, Jeff King, Jakub Narebski

On 09/22/2011 07:26 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> I would like the checking configuration to be *versioned* along with the
>> code.  For example, suppose my project decides to enforce a rule that
>> all Python code needs to be indented with spaces.  It might be that not
>> all of our old code adheres to this rule, and that we only want to clean
>> up the code in master.
> 
> You want to sneak in a badly formatted code? Add an entry to the in-tree
> attributes file to disable whitespace checking to cover that file!

No, the scenario that I was trying to describe is a project that wants
to tighten up its code formatting rules after years of laxity.  It is
convenient to support legacy branches that still contain nonconforming
code without having to reformat it all, just as it is convenient to fix
the current code incrementally rather than requiring all of the cleanup
to be done in one big bang.  Thus it is important that new rules not be
enforced retroactively on old code.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 20:58           ` Jeff King
  2011-09-22 21:04             ` Jeff King
@ 2011-09-23 10:06             ` Michael Haggerty
  2011-09-23 19:33               ` Jeff King
  1 sibling, 1 reply; 31+ messages in thread
From: Michael Haggerty @ 2011-09-23 10:06 UTC (permalink / raw)
  To: Jeff King
  Cc: Jay Soffian, Junio C Hamano, git discussion list, Jakub Narebski

On 09/22/2011 10:58 PM, Jeff King wrote:
> However, I think this is skirting around a somewhat larger issue, which
> is that gitattributes are sometimes about "this is what the file is like
> at the current state of history", and sometimes about "this is what this
> file is like throughout history".

This is a very dangerous line of thought.  When content is read out of a
historical commit, the content must be identical to what was checked
into that commit.  For example, the EOL characters that I see in an old
file revision must not depend on my current HEAD or index.  This rule
should apply regardless of whether the content is being read to be
checked out, put into an archive, or diffed against some other revision.
 So for these purposes, I think there is no choice but to honor the
gitattributes files in the tree/index/directory from which the content
was read.

If the user really wants to say "this is what this file is like
throughout history" (thereby altering history), then this should be a
local decision that should be stored locally in $GIT_DIR/info/attributes.

There are other attributes that affect "two-argument" operations like
diff.  Here the policy has to depend on the situation.  In the case of
diff, I suggest that the relevant attributes be read from *both*
versions of the file.  If they are the same, then obviously use them.
If they differ, then fall back to a "safe" default (for example, the
diff that would be used if *no* attributes had been specified).
However, as a special case, if an attribute is specified in one version
and unspecified in another, it would *usually* be OK to use the
specified version of the attributes and that might be considered a
reasonable alternative.

By the way, it is possible that the two ends of a diff operation have
different filenames (e.g., with -M or -C).  In this case the attributes
should be looked up using the filename corresponding to the commit.

> So I think doing it "right" would be a lot more complicated than just
> reading the commits from a particular tree. I think it would mean adding
> more context to all attribute lookups, and having each caller decide how
> the results should be used.

I agree.

> So if the status quo with working trees (i.e., retroactively applying
> the current gitattributes to past commits) is good enough, perhaps the
> bare-repository solution should be modeled the same way? In other words,
> should "git log -p" in a bare repo be looking for attributes not in the
> commits being diffed, but rather in HEAD[2]?

This would be a very poor policy, modeled on a status quo that is *not*
good enough.  For example, HEAD might not even be a descendant of the
commit whose content is being interrogated; the commit might have new
files and new gitattributes that would be ignored.  I suppose your
suggestion is better than the status quo, so it would be fine as a
starting point, *provided* it is clear that people should not rely on
this behavior not being improved later.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-22 22:54     ` Jakub Narebski
@ 2011-09-23 10:38       ` Michael Haggerty
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Haggerty @ 2011-09-23 10:38 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Junio C Hamano, git discussion list, Jay Soffian, Jeff King

On 09/23/2011 12:54 AM, Jakub Narebski wrote:
> There is a slight problem from the UI point of view of git-check-attr,
> namely that there are _three_ pieces of information: a place to read
> .gitattributes from (working tree, index, commit), list of attributes
> to check (or --all) and list of files (list of paths).  You can use
> "--" to separate _two_ pieces of information.

An alternative would be to accept <rev>:<path>, :<n>:<path>, and :<path>
in addition to simple <path>.

I also just remembered that currently the paths passed to
git-check-attr, git_check_attr(), and git_all_attrs() do not have to
correspond to existing objects; the paths are compared as strings.
Presumably we should retain this aspect of the behavior even if a
revision or tree is specified explicitly.

There are other problems in the UI of git-check-attr:

1. Its "-z" option affects how standard input is interpreted; there is
no way to cause the output to be generated with NUL separators.

2. Its output does not distinguish between the special statuses
unspecified/unset/set and possible string values
"unspecified"/"unset"/"set".

Maybe a --porcelain option could be used to overcome these shortcomings.

If necessary we could deprecate "git check-attr" and implement an
improved command ("git get-attr"?) that starts from a clean UI slate.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-23  8:35       ` Michael Haggerty
@ 2011-09-23 12:49         ` Stephen Bash
  2011-09-23 13:31           ` Michael Haggerty
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Bash @ 2011-09-23 12:49 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: git discussion list, Jay Soffian, Jeff King, Jakub Narebski,
	Junio C Hamano

----- Original Message -----
> From: "Michael Haggerty" <mhagger@alum.mit.edu>
> Sent: Friday, September 23, 2011 4:35:20 AM
> Subject: Re: How to use git attributes to configure server-side checks?
>
> On 09/22/2011 07:26 PM, Junio C Hamano wrote:
> > Michael Haggerty <mhagger@alum.mit.edu> writes:
> >
> >> I would like the checking configuration to be *versioned* along with the
> >> code. For example, suppose my project decides to enforce a rule that
> >> all Python code needs to be indented with spaces. It might be that not
> >> all of our old code adheres to this rule, and that we only want to clean
> >> up the code in master.
> >
> > You want to sneak in a badly formatted code? Add an entry to the
> > in-tree attributes file to disable whitespace checking to cover that file!
> 
> No, the scenario that I was trying to describe is a project that wants
> to tighten up its code formatting rules after years of laxity. It is
> convenient to support legacy branches that still contain nonconforming
> code without having to reformat it all, just as it is convenient to
> fix the current code incrementally rather than requiring all of the
> cleanup to be done in one big bang. Thus it is important that new rules not be
> enforced retroactively on old code.

We're in the process of a similar change over (we're dealing with EOL rather than indents), but I attacked it from a different angle...  I wrote our update script to examine modified files and ensure compliance (diff-tree -r, iterate over blobs).  That way legacy files are left alone (even in master), but active development must live up to the current rules.  Is there a reason you need to go tree-by-tree rather than file-by-file?

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-23 12:49         ` Stephen Bash
@ 2011-09-23 13:31           ` Michael Haggerty
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Haggerty @ 2011-09-23 13:31 UTC (permalink / raw)
  To: Stephen Bash
  Cc: git discussion list, Jay Soffian, Jeff King, Jakub Narebski,
	Junio C Hamano

On 09/23/2011 02:49 PM, Stephen Bash wrote:
> We're in the process of a similar change over (we're dealing with EOL
> rather than indents), but I attacked it from a different angle...  I
> wrote our update script to examine modified files and ensure
> compliance (diff-tree -r, iterate over blobs).  That way legacy files
> are left alone (even in master), but active development must live up
> to the current rules.  Is there a reason you need to go tree-by-tree
> rather than file-by-file?

I want to avoid code churn, especially in third-party code.  With your
solution, I believe that we would be forced to entirely clean up any
file that we needed to touch.  The resulting code churn would make
integrating future upstream releases a nightmare.

For some kinds of checks, one could only check that the *lines* changed
satisfy the new rules.

But rather than thinking up workarounds, it seems like a better idea to
fix git to handle .gitattributes correctly.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-23 10:06             ` Michael Haggerty
@ 2011-09-23 19:33               ` Jeff King
  2011-09-23 19:40                 ` Junio C Hamano
  2011-09-24  6:05                 ` Michael Haggerty
  0 siblings, 2 replies; 31+ messages in thread
From: Jeff King @ 2011-09-23 19:33 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jay Soffian, Junio C Hamano, git discussion list, Jakub Narebski

On Fri, Sep 23, 2011 at 12:06:51PM +0200, Michael Haggerty wrote:

> This is a very dangerous line of thought.  When content is read out of a
> historical commit, the content must be identical to what was checked
> into that commit.  For example, the EOL characters that I see in an old
> file revision must not depend on my current HEAD or index.  This rule
> should apply regardless of whether the content is being read to be
> checked out, put into an archive, or diffed against some other revision.
>  So for these purposes, I think there is no choice but to honor the
> gitattributes files in the tree/index/directory from which the content
> was read.
> 
> If the user really wants to say "this is what this file is like
> throughout history" (thereby altering history), then this should be a
> local decision that should be stored locally in $GIT_DIR/info/attributes.

I think we're in agreement here that this is the right way to do things.
My points are that:

  1. The current behavior is often convenient, even though it is wrong
     in some cases. Using $GIT_DIR/info/attributes is manual and
     annoying[1]. Turning on .gitattributes in the working tree and
     having it impact "git log -p" on older commits is a huge
     convenience. When it's right.

  2. This isn't how we're doing it now, and I don't see a lot of bug
     reports.  That doesn't mean there isn't room for improvement, but
     it may mean the current strategy is good enough, especially if the
     alternative ends up being very complex or hard to understand.

My biggest fear is not that we are wrong in some minority of the cases,
but that there is no good way to tell git it's wrong and to do something
else.

[1] I really wish we had an elegant way of versioning meta-information
    about a repository (like config, info/attributes, etc). I've hacked
    around this before by having a special meta-branch for each repo,
    checkout it out in an alternate directory, and then symlinking bits
    of it into .git. But that's kind of ugly, too.

    I'm not sure what a good solution would look like. There's a real
    can of worms with respect to picking and choosing local versus
    remote bits of meta-information, with some security implications.

> There are other attributes that affect "two-argument" operations like
> diff.  Here the policy has to depend on the situation.  In the case of
> diff, I suggest that the relevant attributes be read from *both*
> versions of the file.  If they are the same, then obviously use them.
> If they differ, then fall back to a "safe" default (for example, the
> diff that would be used if *no* attributes had been specified).
> However, as a special case, if an attribute is specified in one version
> and unspecified in another, it would *usually* be OK to use the
> specified version of the attributes and that might be considered a
> reasonable alternative.

Those are sane ground rules. But specific contexts would want to be
smarter in some cases. For example, if you have two conflicting diff
attributes that both define a textconv, you would not want to fallback
to not doing any textconv (which is what your above text prescribes).
Instead, you want to textconv each side individually according to its
attributes, and then compare the result.

One could argue that textconv is therefore not a "two-argument"
operation, since it works on each side individually. But it is part of
the "diff" attribute, and so it ends up being processed in a
two-argument context (this is possibly a mistake in the original design
of textconv).

> By the way, it is possible that the two ends of a diff operation have
> different filenames (e.g., with -M or -C).  In this case the attributes
> should be looked up using the filename corresponding to the commit.

We do some of this already. For example, textconv will look up each side
based on its individual filename. But the funcname code, for example
does this ("one" is the "from" side of the diff, "two" is the "to"
side):

  pe = diff_funcname_pattern(one);
  if (!pe)
          pe = diff_funcname_pattern(two);

whereas I think your proposed code would do:

  pe1 = diff_funcname_pattern(one);
  pe2 = diff_funcname_pattern(two);
  if (!pe1)
          pe = pe2; /* only one side, or both empty */
  else if (!pe2)
          pe = pe1; /* only one side */
  else if (!pattern_cmp(pe1, pe2))
          pe = pe1; /* identical, pick one */
  else
          pe = NULL; /* conflict, default to nothing */

> > So if the status quo with working trees (i.e., retroactively applying
> > the current gitattributes to past commits) is good enough, perhaps the
> > bare-repository solution should be modeled the same way? In other words,
> > should "git log -p" in a bare repo be looking for attributes not in the
> > commits being diffed, but rather in HEAD[2]?
> 
> This would be a very poor policy, modeled on a status quo that is
> *not* good enough.  For example, HEAD might not even be a descendant
> of the commit whose content is being interrogated; the commit might
> have new files and new gitattributes that would be ignored.

Notice the "if" in my statement. :)

Yes, that is a corner case that is utterly broken. It's also utterly
broken in the current code with a non-bare repository, where we respect
the checked-out .gitattributes (which generally come from HEAD). This
would just be a very simple way of harmonizing bare repositories with
the current non-bare behavior (broken or not).

> I suppose your suggestion is better than the status quo, so it would
> be fine as a starting point, *provided* it is clear that people should
> not rely on this behavior not being improved later.

Sure. If we go down this road, we will be breaking the non-bare case,
too, so we will have to deal with deprecations and all that, anyway.

-Peff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-23 19:33               ` Jeff King
@ 2011-09-23 19:40                 ` Junio C Hamano
  2011-09-23 19:44                   ` Jeff King
  2011-09-24  6:05                 ` Michael Haggerty
  1 sibling, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2011-09-23 19:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, Jay Soffian, git discussion list, Jakub Narebski

Jeff King <peff@peff.net> writes:

> We do some of this already. For example, textconv will look up each side
> based on its individual filename. But the funcname code, for example
> does this ("one" is the "from" side of the diff, "two" is the "to"
> side):
>
>   pe = diff_funcname_pattern(one);
>   if (!pe)
>           pe = diff_funcname_pattern(two);

What text would we see on the actual hunk header line? I had an impression
that we always take from the preimage. I might be wrong, but if that is
indeed the case, shouldn't we be ignoring the attribute tacked on to the
postimage side altogether?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-23 19:40                 ` Junio C Hamano
@ 2011-09-23 19:44                   ` Jeff King
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff King @ 2011-09-23 19:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, Jay Soffian, git discussion list, Jakub Narebski

On Fri, Sep 23, 2011 at 12:40:51PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We do some of this already. For example, textconv will look up each side
> > based on its individual filename. But the funcname code, for example
> > does this ("one" is the "from" side of the diff, "two" is the "to"
> > side):
> >
> >   pe = diff_funcname_pattern(one);
> >   if (!pe)
> >           pe = diff_funcname_pattern(two);
> 
> What text would we see on the actual hunk header line? I had an impression
> that we always take from the preimage. I might be wrong, but if that is
> indeed the case, shouldn't we be ignoring the attribute tacked on to the
> postimage side altogether?

Sort of. I think this is Michael's fallback case of "if there was no
attribute before, use the other side". In other words, we are guessing
if there was no attribute before, and there is one now, the reason is
not because they are two different formats but rather because the
attribute simply hadn't been added yet on the other side.

So if you have three commits, and the second one adds the attribute, you
can diff commits 1 and 3 in either direction, and still get the benefit
of the attribute.

Like reading .gitattributes from the current directory to look at old
(or even unrelated) commits, I think this is a convenience and is right
in most cases, but can be spectacularly wrong in some corner cases.

-Peff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-23 19:33               ` Jeff King
  2011-09-23 19:40                 ` Junio C Hamano
@ 2011-09-24  6:05                 ` Michael Haggerty
  2011-09-24  6:15                   ` Jeff King
  1 sibling, 1 reply; 31+ messages in thread
From: Michael Haggerty @ 2011-09-24  6:05 UTC (permalink / raw)
  To: Jeff King
  Cc: Jay Soffian, Junio C Hamano, git discussion list, Jakub Narebski

On 09/23/2011 09:33 PM, Jeff King wrote:
> [1] I really wish we had an elegant way of versioning meta-information
>     about a repository (like config, info/attributes, etc). I've hacked
>     around this before by having a special meta-branch for each repo,
>     checkout it out in an alternate directory, and then symlinking bits
>     of it into .git. But that's kind of ugly, too.
> 
>     I'm not sure what a good solution would look like. There's a real
>     can of worms with respect to picking and choosing local versus
>     remote bits of meta-information, with some security implications.

This could be implemented by having a single local config option that
enables the use of an in-tree .gitconfig file:

    git config core.useTreeConfig true

(or whatever the correct naming convention would be).  This option would
default to false, so the in-tree config would only occur following an
affirmative decision by the cloner.

If finer granularity is required, config options could be classified
into "security-relevant" and "non-security-relevant" options, where the
former group basically includes anything that can cause arbitrary code
to be executed.  Then there could be a "core.useTreeConfig = safeonly"
option to allow only the harmless options.

I think the priority of config options (highest to lowest) should be

       $GIT_DIR/config
       in-tree .gitconfig
       ~/.gitconfig
       $(prefix)/etc/gitconfig

Of course, just like for attributes, it would have to be decided which
version of the .gitconfig to use in which situations.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-24  6:05                 ` Michael Haggerty
@ 2011-09-24  6:15                   ` Jeff King
  2011-09-24 11:03                     ` Michael Haggerty
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2011-09-24  6:15 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jay Soffian, Junio C Hamano, git discussion list, Jakub Narebski

On Sat, Sep 24, 2011 at 08:05:29AM +0200, Michael Haggerty wrote:

> This could be implemented by having a single local config option that
> enables the use of an in-tree .gitconfig file:
> 
>     git config core.useTreeConfig true
> 
> (or whatever the correct naming convention would be).  This option would
> default to false, so the in-tree config would only occur following an
> affirmative decision by the cloner.

But once you've verified that the config looks OK and turned this option
on, how do you protect yourself from malicious config entering the repo
through a fetch?

> If finer granularity is required, config options could be classified
> into "security-relevant" and "non-security-relevant" options, where the
> former group basically includes anything that can cause arbitrary code
> to be executed.  Then there could be a "core.useTreeConfig = safeonly"
> option to allow only the harmless options.

This is the "can of worms" I referred to earlier. You have to start
classifying each option.

> I think the priority of config options (highest to lowest) should be
> 
>        $GIT_DIR/config
>        in-tree .gitconfig
>        ~/.gitconfig
>        $(prefix)/etc/gitconfig

One catch with any precedence scheme is that there are certain config
constructs that cannot be overridden. For example, some options are
respected multiple times to form a list (e.g., remote fetch and push
refspecs). There's no way in you $GIT_DIR/config to say "forget all of
the previous values you saw for this key" that may have come from the
in-tree .gitconfig.

I think you could introduce a special syntax for that, maybe just:

  # make a list with two elements
  [foo]
    bar = one
    bar = two

  # now let's imagine this is our "reset" syntax
    bar =
  # and now we can put in our own new values
    bar = three

where the final value of the config above would be (three).

It's actually not a new problem with in-tree config, and it doesn't tend
to come up all that much because most config values are treated as
simple scalars, and later values overwrite earlier ones.

> Of course, just like for attributes, it would have to be decided which
> version of the .gitconfig to use in which situations.

I'm not sure it makes sense to have it in the tree at all. For
attributes it makes sense, because you are annotating a path at a
_specific_ revision. But config is often much more meta- than that.
Take textconv for an example. The gitattributes say "foo.pdf should use
the 'pdf' diff driver". That makes sense to go in a revision. But the
config will say "pdf files can be converted to text using
/usr/bin/pdftotext". That is not something that is tied to the revision
at all, and should exist outside of any revision. I.e., whether I am
doing a "git show" on the HEAD, or on some ancient commit, I would want
to use the same value, not whatever tool I used to convert PDFs years
ago.

-Peff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-24  6:15                   ` Jeff King
@ 2011-09-24 11:03                     ` Michael Haggerty
  2011-09-26  4:09                       ` Junio C Hamano
  2011-09-26 11:05                       ` Jeff King
  0 siblings, 2 replies; 31+ messages in thread
From: Michael Haggerty @ 2011-09-24 11:03 UTC (permalink / raw)
  To: Jeff King
  Cc: Jay Soffian, Junio C Hamano, git discussion list, Jakub Narebski

On 09/24/2011 08:15 AM, Jeff King wrote:
> On Sat, Sep 24, 2011 at 08:05:29AM +0200, Michael Haggerty wrote:
>> This could be implemented by having a single local config option that
>> enables the use of an in-tree .gitconfig file:
>>
>>     git config core.useTreeConfig true
>>
>> (or whatever the correct naming convention would be).  This option would
>> default to false, so the in-tree config would only occur following an
>> affirmative decision by the cloner.
> 
> But once you've verified that the config looks OK and turned this option
> on, how do you protect yourself from malicious config entering the repo
> through a fetch?

For most software projects, the user does

    git pull
    make

daily.  There is nothing that a nasty .gitconfig can do that can't be
done more easily by a nasty Makefile (or anything else in the build
process).  The moment I pull from Junio's repository and run a build
without having personally done a full code review first, I've given
Junio complete pownership of my account.  For projects in which I invest
such trust, "core.useTreeConfig=true" would be a convenience that does
not appreciably increase my risk.

>> If finer granularity is required, config options could be classified
>> into "security-relevant" and "non-security-relevant" options, where the
>> former group basically includes anything that can cause arbitrary code
>> to be executed.  Then there could be a "core.useTreeConfig = safeonly"
>> option to allow only the harmless options.
> 
> This is the "can of worms" I referred to earlier. You have to start
> classifying each option.

It is not a "can of worms" in the sense of insoluble.  It is only a lot
of work.  We could start today by setting a default that all config
options are dangerous, incrementally opening up options as they are vetted.

>> I think the priority of config options (highest to lowest) should be
>>
>>        $GIT_DIR/config
>>        in-tree .gitconfig
>>        ~/.gitconfig
>>        $(prefix)/etc/gitconfig
> 
> One catch with any precedence scheme is that there are certain config
> constructs that cannot be overridden.

Good point, and good suggestion to allow config constructs to be cleared.

>> Of course, just like for attributes, it would have to be decided which
>> version of the .gitconfig to use in which situations.
> 
> I'm not sure it makes sense to have it in the tree at all. For
> attributes it makes sense, because you are annotating a path at a
> _specific_ revision. But config is often much more meta- than that.
> Take textconv for an example. The gitattributes say "foo.pdf should use
> the 'pdf' diff driver". That makes sense to go in a revision. But the
> config will say "pdf files can be converted to text using
> /usr/bin/pdftotext". That is not something that is tied to the revision
> at all, and should exist outside of any revision. I.e., whether I am
> doing a "git show" on the HEAD, or on some ancient commit, I would want
> to use the same value, not whatever tool I used to convert PDFs years
> ago.

There are obviously config options that should not be versioned.  These
should not be stored in the tree.  (Perhaps they should be stored in
another branch, as you suggested.)  But there's no need to prevent
people from versioning *any* config options just because they shouldn't
version *all* of them.

I think that your "config-file includes" proposal would make all of this
possible or nearly possible, but I'll have to read your proposal more
carefully before I can comment.


While we are on the topic of config settings, I have often thought that
it would be nice for git's default settings to be set via a
well-commented config file, installed along with git, rather than via
values compiled into the code.  This file and Documentation/config.txt
could be generated from a single source file as part of the build
process.  Advantages:

1. It would give a single place for people to see exactly what defaults
are being applied, and a good starting point for seeing what can be
changed locally.

2. It would ensure that the defaults listed in the documentation always
match those being used, and prevent different default values from being
used in different places in the code.

3. It would make it easy to discover what default values have changed
between two git versions (i.e., what new knobs there are to turn).

4. It would make it easy for packagers to tweak the built-in defaults,
and would make it obvious what defaults packagers had changed.  (Of
course packagers can already just include an /etc/gitconfig, but it
would be better to reserve that file for sitewide changes).

The priority would obviously be:

        $GIT_DIR/config
        in-tree .gitconfig
        ~/.gitconfig
        $(prefix)/etc/gitconfig
        git's installed defaults

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-24 11:03                     ` Michael Haggerty
@ 2011-09-26  4:09                       ` Junio C Hamano
  2011-09-26  4:28                         ` Michael Haggerty
  2011-09-26 11:05                       ` Jeff King
  1 sibling, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2011-09-26  4:09 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, Jay Soffian, git discussion list, Jakub Narebski

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 09/24/2011 08:15 AM, Jeff King wrote:
> For most software projects, the user does
>
>     git pull
>     make
>
> daily.  There is nothing that a nasty .gitconfig can do that can't be
> done more easily by a nasty Makefile (or anything else in the build
> process).  The moment I pull from Junio's repository and run a build
> without having personally done a full code review first, I've given
> Junio complete pownership of my account.

I suspect that argument is somewhat leaky.

Will I be the _only_ one you will be pulling from?  What if I were not so
careful and relay a contaminated in-tree configuration file (which I would
never use myself) to trusting downstream users like you?

>> I'm not sure it makes sense to have it in the tree at all. For
>> attributes it makes sense, because you are annotating a path at a
>> _specific_ revision. But config is often much more meta- than that.
>> Take textconv for an example. The gitattributes say "foo.pdf should use
>> the 'pdf' diff driver". That makes sense to go in a revision. But the
>> config will say "pdf files can be converted to text using
>> /usr/bin/pdftotext". That is not something that is tied to the revision
>> at all, and should exist outside of any revision. I.e., whether I am
>> doing a "git show" on the HEAD, or on some ancient commit, I would want
>> to use the same value, not whatever tool I used to convert PDFs years
>> ago.

I agree 100% with Peff on this point. What pdfviewer is configured to be
used for my repository is tied closely to what I happen to have installed
on my box that I use that repository on. This is not by accident but by
design why attributes only specify what _kind_ of payload is in the path,
and leave it up to the config to say _how_ to handle each kind of payload
on the box).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-26  4:09                       ` Junio C Hamano
@ 2011-09-26  4:28                         ` Michael Haggerty
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Haggerty @ 2011-09-26  4:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Jay Soffian, git discussion list, Jakub Narebski

On 09/26/2011 06:09 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> On 09/24/2011 08:15 AM, Jeff King wrote:
>> For most software projects, the user does
>>
>>     git pull
>>     make
>>
>> daily.  There is nothing that a nasty .gitconfig can do that can't be
>> done more easily by a nasty Makefile (or anything else in the build
>> process).  The moment I pull from Junio's repository and run a build
>> without having personally done a full code review first, I've given
>> Junio complete pownership of my account.
> 
> I suspect that argument is somewhat leaky.
> 
> Will I be the _only_ one you will be pulling from?  What if I were not so
> careful and relay a contaminated in-tree configuration file (which I would
> never use myself) to trusting downstream users like you?

I'm not saying that trusting in-tree configuration files makes sense for
everybody, and in the open-source world one must be very careful about
doing so.  But I think that among closely-cooperating groups (e.g.,
personal projects, many projects in industry) (1) we don't pull from
outside the group and (2) we are forced to trust each other pretty
completely anyway.  So I think that there are many sane use-cases for
giving users the opportunity to "bless" an in-tree config.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-24 11:03                     ` Michael Haggerty
  2011-09-26  4:09                       ` Junio C Hamano
@ 2011-09-26 11:05                       ` Jeff King
  2011-09-26 14:14                         ` Jakub Narebski
  2011-09-26 15:11                         ` Michael Haggerty
  1 sibling, 2 replies; 31+ messages in thread
From: Jeff King @ 2011-09-26 11:05 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jay Soffian, Junio C Hamano, git discussion list, Jakub Narebski

On Sat, Sep 24, 2011 at 01:03:50PM +0200, Michael Haggerty wrote:

> > But once you've verified that the config looks OK and turned this option
> > on, how do you protect yourself from malicious config entering the repo
> > through a fetch?
> 
> For most software projects, the user does
> 
>     git pull
>     make
> 
> daily.  There is nothing that a nasty .gitconfig can do that can't be
> done more easily by a nasty Makefile (or anything else in the build
> process).  The moment I pull from Junio's repository and run a build
> without having personally done a full code review first, I've given
> Junio complete pownership of my account.  For projects in which I invest
> such trust, "core.useTreeConfig=true" would be a convenience that does
> not appreciably increase my risk.

True. I think it would be nice for their to be a less risky option, too,
though.  Which was part of the motivation in my earlier proposal for
specifying a ref that contains config. Then you can ignore it, track
your own local config which is derived from upstream, or do the easy but
dangerous thing of just using the remote tracking ref directly.

> There are obviously config options that should not be versioned.  These
> should not be stored in the tree.  (Perhaps they should be stored in
> another branch, as you suggested.)  But there's no need to prevent
> people from versioning *any* config options just because they shouldn't
> version *all* of them.

Do you have an example of a config option whose old value should be
respected when looking at an old version? In general, the split between
attributes and config was meant to split what is tied to a specific
version, and what should apply to all versions. We may not have followed
that in all cases, but if so, the right solution may actually be
migrating the functionality into .gitattributes.

> While we are on the topic of config settings, I have often thought that
> it would be nice for git's default settings to be set via a
> well-commented config file, installed along with git, rather than via
> values compiled into the code.  This file and Documentation/config.txt
> could be generated from a single source file as part of the build
> process.

I think that can be a nice piece of documentation, but there may be some
complications.  I seem to recall that there may be one or two options
whose builtin values cannot be replicated via config (i.e., the "unset"
state means something). But I may be misremembering.

However, I'm not sure what you mean by "rather than via values compiled
into the code".  Would you somehow generate code that sets the default
according to your master file? Would git fail to start if the file is
missing? Or if a specific config option is missing? How would you track
that?

-Peff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-26 11:05                       ` Jeff King
@ 2011-09-26 14:14                         ` Jakub Narebski
  2011-09-26 15:11                         ` Michael Haggerty
  1 sibling, 0 replies; 31+ messages in thread
From: Jakub Narebski @ 2011-09-26 14:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, Jay Soffian, Junio C Hamano, git discussion list

Jeff King wrote:
> On Sat, Sep 24, 2011 at 01:03:50PM +0200, Michael Haggerty wrote:
 
> > While we are on the topic of config settings, I have often thought that
> > it would be nice for git's default settings to be set via a
> > well-commented config file, installed along with git, rather than via
> > values compiled into the code.  This file and Documentation/config.txt
> > could be generated from a single source file as part of the build
> > process.
> 
> I think that can be a nice piece of documentation, but there may be some
> complications.  I seem to recall that there may be one or two options
> whose builtin values cannot be replicated via config (i.e., the "unset"
> state means something). But I may be misremembering.
> 
> However, I'm not sure what you mean by "rather than via values compiled
> into the code".  Would you somehow generate code that sets the default
> according to your master file? Would git fail to start if the file is
> missing? Or if a specific config option is missing? How would you track
> that?

There is also problem that it would screw up Git deprecation policy.
 * new behavior is introduced, with a knob that defaults to off
 * after some time knob starts to default to on
 * git uses new feature by default

With Git creating config file with state of config variables frozen at
the state of repository creation this would be not possible.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-26 11:05                       ` Jeff King
  2011-09-26 14:14                         ` Jakub Narebski
@ 2011-09-26 15:11                         ` Michael Haggerty
  1 sibling, 0 replies; 31+ messages in thread
From: Michael Haggerty @ 2011-09-26 15:11 UTC (permalink / raw)
  To: Jeff King
  Cc: Jay Soffian, Junio C Hamano, git discussion list, Jakub Narebski

On 09/26/2011 01:05 PM, Jeff King wrote:
> On Sat, Sep 24, 2011 at 01:03:50PM +0200, Michael Haggerty wrote:
>> There are obviously config options that should not be versioned.  These
>> should not be stored in the tree.  (Perhaps they should be stored in
>> another branch, as you suggested.)  But there's no need to prevent
>> people from versioning *any* config options just because they shouldn't
>> version *all* of them.
> 
> Do you have an example of a config option whose old value should be
> respected when looking at an old version? In general, the split between
> attributes and config was meant to split what is tied to a specific
> version, and what should apply to all versions. We may not have followed
> that in all cases, but if so, the right solution may actually be
> migrating the functionality into .gitattributes.

I hadn't realized this point about the split between gitconfig and
gitattributes.  It makes sense.  I'd been thinking that it would be nice
to be able to share changes to the git config the same way that source
code is shared, but you are right that the git config is versioned
differently than the source code.  I can't think of any counterexamples
right now.

>> While we are on the topic of config settings, I have often thought that
>> it would be nice for git's default settings to be set via a
>> well-commented config file, installed along with git, rather than via
>> values compiled into the code.  This file and Documentation/config.txt
>> could be generated from a single source file as part of the build
>> process.
> 
> I think that can be a nice piece of documentation, but there may be some
> complications.  I seem to recall that there may be one or two options
> whose builtin values cannot be replicated via config (i.e., the "unset"
> state means something). But I may be misremembering.

This is more a reason to implement a way to "unset" options than an
argument against storing git's builtin defaults in a gitconfig-formatted
file.

> However, I'm not sure what you mean by "rather than via values compiled
> into the code".  Would you somehow generate code that sets the default
> according to your master file? Would git fail to start if the file is
> missing? Or if a specific config option is missing? How would you track
> that?

I think that git should read its "as-installed" config file at runtime
to maximize the transparency.  If the file is missing then git should
fail, just as it would if some other important part of git were missing
from an install.  The absence of a required config option would be a bug
and might cause git to abort just like any other bug.

I probably shouldn't mention this lest people realize that I'm stupid
and ugly, but...

Subversion installs a file with all of its defaults into your
~/.subversion directory the first time that it runs.  I find this
feature to be very helpful as a source of examples and to learn new
features.  But they do it a bit differently than I am proposing for
git--the file is written to the user's ~/.subversion, and is
subsequently edited by the user.  Because the file lives across
Subversion upgrades, the options in this file are *commented out* to
prevent them causing trouble in the future; i.e., they are for human
consumption only.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2011-09-21 20:17 ` Junio C Hamano
  2011-09-22  8:28   ` Michael Haggerty
@ 2012-02-17 18:42   ` Michael Haggerty
  2012-02-17 19:26     ` Junio C Hamano
  1 sibling, 1 reply; 31+ messages in thread
From: Michael Haggerty @ 2012-02-17 18:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git discussion list

On 09/21/2011 10:17 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> I was thinking of using git attributes to configure a server-side
>> "update" hook that does some basic sanity checking before accepting a
>> push.  I thought I could do something like
>> [...]
>> I see that there is an (undocumented) API involving
>> git_attr_set_direction() that seems to let gitattributes to be read out
>> of the index instead of the working tree.  But I am still confused:
> 
> The words "server side" automatically mean that there should be no working
> tree, and where there is no working tree there should be no index, so the
> direction should not make any difference.  The attributes that are used to
> help whitespace checks should come from project.git/info/attributes in
> such a case [*...*].

I was just alerted by Scott Chacon's blog [1] to the fact that one can
set GIT_INDEX_FILE to an arbitrary filename, thereby causing the index
to be read/written from that file instead of $GIT_DIR/index.  It
occurred to me that this feature, along with the addition of "git
check-attr --cached" in 1.7.8, can be used to give server-side access to
the gitattributes for an arbitrary commit:

    GIT_INDEX_FILE="$(tempfile)"
    export GIT_INDEX_FILE
    git read-tree $COMMIT
    git check-attr --cached attr -- pathname
    ...
    rm "$GIT_INDEX_FILE"

Empirically it seems to work (and it is surprisingly fast).  The use of
a temporary file prevents simultaneous accesses to the repository from
stepping on each other.  This looks like a clean solution to my problem.
 Or is there some hidden pitfall in this approach?

Michael

[1] http://progit.org/2010/04/11/environment.html

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2012-02-17 18:42   ` Michael Haggerty
@ 2012-02-17 19:26     ` Junio C Hamano
  2012-02-17 19:59       ` Junio C Hamano
  0 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2012-02-17 19:26 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git discussion list

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I was just alerted by Scott Chacon's blog [1] to the fact that one can
> set GIT_INDEX_FILE to an arbitrary filename, thereby causing the index
> to be read/written from that file instead of $GIT_DIR/index.

That's very old fashioned.  For almost five years, the preferred way to
say that has been "git read-tree --index-output=file" ;-)

But what you outlined should work.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: How to use git attributes to configure server-side checks?
  2012-02-17 19:26     ` Junio C Hamano
@ 2012-02-17 19:59       ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-02-17 19:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, git discussion list

Junio C Hamano <gitster@pobox.com> writes:

> Michael Haggerty <mhagger@alum.mit.edu> writes:
>
>> I was just alerted by Scott Chacon's blog [1] to the fact that one can
>> set GIT_INDEX_FILE to an arbitrary filename, thereby causing the index
>> to be read/written from that file instead of $GIT_DIR/index.
>
> That's very old fashioned.  For almost five years, the preferred way to
> say that has been "git read-tree --index-output=file" ;-)

Ah, I take that back.

There is any benefit to be had by using "read-tree --index-output" in the
way used in your check-attr example.  Setting up a single GIT_INDEX_FILE
and using it throughout as you did is indeed much more preferrable.

> But what you outlined should work.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-02-17 20:00 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-21 19:32 How to use git attributes to configure server-side checks? Michael Haggerty
2011-09-21 20:02 ` Jay Soffian
2011-09-21 20:17 ` Junio C Hamano
2011-09-22  8:28   ` Michael Haggerty
2011-09-22 15:41     ` Jay Soffian
2011-09-22 17:13       ` Jeff King
2011-09-22 18:41         ` Jay Soffian
2011-09-22 19:22           ` Junio C Hamano
2011-09-22 20:58           ` Jeff King
2011-09-22 21:04             ` Jeff King
2011-09-23 10:06             ` Michael Haggerty
2011-09-23 19:33               ` Jeff King
2011-09-23 19:40                 ` Junio C Hamano
2011-09-23 19:44                   ` Jeff King
2011-09-24  6:05                 ` Michael Haggerty
2011-09-24  6:15                   ` Jeff King
2011-09-24 11:03                     ` Michael Haggerty
2011-09-26  4:09                       ` Junio C Hamano
2011-09-26  4:28                         ` Michael Haggerty
2011-09-26 11:05                       ` Jeff King
2011-09-26 14:14                         ` Jakub Narebski
2011-09-26 15:11                         ` Michael Haggerty
2011-09-22 17:26     ` Junio C Hamano
2011-09-23  8:35       ` Michael Haggerty
2011-09-23 12:49         ` Stephen Bash
2011-09-23 13:31           ` Michael Haggerty
2011-09-22 22:54     ` Jakub Narebski
2011-09-23 10:38       ` Michael Haggerty
2012-02-17 18:42   ` Michael Haggerty
2012-02-17 19:26     ` Junio C Hamano
2012-02-17 19:59       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.