All of lore.kernel.org
 help / color / mirror / Atom feed
* Weirdness with git change detection
@ 2017-07-10 23:45 Peter Eckersley
  2017-07-11  4:15 ` Torsten Bögershausen
  2017-07-11 15:57 ` Junio C Hamano
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Eckersley @ 2017-07-10 23:45 UTC (permalink / raw)
  To: git

I have a local git repo that's in some weird state where changes
appear to be detected by "git diff" and prevent operations like "git
checkout" from switching branches, but those changes are not removed
by a "git reset --hard" or "git stash".

Here's an example of the behaviour, with "git reset --hard" failing to
clear a diff in the index:

https://paste.debian.net/975811/

Happy to collect additional debugging information if it's useful.
-- 
Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-10 23:45 Weirdness with git change detection Peter Eckersley
@ 2017-07-11  4:15 ` Torsten Bögershausen
  2017-07-11  7:06   ` Jeff King
  2017-07-11 15:57 ` Junio C Hamano
  1 sibling, 1 reply; 11+ messages in thread
From: Torsten Bögershausen @ 2017-07-11  4:15 UTC (permalink / raw)
  To: Peter Eckersley, git



On 11/07/17 01:45, Peter Eckersley wrote:
> I have a local git repo that's in some weird state where changes
> appear to be detected by "git diff" and prevent operations like "git
> checkout" from switching branches, but those changes are not removed
> by a "git reset --hard" or "git stash".
> 
> Here's an example of the behaviour, with "git reset --hard" failing to
> clear a diff in the index:
> 
> https://paste.debian.net/975811/
> 
> Happy to collect additional debugging information if it's useful.
> 
If possible, we need to clone the repo and debug ourselfs - in other
words, the problem must be reproducible for others.

It the repo public ?
Which OS, Git version are you using ?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-11  4:15 ` Torsten Bögershausen
@ 2017-07-11  7:06   ` Jeff King
  2017-07-11  7:31     ` Peter Eckersley
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2017-07-11  7:06 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Peter Eckersley, git

On Tue, Jul 11, 2017 at 06:15:17AM +0200, Torsten Bögershausen wrote:

> On 11/07/17 01:45, Peter Eckersley wrote:
> > I have a local git repo that's in some weird state where changes
> > appear to be detected by "git diff" and prevent operations like "git
> > checkout" from switching branches, but those changes are not removed
> > by a "git reset --hard" or "git stash".
> > 
> > Here's an example of the behaviour, with "git reset --hard" failing to
> > clear a diff in the index:
> > 
> > https://paste.debian.net/975811/
> > 
> > Happy to collect additional debugging information if it's useful.
> > 
> If possible, we need to clone the repo and debug ourselfs - in other
> words, the problem must be reproducible for others.
> 
> It the repo public ?

It looks like https://github.com/AI-metrics/AI-metrics.

I notice it has a .gitattributes file with:

  *.ipynb filter=clean_ipynb

There's a config snippet in the repo with:

  [filter "clean_ipynb"]
    clean = ipynb_drop_output
    smudge = cat

and the drop_output script is included. From the paste we can see that
Peter was at commit c464aaa. Checking out that commit and running the
script shows that it produces the differences that Git is showing.

The problem is that the currently committed contents do not match the
output of the clean filter. So even when "git reset --hard" puts the
content from the repository back into the working tree (putting it
through the smudge filter, which is a noop), running the clean filter on
the result will always have a difference. Either the filter needs to be
disabled, or the cleaned contents committed.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-11  7:06   ` Jeff King
@ 2017-07-11  7:31     ` Peter Eckersley
  2017-07-11  7:34       ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Eckersley @ 2017-07-11  7:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Torsten Bögershausen, git

I did try to test that hypothesis by editing the filter to be a no-op,
but it's possible I go that wrong. My apologies for bugging the list!

On 11 July 2017 at 00:06, Jeff King <peff@peff.net> wrote:
> On Tue, Jul 11, 2017 at 06:15:17AM +0200, Torsten Bögershausen wrote:
>
>> On 11/07/17 01:45, Peter Eckersley wrote:
>> > I have a local git repo that's in some weird state where changes
>> > appear to be detected by "git diff" and prevent operations like "git
>> > checkout" from switching branches, but those changes are not removed
>> > by a "git reset --hard" or "git stash".
>> >
>> > Here's an example of the behaviour, with "git reset --hard" failing to
>> > clear a diff in the index:
>> >
>> > https://paste.debian.net/975811/
>> >
>> > Happy to collect additional debugging information if it's useful.
>> >
>> If possible, we need to clone the repo and debug ourselfs - in other
>> words, the problem must be reproducible for others.
>>
>> It the repo public ?
>
> It looks like https://github.com/AI-metrics/AI-metrics.
>
> I notice it has a .gitattributes file with:
>
>   *.ipynb filter=clean_ipynb
>
> There's a config snippet in the repo with:
>
>   [filter "clean_ipynb"]
>     clean = ipynb_drop_output
>     smudge = cat
>
> and the drop_output script is included. From the paste we can see that
> Peter was at commit c464aaa. Checking out that commit and running the
> script shows that it produces the differences that Git is showing.
>
> The problem is that the currently committed contents do not match the
> output of the clean filter. So even when "git reset --hard" puts the
> content from the repository back into the working tree (putting it
> through the smudge filter, which is a noop), running the clean filter on
> the result will always have a difference. Either the filter needs to be
> disabled, or the cleaned contents committed.
>
> -Peff



-- 
Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-11  7:31     ` Peter Eckersley
@ 2017-07-11  7:34       ` Jeff King
  2017-07-11  8:20         ` Torsten Bögershausen
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2017-07-11  7:34 UTC (permalink / raw)
  To: Peter Eckersley; +Cc: Torsten Bögershausen, git

On Tue, Jul 11, 2017 at 12:31:25AM -0700, Peter Eckersley wrote:

> I did try to test that hypothesis by editing the filter to be a no-op,
> but it's possible I go that wrong. My apologies for bugging the list!

No problem. I actually think it would be interesting if Git could
somehow detect and warn about this situation. But the obvious way to do
that would be to re-run the clean filter directly after checkout. And
doing that all the time is expensive.

Perhaps some kind of "lint" program would be interesting to warn of
possible misconfigurations. Of course people would have to run it for it
to be useful. :)

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-11  7:34       ` Jeff King
@ 2017-07-11  8:20         ` Torsten Bögershausen
  2017-07-11  8:24           ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Torsten Bögershausen @ 2017-07-11  8:20 UTC (permalink / raw)
  To: Jeff King, Peter Eckersley; +Cc: git



On 11/07/17 09:34, Jeff King wrote:
> On Tue, Jul 11, 2017 at 12:31:25AM -0700, Peter Eckersley wrote:
> 
>> I did try to test that hypothesis by editing the filter to be a no-op,
>> but it's possible I go that wrong. My apologies for bugging the list!

Actually I like this kind of feedback, to see how people are using Git,
including the problems they have.

> 
> No problem. I actually think it would be interesting if Git could
> somehow detect and warn about this situation. But the obvious way to do
> that would be to re-run the clean filter directly after checkout. And
> doing that all the time is expensive.

Would it be possible to limit the round-trip-check to "git reset --hard" ?
If yes, possibly many users are willing to pay the price, if Git tells
the "your filters don't round-trip". (Including CRLF conversions)

> 
> Perhaps some kind of "lint" program would be interesting to warn of
> possible misconfigurations. Of course people would have to run it for it
> to be useful. :)

What do you have in mind here ?
Don't we need to run some content through the filter(s)?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-11  8:20         ` Torsten Bögershausen
@ 2017-07-11  8:24           ` Jeff King
  2017-08-18  6:56             ` Michael J Gruber
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2017-07-11  8:24 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Peter Eckersley, git

On Tue, Jul 11, 2017 at 10:20:43AM +0200, Torsten Bögershausen wrote:

> > No problem. I actually think it would be interesting if Git could
> > somehow detect and warn about this situation. But the obvious way to do
> > that would be to re-run the clean filter directly after checkout. And
> > doing that all the time is expensive.
> 
> Would it be possible to limit the round-trip-check to "git reset --hard" ?
> If yes, possibly many users are willing to pay the price, if Git tells
> the "your filters don't round-trip". (Including CRLF conversions)

Anything's possible, I suppose. But I don't think I'd want that feature
turned on myself.

> > Perhaps some kind of "lint" program would be interesting to warn of
> > possible misconfigurations. Of course people would have to run it for it
> > to be useful. :)
> 
> What do you have in mind here ?
> Don't we need to run some content through the filter(s)?

I was thinking of a tool that could run a series of checks on the
repository and nag about potential problems. One of them could be doing
a round-trip repo->clean->smudge for each file.

Another one might be warning about files that differ only in case.

The idea being that users could run "git lint" if they suspect something
funny is going on. I dunno. It may be a dead-end. Most such
oddities are better detected and handled during actual git operations if
we can. So this would really just be for things that are too expensive
to detect in normal operations.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-10 23:45 Weirdness with git change detection Peter Eckersley
  2017-07-11  4:15 ` Torsten Bögershausen
@ 2017-07-11 15:57 ` Junio C Hamano
  2017-07-11 16:04   ` Junio C Hamano
  1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2017-07-11 15:57 UTC (permalink / raw)
  To: Peter Eckersley; +Cc: git

Peter Eckersley <peter.eckersley@gmail.com> writes:

> I have a local git repo that's in some weird state where changes
> appear to be detected by "git diff" and prevent operations like "git
> checkout" from switching branches, but those changes are not removed
> by a "git reset --hard" or "git stash".
>
> Here's an example of the behaviour, with "git reset --hard" failing to
> clear a diff in the index:
>
> https://paste.debian.net/975811/
>
> Happy to collect additional debugging information if it's useful.

Do you have any funny clean-smudge pair that do not round-trip?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-11 15:57 ` Junio C Hamano
@ 2017-07-11 16:04   ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2017-07-11 16:04 UTC (permalink / raw)
  To: Peter Eckersley; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Peter Eckersley <peter.eckersley@gmail.com> writes:
>
>> I have a local git repo that's in some weird state where changes
>> appear to be detected by "git diff" and prevent operations like "git
>> checkout" from switching branches, but those changes are not removed
>> by a "git reset --hard" or "git stash".
>>
>> Here's an example of the behaviour, with "git reset --hard" failing to
>> clear a diff in the index:
>>
>> https://paste.debian.net/975811/
>>
>> Happy to collect additional debugging information if it's useful.
>
> Do you have any funny clean-smudge pair that do not round-trip?

Ah, nevermind.  Peff's analysis looks correct.  Thanks for a report
to provoke a good discussion.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-07-11  8:24           ` Jeff King
@ 2017-08-18  6:56             ` Michael J Gruber
  2017-08-18 11:57               ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Michael J Gruber @ 2017-08-18  6:56 UTC (permalink / raw)
  To: Jeff King, Torsten Bögershausen; +Cc: Peter Eckersley, git

Jeff King venit, vidit, dixit 11.07.2017 10:24:
> On Tue, Jul 11, 2017 at 10:20:43AM +0200, Torsten Bögershausen wrote:
> 
>>> No problem. I actually think it would be interesting if Git could
>>> somehow detect and warn about this situation. But the obvious way to do
>>> that would be to re-run the clean filter directly after checkout. And
>>> doing that all the time is expensive.
>>
>> Would it be possible to limit the round-trip-check to "git reset --hard" ?
>> If yes, possibly many users are willing to pay the price, if Git tells
>> the "your filters don't round-trip". (Including CRLF conversions)
> 
> Anything's possible, I suppose. But I don't think I'd want that feature
> turned on myself.
> 
>>> Perhaps some kind of "lint" program would be interesting to warn of
>>> possible misconfigurations. Of course people would have to run it for it
>>> to be useful. :)
>>
>> What do you have in mind here ?
>> Don't we need to run some content through the filter(s)?
> 
> I was thinking of a tool that could run a series of checks on the
> repository and nag about potential problems. One of them could be doing
> a round-trip repo->clean->smudge for each file.
> 
> Another one might be warning about files that differ only in case.
> 
> The idea being that users could run "git lint" if they suspect something
> funny is going on. I dunno. It may be a dead-end. Most such
> oddities are better detected and handled during actual git operations if
> we can. So this would really just be for things that are too expensive
> to detect in normal operations.
> 
> -Peff
> 

Typically, that problem arises when you turn a filter on or off at some
point in your history. Since "attributes" can come from various sources,
especially the versioned ".gitattributes" file, unversioned per-repo
.git/info/attributes, and global attributes, "git diff" may apply
different attributes depending on what you diff (versioned blob, workdir
file, out-of-tree file).

This is not made easier by the fact that unversioned config (per repo,
per user, global) defines the filter action, and that even upgrades of
your filter tools may change the output. So, "filter off/on" is by no
means the only possible source of discrepancies.

I've found that when I decide to use a filter like that, the best
approach is to either apply it retroactively (filter-branch,
unversionsed attributes, that is clean all stored blobs) or make a
commit where I specifically note the switch (versioned .gitattributes
plus affected blob changes) and what config should go along with it.

All of this is difficult to check or correct automatically, since it
depends on user decisions.

About the only thing we could do is checking that
"clean(smudge(foo))=clean(foo)" at a specific "point in time"
(attributes, config) for specific foo, but that wouldn't catch the case
above, even if we iterated over all commits which affect files that the
filter (currently) applies to.

Keep in mind that filters are a killer feature, so if you shoot yourself
in the foot: it could have come worse ;)

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Weirdness with git change detection
  2017-08-18  6:56             ` Michael J Gruber
@ 2017-08-18 11:57               ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2017-08-18 11:57 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Torsten Bögershausen, Peter Eckersley, git

On Fri, Aug 18, 2017 at 08:56:03AM +0200, Michael J Gruber wrote:

> > The idea being that users could run "git lint" if they suspect something
> > funny is going on. I dunno. It may be a dead-end. Most such
> > oddities are better detected and handled during actual git operations if
> > we can. So this would really just be for things that are too expensive
> > to detect in normal operations.
> 
> Typically, that problem arises when you turn a filter on or off at some
> point in your history. Since "attributes" can come from various sources,
> especially the versioned ".gitattributes" file, unversioned per-repo
> .git/info/attributes, and global attributes, "git diff" may apply
> different attributes depending on what you diff (versioned blob, workdir
> file, out-of-tree file).
> [...]

Yeah, I agree that we cannot catch every problematic case (for all the
reason you give here). It does seem like a large number of problems
people report have to do with checkout of in-tree .gitattributes,
though. So my thinking was if we could cover that case, it might help
people (even if we leave many problems unnoticed).

But...

> I've found that when I decide to use a filter like that, the best
> approach is to either apply it retroactively (filter-branch,
> unversionsed attributes, that is clean all stored blobs) or make a
> commit where I specifically note the switch (versioned .gitattributes
> plus affected blob changes) and what config should go along with it.

One problem is that people need to know to run the lint command. And if
they know enough that this is a problem worthy of checking via a linter,
then they could perhaps just as easily do the in-tree blob changes.

I say "perhaps" because I don't think it's as easy as running a single
"git fix-my-stale-blobs". I wonder if rather than a linter, we ought to
just have an option to "git checkout" or something to ignore stat data
and re-checkout any entries for which convert_to_working_tree() isn't a
noop.

That serves both as a repair tool and as a linter (since running "git
diff" on the result would show you what needs to be fixed). It wouldn't
solve the user-education problem, but at least it would give a simple
solution that could be passed along to users.

I dunno. I don't do line ending conversion, so I don't really run into
this issue myself.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-18 11:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-10 23:45 Weirdness with git change detection Peter Eckersley
2017-07-11  4:15 ` Torsten Bögershausen
2017-07-11  7:06   ` Jeff King
2017-07-11  7:31     ` Peter Eckersley
2017-07-11  7:34       ` Jeff King
2017-07-11  8:20         ` Torsten Bögershausen
2017-07-11  8:24           ` Jeff King
2017-08-18  6:56             ` Michael J Gruber
2017-08-18 11:57               ` Jeff King
2017-07-11 15:57 ` Junio C Hamano
2017-07-11 16:04   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.