git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git should preserve modification times at least on request
@ 2018-02-19 21:22 Peter Backes
  2018-02-19 21:58 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Peter Backes @ 2018-02-19 21:22 UTC (permalink / raw)
  To: git

Hello,

please ensure to CC me if you reply as I am not subscribed to the list.

https://git.wiki.kernel.org/index.php/Git_FAQ#Why_isn.27t_Git_preserving_modification_time_on_files.3F 
argues that git isn't preserving modification times because it needs to 
ensure that build tools work properly.

I agree that modification times should not be restored by default, 
because of the principle of least astonishment. But should it be 
impossible? The principle of least astonishment does not mandate this; 
it is not a paternalistic principle.

Thus, I do not get at all
- why git doesn't *store* modification times, perhaps by default, but 
at least on request
- why git doesn't restore modification times *on request*

It is pretty annoying that git cannot, even if I know what I am doing, 
and explicitly want it to, preserve the modification time.

One use case: I have lots of file lying around in my build directory 
and for some of them, the modification time in important information to 
me. Those files are not at all used with the build tool. In contrast to 
git pull, git pull --rebase needs those to be stashed. But after the 
pull and unstash, the mtime is gone. Boo.

Please provide options to store and restore modification times. It 
shouldn't be hard to do, given that other metadata such as the mode is 
already stored. It would make live so much easier. And the fact that 
this has made into the FAQ clearly suggests that there are many others 
who think so.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 21:22 Git should preserve modification times at least on request Peter Backes
@ 2018-02-19 21:58 ` Johannes Schindelin
  2018-02-19 22:08   ` Peter Backes
  2018-02-19 22:37   ` Randall S. Becker
  2018-02-20 21:16 ` Jeff King
  2018-02-21 21:03 ` Derek Fawcus
  2 siblings, 2 replies; 28+ messages in thread
From: Johannes Schindelin @ 2018-02-19 21:58 UTC (permalink / raw)
  To: Peter Backes; +Cc: git

Hi Peter,

On Mon, 19 Feb 2018, Peter Backes wrote:

> please ensure to CC me if you reply as I am not subscribed to the list.
> 
> https://git.wiki.kernel.org/index.php/Git_FAQ#Why_isn.27t_Git_preserving_modification_time_on_files.3F 
> argues that git isn't preserving modification times because it needs to 
> ensure that build tools work properly.
> 
> I agree that modification times should not be restored by default, 
> because of the principle of least astonishment. But should it be 
> impossible? The principle of least astonishment does not mandate this; 
> it is not a paternalistic principle.
> 
> Thus, I do not get at all
> - why git doesn't *store* modification times, perhaps by default, but 
> at least on request
> - why git doesn't restore modification times *on request*
> 
> It is pretty annoying that git cannot, even if I know what I am doing, 
> and explicitly want it to, preserve the modification time.
> 
> One use case: I have lots of file lying around in my build directory 
> and for some of them, the modification time in important information to 
> me. Those files are not at all used with the build tool. In contrast to 
> git pull, git pull --rebase needs those to be stashed. But after the 
> pull and unstash, the mtime is gone. Boo.
> 
> Please provide options to store and restore modification times. It 
> shouldn't be hard to do, given that other metadata such as the mode is 
> already stored. It would make live so much easier. And the fact that 
> this has made into the FAQ clearly suggests that there are many others 
> who think so.

Since you already assessed that it shouldn't be hard to do, you probably
want to put your money where your mouth is and come up with a patch, and
then offer it up for discussion on this here mailing list.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 21:58 ` Johannes Schindelin
@ 2018-02-19 22:08   ` Peter Backes
  2018-02-20  1:22     ` Theodore Ts'o
  2018-02-20 10:46     ` Johannes Schindelin
  2018-02-19 22:37   ` Randall S. Becker
  1 sibling, 2 replies; 28+ messages in thread
From: Peter Backes @ 2018-02-19 22:08 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi Johannes,

On Mon, Feb 19, 2018 at 10:58:12PM +0100, Johannes Schindelin wrote:
> Since you already assessed that it shouldn't be hard to do, you probably
> want to put your money where your mouth is and come up with a patch, and
> then offer it up for discussion on this here mailing list.

Well, it would be good to discuss this a bit beforehand, since my time 
is wasted if there's no chance to get it accepted. Perhaps there is 
some counterargument I don't know about.

Is there some existing code that could be used? I think I read 
somewhere that git once did preserve mtimes, but that this code was 
removed because of the build tool issues. Perhaps that code could 
simply be put back in, and surrounded by conditions.

Best wishes
Peter

PS: Given the opportunity, I want to thank you very much for 
maintaining the git repository for my cvsclone tool.

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Git should preserve modification times at least on request
  2018-02-19 21:58 ` Johannes Schindelin
  2018-02-19 22:08   ` Peter Backes
@ 2018-02-19 22:37   ` Randall S. Becker
  2018-02-19 23:22     ` Hilco Wijbenga
  1 sibling, 1 reply; 28+ messages in thread
From: Randall S. Becker @ 2018-02-19 22:37 UTC (permalink / raw)
  To: 'Johannes Schindelin', 'Peter Backes'; +Cc: git

On February 19, 2018 4:58 PM Johannes wrote:
> On Mon, 19 Feb 2018, Peter Backes wrote:
> 
> > please ensure to CC me if you reply as I am not subscribed to the list.
> >
> > https://git.wiki.kernel.org/index.php/Git_FAQ#Why_isn.27t_Git_preservi
> > ng_modification_time_on_files.3F argues that git isn't preserving
> > modification times because it needs to ensure that build tools work
> > properly.
> >
> > I agree that modification times should not be restored by default,
> > because of the principle of least astonishment. But should it be
> > impossible? The principle of least astonishment does not mandate this;
> > it is not a paternalistic principle.
> >
> > Thus, I do not get at all
> > - why git doesn't *store* modification times, perhaps by default, but
> > at least on request
> > - why git doesn't restore modification times *on request*
> >
> > It is pretty annoying that git cannot, even if I know what I am doing,
> > and explicitly want it to, preserve the modification time.
> >
> > One use case: I have lots of file lying around in my build directory
> > and for some of them, the modification time in important information
> > to me. Those files are not at all used with the build tool. In
> > contrast to git pull, git pull --rebase needs those to be stashed. But
> > after the pull and unstash, the mtime is gone. Boo.
> >
> > Please provide options to store and restore modification times. It
> > shouldn't be hard to do, given that other metadata such as the mode is
> > already stored. It would make live so much easier. And the fact that
> > this has made into the FAQ clearly suggests that there are many others
> > who think so.
> 
> Since you already assessed that it shouldn't be hard to do, you probably
> want to put your money where your mouth is and come up with a patch, and
> then offer it up for discussion on this here mailing list.

Putting my large-production-user hat on, there are (at least) three
conditions that exist in this space:

1. Build systems - this typically need the file modification time to be set
to the time at which git touches a file (e.g., checkout). This permits build
systems to detect that files are modified (even if an older version is
checked out, make, for example, still needs to see the change to initiate a
build. My understanding is that current git behaviour is modeled on this use
case.

2. Commit linkage - in some environments, files that are checked out are set
to the timestamp of the commit rather than the original file time or the
checkout time. This permits a faster production resolution of when changes
were run through the system as a group. I have implemented this strategy
(somewhat grudgingly) in a few places. It is a possible desire for some
users. I particularly dislike this approach because merge/cherry-pick/rebase
can mess with the preceptive "when" of a change and if you are going to do
this, make sure that your metadata is suitably managed.

3. Original file times - as Peter asked, storing the original file time has
some legacy advantages. This emulates the behaviour of some legacy SCM
systems and makes people feel better about things. From an audit point of
view, this has value for systems other than git. In git, you use the
hash-object to figure out what the file really is, so there is no real audit
need anymore for timestamps, which can be spoofed at whim anyway. The
hash-object comment applies to 2 also. Same comment here for dealing with
non-touching but modifying. For example: what is the timestamp on a
merge-squash? I would contend that it is the time of the merge-squash, not
the original time. It could also be an interim term, when a conflict was
resolved.

Just remember that #2 and #3 break #1, unless you essentially rebuild from
scratch in every build (ant/maven models). With that said, I seen many repo
admins who want all of the above, so making them all available would make
their lives easier.

My $0.02. Cheers,
Randall

-- Brief whoami:
  NonStop developer since approximately NonStop(211288444200000000)
  UNIX developer since approximately 421664400
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 22:37   ` Randall S. Becker
@ 2018-02-19 23:22     ` Hilco Wijbenga
  2018-02-20 16:42       ` Hilco Wijbenga
  0 siblings, 1 reply; 28+ messages in thread
From: Hilco Wijbenga @ 2018-02-19 23:22 UTC (permalink / raw)
  To: Git Users; +Cc: Johannes Schindelin, Peter Backes, Randall S. Becker

On Mon, Feb 19, 2018 at 2:37 PM, Randall S. Becker
<rsbecker@nexbridge.com> wrote:
> On February 19, 2018 4:58 PM Johannes wrote:
>> On Mon, 19 Feb 2018, Peter Backes wrote:
>>
>> > please ensure to CC me if you reply as I am not subscribed to the list.
>> >
>> > https://git.wiki.kernel.org/index.php/Git_FAQ#Why_isn.27t_Git_preservi
>> > ng_modification_time_on_files.3F argues that git isn't preserving
>> > modification times because it needs to ensure that build tools work
>> > properly.
>> >
>> > I agree that modification times should not be restored by default,
>> > because of the principle of least astonishment. But should it be
>> > impossible? The principle of least astonishment does not mandate this;
>> > it is not a paternalistic principle.
>> >
>> > Thus, I do not get at all
>> > - why git doesn't *store* modification times, perhaps by default, but
>> > at least on request
>> > - why git doesn't restore modification times *on request*
>> >
>> > It is pretty annoying that git cannot, even if I know what I am doing,
>> > and explicitly want it to, preserve the modification time.
>> >
>> > One use case: I have lots of file lying around in my build directory
>> > and for some of them, the modification time in important information
>> > to me. Those files are not at all used with the build tool. In
>> > contrast to git pull, git pull --rebase needs those to be stashed. But
>> > after the pull and unstash, the mtime is gone. Boo.
>> >
>> > Please provide options to store and restore modification times. It
>> > shouldn't be hard to do, given that other metadata such as the mode is
>> > already stored. It would make live so much easier. And the fact that
>> > this has made into the FAQ clearly suggests that there are many others
>> > who think so.
>>
>> Since you already assessed that it shouldn't be hard to do, you probably
>> want to put your money where your mouth is and come up with a patch, and
>> then offer it up for discussion on this here mailing list.
>
> Putting my large-production-user hat on, there are (at least) three
> conditions that exist in this space:
>
> 1. Build systems - this typically need the file modification time to be set
> to the time at which git touches a file (e.g., checkout). This permits build
> systems to detect that files are modified (even if an older version is
> checked out, make, for example, still needs to see the change to initiate a
> build. My understanding is that current git behaviour is modeled on this use
> case.
>
> 2. Commit linkage - in some environments, files that are checked out are set
> to the timestamp of the commit rather than the original file time or the
> checkout time. This permits a faster production resolution of when changes
> were run through the system as a group. I have implemented this strategy
> (somewhat grudgingly) in a few places. It is a possible desire for some
> users. I particularly dislike this approach because merge/cherry-pick/rebase
> can mess with the preceptive "when" of a change and if you are going to do
> this, make sure that your metadata is suitably managed.
>
> 3. Original file times - as Peter asked, storing the original file time has
> some legacy advantages. This emulates the behaviour of some legacy SCM
> systems and makes people feel better about things. From an audit point of
> view, this has value for systems other than git. In git, you use the
> hash-object to figure out what the file really is, so there is no real audit
> need anymore for timestamps, which can be spoofed at whim anyway. The
> hash-object comment applies to 2 also. Same comment here for dealing with
> non-touching but modifying. For example: what is the timestamp on a
> merge-squash? I would contend that it is the time of the merge-squash, not
> the original time. It could also be an interim term, when a conflict was
> resolved.
>
> Just remember that #2 and #3 break #1, unless you essentially rebuild from
> scratch in every build (ant/maven models). With that said, I seen many repo
> admins who want all of the above, so making them all available would make
> their lives easier.

Aside from exactly which modification times should be used (which I
would love to have a bit more control over as well), something else
I'd like to see is that, when switching between branches, files that
are the same on both branches should not have their modification time
changed.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 22:08   ` Peter Backes
@ 2018-02-20  1:22     ` Theodore Ts'o
  2018-02-20 10:46     ` Johannes Schindelin
  1 sibling, 0 replies; 28+ messages in thread
From: Theodore Ts'o @ 2018-02-20  1:22 UTC (permalink / raw)
  To: Peter Backes; +Cc: Johannes Schindelin, git

On Mon, Feb 19, 2018 at 11:08:19PM +0100, Peter Backes wrote:
> Is thetre some existing code that could be used? I think I read 
> somewhere that git once did preserve mtimes, but that this code was 
> removed because of the build tool issues. Perhaps that code could 
> simply be put back in, and surrounded by conditions.

I don't believe that was ever true, because the mod times is simply
not *stored* anywhere.

You might want to consider trying to implement it as hook scripts
first, and see how well/poorly it works for you.  I do have a use
case, which is to maintain the timestamps for guilt (a quilt-like
patch management system which uses git).  At the moment I just use a
manual script, save-timestamps, which looks like this:

#!/bin/sh
stat -c "touch -d @%Y %n" * | sort -k 3 | grep -v "~$" | sort -k3 > timestamps

and then I just include the timestamps file in thhe commit.  When I
unpack the file elsewhere, I just run the command ". timestamps", or
if I am manually editing a single file, I might do:

	grep file-name-of-patch timestamps | sht

This works because the timestamps file has lines which look like
this:

touch -d @1519007593 jbd2-clarify-recovery-checksum-error-msg

I've been too lazy to automate this using a "pre-commit" and
"post-checkout" hook, but it *really* wouldn't be that hard.  Right
now it also only works for files in the top-level of the repo, which
is all I have in my guilt patch repo.  Making this work in a
multiple-directory environment is also left as an exercise to the
reader.  :-)

Cheers,

						- Ted

P.S.  Also left to the reader is making it work on legacy OS's like
Windows.  :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 22:08   ` Peter Backes
  2018-02-20  1:22     ` Theodore Ts'o
@ 2018-02-20 10:46     ` Johannes Schindelin
  2018-02-20 11:53       ` Peter Backes
  2018-02-20 21:05       ` Peter Backes
  1 sibling, 2 replies; 28+ messages in thread
From: Johannes Schindelin @ 2018-02-20 10:46 UTC (permalink / raw)
  To: Peter Backes; +Cc: git

Hi Peter,

On Mon, 19 Feb 2018, Peter Backes wrote:

> On Mon, Feb 19, 2018 at 10:58:12PM +0100, Johannes Schindelin wrote:
> > Since you already assessed that it shouldn't be hard to do, you
> > probably want to put your money where your mouth is and come up with a
> > patch, and then offer it up for discussion on this here mailing list.
> 
> Well, it would be good to discuss this a bit beforehand, since my time 
> is wasted if there's no chance to get it accepted. Perhaps there is 
> some counterargument I don't know about.

Oh, sorry. I understood your mail as if you had told the core Git
developers that they should implement the feature you desire. I did not
understand that you hinted at a discussion first, and that you would then
go and implement the feature you asked for.

> Is there some existing code that could be used? I think I read 
> somewhere that git once did preserve mtimes, but that this code was 
> removed because of the build tool issues. Perhaps that code could 
> simply be put back in, and surrounded by conditions.

I don't think that code was ever there. Maybe you heard about some file
mode being preserved overzealously (we stored the octal file mode
verbatim, but then decided to store only 644 or 755).

(This is to add to Theodore's reply, giving a bit more depth.)

As you can see from the code decoding a tree entry:

https://github.com/git-for-windows/git/blob/e1848984d/tree-walk.c#L25-L52

there is no mtime at all in the on-disk format of tree objects. There is
the hash, the mode, and the file name.

As your main use case would be stashing and unstashing (which uses tree
objects as storage format), this means you would have to find a different
way to store the information you desire.

If I were you, and if I had the time to implement this feature, I would go
about it by adding a note (using `git notes` from a script first, but only
for proof of concept, because I saw too many things go wrong with Unix
shell scripts in production) for the tree object, say, in
refs/notes/mtimes. I would probably invent a file format
(`<mtime><TAB><path><LF>`) to store the information, and for starters I
would only store the mtimes of the files that were stashed, then extend
the script into a full Git builtin with a subcommand that can generates
these notes, a subcommand to replay them, and a subcommand to inspect
them.

Then I would extend `git-stash.sh` to take an option (and later, to heed a
new config setting to do this automatically) to generate those mtime notes
for the newest stash's top-level tree object (storing only the times of
the files that were modified by the `stash` command), and to replay them
if such an mtime note is found for the stash that is being applied.

You will not be able to convince the core Git developers to make this the
default, I don't think. But if you make it an opt-in as I outlined above,
I believe your chances would be good to get that feature if you put in the
effort to implement it.

Oh, and if you implement the feature using notes, the same feature can be
used not only for stashing and unstashing. These notes are maintained in
regular Git refs, i.e. they can be shared. And since those notes would be
for tree objects, you could even apply the mtimes on a fresh clone, if
you have a use case for that.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 10:46     ` Johannes Schindelin
@ 2018-02-20 11:53       ` Peter Backes
  2018-02-20 21:05       ` Peter Backes
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Backes @ 2018-02-20 11:53 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hello Johannes,

On Tue, Feb 20, 2018 at 11:46:38AM +0100, Johannes Schindelin wrote:
> Oh, sorry. I understood your mail as if you had told the core Git
> developers that they should implement the feature you desire. I did not
> understand that you hinted at a discussion first, and that you would then
> go and implement the feature you asked for.

Well, sorry for being misunderstandable. It was my impression from the 
FAQ that the reason for why this feature doesn't exist was a strong 
opinion that it would cause technical problems. The FAQ doesn't mention 
anything like a lack of manpower. As I stated it was my 
impression that this feature would not be too hard to implement.

Because of this my email presupposed it was not manpower that prevented 
this feature.

My statement "Please provide options" was thus targeted at reviewing 
and discussing the perceived technical reasons for not implementing 
this feature at least as an option. It wasn't supposed to demand free 
lunch from anyone.

Of course I can offer to do some work to the best of my abilitites if 
that's the issue. That should go without saying for Free Software 
projects. Perhaps even my employer would be happy to pay me for 
implementing the feature during workign hours. This shouldn't be the 
issue. The issue is the seemingly dogmatic reply in the FAQ which makes 
me reluctant to put work into this in fear that a patch submission 
would be met with strong rejection.

> You will not be able to convince the core Git developers to make this the
> default, I don't think.

I have stressed very clearly in my mail that I am not asking the 
defaults about mtime restoring to be changed. I agree that those 
defaults are reasonable and in line with the principle of least 
astonishment.

What bugs me is my impression from the FAQ that even as an option, the 
feature might be unwelcome.

Best wishes
Peter
-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 23:22     ` Hilco Wijbenga
@ 2018-02-20 16:42       ` Hilco Wijbenga
  0 siblings, 0 replies; 28+ messages in thread
From: Hilco Wijbenga @ 2018-02-20 16:42 UTC (permalink / raw)
  To: Git Users
  Cc: Johannes Schindelin, Peter Backes, Randall S. Becker, Junio C Hamano

On Mon, Feb 19, 2018 at 3:22 PM, Hilco Wijbenga
<hilco.wijbenga@gmail.com> wrote:
> Aside from exactly which modification times should be used (which I
> would love to have a bit more control over as well), something else
> I'd like to see is that, when switching between branches, files that
> are the same on both branches should not have their modification time
> changed.

As Junio pointed out to me, Git actually already does what I want when
switching branches. To verify, I switched between 5 branches after
setting a specific timestamp on a particular file, and it did not
change throughout the process. Now I'm left wondering when this
changed or whether my memory is faulty. I could have sworn this did
not work previously. :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 10:46     ` Johannes Schindelin
  2018-02-20 11:53       ` Peter Backes
@ 2018-02-20 21:05       ` Peter Backes
  2018-02-20 22:32         ` Johannes Schindelin
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Backes @ 2018-02-20 21:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi Johannes,

On Tue, Feb 20, 2018 at 11:46:38AM +0100, Johannes Schindelin wrote:
> If I were you [...]

It seems all pretty straight forward, except for

> I would probably invent a file format (`<mtime><TAB><path><LF>`)

I'm stuck there because of <path> being munged.

To obtain or set the mtime of the file, I need the unmunged path.

How to get it?

----

What follows is irrelevant for progress.

> I don't think that code was ever there. Maybe you heard about some file
> mode being preserved overzealously (we stored the octal file mode
> verbatim, but then decided to store only 644 or 755).

I'm not sure. I'm not able to find that source anymore, though.

> As you can see from the code decoding a tree entry:
> 
> https://github.com/git-for-windows/git/blob/e1848984d/tree-walk.c#L25-L52
> 
> there is no mtime at all in the on-disk format of tree objects. There is
> the hash, the mode, and the file name.

I didn't comletely get the code in tree-walk.c since the parsing 
architecture seems to pass around pointers via global variables. 
It seems that in addition to hash, mode and file name, the on-disk 
format has at least the object type, see git cat-file -p master^{tree} 
Perhaps I got it wrong.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 21:22 Git should preserve modification times at least on request Peter Backes
  2018-02-19 21:58 ` Johannes Schindelin
@ 2018-02-20 21:16 ` Jeff King
  2018-02-20 22:05   ` Peter Backes
  2018-02-20 23:40   ` Junio C Hamano
  2018-02-21 21:03 ` Derek Fawcus
  2 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2018-02-20 21:16 UTC (permalink / raw)
  To: Peter Backes; +Cc: git

On Mon, Feb 19, 2018 at 10:22:36PM +0100, Peter Backes wrote:

> please ensure to CC me if you reply as I am not subscribed to the list.
> 
> https://git.wiki.kernel.org/index.php/Git_FAQ#Why_isn.27t_Git_preserving_modification_time_on_files.3F 
> argues that git isn't preserving modification times because it needs to 
> ensure that build tools work properly.

I think there are some references buried somewhere in that wiki, but did
you look at any of the third-party tools that store file metadata
alongside the files in the repository? E.g.:

  https://etckeeper.branchable.com/

or

  https://github.com/przemoc/metastore

I didn't see either of those mentioned in this thread (though I also do
not have personal experience with them, either).

Modification times are a subset of the total metadata you might care
about, so they are solving a much more general problem. Which may also
partially answer your question about why this isn't built into git. The
general problem gets much bigger when you start wanting to carry things
like modes (which git doesn't actually track; we really only care about
the executable bit) or extended attributes (acls, etc).

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 21:16 ` Jeff King
@ 2018-02-20 22:05   ` Peter Backes
  2018-02-21  9:48     ` Jacob Keller
  2018-02-20 23:40   ` Junio C Hamano
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Backes @ 2018-02-20 22:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Jeff,

On Tue, Feb 20, 2018 at 04:16:34PM -0500, Jeff King wrote:
> I think there are some references buried somewhere in that wiki, but did
> you look at any of the third-party tools that store file metadata
> alongside the files in the repository? E.g.:
> 
>   https://etckeeper.branchable.com/
> 
> or
> 
>   https://github.com/przemoc/metastore
> 
> I didn't see either of those mentioned in this thread (though I also do
> not have personal experience with them, either).
> 
> Modification times are a subset of the total metadata you might care
> about, so they are solving a much more general problem. Which may also
> partially answer your question about why this isn't built into git. The
> general problem gets much bigger when you start wanting to carry things
> like modes (which git doesn't actually track; we really only care about
> the executable bit) or extended attributes (acls, etc).

I know about those, but that's not what I am looking for. Those tools 
serve entirely different purposes, ie., tracking file system changes. 
I, however, am specifically interested in version control.

In version control, the user checks out his own copy of the tree for 
working. For this purpose, it is thus pointless to track ownership, 
permissions (except for the x bit), xattrs, or any other metadata. In 
fact, it can be considered the wrong thing to do.

The modification time, however, is special. It clearly has its place in 
version control. It tells us when the last modification was actually 
done to the file. I am often working on some feature, and one part is 
finished and is lying around, but I am still working on other parts in 
other files. Then, maybe after some weeks, the other parts are 
finished. Now, when committing, the information about modification time 
is lost. Maybe some weeks later I want to figure out when I last 
modified those files that were committed. But that information is now 
gone, at least in the git repository. Sure, I could do lots of WIP 
commits, but this would clutter up the history unneccessarly and I 
would have lots of versions that might not even compile, let alone run.

As far as I remember, bitkeeper had this distinction between checkins 
and commits. You could check in a file at any time, and any number of 
times, and then group all those checkins together with a commit. Git 
seems to have avoided this principle, or have kept it only 
rudimentarily via git add (but git add cannot add more than one version 
of the same file). Perhaps for simplificiation of use, perhaps for 
simplification of implementation, I don't know.

I assume, if it were not for the build tool issues, git would have 
tracked mtime from the very start.

Best wishes
Peter
-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 21:05       ` Peter Backes
@ 2018-02-20 22:32         ` Johannes Schindelin
  2018-02-20 22:48           ` Peter Backes
  0 siblings, 1 reply; 28+ messages in thread
From: Johannes Schindelin @ 2018-02-20 22:32 UTC (permalink / raw)
  To: Peter Backes; +Cc: git

Hi Peter,

On Tue, 20 Feb 2018, Peter Backes wrote:

> On Tue, Feb 20, 2018 at 11:46:38AM +0100, Johannes Schindelin wrote:
> 
> > I would probably invent a file format (`<mtime><TAB><path><LF>`)
> 
> I'm stuck there because of <path> being munged.

From which command do you want to get it? If you are looking at `git
diff`, you may want to use the `-z --name-only` options to avoid munging
the paths.

Or are you looking at a different source for the paths?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 22:32         ` Johannes Schindelin
@ 2018-02-20 22:48           ` Peter Backes
  2018-02-21 21:30             ` Phillip Wood
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Backes @ 2018-02-20 22:48 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Tue, Feb 20, 2018 at 11:32:23PM +0100, Johannes Schindelin wrote:
> Hi Peter,
> 
> On Tue, 20 Feb 2018, Peter Backes wrote:
> 
> > On Tue, Feb 20, 2018 at 11:46:38AM +0100, Johannes Schindelin wrote:
> > 
> > > I would probably invent a file format (`<mtime><TAB><path><LF>`)
> > 
> > I'm stuck there because of <path> being munged.
> 
> From which command do you want to get it? If you are looking at `git
> diff`, you may want to use the `-z --name-only` options to avoid munging
> the paths.

I plan to use "git diff-tree --name-only $w_tree HEAD" and subtract
all lines from "git diff-index --name-only HEAD" to get the files for 
which the timestamp should be stored..

If I use "-z" I get the non-munged path, but I cannot safely store such 
paths in the proposed file format; they might contain newlines (sigh). 
So at one point I have to munge. Then the same question arises when I 
have to get the actual path from the munged path when restoring the 
timestamps.

If there's no ready-made functionality to munge and unmunge paths, I 
have to write some awk for this. At first I thought this might add one 
more dependency to git, but it seems that awk is already used in 
git-mergetool.sh, so I suppose it's okay to use in git-stash.sh etc, 
too.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 21:16 ` Jeff King
  2018-02-20 22:05   ` Peter Backes
@ 2018-02-20 23:40   ` Junio C Hamano
  1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2018-02-20 23:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Peter Backes, git

Jeff King <peff@peff.net> writes:

> Modification times are a subset of the total metadata you might care
> about, so they are solving a much more general problem. Which may also
> partially answer your question about why this isn't built into git. The
> general problem gets much bigger when you start wanting to carry things
> like modes (which git doesn't actually track; we really only care about
> the executable bit) or extended attributes (acls, etc).

"modes" are interesting, especially when you think about group
permissions, as it would make you design how you store group and
owner, which in turn forces you to think how "peter" on one system
relates to "peter" on another system (answer: there generally isn't
any relationship) ;-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 22:05   ` Peter Backes
@ 2018-02-21  9:48     ` Jacob Keller
  0 siblings, 0 replies; 28+ messages in thread
From: Jacob Keller @ 2018-02-21  9:48 UTC (permalink / raw)
  To: Peter Backes; +Cc: Jeff King, Git mailing list

On Tue, Feb 20, 2018 at 2:05 PM, Peter Backes <rtc@helen.plasma.xg8.de> wrote:
> Hi Jeff,
>
> On Tue, Feb 20, 2018 at 04:16:34PM -0500, Jeff King wrote:
>> I think there are some references buried somewhere in that wiki, but did
>> you look at any of the third-party tools that store file metadata
>> alongside the files in the repository? E.g.:
>>
>>   https://etckeeper.branchable.com/
>>
>> or
>>
>>   https://github.com/przemoc/metastore
>>
>> I didn't see either of those mentioned in this thread (though I also do
>> not have personal experience with them, either).
>>
>> Modification times are a subset of the total metadata you might care
>> about, so they are solving a much more general problem. Which may also
>> partially answer your question about why this isn't built into git. The
>> general problem gets much bigger when you start wanting to carry things
>> like modes (which git doesn't actually track; we really only care about
>> the executable bit) or extended attributes (acls, etc).
>
> I know about those, but that's not what I am looking for. Those tools
> serve entirely different purposes, ie., tracking file system changes.
> I, however, am specifically interested in version control.
>
> In version control, the user checks out his own copy of the tree for
> working. For this purpose, it is thus pointless to track ownership,
> permissions (except for the x bit), xattrs, or any other metadata. In
> fact, it can be considered the wrong thing to do.
>
> The modification time, however, is special. It clearly has its place in
> version control. It tells us when the last modification was actually
> done to the file. I am often working on some feature, and one part is
> finished and is lying around, but I am still working on other parts in
> other files. Then, maybe after some weeks, the other parts are
> finished. Now, when committing, the information about modification time
> is lost. Maybe some weeks later I want to figure out when I last
> modified those files that were committed. But that information is now
> gone, at least in the git repository. Sure, I could do lots of WIP
> commits, but this would clutter up the history unneccessarly and I
> would have lots of versions that might not even compile, let alone run.

You could have git figure this out by the commit time of the last
commit which modified a file. This gets a bit weird for cherry-picks
or other things like rebase, but that should get what you want.

If you only ever need this information sometimes, you can look it up
by doing something like:

git log -1 --pretty="%cd" -- <path to file>

That should show the commit time of the latest commit which touches
that file, which is "essentially" the modify time of the file in terms
of  the version control history.

Obviously, this wouldn't work if you continually amend a change of
multiple files, since git wouldn't track the files separately, and
this only really shows you the time of the last commit.

However, in "version control" sense, this *is* the last time a file
was modified, since it doesn't really care about the stuff that
happens outside of version control.

I'm not really sure if this is enough for you, or if you really want
to store the actual mtime for some reason? (I think you can likely
solve your problem in some other way though).

>
> As far as I remember, bitkeeper had this distinction between checkins
> and commits. You could check in a file at any time, and any number of
> times, and then group all those checkins together with a commit. Git
> seems to have avoided this principle, or have kept it only
> rudimentarily via git add (but git add cannot add more than one version
> of the same file). Perhaps for simplificiation of use, perhaps for
> simplification of implementation, I don't know.
>

You can do lots of commits on a branch and then one merge commit to
merge it into the main line. This is a common strategy used by many
people.

Thanks,
Jake

> I assume, if it were not for the build tool issues, git would have
> tracked mtime from the very start.
>

Maybe. Personally, I would hate having my mtime not be "the time I
checked the file out", since this is intuitive to me at this point.
I'm sure if I lived in a different world I'd be used to that way also,
though.

The build issue *is* important though, because many build systems rely
on the mtime to figure out what to rebuild, and a complete rebuild
isn't a good idea for very large projects.

Thanks,
Jake

> Best wishes
> Peter
> --
> Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-19 21:22 Git should preserve modification times at least on request Peter Backes
  2018-02-19 21:58 ` Johannes Schindelin
  2018-02-20 21:16 ` Jeff King
@ 2018-02-21 21:03 ` Derek Fawcus
  2018-02-21 21:33   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 28+ messages in thread
From: Derek Fawcus @ 2018-02-21 21:03 UTC (permalink / raw)
  To: Peter Backes; +Cc: git

On Mon, Feb 19, 2018 at 10:22:36PM +0100, Peter Backes wrote:
> 
> It is pretty annoying that git cannot, even if I know what I am doing, 
> and explicitly want it to, preserve the modification time.

The use case I've come across where it would be of value is for code
archeology, either importing a bunch of tar files, or importing a
repo from some other VCS.

There preserving the mod times can be useful when one is subsequently
figuring out what changed, and the scope of the 'commits' is too big
(i.e. the granularity of the tar files themselves).

e.g. initial commits are done on tar boundaries, but one may try to
figure out individual changes from a ChangeLog file.  I've done this
a couple of times, but to date it has required keeping the untarred
trees around (or a timestamp list file from each tree), in addition
to the git repro in to which one is then synthesizing smaller commits.

DF

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-20 22:48           ` Peter Backes
@ 2018-02-21 21:30             ` Phillip Wood
  0 siblings, 0 replies; 28+ messages in thread
From: Phillip Wood @ 2018-02-21 21:30 UTC (permalink / raw)
  To: Peter Backes, Johannes Schindelin; +Cc: git

On 20/02/18 22:48, Peter Backes wrote:
> 
> On Tue, Feb 20, 2018 at 11:32:23PM +0100, Johannes Schindelin wrote:
>> Hi Peter,
>>
>> On Tue, 20 Feb 2018, Peter Backes wrote:
>>
>>> On Tue, Feb 20, 2018 at 11:46:38AM +0100, Johannes Schindelin wrote:
>>>
>>>> I would probably invent a file format (`<mtime><TAB><path><LF>`)
>>>
>>> I'm stuck there because of <path> being munged.
>>
>>  From which command do you want to get it? If you are looking at `git
>> diff`, you may want to use the `-z --name-only` options to avoid munging
>> the paths.
> 
> I plan to use "git diff-tree --name-only $w_tree HEAD" and subtract
> all lines from "git diff-index --name-only HEAD" to get the files for
> which the timestamp should be stored..
> 
> If I use "-z" I get the non-munged path, but I cannot safely store such
> paths in the proposed file format; they might contain newlines (sigh).
> So at one point I have to munge. Then the same question arises when I
> have to get the actual path from the munged path when restoring the
> timestamps.
> 
> If there's no ready-made functionality to munge and unmunge paths, I
> have to write some awk for this. At first I thought this might add one
> more dependency to git, but it seems that awk is already used in
> git-mergetool.sh, so I suppose it's okay to use in git-stash.sh etc,
> too.

In recent versions of git there's unquote_path() in Git.pm, you could 
possibly use that with perl -e from your script

Best Wishes

Phillip
> Best wishes
> Peter
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-21 21:03 ` Derek Fawcus
@ 2018-02-21 21:33   ` Ævar Arnfjörð Bjarmason
  2018-02-21 22:14     ` Peter Backes
  0 siblings, 1 reply; 28+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-21 21:33 UTC (permalink / raw)
  To: Derek Fawcus; +Cc: Peter Backes, git


On Wed, Feb 21 2018, Derek Fawcus jotted:

> On Mon, Feb 19, 2018 at 10:22:36PM +0100, Peter Backes wrote:
>>
>> It is pretty annoying that git cannot, even if I know what I am doing,
>> and explicitly want it to, preserve the modification time.
>
> The use case I've come across where it would be of value is for code
> archeology, either importing a bunch of tar files, or importing a
> repo from some other VCS.
>
> There preserving the mod times can be useful when one is subsequently
> figuring out what changed, and the scope of the 'commits' is too big
> (i.e. the granularity of the tar files themselves).
>
> e.g. initial commits are done on tar boundaries, but one may try to
> figure out individual changes from a ChangeLog file.  I've done this
> a couple of times, but to date it has required keeping the untarred
> trees around (or a timestamp list file from each tree), in addition
> to the git repro in to which one is then synthesizing smaller commits.

This sounds like a sensible job for a git import tool, i.e. import a
target directory into git, and instead of 'git add'-ing the whole thing
it would look at the mtimes, sort files by mtime, then add them in order
and only commit those files that had the same mtime in the same commit
(or within some boundary).

The advantage of doing this via such a tool is that you could tweak it
to commit by any criteria you wanted, e.g. not mtime but ctime or even
atime.

You'd get the same thing as you'd get if git's tree format would change
to include mtimes (which isn't going to happen), but with a lot more
flexibility.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-21 21:33   ` Ævar Arnfjörð Bjarmason
@ 2018-02-21 22:14     ` Peter Backes
  2018-02-21 22:44       ` Ævar Arnfjörð Bjarmason
  2018-02-23 12:28       ` Konstantin Khomoutov
  0 siblings, 2 replies; 28+ messages in thread
From: Peter Backes @ 2018-02-21 22:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Derek Fawcus, git

On Wed, Feb 21, 2018 at 10:33:05PM +0100, Ævar Arnfjörð Bjarmason wrote:
> This sounds like a sensible job for a git import tool, i.e. import a
> target directory into git, and instead of 'git add'-ing the whole thing
> it would look at the mtimes, sort files by mtime, then add them in order
> and only commit those files that had the same mtime in the same commit
> (or within some boundary).

I think that this would be The Wrong Thing to do.

The commit time is just that: The time the commit was done. The commit 
is an atomic group of changes to a number of files that hopefully bring 
the tree from one usable state into the next.

The mtime, in contrast, tells us when a file was most recently modified.

It may well be that main.c was most recently modified yesterday, and 
feature.c was modified this morning, and that only both changes taken 
together make sense as a commit, despite the long time in between.

Even worse, it may be that feature A took a long time to implement, so 
we have huge gaps in between the mtimes, but feature B was quickly done 
after A was finished. Such an algorithm would probably split feature A 
incorrectly into several commits, and group the more recently changed 
files of feature A with those of feature B.

And if Feature A and Feature B were developed in parallel, things get 
completely messy.

> The advantage of doing this via such a tool is that you could tweak it
> to commit by any criteria you wanted, e.g. not mtime but ctime or even
> atime.

Maybe, but it would be rather useless to commit by ctime or atime. You 
do one grep -r and the atime is different. You do one chmod or chown 
and the ctime is different. Those timestamps are really only useful for 
very limited purposes.

That ctime exists seems reasonable, since it's only ever updated when 
the inode is written anyway.

atime, in contrast, was clearly one of the rather nonsensical 
innovations of UNIX: Do one write to the disk for each read from the 
disk. C'mon, really? It would have been a lot more reasonable to simply 
provide a generic way for tracing read() system calls instead; then 
userspace could decide what to do with that information and which of it 
is useful and should be kept and perhaps stored on disk. Now we have 
this ugly hack called relatime to deal with the problem.

> You'd get the same thing as you'd get if git's tree format would change
> to include mtimes (which isn't going to happen), but with a lot more
> flexibility.

Well, from basic logic, I don't see how a decision not to implement a 
feature could possibly increase flexility. The opposite seems to be the 
case.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-21 22:14     ` Peter Backes
@ 2018-02-21 22:44       ` Ævar Arnfjörð Bjarmason
  2018-02-21 23:12         ` Peter Backes
  2018-02-22 23:24         ` Derek Fawcus
  2018-02-23 12:28       ` Konstantin Khomoutov
  1 sibling, 2 replies; 28+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-21 22:44 UTC (permalink / raw)
  To: Peter Backes; +Cc: Derek Fawcus, git, Theodore Ts'o


On Wed, Feb 21 2018, Peter Backes jotted:

> On Wed, Feb 21, 2018 at 10:33:05PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> This sounds like a sensible job for a git import tool, i.e. import a
>> target directory into git, and instead of 'git add'-ing the whole thing
>> it would look at the mtimes, sort files by mtime, then add them in order
>> and only commit those files that had the same mtime in the same commit
>> (or within some boundary).
>
> I think that this would be The Wrong Thing to do.

I'm merely pointing out that if you have the use-case Derek Fawcus
describes you can get per-file mtimes via something similar to the the
hook method Theodore Ts'o described today with a simple import tool with
no changes to git or its object format required.

To the extent that there's a convention for this in git that's the
convention, e.g. if you use github or gitlab they'll render the
modification time of a file in the tree view, and that time is the time
of the commit that last touched it: https://github.com/git/git

> The commit time is just that: The time the commit was done. The commit
> is an atomic group of changes to a number of files that hopefully bring
> the tree from one usable state into the next.
>
> The mtime, in contrast, tells us when a file was most recently modified.
>
> It may well be that main.c was most recently modified yesterday, and
> feature.c was modified this morning, and that only both changes taken
> together make sense as a commit, despite the long time in between.
>
> Even worse, it may be that feature A took a long time to implement, so
> we have huge gaps in between the mtimes, but feature B was quickly done
> after A was finished.

...

> Such an algorithm would probably split feature A
> incorrectly into several commits, and group the more recently changed
> files of feature A with those of feature B.

Right, but that's a trade-off you can pick at import time in this
hypothetical tar-to-commits tool, you could decide to do no merging and
suffer to signal loss.

> And if Feature A and Feature B were developed in parallel, things get
> completely messy.
>
>> The advantage of doing this via such a tool is that you could tweak it
>> to commit by any criteria you wanted, e.g. not mtime but ctime or even
>> atime.
>
> Maybe, but it would be rather useless to commit by ctime or atime. You
> do one grep -r and the atime is different. You do one chmod or chown
> and the ctime is different. Those timestamps are really only useful for
> very limited purposes.
>
> That ctime exists seems reasonable, since it's only ever updated when
> the inode is written anyway.
>
> atime, in contrast, was clearly one of the rather nonsensical
> innovations of UNIX: Do one write to the disk for each read from the
> disk. C'mon, really? It would have been a lot more reasonable to simply
> provide a generic way for tracing read() system calls instead; then
> userspace could decide what to do with that information and which of it
> is useful and should be kept and perhaps stored on disk. Now we have
> this ugly hack called relatime to deal with the problem.

Yes, that [ac]time example was a stretch. A better example would be
committing the file mode, or extended attributes, or "this is on a
different FS", or whatever other per-file/dir attribute we're not
currently capturing.

>> You'd get the same thing as you'd get if git's tree format would change
>> to include mtimes (which isn't going to happen), but with a lot more
>> flexibility.
>
> Well, from basic logic, I don't see how a decision not to implement a
> feature could possibly increase flexility. The opposite seems to be the
> case.

I'm not trying to argue the usefulness of this mtime-per-file thing in
theory, just providing Derek Fawcus with a suggestion for a viable
workaround.

What I meant by this offhand comment, and which you may or may not know
(and I see no references to it from skimming the thread) is that there's
simply no space in the tree objects to add *anything* without breaking
the object format and requiring a major upgrade, although the plan to
switch to a new hash function is relevant to this.

Even if we suppose that git was being implemented today I don't think
this would make any sense as a first-level feature.

Empirical evidence suggests that people use git on a massive scale
largely without caring about this, and the users who do have a
workaround.

If it were added as a first-level feature to git it would present a lot
of UX confusion. E.g. you run "git add" and it'll be showing the mtime
somehow, or you get a formatted patch over E-Mail and it doesn't only
include the commit time but also times for individual files.

The VC systems that had this feature in the past were centralized, so
they could (in theory anyway) ensure that timestamps were monotonically
increasing. This won't be the case with git, we have plenty of timestamp
drift in e.g. linux.git and other git repos.

So if these mtimes were used by default they'd interact badly with stuff
like "make" in those cases, because you might check out a modified
version with a timestamp in the past.

Or maybe I've misunderstood how that worked in CVS/SVN/Bitkeeper, but in
any case, I just wanted to point out a workaround (but then digressed
into critiquing the idea above...).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-21 22:44       ` Ævar Arnfjörð Bjarmason
@ 2018-02-21 23:12         ` Peter Backes
  2018-02-21 23:58           ` Randall S. Becker
  2018-02-22 23:24         ` Derek Fawcus
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Backes @ 2018-02-21 23:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derek Fawcus, git, Theodore Ts'o

On Wed, Feb 21, 2018 at 11:44:13PM +0100, Ævar Arnfjörð Bjarmason wrote:
> If it were added as a first-level feature to git it would present a lot
> of UX confusion. E.g. you run "git add" and it'll be showing the mtime
> somehow, or you get a formatted patch over E-Mail and it doesn't only
> include the commit time but also times for individual files.

But that's pretty standard. patch format has timestamp fields for 
precisely this purpose:

% echo a > x  
% echo b > y
% diff -u x y
--- x	2018-02-21 23:56:29.574029523 +0100
+++ y	2018-02-21 23:56:31.430003389 +0100
@@ -1 +1 @@
-a
+b

At present, git simply leaves those fields blank...

> The VC systems that had this feature in the past were centralized, so
> they could (in theory anyway) ensure that timestamps were monotonically
> increasing. This won't be the case with git, we have plenty of timestamp
> drift in e.g. linux.git and other git repos.

I don't see where monotonicity would be an issue any more than it is 
for centralized version control systems.

Even in the centralized setting, monotonicity is not guaranteed, since 
you might have local timestamps deviating from the repository; you 
might have added a line, compiled, and removed it again later on, 
without running make again. Now if you checkout changes from the 
repository, and it sets the timestamp, that timestamp might be older 
than before the compile, and the file would not be rebuilt if you run 
make. So you cannot avoid those issues in centralized setttings either.

> So if these mtimes were used by default they'd interact badly with stuff
> like "make" in those cases, because you might check out a modified
> version with a timestamp in the past.

That's very clearly the case, and I have stressed in my initial email 
that I fully agree with the reasoning of the FAQ in this regard. It is, 
however, merely an argument against *restoring* the timestamps *by 
default*, to comply with the principle of least astonishment. It is, by 
itself, not an argument against *storing* the timestamps, let alone 
against restoring them *on request*.

For the initial checkout, it should not even be harmful to restore the 
timestamps by default.

> any case, I just wanted to point out a workaround (but then digressed
> into critiquing the idea above...).

Well, Johannes's proposed solution seems pretty reasonable and 
realistic to me.  Thanks to Phillip's hint about unquote_path() in 
Git.pm it seems I now have all the needed ingredients to implement this 
feature.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Git should preserve modification times at least on request
  2018-02-21 23:12         ` Peter Backes
@ 2018-02-21 23:58           ` Randall S. Becker
  2018-02-22  2:05             ` 'Peter Backes'
  0 siblings, 1 reply; 28+ messages in thread
From: Randall S. Becker @ 2018-02-21 23:58 UTC (permalink / raw)
  To: 'Peter Backes', 'Ævar Arnfjörð Bjarmason'
  Cc: 'Derek Fawcus', git, 'Theodore Ts'o'

On February 21, 2018 6:13 PM, Peter Backes wrote:
> On Wed, Feb 21, 2018 at 11:44:13PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > If it were added as a first-level feature to git it would present a
> > lot of UX confusion. E.g. you run "git add" and it'll be showing the
> > mtime somehow, or you get a formatted patch over E-Mail and it doesn't
> > only include the commit time but also times for individual files.
> 
> But that's pretty standard. patch format has timestamp fields for
precisely
> this purpose:
> 
> % echo a > x
> % echo b > y
> % diff -u x y
> --- x	2018-02-21 23:56:29.574029523 +0100
> +++ y	2018-02-21 23:56:31.430003389 +0100

May I suggest storing the date/time in UTC+0 in all cases. I can see
potential issues a couple of times a year where holes exist. I cannot even
fathom what would happen on a merge or edit of history.

Cheers,
Randall


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-21 23:58           ` Randall S. Becker
@ 2018-02-22  2:05             ` 'Peter Backes'
  2018-02-26 10:56               ` Andreas Krey
  0 siblings, 1 reply; 28+ messages in thread
From: 'Peter Backes' @ 2018-02-22  2:05 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Ævar Arnfjörð Bjarmason',
	'Derek Fawcus', git, 'Theodore Ts'o'

On Wed, Feb 21, 2018 at 06:58:34PM -0500, Randall S. Becker wrote:
> May I suggest storing the date/time in UTC+0 in all cases. I can see
> potential issues a couple of times a year where holes exist. I cannot even
> fathom what would happen on a merge or edit of history.

I consider storing the timestamp simply in the traditional 
seconds-since-epoch UNIX timestamp format. But I'm not entirely sure 
yet (see below).

If a timestamp includes the offset, there shouldn't be any issue with 
holes. UTC+0 is nice, too, of course, though some might want to 
preserve the timezone in which the timestamp was actually created.

The bigger issue is usually to copy with those pesky leap seconds. It 
makes a difference whether one uses solar seconds ("posix" style; those 
are more commonly seen) or atomic seconds ("right" style) for the UNIX 
timestamp. Those differences accumulate over time, so you can have 
almost half a minute delta if you are not careful with timestamp 
conversion. If I remember correctly, rcs uses some rather awkward 
interative convergence algorithm to portably convert from 
human-readable date and time to UNIX timestamps.

Thus I'm still not sure whether it will be a UNIX-format timestamp or 
whether a human-readable date/time might be preferrable.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-21 22:44       ` Ævar Arnfjörð Bjarmason
  2018-02-21 23:12         ` Peter Backes
@ 2018-02-22 23:24         ` Derek Fawcus
  1 sibling, 0 replies; 28+ messages in thread
From: Derek Fawcus @ 2018-02-22 23:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Peter Backes, git, Theodore Ts'o

On Wed, Feb 21, 2018 at 11:44:13PM +0100, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Feb 21 2018, Peter Backes jotted:
> > On Wed, Feb 21, 2018 at 10:33:05PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> This sounds like a sensible job for a git import tool, i.e. import a
> >> target directory into git, and instead of 'git add'-ing the whole thing
> >> it would look at the mtimes, sort files by mtime, then add them in order
> >> and only commit those files that had the same mtime in the same commit
> >> (or within some boundary).
> >
> > I think that this would be The Wrong Thing to do.

Agreed, but probably for a different reason.

> I'm merely pointing out that if you have the use-case Derek Fawcus
> describes you can get per-file mtimes via something similar to the the
> hook method Theodore Ts'o described today with a simple import tool with
> no changes to git or its object format required.

Actually, I was not proposing any change to the git objects.
I was simply suggesting a case where I'd have found a optional mechanism
for mtime restoration useful.

What would be useful is a better version of the hook based scheme which
Ted mentioned.  The import could be via a wrapper script, but checkouts
would have to be via a hook such that the original timestamps could then
be applied; and those stamps would have to be part of the tar-file commit.

The idea of automatically generating a bunch of commits in time order
would be the wrong thing here. That is because one file could well
contain changes from more than one logical commit (as guided by the
Changelog), and that one logical commit can be spread across a few
files with diffrent mode time, one has to manually tease those apart.

So here the purpose behind restoring the timestamps is as an aid in
guiding the examination of files to find the changes referenced in
the Changelog.

Git is quite useful for this sort of effort, as once a sensible commit
has been synthsized, rebase of the next tar-file commit then helps
reveal the next set of changes.

So what I'm thinking of is for stuff like this: https://github.com/DoctorWkt/unix-jun72
(and the other repros there), where one wishes to figure out and
regenerate a history of changes.  Since git is quite useful for
representing the end result, it is just that other scripting
may make it easier to use for such cases.

DF

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-21 22:14     ` Peter Backes
  2018-02-21 22:44       ` Ævar Arnfjörð Bjarmason
@ 2018-02-23 12:28       ` Konstantin Khomoutov
  1 sibling, 0 replies; 28+ messages in thread
From: Konstantin Khomoutov @ 2018-02-23 12:28 UTC (permalink / raw)
  To: Peter Backes; +Cc: Ævar Arnfjörð Bjarmason, Derek Fawcus, git

On Wed, Feb 21, 2018 at 11:14:20PM +0100, Peter Backes wrote:

[...]
> atime, in contrast, was clearly one of the rather nonsensical 
> innovations of UNIX: Do one write to the disk for each read from the 
> disk. C'mon, really? It would have been a lot more reasonable to simply 
> provide a generic way for tracing read() system calls instead; then 
> userspace could decide what to do with that information and which of it 
> is useful and should be kept and perhaps stored on disk. Now we have 
> this ugly hack called relatime to deal with the problem.
[...]

IIUC, the purpose of atime can be more apparent if you consider it in
the context of the time it appeared: the systems were multi-user but the
disks were small, so a question "what files are lying there but appear
to be unused" was rather sensical to ask as such files could be found,
reported and then considered for deletion of moving off rotating media
to tapes etc.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-22  2:05             ` 'Peter Backes'
@ 2018-02-26 10:56               ` Andreas Krey
  2018-02-26 11:04                 ` 'Peter Backes'
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Krey @ 2018-02-26 10:56 UTC (permalink / raw)
  To: 'Peter Backes'
  Cc: Randall S. Becker,
	'Ævar Arnfjörð Bjarmason',
	'Derek Fawcus', git, 'Theodore Ts'o'

On Thu, 22 Feb 2018 03:05:35 +0000, 'Peter Backes' wrote:
...
> The bigger issue is usually to copy with those pesky leap seconds. It 
> makes a difference whether one uses solar seconds ("posix" style; those 
> are more commonly seen) or atomic seconds ("right" style) for the UNIX 
> timestamp.

Is there any system, unix or otherwise, that uses 'right'-style seconds,
i.e. TAI, as its base?

(I.e. one where (time(0)%60) does not indicate the current position
of the second hand of an accurate clock?)

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git should preserve modification times at least on request
  2018-02-26 10:56               ` Andreas Krey
@ 2018-02-26 11:04                 ` 'Peter Backes'
  0 siblings, 0 replies; 28+ messages in thread
From: 'Peter Backes' @ 2018-02-26 11:04 UTC (permalink / raw)
  To: Andreas Krey
  Cc: Randall S. Becker,
	'Ævar Arnfjörð Bjarmason',
	'Derek Fawcus', git, 'Theodore Ts'o'

On Mon, Feb 26, 2018 at 11:56:42AM +0100, Andreas Krey wrote:
> > The bigger issue is usually to copy with those pesky leap seconds. It 
> > makes a difference whether one uses solar seconds ("posix" style; those 
> > are more commonly seen) or atomic seconds ("right" style) for the UNIX 
> > timestamp.
> 
> Is there any system, unix or otherwise, that uses 'right'-style seconds,
> i.e. TAI, as its base?

Most certainly there is. This depends on the individual configuration 
of the system. On my Fedora system, the commonly used tzdata package 
off the shelf contains support for 'right' style versions of all 
timezones in /usr/share/zoneinfo/right If the user links one of those 
timezones to /etc/localtime or manually specifies them (like 
TZ=right/Europe/Berlin ls -l) they will be used.

You don't find a lot of those systems today, but those who used to use 
the 'right' timestamps might for legacy reasons explicitly configure 
their system to use those timezone variants. I personally did this for 
a number of years, but then converted the filesystems timestamps to 
'posix' and I am now exclusively using 'posix' ones.

Best wishes
Peter
-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-02-26 11:28 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-19 21:22 Git should preserve modification times at least on request Peter Backes
2018-02-19 21:58 ` Johannes Schindelin
2018-02-19 22:08   ` Peter Backes
2018-02-20  1:22     ` Theodore Ts'o
2018-02-20 10:46     ` Johannes Schindelin
2018-02-20 11:53       ` Peter Backes
2018-02-20 21:05       ` Peter Backes
2018-02-20 22:32         ` Johannes Schindelin
2018-02-20 22:48           ` Peter Backes
2018-02-21 21:30             ` Phillip Wood
2018-02-19 22:37   ` Randall S. Becker
2018-02-19 23:22     ` Hilco Wijbenga
2018-02-20 16:42       ` Hilco Wijbenga
2018-02-20 21:16 ` Jeff King
2018-02-20 22:05   ` Peter Backes
2018-02-21  9:48     ` Jacob Keller
2018-02-20 23:40   ` Junio C Hamano
2018-02-21 21:03 ` Derek Fawcus
2018-02-21 21:33   ` Ævar Arnfjörð Bjarmason
2018-02-21 22:14     ` Peter Backes
2018-02-21 22:44       ` Ævar Arnfjörð Bjarmason
2018-02-21 23:12         ` Peter Backes
2018-02-21 23:58           ` Randall S. Becker
2018-02-22  2:05             ` 'Peter Backes'
2018-02-26 10:56               ` Andreas Krey
2018-02-26 11:04                 ` 'Peter Backes'
2018-02-22 23:24         ` Derek Fawcus
2018-02-23 12:28       ` Konstantin Khomoutov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).