Hi René,

On Tue, 25 Apr 2017, René Scharfe wrote:

> Am 24.04.2017 um 15:57 schrieb Johannes Schindelin:
> > Git v2.9.2 was released in a hurry to accomodate for platforms like
> > Windows, where the `unsigned long` data type is 32-bit even for 64-bit
> > setups.
> > 
> > The quick fix was to simply disable all the testing with "absurd"
> > future dates.
> > 
> > However, we can do much better than that, as we already make use of
> > 64-bit data types internally. There is no good reason why we should
> > not use the same for timestamps. Hence, let's use uintmax_t for
> > timestamps.
> > 
> > Note: while the `time_t` data type exists and is meant to be used for
> > timestamps, on 32-bit Linux it is *still* 32-bit. An earlier iteration
> > used `time_t` for that reason, but it came with a few serious
> > downsides: as `time_t` can be signed (and indeed, on Windows it is an
> > int64_t), Git's expectation that 0 is the minimal value does no longer
> > hold true, introducing its own set of interesting challenges. Besides,
> > if we *can* handle far in the future timestamps (except for formatting
> > them using the system libraries), it is more consistent to do so.
> 
> time_t is signed on Linux and BSDs as well.

s/is/happens to be/.

The point is: we must not rely on time_t to be signed just because it
*happens* to be the case on the setups to which we have access. We want
Git to be portable, not only "portable to our own setups".

> Using an unsigned type gives us the ability to represent times beyond
> the 292 billion years in the future that int64_t would give us, but
> prevents recording events that occurred before the Epoch.  That doesn't
> sound like a good deal to me -- storing historical works (e.g. law
> texts) with real time stamps is probably more interesting than fixing
> the year 292277026596 problem within this decade.

It sounds like a good deal to me, if the alternative is that *I* have to
patch Git's source code to support signed timestamps, when *I* am probably
the only one in this entire thread who does not even want them.

So could y'all please just stop talking about signed timestamps to me? I
feel that they are really, really start to irritate the hell out of me.

> > The upside of using `uintmax_t` for timestamps is that we do a much
> > better job to support far in the future timestamps across all
> > platforms, including 32-bit ones. The downside is that those platforms
> > that use a 32-bit `time_t` will barf when parsing or formatting those
> > timestamps.
> 
> IIUC this series has two aims: solving the year 2038 problem on 32-bit
> Linux by replacing time_t (int32_t), and solving the year 2106 problem
> on Windows by replacing unsigned long (uint32_t), right?

No. The series has one aim: to stop using `unsigned long` (which is
ill-defined to begin with) for timestamps.

> The latter one sounds more interesting, because 32-bit platforms would
> still be unable to fully use bigger time values as you wrote above.
> 
> Can we leave time_t alone and just do the part where you replace
> unsigned long with timestamp_t defined as uint64_t?  That should already
> help on Windows, correct?  When/if timestamp_t is later changed to a
> signed type then we could easily convert the time_t cases to timestamp_t
> as well, or the other way around.

This patch series leaves time_t alone already, so your wish has been
fulfilled preemptively.

> > This iteration makes the date_overflows() check more stringent again.
> > 
> > It is arguably a bug to paper over too-large author/committer dates and
> > to replace them with Jan 1 1970 without even telling the user that we do
> > that, but this is the behavior that t4212 verifies, so I reinstated that
> > behavior. The change in behavior was missed because of the missing
> > unsigned_add_overflows() test.
> 
> I can't think of many ways to get future time stamps (broken clock,
> broken CMOS battery, bit rot, time travel), so I wouldn't expect a
> change towards better error reporting to affect a lot of users.  (Not
> necessarily as part of this series, of course.)

If you want to suggest that we should stop verifying overflows when a
complex reasoning can prove that the overflow is not happening in a
billion years, I disagree. Not only is it unnecessarily time-consuming to
ask readers to perform the complex reasoning, and not only is there enough
room for bugs to hide in plain sight (because of the complexity), it also
makes the same code harder to reuse in other software where a different
timestamp data type was chosen (or inherited from previous Git versions).

I'd much rather have easy-to-reason code that does not cause head
scratching (like the "why do we ignore a too large timestamp?" triggering
`if (date_overflows(date)) date = 0;`) than pretending to be smart and
clever and make everybody else feel stupid by forcing them through hoops
of thinking bubbles until they also reached the conclusion that this
actually won't happen. Unless there is a bug in the code.

> >   Documentation/technical/api-parse-options.txt |   8 +-
> >   archive-tar.c                                 |   5 +-
> >   archive-zip.c                                 |  12 ++-
> >   archive.h                                     |   2 +-
> >   builtin/am.c                                  |   4 +-
> >   builtin/blame.c                               |  14 ++--
> >   builtin/fsck.c                                |   6 +-
> >   builtin/gc.c                                  |   2 +-
> >   builtin/log.c                                 |   4 +-
> >   builtin/merge-base.c                          |   2 +-
> >   builtin/name-rev.c                            |   6 +-
> >   builtin/pack-objects.c                        |   4 +-
> >   builtin/prune.c                               |   4 +-
> >   builtin/receive-pack.c                        |  14 ++--
> >   builtin/reflog.c                              |  24 +++---
> >   builtin/rev-list.c                            |   2 +-
> >   builtin/rev-parse.c                           |   2 +-
> >   builtin/show-branch.c                         |   4 +-
> >   builtin/worktree.c                            |   4 +-
> >   bundle.c                                      |   4 +-
> >   cache.h                                       |  14 ++--
> >   commit.c                                      |  18 ++--
> >   commit.h                                      |   2 +-
> >   config.c                                      |   2 +-
> >   credential-cache--daemon.c                    |  12 +--
> >   date.c                                        | 113
> >   ++++++++++++++------------
> >   fetch-pack.c                                  |   8 +-
> >   fsck.c                                        |   2 +-
> >   git-compat-util.h                             |   9 ++
> >   http-backend.c                                |   4 +-
> >   parse-options-cb.c                            |   4 +-
> >   pretty.c                                      |   4 +-
> >   reachable.c                                   |   9 +-
> >   reachable.h                                   |   4 +-
> >   ref-filter.c                                  |  22 ++---
> >   reflog-walk.c                                 |   8 +-
> >   refs.c                                        |  14 ++--
> >   refs.h                                        |   8 +-
> >   refs/files-backend.c                          |   8 +-
> >   revision.c                                    |   6 +-
> >   revision.h                                    |   4 +-
> >   sha1_name.c                                   |   6 +-
> >   t/helper/test-date.c                          |  18 ++--
> >   t/helper/test-parse-options.c                 |   4 +-
> >   t/helper/test-ref-store.c                     |   4 +-
> >   t/t0006-date.sh                               |   4 +-
> >   t/t5000-tar-tree.sh                           |   6 +-
> >   t/test-lib.sh                                 |   3 +
> >   tag.c                                         |   6 +-
> >   tag.h                                         |   2 +-
> >   upload-pack.c                                 |   8 +-
> >   vcs-svn/fast_export.c                         |   8 +-
> >   vcs-svn/fast_export.h                         |   4 +-
> >   vcs-svn/svndump.c                             |   2 +-
> >   wt-status.c                                   |   2 +-
> >   55 files changed, 260 insertions(+), 219 deletions(-)
> 
> How did you find all the pieces of code that need to be touched?

Pain and suffering.

Seriosly, for v1 of this patch series, I went painstakingly through `git
grep -Ovi "unsigned long"`. I determined for every single use of `unsigned
long` whether it referred to a timestamp and changed it accordingly.

In subsequent iterations, I went the cheaper route of compiling with
DEVELOPER=1 on Windows and once I even went through replacing the 64-bit
libraries and compiler/linker in my Linux VM with 32-bit ones to imitate
the 32-bit Linux Travis coordinate (because I failed to get the Docker
setup to run in the VM). These build runs identified new callers of
functions whose signature I had to change to avoid `unsigned long` for
timestamps.

This was no fun.

> Is there a regex or something that can be used to spot new such places
> that sneak in, e.g. through in-flight merges?

No, a regex would only work if we already had converted all `unsigned
long` uses to semantically meaningful data types.

Ciao,
Dscho

P.S.: Please remove the quoted interdiff when there is no reason to keep
it around.