All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] handle bogus commit dates
@ 2014-02-24  7:33 Jeff King
  2014-02-24  7:36 ` [PATCH 1/5] t4212: test bogus timestamps with git-log Jeff King
                   ` (4 more replies)
  0 siblings, 5 replies; 48+ messages in thread
From: Jeff King @ 2014-02-24  7:33 UTC (permalink / raw)
  To: git

This series improves our handling of commit dates that are numbers, but
are ridiculously large. The most critical one is the final commit, which
fixes a segfault, but the others clean up various corner cases.

  [1/5]: t4212: test bogus timestamps with git-log
  [2/5]: fsck: report integer overflow in author timestamps
  [3/5]: date: check date overflow against time_t
  [4/5]: pretty: handle integer overflow in timestamps
  [5/5]: log: do not segfault on gmtime errors

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/5] t4212: test bogus timestamps with git-log
  2014-02-24  7:33 [PATCH 0/5] handle bogus commit dates Jeff King
@ 2014-02-24  7:36 ` Jeff King
  2014-02-24  7:39 ` [PATCH 2/5] fsck: report integer overflow in author timestamps Jeff King
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-02-24  7:36 UTC (permalink / raw)
  To: git

When t4212 was originally added by 9dbe7c3d (pretty: handle
broken commit headers gracefully, 2013-04-17), it tested our
handling of commits with broken ident lines in which the
timestamps could not be parsed. It does so using a bogus line
like "Name <email>-<> 1234 -0000", because that simulates an
error that was seen in the wild.

Later, 03818a4 (split_ident: parse timestamp from end of
line, 2013-10-14) made our parser smart enough to actually
find the timestamp on such a line, and t4212 was adjusted to
match. While it's nice that we handle this real-world case,
this meant that we were not actually testing the
bogus-timestamp case anymore.

This patch adds a test with a totally incomprehensible
timestamp to make sure we are testing the code path.

Note that the behavior is slightly different between regular log
output and "--format=%ad". In the former case, we produce a
sentinel value and in the latter, we produce an empty
string. While at first this seems unnecessarily
inconsistent, it matches the original behavior given by
9dbe7c3d.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t4212-log-corrupt.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 93c7c36..83de981 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -44,4 +44,25 @@ test_expect_success 'git log --format with broken author email' '
 	test_cmp expect.err actual.err
 '
 
+munge_author_date () {
+	git cat-file commit "$1" >commit.orig &&
+	sed "s/^\(author .*>\) [0-9]*/\1 $2/" <commit.orig >commit.munge &&
+	git hash-object -w -t commit commit.munge
+}
+
+test_expect_success 'unparsable dates produce sentinel value' '
+	commit=$(munge_author_date HEAD totally_bogus) &&
+	echo "Date:   Thu Jan 1 00:00:00 1970 +0000" >expect &&
+	git log -1 $commit >actual.full &&
+	grep Date <actual.full >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'unparsable dates produce sentinel value (%ad)' '
+	commit=$(munge_author_date HEAD totally_bogus) &&
+	echo >expect &&
+	git log -1 --format=%ad $commit >actual
+	test_cmp expect actual
+'
+
 test_done
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/5] fsck: report integer overflow in author timestamps
  2014-02-24  7:33 [PATCH 0/5] handle bogus commit dates Jeff King
  2014-02-24  7:36 ` [PATCH 1/5] t4212: test bogus timestamps with git-log Jeff King
@ 2014-02-24  7:39 ` Jeff King
  2014-02-24  7:39 ` [PATCH 3/5] date: check date overflow against time_t Jeff King
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-02-24  7:39 UTC (permalink / raw)
  To: git

When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.

Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.

Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.

For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).

For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.

Signed-off-by: Jeff King <peff@peff.net>
---
Note that tags don't get checked here, because we do not feed their
ident lines to fsck_ident at all. This is still a step forward, though,
as if we ever teach them to check ident lines, they'll get this new
check automatically.

 fsck.c          | 12 ++++++------
 t/t1450-fsck.sh | 14 ++++++++++++++
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/fsck.c b/fsck.c
index 99c0497..760e072 100644
--- a/fsck.c
+++ b/fsck.c
@@ -245,6 +245,8 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 
 static int fsck_ident(char **ident, struct object *obj, fsck_error error_func)
 {
+	char *end;
+
 	if (**ident == '<')
 		return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing space before email");
 	*ident += strcspn(*ident, "<>\n");
@@ -264,10 +266,11 @@ static int fsck_ident(char **ident, struct object *obj, fsck_error error_func)
 	(*ident)++;
 	if (**ident == '0' && (*ident)[1] != ' ')
 		return error_func(obj, FSCK_ERROR, "invalid author/committer line - zero-padded date");
-	*ident += strspn(*ident, "0123456789");
-	if (**ident != ' ')
+	if (strtoul(*ident, &end, 10) == ULONG_MAX)
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - date causes integer overflow");
+	if (end == *ident || *end != ' ')
 		return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad date");
-	(*ident)++;
+	*ident = end + 1;
 	if ((**ident != '+' && **ident != '-') ||
 	    !isdigit((*ident)[1]) ||
 	    !isdigit((*ident)[2]) ||
@@ -287,9 +290,6 @@ static int fsck_commit(struct commit *commit, fsck_error error_func)
 	int parents = 0;
 	int err;
 
-	if (commit->date == ULONG_MAX)
-		return error_func(&commit->object, FSCK_ERROR, "invalid author/committer line");
-
 	if (memcmp(buffer, "tree ", 5))
 		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
 	if (get_sha1_hex(buffer+5, tree_sha1) || buffer[45] != '\n')
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d730734..8c739c9 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -142,6 +142,20 @@ test_expect_success '> in name is reported' '
 	grep "error in commit $new" out
 '
 
+# date is 2^64 + 1
+test_expect_success 'integer overflow in timestamps is reported' '
+	git cat-file commit HEAD >basis &&
+	sed "s/^\\(author .*>\\) [0-9]*/\\1 18446744073709551617/" \
+		<basis >bad-timestamp &&
+	new=$(git hash-object -t commit -w --stdin <bad-timestamp) &&
+	test_when_finished "remove_object $new" &&
+	git update-ref refs/heads/bogus "$new" &&
+	test_when_finished "git update-ref -d refs/heads/bogus" &&
+	git fsck 2>out &&
+	cat out &&
+	grep "error in commit $new.*integer overflow" out
+'
+
 test_expect_success 'tag pointing to nonexistent' '
 	cat >invalid-tag <<-\EOF &&
 	object ffffffffffffffffffffffffffffffffffffffff
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/5] date: check date overflow against time_t
  2014-02-24  7:33 [PATCH 0/5] handle bogus commit dates Jeff King
  2014-02-24  7:36 ` [PATCH 1/5] t4212: test bogus timestamps with git-log Jeff King
  2014-02-24  7:39 ` [PATCH 2/5] fsck: report integer overflow in author timestamps Jeff King
@ 2014-02-24  7:39 ` Jeff King
  2014-02-24  7:46 ` [PATCH 4/5] log: handle integer overflow in timestamps Jeff King
  2014-02-24  7:49 ` [PATCH 5/5] log: do not segfault on gmtime errors Jeff King
  4 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-02-24  7:39 UTC (permalink / raw)
  To: git

When we check whether a timestamp has overflowed, we check
only against ULONG_MAX, meaning that strtoul has overflowed.
However, we also feed these timestamps to system functions
like gmtime, which expect a time_t. On many systems, time_t
is actually smaller than "unsigned long" (e.g., because it
is signed), and we would overflow when using these
functions.  We don't know the actual size or signedness of
time_t, but we can easily check for truncation with a simple
assignment.

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h |  1 +
 date.c  | 17 +++++++++++++++++
 fsck.c  |  2 +-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index dc040fb..e405598 100644
--- a/cache.h
+++ b/cache.h
@@ -959,6 +959,7 @@ void datestamp(char *buf, int bufsize);
 unsigned long approxidate_careful(const char *, int *);
 unsigned long approxidate_relative(const char *date, const struct timeval *now);
 enum date_mode parse_date_format(const char *format);
+int date_overflows(unsigned long date);
 
 #define IDENT_STRICT	       1
 #define IDENT_NO_DATE	       2
diff --git a/date.c b/date.c
index 83b4166..90b28f7 100644
--- a/date.c
+++ b/date.c
@@ -1113,3 +1113,20 @@ unsigned long approxidate_careful(const char *date, int *error_ret)
 	gettimeofday(&tv, NULL);
 	return approxidate_str(date, &tv, error_ret);
 }
+
+int date_overflows(unsigned long t)
+{
+	time_t sys;
+
+	/* If we overflowed our unsigned long, that's bad... */
+	if (t == ULONG_MAX)
+		return 1;
+
+	/*
+	 * ...but we also are going to feed the result to system
+	 * functions that expect time_t, which is often "signed long".
+	 * Make sure that we fit into time_t, as well.
+	 */
+	sys = t;
+	return t != sys || (t < 1) != (sys < 1);
+}
diff --git a/fsck.c b/fsck.c
index 760e072..64bf279 100644
--- a/fsck.c
+++ b/fsck.c
@@ -266,7 +266,7 @@ static int fsck_ident(char **ident, struct object *obj, fsck_error error_func)
 	(*ident)++;
 	if (**ident == '0' && (*ident)[1] != ' ')
 		return error_func(obj, FSCK_ERROR, "invalid author/committer line - zero-padded date");
-	if (strtoul(*ident, &end, 10) == ULONG_MAX)
+	if (date_overflows(strtoul(*ident, &end, 10)))
 		return error_func(obj, FSCK_ERROR, "invalid author/committer line - date causes integer overflow");
 	if (end == *ident || *end != ' ')
 		return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad date");
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/5] log: handle integer overflow in timestamps
  2014-02-24  7:33 [PATCH 0/5] handle bogus commit dates Jeff King
                   ` (2 preceding siblings ...)
  2014-02-24  7:39 ` [PATCH 3/5] date: check date overflow against time_t Jeff King
@ 2014-02-24  7:46 ` Jeff King
  2014-02-24 19:50   ` Junio C Hamano
  2014-02-24  7:49 ` [PATCH 5/5] log: do not segfault on gmtime errors Jeff King
  4 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-02-24  7:46 UTC (permalink / raw)
  To: git

If an ident line has a ridiculous date value like (2^64)+1,
we currently just pass ULONG_MAX along to the date code,
which can produce nonsensical dates.

On systems with a signed long time_t (e.g., 64-bit glibc
systems), this actually doesn't end up too bad. The
ULONG_MAX is converted to -1, we apply the timezone field to
that, and the result ends up somewhere between Dec 31, 1969
and Jan 1, 1970.

However, there is still a few good reasons to detect the
overflow explicitly:

  1. On systems where "unsigned long" is smaller than
     time_t, we get a nonsensical date in the future.

  2. Even where it would produce "Dec 31, 1969", it's easier
     to recognize "midnight Jan 1" as a consistent sentinel
     value for "we could not parse this".

  3.  Values which do not overflow strtoul but do overflow a
      signed time_t produce nonsensical values in the past.
      For example, on a 64-bit system with a signed long
      time_t, a timestamp of 18446744073000000000 produces a
      date in 1947.

We also recognize overflow in the timezone field, which
could produce nonsensical results. In this case we show the
parsed date, but in UTC.

Signed-off-by: Jeff King <peff@peff.net>
---
A note on these tests. They are designed for 64-bit systems, but should
run fine on 32-bit systems (they are both just overflow, and the second
test is not doing anything novel that the first is not).

However, the second test relies on finding a value that fits into an
"unsigned long" but does not fit into a time_t. For systems with a
64-bit signed time_t, that's fine. But if a system has an unsigned
64-bit time_t, the test will fail (it will actually produce some value
300 billion years from now).

I'm inclined to include it as-is, as I do not know of any such system
(and it would be kind of lame, since it could not represent dates before
1970). If somebody comes up with such a system, we can allow either
output (we could do it preemptively, but somebody would have to
calculate the exact date/time billions of years in the future; we would
be just as likely to disagree with the system's gmtime about something
silly like leapseconds).

 pretty.c               | 10 ++++++++--
 t/t4212-log-corrupt.sh | 16 ++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/pretty.c b/pretty.c
index 87db08b..3b811ed 100644
--- a/pretty.c
+++ b/pretty.c
@@ -401,8 +401,14 @@ static const char *show_ident_date(const struct ident_split *ident,
 
 	if (ident->date_begin && ident->date_end)
 		date = strtoul(ident->date_begin, NULL, 10);
-	if (ident->tz_begin && ident->tz_end)
-		tz = strtol(ident->tz_begin, NULL, 10);
+	if (date_overflows(date))
+		date = 0;
+	else {
+		if (ident->tz_begin && ident->tz_end)
+			tz = strtol(ident->tz_begin, NULL, 10);
+		if (tz == LONG_MAX || tz == LONG_MIN)
+			tz = 0;
+	}
 	return show_date(date, tz, mode);
 }
 
diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 83de981..ba25a2e 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -65,4 +65,20 @@ test_expect_success 'unparsable dates produce sentinel value (%ad)' '
 	test_cmp expect actual
 '
 
+# date is 2^64 + 1
+test_expect_success 'date parser recognizes integer overflow' '
+	commit=$(munge_author_date HEAD 18446744073709551617) &&
+	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
+	git log -1 --format=%ad $commit >actual &&
+	test_cmp expect actual
+'
+
+# date is 2^64 - 2
+test_expect_success 'date parser recognizes time_t overflow' '
+	commit=$(munge_author_date HEAD 18446744073709551614) &&
+	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
+	git log -1 --format=%ad $commit >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/5] log: do not segfault on gmtime errors
  2014-02-24  7:33 [PATCH 0/5] handle bogus commit dates Jeff King
                   ` (3 preceding siblings ...)
  2014-02-24  7:46 ` [PATCH 4/5] log: handle integer overflow in timestamps Jeff King
@ 2014-02-24  7:49 ` Jeff King
  2014-03-22  9:32   ` René Scharfe
  2014-03-26 11:05   ` Charles Bailey
  4 siblings, 2 replies; 48+ messages in thread
From: Jeff King @ 2014-02-24  7:49 UTC (permalink / raw)
  To: git

Many code paths assume that show_date and show_ident_date
cannot return NULL. For the most part, we handle missing or
corrupt timestamps by showing the epoch time t=0.

However, we might still return NULL if gmtime rejects the
time_t we feed it, resulting in a segfault. Let's catch this
case and just format t=0.

Signed-off-by: Jeff King <peff@peff.net>
---
This test is of questionable portability, since we are depending on
gmtime's arbitrary point to decide that our input is crazy and return
NULL. The value is sufficiently large that I'd expect most to do so,
though, so it may be safe.

On 32-bit systems, of course, the test does nothing (it is just hitting
the integer overflow code path). But that's OK, since the output is the
same for both cases.

 date.c                 | 6 ++++--
 t/t4212-log-corrupt.sh | 8 ++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/date.c b/date.c
index 90b28f7..e1a2cee 100644
--- a/date.c
+++ b/date.c
@@ -184,8 +184,10 @@ const char *show_date(unsigned long time, int tz, enum date_mode mode)
 		tz = local_tzoffset(time);
 
 	tm = time_to_tm(time, tz);
-	if (!tm)
-		return NULL;
+	if (!tm) {
+		tm = time_to_tm(0, 0);
+		tz = 0;
+	}
 
 	strbuf_reset(&timebuf);
 	if (mode == DATE_SHORT)
diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index ba25a2e..3fa1715 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -81,4 +81,12 @@ test_expect_success 'date parser recognizes time_t overflow' '
 	test_cmp expect actual
 '
 
+# date is within 2^63-1, but enough to choke glibc's gmtime
+test_expect_success 'absurdly far-in-future dates produce sentinel' '
+	commit=$(munge_author_date HEAD 999999999999999999) &&
+	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
+	git log -1 --format=%ad $commit >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] log: handle integer overflow in timestamps
  2014-02-24  7:46 ` [PATCH 4/5] log: handle integer overflow in timestamps Jeff King
@ 2014-02-24 19:50   ` Junio C Hamano
  2014-02-24 19:58     ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-02-24 19:50 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> If an ident line has a ridiculous date value like (2^64)+1,
> we currently just pass ULONG_MAX along to the date code,
> which can produce nonsensical dates.
>
> On systems with a signed long time_t (e.g., 64-bit glibc
> systems), this actually doesn't end up too bad. The
> ULONG_MAX is converted to -1, we apply the timezone field to
> that, and the result ends up somewhere between Dec 31, 1969
> and Jan 1, 1970.
> ...
> We also recognize overflow in the timezone field, which
> could produce nonsensical results. In this case we show the
> parsed date, but in UTC.

Both are good measures to fallback to sanity, but why is that
if/else?  In other words...

> +	if (date_overflows(date))
> +		date = 0;
> +	else {
> +		if (ident->tz_begin && ident->tz_end)
> +			tz = strtol(ident->tz_begin, NULL, 10);
> +		if (tz == LONG_MAX || tz == LONG_MIN)
> +			tz = 0;
> +	}

... don't we want to fix an input having a bogus timestamp and also
a bogus tz recorded in it?

>  	return show_date(date, tz, mode);
>  }
>  
> diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
> index 83de981..ba25a2e 100755
> --- a/t/t4212-log-corrupt.sh
> +++ b/t/t4212-log-corrupt.sh
> @@ -65,4 +65,20 @@ test_expect_success 'unparsable dates produce sentinel value (%ad)' '
>  	test_cmp expect actual
>  '
>  
> +# date is 2^64 + 1
> +test_expect_success 'date parser recognizes integer overflow' '
> +	commit=$(munge_author_date HEAD 18446744073709551617) &&
> +	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
> +	git log -1 --format=%ad $commit >actual &&
> +	test_cmp expect actual
> +'
> +
> +# date is 2^64 - 2
> +test_expect_success 'date parser recognizes time_t overflow' '
> +	commit=$(munge_author_date HEAD 18446744073709551614) &&
> +	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
> +	git log -1 --format=%ad $commit >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] log: handle integer overflow in timestamps
  2014-02-24 19:50   ` Junio C Hamano
@ 2014-02-24 19:58     ` Jeff King
  2014-02-24 20:21       ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-02-24 19:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Feb 24, 2014 at 11:50:00AM -0800, Junio C Hamano wrote:

> > We also recognize overflow in the timezone field, which
> > could produce nonsensical results. In this case we show the
> > parsed date, but in UTC.
> 
> Both are good measures to fallback to sanity, but why is that
> if/else?  In other words...
> 
> > +	if (date_overflows(date))
> > +		date = 0;
> > +	else {
> > +		if (ident->tz_begin && ident->tz_end)
> > +			tz = strtol(ident->tz_begin, NULL, 10);
> > +		if (tz == LONG_MAX || tz == LONG_MIN)
> > +			tz = 0;
> > +	}
> 
> ... don't we want to fix an input having a bogus timestamp and also
> a bogus tz recorded in it?

If there is a bogus timestamp, then we do not want to look at tz at all.
We leave it at "0", so that we get a true sentinel:

  Thu Jan 1 00:00:00 1970 +0000

and not:

  Wed Dec 31 19:00:00 1969 -0500

If the timestamp is valid, then we bother to parse the zone, and handle
overflow there.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] log: handle integer overflow in timestamps
  2014-02-24 19:58     ` Jeff King
@ 2014-02-24 20:21       ` Junio C Hamano
  2014-02-24 20:37         ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-02-24 20:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Mon, Feb 24, 2014 at 11:50:00AM -0800, Junio C Hamano wrote:
>
>> > We also recognize overflow in the timezone field, which
>> > could produce nonsensical results. In this case we show the
>> > parsed date, but in UTC.
>> 
>> Both are good measures to fallback to sanity, but why is that
>> if/else?  In other words...
>> 
>> > +	if (date_overflows(date))
>> > +		date = 0;
>> > +	else {
>> > +		if (ident->tz_begin && ident->tz_end)
>> > +			tz = strtol(ident->tz_begin, NULL, 10);
>> > +		if (tz == LONG_MAX || tz == LONG_MIN)
>> > +			tz = 0;
>> > +	}
>> 
>> ... don't we want to fix an input having a bogus timestamp and also
>> a bogus tz recorded in it?
>
> If there is a bogus timestamp, then we do not want to look at tz at all.
> We leave it at "0", so that we get a true sentinel:

Ah, OK, I missed the initialization to 0 at the beginning.

It might have been more clear if "int tz" declaration were left
uninitialized, and the variable were explicitly cleared to 0 in the
"date-overflows" error codepath, but it is not a big deal.

Thanks for clarification.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] log: handle integer overflow in timestamps
  2014-02-24 20:21       ` Junio C Hamano
@ 2014-02-24 20:37         ` Jeff King
  2014-02-24 21:01           ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-02-24 20:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Feb 24, 2014 at 12:21:33PM -0800, Junio C Hamano wrote:

> >> > +	if (date_overflows(date))
> >> > +		date = 0;
> >> > +	else {
> >> > +		if (ident->tz_begin && ident->tz_end)
> >> > +			tz = strtol(ident->tz_begin, NULL, 10);
> >> > +		if (tz == LONG_MAX || tz == LONG_MIN)
> >> > +			tz = 0;
> >> > +	}
> >> 
> >> ... don't we want to fix an input having a bogus timestamp and also
> >> a bogus tz recorded in it?
> >
> > If there is a bogus timestamp, then we do not want to look at tz at all.
> > We leave it at "0", so that we get a true sentinel:
> 
> Ah, OK, I missed the initialization to 0 at the beginning.
> 
> It might have been more clear if "int tz" declaration were left
> uninitialized, and the variable were explicitly cleared to 0 in the
> "date-overflows" error codepath, but it is not a big deal.

It might be, but I think it would end up cumbersome. The initialization
was already there from the previous version, which was hitting the else
for "ident->tz_begin". Without fallback initializations, you end up with:

  if (ident->date_begin && ident->date_end) {
          date = strtoul(ident->date_begin, NULL, 10);
          if (date_overflows(date)) {
                  date = 0;
                  tz = 0;
          } else {
                  if (ident->tz_begin && ident->tz_end) {
                          tz = strtol(ident->tz_begin, NULL, 10);
                          if (tz == LONG_MAX || tz == LONG_MIN)
                                  tz = 0;
                  } else
                          tz = 0;
          }
  } else {
          date = 0;
          tz = 0;
  }

which I think is much more confusing (and hard to verify that the
variables are always set). Checking !date as an error condition would
make it a little more readable:

  if (ident->date_begin && ident->date_end) {
          date = strtoul(ident->date_begin, NULL, 10);
          if (date_overflows(date))
                  date = 0;
  } else
          date = 0;

  if (date) {
          if (ident->tz_begin && ident->tz_end) {
                  tz = strtol(ident->tz_begin, NULL, 10);
                  if (tz == LONG_MAX || tz == LONG_MIN)
                          tz = 0;
          } else
                  tz = 0;
  } else
          tz = 0;

but then we treat date==0 as a sentinel, and can never correctly parse
dates on Jan 1, 1970.

So I'd be in favor of keeping it as-is, but feel free to mark it up if
you feel strongly.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] log: handle integer overflow in timestamps
  2014-02-24 20:37         ` Jeff King
@ 2014-02-24 21:01           ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2014-02-24 21:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Mon, Feb 24, 2014 at 12:21:33PM -0800, Junio C Hamano wrote:
>
>> >> > +	if (date_overflows(date))
>> >> > +		date = 0;
>> >> > +	else {
>> >> > +		if (ident->tz_begin && ident->tz_end)
>> >> > +			tz = strtol(ident->tz_begin, NULL, 10);
>> >> > +		if (tz == LONG_MAX || tz == LONG_MIN)
>> >> > +			tz = 0;
>> >> > +	}
>> >> 
>> >> ... don't we want to fix an input having a bogus timestamp and also
>> >> a bogus tz recorded in it?
>> >
>> > If there is a bogus timestamp, then we do not want to look at tz at all.
>> > We leave it at "0", so that we get a true sentinel:
>> 
>> Ah, OK, I missed the initialization to 0 at the beginning.
>> 
>> It might have been more clear if "int tz" declaration were left
>> uninitialized, and the variable were explicitly cleared to 0 in the
>> "date-overflows" error codepath, but it is not a big deal.
>
> It might be, but I think it would end up cumbersome.
> ...
> So I'd be in favor of keeping it as-is, but feel free to mark it up if
> you feel strongly.

I'd be in favor of keeping it as-is.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-02-24  7:49 ` [PATCH 5/5] log: do not segfault on gmtime errors Jeff King
@ 2014-03-22  9:32   ` René Scharfe
  2014-03-24 21:33     ` Jeff King
  2014-03-26 11:05   ` Charles Bailey
  1 sibling, 1 reply; 48+ messages in thread
From: René Scharfe @ 2014-03-22  9:32 UTC (permalink / raw)
  To: Jeff King, git

Am 24.02.2014 08:49, schrieb Jeff King:
> Many code paths assume that show_date and show_ident_date
> cannot return NULL. For the most part, we handle missing or
> corrupt timestamps by showing the epoch time t=0.
>
> However, we might still return NULL if gmtime rejects the
> time_t we feed it, resulting in a segfault. Let's catch this
> case and just format t=0.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This test is of questionable portability, since we are depending on
> gmtime's arbitrary point to decide that our input is crazy and return
> NULL. The value is sufficiently large that I'd expect most to do so,
> though, so it may be safe.

Just came around to testing on FreeBSD 10 amd64; the new test in t4212 
fails there:

--- expect      2014-03-22 08:29:44.000000000 +0000
+++ actual      2014-03-22 08:29:44.000000000 +0000
@@ -1 +1 @@
-Thu Jan 1 00:00:00 1970 +0000
+Sun Jan 0 00:00:00 1900 -0700
not ok 9 - absurdly far-in-future dates produce sentinel
#
#               commit=$(munge_author_date HEAD 999999999999999999) &&
#               echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
#               git log -1 --format=%ad $commit >actual &&
#               test_cmp expect actual
#

# failed 1 among 9 test(s)

Looks like we get a cleared struct tm instead of a NULL pointer.  It 
seems to be a long-standing bug; 
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/145341 was filed in 
April 2010.

> On 32-bit systems, of course, the test does nothing (it is just hitting
> the integer overflow code path). But that's OK, since the output is the
> same for both cases.
>
>   date.c                 | 6 ++++--
>   t/t4212-log-corrupt.sh | 8 ++++++++
>   2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/date.c b/date.c
> index 90b28f7..e1a2cee 100644
> --- a/date.c
> +++ b/date.c
> @@ -184,8 +184,10 @@ const char *show_date(unsigned long time, int tz, enum date_mode mode)
>   		tz = local_tzoffset(time);
>
>   	tm = time_to_tm(time, tz);
> -	if (!tm)
> -		return NULL;
> +	if (!tm) {

Would it make sense to work around the FreeBSD issue by adding a check 
like this?

	if (!tm || tm->tm_year < 70) {

> +		tm = time_to_tm(0, 0);
> +		tz = 0;
> +	}
>
>   	strbuf_reset(&timebuf);
>   	if (mode == DATE_SHORT)
> diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
> index ba25a2e..3fa1715 100755
> --- a/t/t4212-log-corrupt.sh
> +++ b/t/t4212-log-corrupt.sh
> @@ -81,4 +81,12 @@ test_expect_success 'date parser recognizes time_t overflow' '
>   	test_cmp expect actual
>   '
>
> +# date is within 2^63-1, but enough to choke glibc's gmtime
> +test_expect_success 'absurdly far-in-future dates produce sentinel' '
> +	commit=$(munge_author_date HEAD 999999999999999999) &&
> +	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
> +	git log -1 --format=%ad $commit >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_done
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-22  9:32   ` René Scharfe
@ 2014-03-24 21:33     ` Jeff King
  2014-03-24 22:03       ` René Scharfe
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-24 21:33 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Sat, Mar 22, 2014 at 10:32:37AM +0100, René Scharfe wrote:

> >This test is of questionable portability, since we are depending on
> >gmtime's arbitrary point to decide that our input is crazy and return
> >NULL. The value is sufficiently large that I'd expect most to do so,
> >though, so it may be safe.
> 
> Just came around to testing on FreeBSD 10 amd64; the new test in t4212 fails
> there:

Thanks for the report. I don't think the problem is in the test here,
but rather that we should do a more thorough job of detecting gmtime's
"I don't know what to do with this" response.

> >@@ -184,8 +184,10 @@ const char *show_date(unsigned long time, int tz, enum date_mode mode)
> >  		tz = local_tzoffset(time);
> >
> >  	tm = time_to_tm(time, tz);
> >-	if (!tm)
> >-		return NULL;
> >+	if (!tm) {
> 
> Would it make sense to work around the FreeBSD issue by adding a check like
> this?
> 
> 	if (!tm || tm->tm_year < 70) {

That feels like a bit of a maintenance headache.  Right now we do not
internally represent times prior to the epoch, since we mostly use
"unsigned long" through the code. So anything < 70 has to be an error.
But it would be nice one day to consistently use a 64-bit signed time_t
throughout git, and represent even ancient timestamps (e.g., for people
doing historical projects, like importing laws into git). This would set
quite a trap for people working later on the code.

If the result is all-zeroes, can we check for that case instead? I
suppose that will eventually create a "trap" at midnight on January 1st
of the year 0 (though I am not sure such a date is even meaningful,
given the history of our calendars).

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-24 21:33     ` Jeff King
@ 2014-03-24 22:03       ` René Scharfe
  2014-03-24 22:11         ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: René Scharfe @ 2014-03-24 22:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 24.03.2014 22:33, schrieb Jeff King:
> On Sat, Mar 22, 2014 at 10:32:37AM +0100, René Scharfe wrote:
>>> @@ -184,8 +184,10 @@ const char *show_date(unsigned long time, int tz, enum date_mode mode)
>>>   		tz = local_tzoffset(time);
>>>
>>>   	tm = time_to_tm(time, tz);
>>> -	if (!tm)
>>> -		return NULL;
>>> +	if (!tm) {
>>
>> Would it make sense to work around the FreeBSD issue by adding a check like
>> this?
>>
>> 	if (!tm || tm->tm_year < 70) {
>
> That feels like a bit of a maintenance headache.  Right now we do not
> internally represent times prior to the epoch, since we mostly use
> "unsigned long" through the code. So anything < 70 has to be an error.
> But it would be nice one day to consistently use a 64-bit signed time_t
> throughout git, and represent even ancient timestamps (e.g., for people
> doing historical projects, like importing laws into git). This would set
> quite a trap for people working later on the code.
>
> If the result is all-zeroes, can we check for that case instead? I
> suppose that will eventually create a "trap" at midnight on January 1st
> of the year 0 (though I am not sure such a date is even meaningful,
> given the history of our calendars).

Makes sense. That would be "Sun Jan 0 00:00:00 1900", however -- days 
are 1-based and years 1900-based in struct tm.  Since a zeroth day is 
invalid, would this suffice:

	if (!tm || !tm->tm_mday) {

(Yes, I'm lazy. :)

René

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-24 22:03       ` René Scharfe
@ 2014-03-24 22:11         ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-03-24 22:11 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Mon, Mar 24, 2014 at 11:03:42PM +0100, René Scharfe wrote:

> >If the result is all-zeroes, can we check for that case instead? I
> >suppose that will eventually create a "trap" at midnight on January 1st
> >of the year 0 (though I am not sure such a date is even meaningful,
> >given the history of our calendars).
> 
> Makes sense. That would be "Sun Jan 0 00:00:00 1900", however -- days are
> 1-based and years 1900-based in struct tm.

Oh right, I was stupidly forgetting about being 1900-based.

> Since a zeroth day is invalid, would this suffice:
>
> 	if (!tm || !tm->tm_mday) {
> 
> (Yes, I'm lazy. :)

That looks perfect (and I like that it is quick to check, too, since
this case should be extremely rare).

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-02-24  7:49 ` [PATCH 5/5] log: do not segfault on gmtime errors Jeff King
  2014-03-22  9:32   ` René Scharfe
@ 2014-03-26 11:05   ` Charles Bailey
  2014-03-26 18:21     ` Jeff King
  1 sibling, 1 reply; 48+ messages in thread
From: Charles Bailey @ 2014-03-26 11:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Mon, Feb 24, 2014 at 02:49:05AM -0500, Jeff King wrote:
> +# date is within 2^63-1, but enough to choke glibc's gmtime
> +test_expect_success 'absurdly far-in-future dates produce sentinel' '
> +	commit=$(munge_author_date HEAD 999999999999999999) &&
> +	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
> +	git log -1 --format=%ad $commit >actual &&
> +	test_cmp expect actual
> +'

Git on AIX seems happy to convert this to Thu Oct 24 18:46:39
162396404 -0700. I don't know if this is correct for the given input
but the test fails.

Charles.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-26 11:05   ` Charles Bailey
@ 2014-03-26 18:21     ` Jeff King
  2014-03-26 18:51       ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Jeff King
  2014-03-26 18:58       ` [PATCH 5/5] log: do not segfault on gmtime errors Junio C Hamano
  0 siblings, 2 replies; 48+ messages in thread
From: Jeff King @ 2014-03-26 18:21 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

On Wed, Mar 26, 2014 at 11:05:59AM +0000, Charles Bailey wrote:

> On Mon, Feb 24, 2014 at 02:49:05AM -0500, Jeff King wrote:
> > +# date is within 2^63-1, but enough to choke glibc's gmtime
> > +test_expect_success 'absurdly far-in-future dates produce sentinel' '
> > +	commit=$(munge_author_date HEAD 999999999999999999) &&
> > +	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
> > +	git log -1 --format=%ad $commit >actual &&
> > +	test_cmp expect actual
> > +'
> 
> Git on AIX seems happy to convert this to Thu Oct 24 18:46:39
> 162396404 -0700. I don't know if this is correct for the given input
> but the test fails.

Ick. I am not sure that dates at that range are even meaningful (will
October really exist in the year 162396404? Did they account for all the
leap-seconds between then and now?). But at the same time, I cannot
fault their gmtime for just extrapolating the current rules out as far
as we ask them to.

Unlike the FreeBSD thing that René brought up, this is not a problem in
the code, but just in the test. So I think our options are basically:

  1. Scrap the test as unportable.

  2. Hard-code a few expected values. I'd be unsurprised if some other
     system comes up with a slightly different date in 162396404, so we
     may end up needing several of these.

I think I'd lean towards (2), just because it is testing an actual
feature of the code, and I'd like to continue doing so. And while we may
end up with a handful of values, there's probably not _that_ many
independent implementations of gmtime in the wild.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 18:21     ` Jeff King
@ 2014-03-26 18:51       ` Jeff King
  2014-03-26 19:18         ` Junio C Hamano
  2014-03-26 18:58       ` [PATCH 5/5] log: do not segfault on gmtime errors Junio C Hamano
  1 sibling, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-26 18:51 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

One of the tests in t4212 checks our behavior when we feed
gmtime a date so far in the future that it gives up and
returns NULL.

But some gmtime implementations just refuse to quit. They
soldier on, giving us a glimpse of a chilly October evening
some 160 million years in the future (and presumably filled
with our leather-clad horseback-riding ape descendants).

Let's allow the test to match either the sentinel value
(i.e., what we want when gmtime gives up) or any reasonable
value returned by known implementations.

Reported-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Jeff King <peff@peff.net>
---
On top of jk/commit-dates-parsing-fix (though the test is already in
v1.9.1).

 t/t4212-log-corrupt.sh | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 85c6df4..08f982c 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -76,12 +76,27 @@ test_expect_success 'date parser recognizes time_t overflow' '
 	test_cmp expect actual
 '
 
-# date is within 2^63-1, but enough to choke glibc's gmtime
-test_expect_success 'absurdly far-in-future dates produce sentinel' '
+cmp_one_of () {
+	for candidate in "$@"; do
+		echo "$candidate" >expect &&
+		test_cmp expect actual &&
+		return 0
+	done
+	return 1
+}
+
+# date is within 2^63-1, but enough to choke glibc's gmtime.
+# We check that either the date broke gmtime (and we return the
+# usual epoch value), or gmtime gave us some sensible value.
+#
+# The sensible values are determined experimentally. The first
+# is from AIX.
+test_expect_success 'absurdly far-in-future dates' '
 	commit=$(munge_author_date HEAD 999999999999999999) &&
-	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
 	git log -1 --format=%ad $commit >actual &&
-	test_cmp expect actual
+	cmp_one_of \
+		"Thu Jan 1 00:00:00 1970 +0000" \
+		"Thu Oct 24 18:46:39 162396404 -0700"
 '
 
 test_done
-- 
1.9.1.656.ge8a0637

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-26 18:21     ` Jeff King
  2014-03-26 18:51       ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Jeff King
@ 2014-03-26 18:58       ` Junio C Hamano
  2014-03-26 19:01         ` Jeff King
  1 sibling, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-03-26 18:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> On Wed, Mar 26, 2014 at 11:05:59AM +0000, Charles Bailey wrote:
>
>> On Mon, Feb 24, 2014 at 02:49:05AM -0500, Jeff King wrote:
>> > +# date is within 2^63-1, but enough to choke glibc's gmtime
>> > +test_expect_success 'absurdly far-in-future dates produce sentinel' '
>> > +	commit=$(munge_author_date HEAD 999999999999999999) &&
>> > +	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
>> > +	git log -1 --format=%ad $commit >actual &&
>> > +	test_cmp expect actual
>> > +'
>> 
>> Git on AIX seems happy to convert this to Thu Oct 24 18:46:39
>> 162396404 -0700. I don't know if this is correct for the given input
>> but the test fails.
>
> Ick. I am not sure that dates at that range are even meaningful (will
> October really exist in the year 162396404? Did they account for all the
> leap-seconds between then and now?). But at the same time, I cannot
> fault their gmtime for just extrapolating the current rules out as far
> as we ask them to.
>
> Unlike the FreeBSD thing that René brought up, this is not a problem in
> the code, but just in the test. So I think our options are basically:
>
>   1. Scrap the test as unportable.
>
>   2. Hard-code a few expected values. I'd be unsurprised if some other
>      system comes up with a slightly different date in 162396404, so we
>      may end up needing several of these.
>
> I think I'd lean towards (2), just because it is testing an actual
> feature of the code, and I'd like to continue doing so. And while we may
> end up with a handful of values, there's probably not _that_ many
> independent implementations of gmtime in the wild.

Or "3. Just make sure that 'git log' does not segfault"?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-26 18:58       ` [PATCH 5/5] log: do not segfault on gmtime errors Junio C Hamano
@ 2014-03-26 19:01         ` Jeff King
  2014-03-26 21:01           ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-26 19:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Wed, Mar 26, 2014 at 11:58:49AM -0700, Junio C Hamano wrote:

> > Unlike the FreeBSD thing that René brought up, this is not a problem in
> > the code, but just in the test. So I think our options are basically:
> >
> >   1. Scrap the test as unportable.
> >
> >   2. Hard-code a few expected values. I'd be unsurprised if some other
> >      system comes up with a slightly different date in 162396404, so we
> >      may end up needing several of these.
> >
> > I think I'd lean towards (2), just because it is testing an actual
> > feature of the code, and I'd like to continue doing so. And while we may
> > end up with a handful of values, there's probably not _that_ many
> > independent implementations of gmtime in the wild.
> 
> Or "3. Just make sure that 'git log' does not segfault"?

That would not test the FreeBSD case, which does not segfault, but
returns a bogus sentinel value.

I don't know how important that is. This is such a minor feature that it
is not worth a lot of maintenance headache in the test. But I also do
not know if this is going to be the last report, or we will have a bunch
of other systems that need their own special values put into the test.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 18:51       ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Jeff King
@ 2014-03-26 19:18         ` Junio C Hamano
  2014-03-26 19:25           ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-03-26 19:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> +cmp_one_of () {
> +	for candidate in "$@"; do

Style ;-)

> +		echo "$candidate" >expect &&
> +		test_cmp expect actual &&
> +		return 0
> +	done
> +	return 1
> +}

It actually may be easier to understand if you write a trivial case
statement at the sole calling site of this helper function, though.

In any case, would it really be essential to make sure that the
output shows a phony (or a seemingly real) datestamp for this test?

The primary thing you wanted to achieve by the "gmtime gave us NULL,
let's substitute it with an arbitrary value to avoid dereferencing
the NULL" change was *not* that we see that same arbitrary value
comes out of the system, but that we do not die by attempting to
reference the NULL, I think.  Not dying is the primary thing we want
to (and we already do) test, no?

> +# date is within 2^63-1, but enough to choke glibc's gmtime.
> +# We check that either the date broke gmtime (and we return the
> +# usual epoch value), or gmtime gave us some sensible value.
> +#
> +# The sensible values are determined experimentally. The first
> +# is from AIX.
> +test_expect_success 'absurdly far-in-future dates' '
>  	commit=$(munge_author_date HEAD 999999999999999999) &&
> -	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
>  	git log -1 --format=%ad $commit >actual &&
> -	test_cmp expect actual
> +	cmp_one_of \
> +		"Thu Jan 1 00:00:00 1970 +0000" \
> +		"Thu Oct 24 18:46:39 162396404 -0700"
>  '
>  
>  test_done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 19:18         ` Junio C Hamano
@ 2014-03-26 19:25           ` Jeff King
  2014-03-26 19:33             ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-26 19:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Wed, Mar 26, 2014 at 12:18:25PM -0700, Junio C Hamano wrote:

> > +		echo "$candidate" >expect &&
> > +		test_cmp expect actual &&
> > +		return 0
> > +	done
> > +	return 1
> > +}
> 
> It actually may be easier to understand if you write a trivial case
> statement at the sole calling site of this helper function, though.

Yeah, perhaps. I wanted to build on test_cmp because it has nice output
for the failure case, but it is probably simple enough to do:

  output=$(cat actual)
  case "$output" in
  ...
  *) echo >&2 "unrecognized date: $output"

> In any case, would it really be essential to make sure that the
> output shows a phony (or a seemingly real) datestamp for this test?
> 
> The primary thing you wanted to achieve by the "gmtime gave us NULL,
> let's substitute it with an arbitrary value to avoid dereferencing
> the NULL" change was *not* that we see that same arbitrary value
> comes out of the system, but that we do not die by attempting to
> reference the NULL, I think.  Not dying is the primary thing we want
> to (and we already do) test, no?

I think there are really two separate behaviors we are testing here (and
in the surrounding tests):

  1. Don't segfault if gmtime returns NULL.

  2. Whenever we cannot process a date (either because gmtime fails, or
     because we fail before even getting the value to gmtime),
     consistently return the sentinel date (so the reader can easily
     know it's bogus).

Having the test be particular about its output helped us find a case
where FreeBSD did not trigger (1), but did trigger (2), by returning a
blanked "struct tm".

I'm open to the argument that (2) is not worth worrying about that much
if it is a hassle to test. But I don't think it is that much hassle
(yet, anyway).

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 19:25           ` Jeff King
@ 2014-03-26 19:33             ` Jeff King
  2014-03-26 19:40               ` Jeff King
  2014-03-26 21:22               ` Charles Bailey
  0 siblings, 2 replies; 48+ messages in thread
From: Jeff King @ 2014-03-26 19:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Wed, Mar 26, 2014 at 03:25:36PM -0400, Jeff King wrote:

> > The primary thing you wanted to achieve by the "gmtime gave us NULL,
> > let's substitute it with an arbitrary value to avoid dereferencing
> > the NULL" change was *not* that we see that same arbitrary value
> > comes out of the system, but that we do not die by attempting to
> > reference the NULL, I think.  Not dying is the primary thing we want
> > to (and we already do) test, no?
> 
> I think there are really two separate behaviors we are testing here (and
> in the surrounding tests):
> 
>   1. Don't segfault if gmtime returns NULL.
> 
>   2. Whenever we cannot process a date (either because gmtime fails, or
>      because we fail before even getting the value to gmtime),
>      consistently return the sentinel date (so the reader can easily
>      know it's bogus).
> 
> Having the test be particular about its output helped us find a case
> where FreeBSD did not trigger (1), but did trigger (2), by returning a
> blanked "struct tm".
> 
> I'm open to the argument that (2) is not worth worrying about that much
> if it is a hassle to test. But I don't think it is that much hassle
> (yet, anyway).

That being said, is the AIX value actually right? I did not look closely
at first, but just assumed that it was vaguely right. But:

  999999999999999999 / (86400 * 365)

is something like 31 billion years in the future, not 160 million.
A real date calculation will have a few tweaks (leap years, etc), but
that is orders of magnitude off.

So I am not sure that AIX is not actually just giving us utter crap. In
that case, the test is not wrong; it's pickiness is actually finding a
real problem. But I am not sure it is a problem worth solving. I do not
want to get into heuristics deciding whether a particular platform's
gmtime output is crap or not. That pushes this into the realm of "it's
not worth testing", and we should stick to just testing that we did not
segfault.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 19:33             ` Jeff King
@ 2014-03-26 19:40               ` Jeff King
  2014-03-26 20:36                 ` Charles Bailey
  2014-03-26 21:22               ` Charles Bailey
  1 sibling, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-26 19:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Wed, Mar 26, 2014 at 03:33:59PM -0400, Jeff King wrote:

> That being said, is the AIX value actually right? I did not look closely
> at first, but just assumed that it was vaguely right. But:
> 
>   999999999999999999 / (86400 * 365)
> 
> is something like 31 billion years in the future, not 160 million.
> A real date calculation will have a few tweaks (leap years, etc), but
> that is orders of magnitude off.

Assuming my math is right, then here is the most sensible patch, IMHO.

-- >8 --
Subject: t4212: loosen far-in-future test for AIX

One of the tests in t4212 checks our behavior when we feed
gmtime a date so far in the future that it gives up and
returns NULL. Some implementations, like AIX, may actually
just provide us a bogus result instead.

It's not worth it for us to come up with heuristics that
guess whether the return value is sensible or not. On good
platforms where gmtime reports the problem to us with NULL,
we will print the epoch value. On bad platforms, we will
print garbage.  But our test should be written for the
lowest common denominator so that it passes everywhere.

Reported-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t4212-log-corrupt.sh | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 85c6df4..bb843ab 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -77,11 +77,14 @@ test_expect_success 'date parser recognizes time_t overflow' '
 '
 
 # date is within 2^63-1, but enough to choke glibc's gmtime
-test_expect_success 'absurdly far-in-future dates produce sentinel' '
+#
+# Ideally we would check the output to make sure we replaced it with
+# a useful sentinel value, but some platforms will actually hand us back
+# a nonsensical date. It is not worth our time to try to evaluate these
+# dates, so just make sure we didn't segfault or otherwise abort.
+test_expect_success 'absurdly far-in-future dates' '
 	commit=$(munge_author_date HEAD 999999999999999999) &&
-	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
-	git log -1 --format=%ad $commit >actual &&
-	test_cmp expect actual
+	git log -1 --format=%ad $commit
 '
 
 test_done
-- 
1.9.1.656.ge8a0637

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 19:40               ` Jeff King
@ 2014-03-26 20:36                 ` Charles Bailey
  2014-03-26 20:38                   ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Charles Bailey @ 2014-03-26 20:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Wed, Mar 26, 2014 at 03:40:43PM -0400, Jeff King wrote:
> On Wed, Mar 26, 2014 at 03:33:59PM -0400, Jeff King wrote:
> 
> > That being said, is the AIX value actually right? I did not look closely
> > at first, but just assumed that it was vaguely right. But:
> > 
> >   999999999999999999 / (86400 * 365)
> > 
> > is something like 31 billion years in the future, not 160 million.
> > A real date calculation will have a few tweaks (leap years, etc), but
> > that is orders of magnitude off.
> 
> Assuming my math is right, then here is the most sensible patch, IMHO.
> 

Perhaps hold onto this one for a little while longer. Splitting things
out from the test is giving me some inconsistent results, there may be
something else going wrong in our environment here.

Charles.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 20:36                 ` Charles Bailey
@ 2014-03-26 20:38                   ` Jeff King
  2014-03-26 20:41                     ` Charles Bailey
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-26 20:38 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Wed, Mar 26, 2014 at 08:36:18PM +0000, Charles Bailey wrote:

> On Wed, Mar 26, 2014 at 03:40:43PM -0400, Jeff King wrote:
> > On Wed, Mar 26, 2014 at 03:33:59PM -0400, Jeff King wrote:
> > 
> > > That being said, is the AIX value actually right? I did not look closely
> > > at first, but just assumed that it was vaguely right. But:
> > > 
> > >   999999999999999999 / (86400 * 365)
> > > 
> > > is something like 31 billion years in the future, not 160 million.
> > > A real date calculation will have a few tweaks (leap years, etc), but
> > > that is orders of magnitude off.
> > 
> > Assuming my math is right, then here is the most sensible patch, IMHO.
> > 
> 
> Perhaps hold onto this one for a little while longer. Splitting things
> out from the test is giving me some inconsistent results, there may be
> something else going wrong in our environment here.

By the way, can you confirm that this is a 64-bit system? On a 32-bit
system, we should be triggering different code paths (we fail at the
strtoul level). Those should be checked by the previous tests, but I'd
like to make sure.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 20:38                   ` Jeff King
@ 2014-03-26 20:41                     ` Charles Bailey
  0 siblings, 0 replies; 48+ messages in thread
From: Charles Bailey @ 2014-03-26 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Wed, Mar 26, 2014 at 04:38:30PM -0400, Jeff King wrote:
> 
> By the way, can you confirm that this is a 64-bit system? On a 32-bit
> system, we should be triggering different code paths (we fail at the
> strtoul level). Those should be checked by the previous tests, but I'd
> like to make sure.

Yes, we're only building 64-bit at the moment.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-26 19:01         ` Jeff King
@ 2014-03-26 21:01           ` Junio C Hamano
  2014-03-26 21:09             ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-03-26 21:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> On Wed, Mar 26, 2014 at 11:58:49AM -0700, Junio C Hamano wrote:
>
>> > Unlike the FreeBSD thing that René brought up, this is not a problem in
>> > the code, but just in the test. So I think our options are basically:
>> >
>> >   1. Scrap the test as unportable.
>> >
>> >   2. Hard-code a few expected values. I'd be unsurprised if some other
>> >      system comes up with a slightly different date in 162396404, so we
>> >      may end up needing several of these.
>> >
>> > I think I'd lean towards (2), just because it is testing an actual
>> > feature of the code, and I'd like to continue doing so. And while we may
>> > end up with a handful of values, there's probably not _that_ many
>> > independent implementations of gmtime in the wild.
>> 
>> Or "3. Just make sure that 'git log' does not segfault"?
>
> That would not test the FreeBSD case, which does not segfault, but
> returns a bogus sentinel value.
>
> I don't know how important that is. This is such a minor feature that it
> is not worth a lot of maintenance headache in the test. But I also do
> not know if this is going to be the last report, or we will have a bunch
> of other systems that need their own special values put into the test.

I didn't quite realize that your objective of the change was to
signal the failure of gmtime for all implementations, i.e. both (1)
the ones that signal an error by giving NULL back to us and (2) the
ones that fail to signal an error but leave bogus result (FreeBSD,
but it appears AIX might be also giving us another bogus value), by
using a single sane sentinel value.  If we can do that consistently
on every platform, that would indeed be great.

But if that is the case, we should not have to maintain special case
values in the test code, I would think.  The test instead should
expect the output to have that single sentinel value, and if the
workaround code fails to detect a breakage in the platform gmtime(),
the special case logic to catch these platform-specific breakages
should go that "timestamp that cannot be grokked by gmtime---replace
it with a sentinel" logic, no?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/5] log: do not segfault on gmtime errors
  2014-03-26 21:01           ` Junio C Hamano
@ 2014-03-26 21:09             ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-03-26 21:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Wed, Mar 26, 2014 at 02:01:21PM -0700, Junio C Hamano wrote:

> > I don't know how important that is. This is such a minor feature that it
> > is not worth a lot of maintenance headache in the test. But I also do
> > not know if this is going to be the last report, or we will have a bunch
> > of other systems that need their own special values put into the test.
> 
> I didn't quite realize that your objective of the change was to
> signal the failure of gmtime for all implementations,

I didn't realize it at the time I wrote the test, either. It was my goal
to recognize such failures, but I didn't now at the time that there were
failures that would be signaled by anything besides a NULL return.

> the ones that signal an error by giving NULL back to us and (2) the
> ones that fail to signal an error but leave bogus result (FreeBSD,
> but it appears AIX might be also giving us another bogus value), by
> using a single sane sentinel value.  If we can do that consistently
> on every platform, that would indeed be great.

Agreed. I am not sure we can do so. For FreeBSD, I think it is not hard.
I am not sure what the pattern is for AIX. I am hoping Charles will
reveal some pattern, but there may not be one.

> But if that is the case, we should not have to maintain special case
> values in the test code, I would think.

Right. All of my suggestions to accommodate special values in the test
were assuming that the system gmtime was smarter than we are. That it
produced a good value for this crazy time and _didn't_ fail.

But after having done the basic math, I don't think that is what is
going on here.

> The test instead should expect the output to have that single sentinel
> value, and if the workaround code fails to detect a breakage in the
> platform gmtime(), the special case logic to catch these
> platform-specific breakages should go that "timestamp that cannot be
> grokked by gmtime---replace it with a sentinel" logic, no?

Yes, exactly. That is the preferable solution, if we can come up with
such logic. Personally I am not optimistic.

The fallback is to accept that we cannot cover all cases, and just
loosen the test (i.e., the second patch I posted today).

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 19:33             ` Jeff King
  2014-03-26 19:40               ` Jeff King
@ 2014-03-26 21:22               ` Charles Bailey
  2014-03-26 21:57                 ` Jeff King
  1 sibling, 1 reply; 48+ messages in thread
From: Charles Bailey @ 2014-03-26 21:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Wed, Mar 26, 2014 at 03:33:59PM -0400, Jeff King wrote:
> 
> That being said, is the AIX value actually right? I did not look closely
> at first, but just assumed that it was vaguely right. But:
> 
>   999999999999999999 / (86400 * 365)
> 
> is something like 31 billion years in the future, not 160 million.
> A real date calculation will have a few tweaks (leap years, etc), but
> that is orders of magnitude off.

Well, this is embarrassing, while moving this through the corporate
firewall (aka typing on one machine while looking at another), I
munged the date. It still doesn't seem right but at least you can now
see the actual data.

I stopped the test with --immediate and found the dangling commit that
the test created and dumped it with the previous version of Git (well
a 1.8.5.5 build)

  ibm: trash directory.t4212-log-corrupt $ git log -1 --pretty=raw 1fc17e734e4487c31bdfe05bb3d15618b69c4dca
  commit 1fc17e734e4487c31bdfe05bb3d15618b69c4dca
  tree 64fd3796c57084e7b8cbae358ce37970b8e954f6
  author A U Thor <author@example.com> 999999999999999999 -0700
  committer C O Mitter <committer@example.com> 1112911993 -0700

      foo

  ibm: trash directory.t4212-log-corrupt $ git log -1 1fc17e734e4487c31bdfe05bb3d15618b69c4dca
  commit 1fc17e734e4487c31bdfe05bb3d15618b69c4dca
  Author: A U Thor <author@example.com>
  Date:   Thu Oct 24 18:46:39 1623969404 -0700

      foo

Same commit but dumped from a linux machine:

  linux: trash directory.t4212-log-corrupt $ git log -1 1fc17e734e4487c31bdfe05bb3d15618b69c4dca
  commit 1fc17e734e4487c31bdfe05bb3d15618b69c4dca
  Author: A U Thor <author@example.com>
  Date:   (null)

      foo

Charles.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 21:22               ` Charles Bailey
@ 2014-03-26 21:57                 ` Jeff King
  2014-03-26 22:46                   ` Charles Bailey
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-26 21:57 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Wed, Mar 26, 2014 at 09:22:27PM +0000, Charles Bailey wrote:

> On Wed, Mar 26, 2014 at 03:33:59PM -0400, Jeff King wrote:
> > 
> > That being said, is the AIX value actually right? I did not look closely
> > at first, but just assumed that it was vaguely right. But:
> > 
> >   999999999999999999 / (86400 * 365)
> > 
> > is something like 31 billion years in the future, not 160 million.
> > A real date calculation will have a few tweaks (leap years, etc), but
> > that is orders of magnitude off.
> 
> Well, this is embarrassing, while moving this through the corporate
> firewall (aka typing on one machine while looking at another), I
> munged the date. It still doesn't seem right but at least you can now
> see the actual data.

Hmm, so the year you got is actually: 1623969404. That still seems off
to me by a factor 20. I don't know if this is really worth digging into
that much further, but I wonder what you would get for timestamps of:

  99999999999999999
  9999999999999999
  999999999999999
  etc.

Do we start generating weird values at some particular size? Or is AIX
gmtime really more clever than I am, and is accounting for wobble of the
Earth or something over the next billion years?

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 21:57                 ` Jeff King
@ 2014-03-26 22:46                   ` Charles Bailey
  2014-03-27 22:48                     ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Charles Bailey @ 2014-03-26 22:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Wed, Mar 26, 2014 at 05:57:41PM -0400, Jeff King wrote:
> Hmm, so the year you got is actually: 1623969404. That still seems off
> to me by a factor 20. I don't know if this is really worth digging into
> that much further, but I wonder what you would get for timestamps of:
> 
>   99999999999999999
>   9999999999999999
>   999999999999999
>   etc.
> 

AIX goes negative at about the same time Linux and Solaris segfault:

9999999 Sun Apr 26 10:46:39 1970 -0700
99999999 Sat Mar 3 02:46:39 1973 -0700
999999999 Sat Sep 8 18:46:39 2001 -0700
9999999999 Sat Nov 20 10:46:39 2286 -0700
99999999999 Wed Nov 16 02:46:39 5138 -0700
999999999999 Thu Sep 26 18:46:39 33658 -0700
9999999999999 Sun May 20 10:46:39 318857 -0700
99999999999999 Sat Nov 7 02:46:39 3170843 -0700
999999999999999 Sat Jul 4 18:46:39 31690708 -0700
9999999999999999 Sat Jan 25 10:46:39 316889355 -0700
99999999999999999 Wed Sep 6 02:46:39 -1126091476 -0700
999999999999999999 Thu Oct 24 18:46:39 1623969404 -0700

So, very bogus values.

Charles.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-26 22:46                   ` Charles Bailey
@ 2014-03-27 22:48                     ` Jeff King
  2014-03-28 16:41                       ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-27 22:48 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Wed, Mar 26, 2014 at 10:46:16PM +0000, Charles Bailey wrote:

> On Wed, Mar 26, 2014 at 05:57:41PM -0400, Jeff King wrote:
> > Hmm, so the year you got is actually: 1623969404. That still seems off
> > to me by a factor 20. I don't know if this is really worth digging into
> > that much further, but I wonder what you would get for timestamps of:
> > 
> >   99999999999999999
> >   9999999999999999
> >   999999999999999
> >   etc.
> > 
> 
> AIX goes negative at about the same time Linux and Solaris segfault:
> 
> 9999999 Sun Apr 26 10:46:39 1970 -0700
> 99999999 Sat Mar 3 02:46:39 1973 -0700
> 999999999 Sat Sep 8 18:46:39 2001 -0700
> 9999999999 Sat Nov 20 10:46:39 2286 -0700
> 99999999999 Wed Nov 16 02:46:39 5138 -0700
> 999999999999 Thu Sep 26 18:46:39 33658 -0700
> 9999999999999 Sun May 20 10:46:39 318857 -0700
> 99999999999999 Sat Nov 7 02:46:39 3170843 -0700
> 999999999999999 Sat Jul 4 18:46:39 31690708 -0700
> 9999999999999999 Sat Jan 25 10:46:39 316889355 -0700
> 99999999999999999 Wed Sep 6 02:46:39 -1126091476 -0700
> 999999999999999999 Thu Oct 24 18:46:39 1623969404 -0700

Thanks. Given the value where it fails, it kind of looks like there is
some signed 32-bit value at work (~300 million years is OK, but 10 times
that, rather than yielding ~3 billion, gets us -1 billion). Perhaps
tm.tm_year is 32-bit.

So what do we want to do? I think the options are:

  1. Try to guess when we have a bogus timestamp value with an arbitrary
     cutoff like "greater than 1 million years from now" (and enforce it
     via time_t seconds, and avoid gmtime entirely). That is made-up and
     arbitrary, but it also is sufficiently far that it won't ever
     matter, and sufficiently close that any gmtime should behave
     sensibly with it.

  2. Accept that we can't guess at every broken gmtime's output, and
     just loosen the test to make sure we don't segfault.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-27 22:48                     ` Jeff King
@ 2014-03-28 16:41                       ` Junio C Hamano
  2014-03-28 18:47                         ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-03-28 16:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

>> 9999999999999999 Sat Jan 25 10:46:39 316889355 -0700
>> 99999999999999999 Wed Sep 6 02:46:39 -1126091476 -0700
>> 999999999999999999 Thu Oct 24 18:46:39 1623969404 -0700
>
> Thanks. Given the value where it fails, it kind of looks like there is
> some signed 32-bit value at work (~300 million years is OK, but 10 times
> that, rather than yielding ~3 billion, gets us -1 billion). Perhaps
> tm.tm_year is 32-bit.

That is what it looks like to me, too.  It makes me wonder if some
other platforms may have similar breakage using 16-bit signed
tm_year and how such a breakage would look like, though ;-)

> So what do we want to do? I think the options are:
>
>   1. Try to guess when we have a bogus timestamp value with an arbitrary
>      cutoff like "greater than 1 million years from now" (and enforce it
>      via time_t seconds, and avoid gmtime entirely). That is made-up and
>      arbitrary, but it also is sufficiently far that it won't ever
>      matter, and sufficiently close that any gmtime should behave
>      sensibly with it.

I think that two assumptions here are that

 (1) we would be able to find a single insane value (like 3 billion
     years from now expressed in time_t seconds) the test script
     would be able to feed and expect it to fail on all platforms we
     care about, even though how exactly that value causes various
     platforms fail may be different (glibc may give us a NULL from
     gmtime, FreeBSD may leave their own sentinel in gmtime we can
     recognize, and some others may simply wrap around the years);
     and that

 (2) the only other class of failure mode we haven't considered
     bevore Charles's report is tm_year wrapping around 32-bit
     signed int.

Offhand, the three possible failure modes this thread identified
sounds to me like the only plausible ones, and I think the best way
forward might be to

 - teach the "is the result sane, even though we may have got a
   non-NULL from gmtime?  otherwise let's signal a failure by
   replacing it with a known sentinel value" codepath the new
   failure mode Charles's report suggests---if we feed a positive
   timestamp and gmtime gave us back a tm_year+1900 < 0, that is
   certainly an overflow; and

 - Use that 3-billion years timestamp from Charles's report in the
   test and make sure we convert it to the known sentinel value.

perhaps?

>   2. Accept that we can't guess at every broken gmtime's output, and
>      just loosen the test to make sure we don't segfault.

Of course that is a simpler option, but we may have identified all
plausible failure modes, in which case we can afford to go with your
original plan to validate that we not just avoid segfaulting on one
of the three failure modes from gmtime, but also cover the other two
failure modes and signal any of them with a sentinel.  That way may
allow us to identify the fourth failure mode we haven't anticipated.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-28 16:41                       ` Junio C Hamano
@ 2014-03-28 18:47                         ` Jeff King
  2014-03-28 19:02                           ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-28 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Fri, Mar 28, 2014 at 09:41:53AM -0700, Junio C Hamano wrote:

> Offhand, the three possible failure modes this thread identified
> sounds to me like the only plausible ones, and I think the best way
> forward might be to
> 
>  - teach the "is the result sane, even though we may have got a
>    non-NULL from gmtime?  otherwise let's signal a failure by
>    replacing it with a known sentinel value" codepath the new
>    failure mode Charles's report suggests---if we feed a positive
>    timestamp and gmtime gave us back a tm_year+1900 < 0, that is
>    certainly an overflow; and

I don't think we can analyze the output from gmtime. If it wraps the
year at N, then won't N+2014 look like a valid value?

If we are going to do something trustworthy I think it has to be before
we hand off to gmtime. Like:

diff --git a/date.c b/date.c
index e1a2cee..e0c43c4 100644
--- a/date.c
+++ b/date.c
@@ -57,6 +57,8 @@ static time_t gm_time_t(unsigned long time, int tz)
 static struct tm *time_to_tm(unsigned long time, int tz)
 {
 	time_t t = gm_time_t(time, tz);
+	if (t > 9999999999999999)
+		return NULL;
 	return gmtime(&t);
 }

I suspect that would handle the FreeBSD case, as well.

By the way, I have a suspicion that the gm_time_t above can overflow if
you specially craft a value at the edge of what time_t can handle (we
check that our value will not overflow time_t earlier, but now we might
be adding up to 86400 seconds to it). <sigh>

-Peff

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-28 18:47                         ` Jeff King
@ 2014-03-28 19:02                           ` Junio C Hamano
  2014-03-28 19:05                             ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-03-28 19:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> On Fri, Mar 28, 2014 at 09:41:53AM -0700, Junio C Hamano wrote:
>
>> Offhand, the three possible failure modes this thread identified
>> sounds to me like the only plausible ones, and I think the best way
>> forward might be to
>> 
>>  - teach the "is the result sane, even though we may have got a
>>    non-NULL from gmtime?  otherwise let's signal a failure by
>>    replacing it with a known sentinel value" codepath the new
>>    failure mode Charles's report suggests---if we feed a positive
>>    timestamp and gmtime gave us back a tm_year+1900 < 0, that is
>>    certainly an overflow; and
>
> I don't think we can analyze the output from gmtime. If it wraps the
> year at N, then won't N+2014 look like a valid value?

Yes, but I was hoping that there are small number of possible N's
;-)

> If we are going to do something trustworthy I think it has to be before
> we hand off to gmtime. Like:
>
> diff --git a/date.c b/date.c
> index e1a2cee..e0c43c4 100644
> --- a/date.c
> +++ b/date.c
> @@ -57,6 +57,8 @@ static time_t gm_time_t(unsigned long time, int tz)
>  static struct tm *time_to_tm(unsigned long time, int tz)
>  {
>  	time_t t = gm_time_t(time, tz);
> +	if (t > 9999999999999999)
> +		return NULL;
>  	return gmtime(&t);
>  }
>
> I suspect that would handle the FreeBSD case, as well.
>
> By the way, I have a suspicion that the gm_time_t above can overflow if
> you specially craft a value at the edge of what time_t can handle (we
> check that our value will not overflow time_t earlier, but now we might
> be adding up to 86400 seconds to it). <sigh>

Yuck.  Let's not go there.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-28 19:02                           ` Junio C Hamano
@ 2014-03-28 19:05                             ` Jeff King
  2014-03-28 19:30                               ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-03-28 19:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Fri, Mar 28, 2014 at 12:02:46PM -0700, Junio C Hamano wrote:

> >>  - teach the "is the result sane, even though we may have got a
> >>    non-NULL from gmtime?  otherwise let's signal a failure by
> >>    replacing it with a known sentinel value" codepath the new
> >>    failure mode Charles's report suggests---if we feed a positive
> >>    timestamp and gmtime gave us back a tm_year+1900 < 0, that is
> >>    certainly an overflow; and
> >
> > I don't think we can analyze the output from gmtime. If it wraps the
> > year at N, then won't N+2014 look like a valid value?
> 
> Yes, but I was hoping that there are small number of possible N's
> ;-)

I'm not sure I understand. Even if we know N, we've lost information
during the truncation done by time_t (we cannot distingiuish true M from
N+M).

> > diff --git a/date.c b/date.c
> > index e1a2cee..e0c43c4 100644
> > --- a/date.c
> > +++ b/date.c
> > @@ -57,6 +57,8 @@ static time_t gm_time_t(unsigned long time, int tz)
> >  static struct tm *time_to_tm(unsigned long time, int tz)
> >  {
> >  	time_t t = gm_time_t(time, tz);
> > +	if (t > 9999999999999999)
> > +		return NULL;
> >  	return gmtime(&t);
> >  }
> >
> > I suspect that would handle the FreeBSD case, as well.
> >
> > By the way, I have a suspicion that the gm_time_t above can overflow if
> > you specially craft a value at the edge of what time_t can handle (we
> > check that our value will not overflow time_t earlier, but now we might
> > be adding up to 86400 seconds to it). <sigh>
> 
> Yuck.  Let's not go there.

Do you mean "let's not worry about the absurdly specific overflow case",
or "let's not do this gross time_to_tm hack?"

This (non-)issue has consumed a lot more brain power than it is probably
worth. I'd like to figure out which patch to go with and be done. :)

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-28 19:05                             ` Jeff King
@ 2014-03-28 19:30                               ` Junio C Hamano
  2014-04-01  7:38                                 ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-03-28 19:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> This (non-)issue has consumed a lot more brain power than it is probably
> worth. I'd like to figure out which patch to go with and be done. :)

Let's just deal with a simple known cases (like FreeBSD) in the real
code that everybody exercises at runtime, and have the new test only
check we do not segfault on a value we used to segfault upon seeing.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-03-28 19:30                               ` Junio C Hamano
@ 2014-04-01  7:38                                 ` Jeff King
  2014-04-01  7:42                                   ` [PATCH 1/2] date: recognize bogus FreeBSD gmtime output Jeff King
                                                     ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Jeff King @ 2014-04-01  7:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Charles Bailey, git

On Fri, Mar 28, 2014 at 12:30:02PM -0700, Junio C Hamano wrote:

> Let's just deal with a simple known cases (like FreeBSD) in the real
> code that everybody exercises at runtime, and have the new test only
> check we do not segfault on a value we used to segfault upon seeing.

OK. Here it is, with the other option as an "alt" patch for reference.

  [1/2]: date: recognize bogus FreeBSD gmtime output
  [2/2]: t4212: loosen far-in-future test for AIX
  [2alt/2]: work around unreliable gmtime errors on AIX

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/2] date: recognize bogus FreeBSD gmtime output
  2014-04-01  7:38                                 ` Jeff King
@ 2014-04-01  7:42                                   ` Jeff King
  2014-04-01 17:42                                     ` René Scharfe
  2014-04-01  7:43                                   ` [PATCH 2/2] t4212: loosen far-in-future test for AIX Jeff King
                                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2014-04-01  7:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Charles Bailey, git

Most gmtime implementations return a NULL value when they
encounter an error (and this behavior is specified by ANSI C
and POSIX).  FreeBSD's implementation, however, will
indicate an error by returning a pointer to a "struct tm"
with all fields set to zero. Let's also recognize this and
convert it to a NULL (with this patch, t4212 should pass on
FreeBSD).

Reported-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
There are actually a few callers to gmtime and gmtime_r, so I pushed
this fix up into a compat wrapper rather than in time_to_tm to get them
all. It's possible that localtime() would want to receive the same
treatment, too.  It's not strictly necessary to make the wrapper
conditional, but it was easy to do so. We could also just run this code
all the time.

I don't have a FreeBSD VM handy to test this, so confirmation that it
passes the test would be nice.

 Makefile          |  8 ++++++++
 compat/gmtime.c   | 26 ++++++++++++++++++++++++++
 config.mak.uname  |  1 +
 git-compat-util.h |  7 +++++++
 4 files changed, 42 insertions(+)
 create mode 100644 compat/gmtime.c

diff --git a/Makefile b/Makefile
index 3646391..2f3758c 100644
--- a/Makefile
+++ b/Makefile
@@ -338,6 +338,9 @@ all::
 # Define TEST_GIT_INDEX_VERSION to 2, 3 or 4 to run the test suite
 # with a different indexfile format version.  If it isn't set the index
 # file format used is index-v[23].
+#
+# Define GMTIME_UNRELIABLE_ERRORS if your gmtime() function does not
+# return NULL when it receives a bogus time_t.
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1489,6 +1492,11 @@ ifneq (,$(XDL_FAST_HASH))
 	BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
 
+ifdef GMTIME_UNRELIABLE_ERRORS
+	COMPAT_OBJS += compat/gmtime.o
+	BASIC_CFLAGS += -DGMTIME_UNRELIABLE_ERRORS
+endif
+
 ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
diff --git a/compat/gmtime.c b/compat/gmtime.c
new file mode 100644
index 0000000..ffcabf4
--- /dev/null
+++ b/compat/gmtime.c
@@ -0,0 +1,26 @@
+#include "../git-compat-util.h"
+#undef gmtime
+#undef gmtime_r
+
+struct tm *git_gmtime(const time_t *timep)
+{
+	static struct tm result;
+	return git_gmtime_r(timep, &result);
+}
+
+struct tm *git_gmtime_r(const time_t *timep, struct tm *result)
+{
+	struct tm *ret;
+
+	ret = gmtime_r(timep, result);
+
+	/*
+	 * Rather than NULL, FreeBSD gmtime will return a "struct tm" with all
+	 * fields zeroed. Since "mday" cannot otherwise be zero, we can test
+	 * this very quickly.
+	 */
+	if (ret && !ret->tm_mday)
+		ret = NULL;
+
+	return ret;
+}
diff --git a/config.mak.uname b/config.mak.uname
index 6069a44..0e22ac0 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -187,6 +187,7 @@ ifeq ($(uname_S),FreeBSD)
 	endif
 	PYTHON_PATH = /usr/local/bin/python
 	HAVE_PATHS_H = YesPlease
+	GMTIME_UNRELIABLE_ERRORS = UnfortunatelyYes
 endif
 ifeq ($(uname_S),OpenBSD)
 	NO_STRCASESTR = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index 892032b..5191866 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -716,4 +716,11 @@ void warn_on_inaccessible(const char *path);
 /* Get the passwd entry for the UID of the current process. */
 struct passwd *xgetpwuid_self(void);
 
+#ifdef GMTIME_UNRELIABLE_ERRORS
+struct tm *git_gmtime(const time_t *);
+struct tm *git_gmtime_r(const time_t *, struct tm *);
+#define gmtime git_gmtime
+#define gmtime_r git_gmtime_r
+#endif
+
 #endif
-- 
1.9.1.656.ge8a0637

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/2] t4212: loosen far-in-future test for AIX
  2014-04-01  7:38                                 ` Jeff King
  2014-04-01  7:42                                   ` [PATCH 1/2] date: recognize bogus FreeBSD gmtime output Jeff King
@ 2014-04-01  7:43                                   ` Jeff King
  2014-04-01  7:45                                   ` [PATCH 2alt/2] work around unreliable gmtime errors on AIX Jeff King
  2014-04-01 19:07                                   ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Junio C Hamano
  3 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-04-01  7:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Charles Bailey, git

One of the tests in t4212 checks our behavior when we feed
gmtime a date so far in the future that it gives up and
returns NULL. Some implementations, like AIX, may actually
just provide us a bogus result instead.

It's not worth it for us to come up with heuristics that
guess whether the return value is sensible or not. On good
platforms where gmtime reports the problem to us with NULL,
we will print the epoch value. On bad platforms, we will
print garbage.  But our test should be written for the
lowest common denominator so that it passes everywhere.

Reported-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t4212-log-corrupt.sh | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 3fa1715..58b792b 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -82,11 +82,9 @@ test_expect_success 'date parser recognizes time_t overflow' '
 '
 
 # date is within 2^63-1, but enough to choke glibc's gmtime
-test_expect_success 'absurdly far-in-future dates produce sentinel' '
+test_expect_success 'absurdly far-in-future date' '
 	commit=$(munge_author_date HEAD 999999999999999999) &&
-	echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
-	git log -1 --format=%ad $commit >actual &&
-	test_cmp expect actual
+	git log -1 --format=%ad $commit
 '
 
 test_done
-- 
1.9.1.656.ge8a0637

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2alt/2] work around unreliable gmtime errors on AIX
  2014-04-01  7:38                                 ` Jeff King
  2014-04-01  7:42                                   ` [PATCH 1/2] date: recognize bogus FreeBSD gmtime output Jeff King
  2014-04-01  7:43                                   ` [PATCH 2/2] t4212: loosen far-in-future test for AIX Jeff King
@ 2014-04-01  7:45                                   ` Jeff King
  2014-04-01 19:07                                   ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Junio C Hamano
  3 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-04-01  7:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Charles Bailey, git

AIX's gmtime will happily overflow the tm_year field. Let's
catch this error before handing the value to gmtime.

Signed-off-by: Jeff King <peff@peff.net>
---
This is an alternative to loosening the test in t4212.

It's really not _that_ ugly.  The "LL" here may not be portable, though.
32-bit systems can't represent this timestamp at all (so they're safe),
but I don't know what would be the best way to conditionally compile
here.

 compat/gmtime.c  | 10 ++++++++++
 config.mak.uname |  1 +
 2 files changed, 11 insertions(+)

diff --git a/compat/gmtime.c b/compat/gmtime.c
index 75a5835..f95ba50 100644
--- a/compat/gmtime.c
+++ b/compat/gmtime.c
@@ -12,6 +12,16 @@ struct tm *git_gmtime_r(const time_t *timep, struct tm *result)
 {
 	struct tm *ret;
 
+	/*
+	 * Some systems, like AIX, will happily overflow the tm_year field.
+	 * So let's recognize obviously out-of-bound data before it hits gmtime
+	 * and just mark it as an error. This date is ~316 million years in the
+	 * future, which is far enough that nobody should care, but close
+	 * enough for the year to fit into a 32-bit tm_year.
+	 */
+	if (*timep > 9999999999999999LL)
+		return NULL;
+
 	ret = gmtime_r(timep, result);
 
 	/*
diff --git a/config.mak.uname b/config.mak.uname
index 0e22ac0..c1110ad 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -236,6 +236,7 @@ ifeq ($(uname_S),AIX)
 		INLINE = ''
 	endif
 	GIT_TEST_CMP = cmp
+	GMTIME_UNRELIABLE_ERRORS = UnfortunatelyYes
 endif
 ifeq ($(uname_S),GNU)
 	# GNU/Hurd
-- 
1.9.1.656.ge8a0637

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] date: recognize bogus FreeBSD gmtime output
  2014-04-01  7:42                                   ` [PATCH 1/2] date: recognize bogus FreeBSD gmtime output Jeff King
@ 2014-04-01 17:42                                     ` René Scharfe
  2014-04-01 19:08                                       ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: René Scharfe @ 2014-04-01 17:42 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Charles Bailey, git

Am 01.04.2014 09:42, schrieb Jeff King:
> diff --git a/compat/gmtime.c b/compat/gmtime.c
> new file mode 100644
> index 0000000..ffcabf4
> --- /dev/null
> +++ b/compat/gmtime.c
> @@ -0,0 +1,26 @@
> +#include "../git-compat-util.h"
> +#undef gmtime
> +#undef gmtime_r
> +
> +struct tm *git_gmtime(const time_t *timep)
> +{
> +	static struct tm result;
> +	return git_gmtime_r(timep, &result);
> +}
> +
> +struct tm *git_gmtime_r(const time_t *timep, struct tm *result)
> +{
> +	struct tm *ret;
> +
> +	ret = gmtime_r(timep, result);
> +
> +	/*
> +	 * Rather than NULL, FreeBSD gmtime will return a "struct tm" with all
> +	 * fields zeroed. Since "mday" cannot otherwise be zero, we can test
> +	 * this very quickly.
> +	 */
> +	if (ret && !ret->tm_mday)
> +		ret = NULL;
> +
> +	return ret;
> +}

http://pubs.opengroup.org/onlinepubs/009695399/functions/gmtime.html 
says that errno shall be set on error and only mentions EOVERFLOW as a 
possible error code.

René

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-04-01  7:38                                 ` Jeff King
                                                     ` (2 preceding siblings ...)
  2014-04-01  7:45                                   ` [PATCH 2alt/2] work around unreliable gmtime errors on AIX Jeff King
@ 2014-04-01 19:07                                   ` Junio C Hamano
  2014-04-01 19:46                                     ` Jeff King
  3 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-04-01 19:07 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Charles Bailey, git

Jeff King <peff@peff.net> writes:

> On Fri, Mar 28, 2014 at 12:30:02PM -0700, Junio C Hamano wrote:
>
>> Let's just deal with a simple known cases (like FreeBSD) in the real
>> code that everybody exercises at runtime, and have the new test only
>> check we do not segfault on a value we used to segfault upon seeing.
>
> OK. Here it is, with the other option as an "alt" patch for reference.
>
>   [1/2]: date: recognize bogus FreeBSD gmtime output
>   [2/2]: t4212: loosen far-in-future test for AIX
>   [2alt/2]: work around unreliable gmtime errors on AIX
>
> -Peff

Thanks.  2alt does not look too bad, but on the other hand, we are
replacing a value that can produce the right result on correctly
implemented gmtime with a completely bogus value only because we
know there exists one broken implementation---which does not sound a
very good trade-off, given that we would get a result that does not
correspond to the input anyway with or without the change on the
broken implementation.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] date: recognize bogus FreeBSD gmtime output
  2014-04-01 17:42                                     ` René Scharfe
@ 2014-04-01 19:08                                       ` Junio C Hamano
  2014-04-01 21:17                                         ` René Scharfe
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2014-04-01 19:08 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Charles Bailey, git

René Scharfe <l.s.r@web.de> writes:

> Am 01.04.2014 09:42, schrieb Jeff King:
>> diff --git a/compat/gmtime.c b/compat/gmtime.c
>> new file mode 100644
>> index 0000000..ffcabf4
>> --- /dev/null
>> +++ b/compat/gmtime.c
>> @@ -0,0 +1,26 @@
>> +#include "../git-compat-util.h"
>> +#undef gmtime
>> +#undef gmtime_r
>> +
>> +struct tm *git_gmtime(const time_t *timep)
>> +{
>> +	static struct tm result;
>> +	return git_gmtime_r(timep, &result);
>> +}
>> +
>> +struct tm *git_gmtime_r(const time_t *timep, struct tm *result)
>> +{
>> +	struct tm *ret;
>> +
>> +	ret = gmtime_r(timep, result);
>> +
>> +	/*
>> +	 * Rather than NULL, FreeBSD gmtime will return a "struct tm" with all
>> +	 * fields zeroed. Since "mday" cannot otherwise be zero, we can test
>> +	 * this very quickly.
>> +	 */
>> +	if (ret && !ret->tm_mday)
>> +		ret = NULL;
>> +
>> +	return ret;
>> +}
>
> http://pubs.opengroup.org/onlinepubs/009695399/functions/gmtime.html
> says that errno shall be set on error and only mentions EOVERFLOW as a
> possible error code.

So are you saying we should set EOVERFLOW ourselves, or does FreeBSD
set EOVERFLOW for us in this case and we do not have to worry?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] t4212: handle systems with post-apocalyptic gmtime
  2014-04-01 19:07                                   ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Junio C Hamano
@ 2014-04-01 19:46                                     ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-04-01 19:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Charles Bailey, git

On Tue, Apr 01, 2014 at 12:07:22PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Fri, Mar 28, 2014 at 12:30:02PM -0700, Junio C Hamano wrote:
> >
> >> Let's just deal with a simple known cases (like FreeBSD) in the real
> >> code that everybody exercises at runtime, and have the new test only
> >> check we do not segfault on a value we used to segfault upon seeing.
> >
> > OK. Here it is, with the other option as an "alt" patch for reference.
> >
> >   [1/2]: date: recognize bogus FreeBSD gmtime output
> >   [2/2]: t4212: loosen far-in-future test for AIX
> >   [2alt/2]: work around unreliable gmtime errors on AIX
> >
> > -Peff
> 
> Thanks.  2alt does not look too bad, but on the other hand, we are
> replacing a value that can produce the right result on correctly
> implemented gmtime with a completely bogus value only because we
> know there exists one broken implementation---which does not sound a
> very good trade-off, given that we would get a result that does not
> correspond to the input anyway with or without the change on the
> broken implementation.

One other option is to push _all_ callers through a git_gmtime()
wrapper, and then use Makefile knobs to turn on specific quirks, like:

    struct tm *git_gmtime_r(const time_t *timep, struct tm *result)
    {
    #ifdef GMTIME_OVERFLOWS_32BIT
              if (*timep > 999999999999999)
                        return NULL;
    #endif

              ret = gmtime_r(timep, result);

    #ifdef GMTIME_ERROR_BLANK
              if (ret && !ret->tm_mday)
                        return NULL;
    #endif

              return ret;
    }

and then each platform can turn the knobs as appropriate.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] date: recognize bogus FreeBSD gmtime output
  2014-04-01 19:08                                       ` Junio C Hamano
@ 2014-04-01 21:17                                         ` René Scharfe
  2014-04-01 21:28                                           ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: René Scharfe @ 2014-04-01 21:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Charles Bailey, git

Am 01.04.2014 21:08, schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Am 01.04.2014 09:42, schrieb Jeff King:
>>> diff --git a/compat/gmtime.c b/compat/gmtime.c
>>> new file mode 100644
>>> index 0000000..ffcabf4
>>> --- /dev/null
>>> +++ b/compat/gmtime.c
>>> @@ -0,0 +1,26 @@
>>> +#include "../git-compat-util.h"
>>> +#undef gmtime
>>> +#undef gmtime_r
>>> +
>>> +struct tm *git_gmtime(const time_t *timep)
>>> +{
>>> +	static struct tm result;
>>> +	return git_gmtime_r(timep, &result);
>>> +}
>>> +
>>> +struct tm *git_gmtime_r(const time_t *timep, struct tm *result)
>>> +{
>>> +	struct tm *ret;
>>> +
>>> +	ret = gmtime_r(timep, result);
>>> +
>>> +	/*
>>> +	 * Rather than NULL, FreeBSD gmtime will return a "struct tm" with all
>>> +	 * fields zeroed. Since "mday" cannot otherwise be zero, we can test
>>> +	 * this very quickly.
>>> +	 */
>>> +	if (ret && !ret->tm_mday)
>>> +		ret = NULL;
>>> +
>>> +	return ret;
>>> +}
>>
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/gmtime.html
>> says that errno shall be set on error and only mentions EOVERFLOW as a
>> possible error code.
>
> So are you saying we should set EOVERFLOW ourselves, or does FreeBSD
> set EOVERFLOW for us in this case and we do not have to worry?

If we correct the return value then we should correct errno as well. 
gmtime() on FreeBSD 10 leaves errno untouched when it encounters an 
overflow.

While testing this again I just noticed that FreeBSD doesn't strictly 
return a pointer to a cleared struct tm.  It simply leaves its static 
buffer untouched.  We should probably clear it (memset in git_gmtime_r).

Demo program:

-- >8 --
#include <stdio.h>
#include <time.h>

const static time_t epoch;

int main(int argc, char **argv)
{
         time_t t = 99999999999999999;

         printf("%s", ctime(&t));
         printf("%s", asctime(gmtime(&t)));

         printf("%s", ctime(&t));
         gmtime(&epoch);
         printf("%s", asctime(gmtime(&t)));

         return 0;
}
-- 8< --

Results on FreeBSD 10:

	Wed Sep  6 11:46:39     -1126091476
	Wed Sep  6 11:46:39     -1126091476
	Wed Sep  6 11:46:39     -1126091476
	Thu Jan  1 00:00:00 1970

René

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] date: recognize bogus FreeBSD gmtime output
  2014-04-01 21:17                                         ` René Scharfe
@ 2014-04-01 21:28                                           ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2014-04-01 21:28 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Charles Bailey, git

On Tue, Apr 01, 2014 at 11:17:14PM +0200, René Scharfe wrote:

> >So are you saying we should set EOVERFLOW ourselves, or does FreeBSD
> >set EOVERFLOW for us in this case and we do not have to worry?
> 
> If we correct the return value then we should correct errno as well.
> gmtime() on FreeBSD 10 leaves errno untouched when it encounters an
> overflow.
> 
> While testing this again I just noticed that FreeBSD doesn't strictly return
> a pointer to a cleared struct tm.  It simply leaves its static buffer
> untouched.  We should probably clear it (memset in git_gmtime_r).

Thanks, that all makes sense (we do not ever care about gmtime's errno
in our code, but it does not hurt to be thorough to avoid surprising any
new callers). Here's a replacement for patch 1.

-- >8 --
Subject: date: recognize bogus FreeBSD gmtime output

Most gmtime implementations return a NULL value when they
encounter an error (and this behavior is specified by ANSI C
and POSIX).  FreeBSD's implementation, however, will simply
leave the "struct tm" untouched.  Let's also recognize this
and convert it to a NULL (with this patch, t4212 should pass
on FreeBSD).

Reported-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 Makefile          |  8 ++++++++
 compat/gmtime.c   | 29 +++++++++++++++++++++++++++++
 config.mak.uname  |  1 +
 git-compat-util.h |  7 +++++++
 4 files changed, 45 insertions(+)
 create mode 100644 compat/gmtime.c

diff --git a/Makefile b/Makefile
index 3646391..2f3758c 100644
--- a/Makefile
+++ b/Makefile
@@ -338,6 +338,9 @@ all::
 # Define TEST_GIT_INDEX_VERSION to 2, 3 or 4 to run the test suite
 # with a different indexfile format version.  If it isn't set the index
 # file format used is index-v[23].
+#
+# Define GMTIME_UNRELIABLE_ERRORS if your gmtime() function does not
+# return NULL when it receives a bogus time_t.
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1489,6 +1492,11 @@ ifneq (,$(XDL_FAST_HASH))
 	BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
 
+ifdef GMTIME_UNRELIABLE_ERRORS
+	COMPAT_OBJS += compat/gmtime.o
+	BASIC_CFLAGS += -DGMTIME_UNRELIABLE_ERRORS
+endif
+
 ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
diff --git a/compat/gmtime.c b/compat/gmtime.c
new file mode 100644
index 0000000..e8362dd
--- /dev/null
+++ b/compat/gmtime.c
@@ -0,0 +1,29 @@
+#include "../git-compat-util.h"
+#undef gmtime
+#undef gmtime_r
+
+struct tm *git_gmtime(const time_t *timep)
+{
+	static struct tm result;
+	return git_gmtime_r(timep, &result);
+}
+
+struct tm *git_gmtime_r(const time_t *timep, struct tm *result)
+{
+	struct tm *ret;
+
+	memset(result, 0, sizeof(*result));
+	ret = gmtime_r(timep, result);
+
+	/*
+	 * Rather than NULL, FreeBSD gmtime simply leaves the "struct tm"
+	 * untouched when it encounters overflow. Since "mday" cannot otherwise
+	 * be zero, we can test this very quickly.
+	 */
+	if (ret && !ret->tm_mday) {
+		ret = NULL;
+		errno = EOVERFLOW;
+	}
+
+	return ret;
+}
diff --git a/config.mak.uname b/config.mak.uname
index 6069a44..0e22ac0 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -187,6 +187,7 @@ ifeq ($(uname_S),FreeBSD)
 	endif
 	PYTHON_PATH = /usr/local/bin/python
 	HAVE_PATHS_H = YesPlease
+	GMTIME_UNRELIABLE_ERRORS = UnfortunatelyYes
 endif
 ifeq ($(uname_S),OpenBSD)
 	NO_STRCASESTR = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index 892032b..5191866 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -716,4 +716,11 @@ void warn_on_inaccessible(const char *path);
 /* Get the passwd entry for the UID of the current process. */
 struct passwd *xgetpwuid_self(void);
 
+#ifdef GMTIME_UNRELIABLE_ERRORS
+struct tm *git_gmtime(const time_t *);
+struct tm *git_gmtime_r(const time_t *, struct tm *);
+#define gmtime git_gmtime
+#define gmtime_r git_gmtime_r
+#endif
+
 #endif
-- 
1.9.1.656.ge8a0637

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2014-04-01 21:28 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-24  7:33 [PATCH 0/5] handle bogus commit dates Jeff King
2014-02-24  7:36 ` [PATCH 1/5] t4212: test bogus timestamps with git-log Jeff King
2014-02-24  7:39 ` [PATCH 2/5] fsck: report integer overflow in author timestamps Jeff King
2014-02-24  7:39 ` [PATCH 3/5] date: check date overflow against time_t Jeff King
2014-02-24  7:46 ` [PATCH 4/5] log: handle integer overflow in timestamps Jeff King
2014-02-24 19:50   ` Junio C Hamano
2014-02-24 19:58     ` Jeff King
2014-02-24 20:21       ` Junio C Hamano
2014-02-24 20:37         ` Jeff King
2014-02-24 21:01           ` Junio C Hamano
2014-02-24  7:49 ` [PATCH 5/5] log: do not segfault on gmtime errors Jeff King
2014-03-22  9:32   ` René Scharfe
2014-03-24 21:33     ` Jeff King
2014-03-24 22:03       ` René Scharfe
2014-03-24 22:11         ` Jeff King
2014-03-26 11:05   ` Charles Bailey
2014-03-26 18:21     ` Jeff King
2014-03-26 18:51       ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Jeff King
2014-03-26 19:18         ` Junio C Hamano
2014-03-26 19:25           ` Jeff King
2014-03-26 19:33             ` Jeff King
2014-03-26 19:40               ` Jeff King
2014-03-26 20:36                 ` Charles Bailey
2014-03-26 20:38                   ` Jeff King
2014-03-26 20:41                     ` Charles Bailey
2014-03-26 21:22               ` Charles Bailey
2014-03-26 21:57                 ` Jeff King
2014-03-26 22:46                   ` Charles Bailey
2014-03-27 22:48                     ` Jeff King
2014-03-28 16:41                       ` Junio C Hamano
2014-03-28 18:47                         ` Jeff King
2014-03-28 19:02                           ` Junio C Hamano
2014-03-28 19:05                             ` Jeff King
2014-03-28 19:30                               ` Junio C Hamano
2014-04-01  7:38                                 ` Jeff King
2014-04-01  7:42                                   ` [PATCH 1/2] date: recognize bogus FreeBSD gmtime output Jeff King
2014-04-01 17:42                                     ` René Scharfe
2014-04-01 19:08                                       ` Junio C Hamano
2014-04-01 21:17                                         ` René Scharfe
2014-04-01 21:28                                           ` Jeff King
2014-04-01  7:43                                   ` [PATCH 2/2] t4212: loosen far-in-future test for AIX Jeff King
2014-04-01  7:45                                   ` [PATCH 2alt/2] work around unreliable gmtime errors on AIX Jeff King
2014-04-01 19:07                                   ` [PATCH] t4212: handle systems with post-apocalyptic gmtime Junio C Hamano
2014-04-01 19:46                                     ` Jeff King
2014-03-26 18:58       ` [PATCH 5/5] log: do not segfault on gmtime errors Junio C Hamano
2014-03-26 19:01         ` Jeff King
2014-03-26 21:01           ` Junio C Hamano
2014-03-26 21:09             ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.