All of lore.kernel.org
 help / color / mirror / Atom feed
* PATCH: Improved support for ISO 8601 timezones
@ 2010-05-17 19:07 Marcus Comstedt
  2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt
  2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt
  0 siblings, 2 replies; 6+ messages in thread
From: Marcus Comstedt @ 2010-05-17 19:07 UTC (permalink / raw)
  To: git

Hi.

I discovered that git's date parser does not understand "Z" to mean
the "UTC" timezone.  This is unfortunate, because the use of "Z" is
prescribed by ISO 8601.

I made a small patch to add "Z" as an alias for "UTC", which enables
standard ISO 8601 timestamps to be parsed correctly.  Also, it fixes
a bug that at least three characters of the timezone name had to match,
which is of course impossible when the name of the timezone is shorter
than three characters.  There was already such a timezone before ("NT")
which could not be selected due to the bug.

The second patch, which is perhaps less essential, adds support for
the remaining numerical timezone indicators defined by ISO 8601 not
already supported by git (only +-hhmm was supported, but ISO 8601
also specifies that +-hh:mm and +-hh are ok as well).

Thanks


  // Marcus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Added "Z" as an alias for the timezone "UTC"
  2010-05-17 19:07 PATCH: Improved support for ISO 8601 timezones Marcus Comstedt
@ 2010-05-17 19:07 ` Marcus Comstedt
  2010-05-17 20:32   ` Jay Soffian
  2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt
  1 sibling, 1 reply; 6+ messages in thread
From: Marcus Comstedt @ 2010-05-17 19:07 UTC (permalink / raw)
  To: git; +Cc: Marcus Comstedt

The name "Z" for the UTC timezone is required to properly parse
ISO 8601 times.  Added it to the list of recignozed timezones.

Also, fixed the bug that timezone names shorter than 3 characters
can never be matched by match_alpha().  Prior to the introduction
of the "Z" zone, this affected the timezone "NT" (Nome).

Signed-off-by: Marcus Comstedt <marcus@mc.pp.se>
---
:100644 100644 002aa3c... 6bae49c... M	date.c
 date.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/date.c b/date.c
index 002aa3c..6bae49c 100644
--- a/date.c
+++ b/date.c
@@ -229,6 +229,7 @@ static const struct {
 
 	{ "GMT",    0, 0, },	/* Greenwich Mean */
 	{ "UTC",    0, 0, },	/* Universal (Coordinated) */
+	{ "Z",      0, 0, },    /* Zulu, alias for UTC */
 
 	{ "WET",    0, 0, },	/* Western European */
 	{ "BST",    0, 1, },	/* British Summer */
@@ -305,7 +306,7 @@ static int match_alpha(const char *date, struct tm *tm, int *offset)
 
 	for (i = 0; i < ARRAY_SIZE(timezone_names); i++) {
 		int match = match_string(date, timezone_names[i].name);
-		if (match >= 3) {
+		if (match >= 3 || match == strlen(timezone_names[i].name)) {
 			int off = timezone_names[i].offset;
 
 			/* This is bogus, but we like summer */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm
  2010-05-17 19:07 PATCH: Improved support for ISO 8601 timezones Marcus Comstedt
  2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt
@ 2010-05-17 19:07 ` Marcus Comstedt
  2010-05-19 14:31   ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Marcus Comstedt @ 2010-05-17 19:07 UTC (permalink / raw)
  To: git; +Cc: Marcus Comstedt

ISO 8601 specifies three syntaxes for timezones other than "Z".
git already supports the +-hhmm syntax.  This patch adds support
for the other two: +-hh:mm and +-hh.

Signed-off-by: Marcus Comstedt <marcus@mc.pp.se>
---
:100644 100644 6bae49c... f83e46e... M	date.c
 date.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/date.c b/date.c
index 6bae49c..f83e46e 100644
--- a/date.c
+++ b/date.c
@@ -555,6 +555,18 @@ static int match_tz(const char *date, int *offp)
 	int min, hour;
 	int n = end - date - 1;
 
+	/* Check for HH:MM format, allowed by ISO 8601 */
+	if (n == 2 && date[3] == ':') {
+		char *end2;
+		min = strtoul(date+4, &end2, 10);
+		/* If we have two digits after the colon too, assume HH:MM */
+		if (end2 == date+6) {
+			offset = offset*100 + min;
+			end = end2;
+			n = end - date - 1;
+		}
+	}
+
 	min = offset % 100;
 	hour = offset / 100;
 
@@ -570,6 +582,17 @@ static int match_tz(const char *date, int *offp)
 
 		*offp = offset;
 	}
+	/*
+	 * Also accept just the hour, allowed by ISO 8601
+	 */
+	else if (n == 2 && hour == 0 && min < 24) {
+		offset = min*60;
+		if (*date == '-')
+			offset = -offset;
+
+		*offp = offset;
+	}
+
 	return end - date;
 }
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] Added "Z" as an alias for the timezone "UTC"
  2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt
@ 2010-05-17 20:32   ` Jay Soffian
  0 siblings, 0 replies; 6+ messages in thread
From: Jay Soffian @ 2010-05-17 20:32 UTC (permalink / raw)
  To: Marcus Comstedt; +Cc: git

On Mon, May 17, 2010 at 3:07 PM, Marcus Comstedt <marcus@mc.pp.se> wrote:
> The name "Z" for the UTC timezone is required to properly parse
> ISO 8601 times.  Added it to the list of recignozed timezones.

s/Added/Add/; s/recignozed/recognized/

> Also, fixed the bug that timezone names shorter than 3 characters

s/fixed the/fix a/

j.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm
  2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt
@ 2010-05-19 14:31   ` Junio C Hamano
  2010-05-19 17:21     ` Marcus Comstedt
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2010-05-19 14:31 UTC (permalink / raw)
  To: Marcus Comstedt; +Cc: git

Marcus Comstedt <marcus@mc.pp.se> writes:

> ISO 8601 specifies three syntaxes for timezones other than "Z".
> git already supports the +-hhmm syntax.  This patch adds support
> for the other two: +-hh:mm and +-hh.
>
> Signed-off-by: Marcus Comstedt <marcus@mc.pp.se>
> ---
> :100644 100644 6bae49c... f83e46e... M	date.c
>  date.c |   23 +++++++++++++++++++++++
>  1 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/date.c b/date.c
> index 6bae49c..f83e46e 100644
> --- a/date.c
> +++ b/date.c
> @@ -555,6 +555,18 @@ static int match_tz(const char *date, int *offp)
>  	int min, hour;
>  	int n = end - date - 1;
>  
> +	/* Check for HH:MM format, allowed by ISO 8601 */
> +	if (n == 2 && date[3] == ':') {
> +		char *end2;
> +		min = strtoul(date+4, &end2, 10);
> +		/* If we have two digits after the colon too, assume HH:MM */
> +		if (end2 == date+6) {
> +			offset = offset*100 + min;
> +			end = end2;
> +			n = end - date - 1;
> +		}
> +	}
> +
>  	min = offset % 100;
>  	hour = offset / 100;
>  
> @@ -570,6 +582,17 @@ static int match_tz(const char *date, int *offp)
>  
>  		*offp = offset;
>  	}
> +	/*
> +	 * Also accept just the hour, allowed by ISO 8601
> +	 */
> +	else if (n == 2 && hour == 0 && min < 24) {
> +		offset = min*60;
> +		if (*date == '-')
> +			offset = -offset;
> +
> +		*offp = offset;
> +	}
> +

I don't recall seeing in ISO 8601 that +hh or -hh without minute
resolution was allowed, but I don't have my copy of ISO 8601 with me (they
are packed and are still in transit with my household goods) so I'll take
your word for it for now [*1*].

But the placement of this second hunk is somewhat curious.  Why doesn't the
updated function look like this?

        int offset = strtoul(date + 1, &end, 10);
        int min, hour;
        int n = end - date - 1;

        if (n == 2 && offset <= 14) {
                /* +HH:MM (ISO 8601) or +HH (ISO 8601 abbreviated) */
                hour = offset;
                if (n == 2 && date[3] == ':') {
                        min = strtoul(date + 4, &end, 10);
                        if (end != date + 6)
                                return 0; /* Bad CRAP */
                } else {
                        min = 0;
                }
        } else if (n < 3) {
                return 0; /* we want at least 3 digits */
        } else {
                min = offset % 100;
                hour = offset / 100;
        }

        if (60 <= min)
                return 0; /* invalid minute */

        offset = hour * 60 + min;
        if (*date == '-')
                offset = -offset;
        *offp = offset;
        return end - date;


[Footnote]

*1* Appendix A of RFC3339 seems to agree with you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm
  2010-05-19 14:31   ` Junio C Hamano
@ 2010-05-19 17:21     ` Marcus Comstedt
  0 siblings, 0 replies; 6+ messages in thread
From: Marcus Comstedt @ 2010-05-19 17:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


Hi Junio.

Thanks for reviewing this patch.


Junio C Hamano <gitster@pobox.com> writes:

> I don't recall seeing in ISO 8601 that +hh or -hh without minute
> resolution was allowed, but I don't have my copy of ISO 8601 with me (they
> are packed and are still in transit with my household goods) so I'll take
> your word for it for now [*1*].

In the final draft of 8601:2000 (which is the only version I have),
section 5.3.4.1 states that "[...] the representation of the difference
can be expressed in hours and minutes, or hours only."  Examples of
this then follow in that section and the next one.  Maybe they changed
it in the final version (or it differs from another release of the
standard)?  I wish you could "git log -S" ISO standards...  :-)
Wikipedia also agrees that it is allowed by the standard though.


> But the placement of this second hunk is somewhat curious.  Why doesn't the
> updated function look like this?
[...]

I was perhaps treading a bit over-cautiously.  The placement allowed
me to leave the existing code both syntactically and semantically
unaltered.  After all, there was nothing wrong with the old code per
se, I was just adding new functionality.  I also wanted the two
changes independent, in case you wanted one but not the other.

I can concede that your variant leaves a more appealing end result
though.  (Except for the fact that "n == 2" is needlessly tested in
the inner if. ;)

One thing though:  Shouldn't 1 be returned for bad crap rather than 0?
Seems to me parse_date will get stuck otherwise, because the sign will
never be consumed.  In fact, the old code would consume both the sign
and the initial sequence of digits in the crap case.  Consuming just
the sign would leave the digits to be handled by match_digit, which
may or may not regard it as non-crap.  Good or bad, I don't know.  But
it might cause regressions.

I'll play around a little with the code and perform some new unit
tests, and then resubmit a new patch with the suggested structure.


  // Marcus

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-19 17:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-17 19:07 PATCH: Improved support for ISO 8601 timezones Marcus Comstedt
2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt
2010-05-17 20:32   ` Jay Soffian
2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt
2010-05-19 14:31   ` Junio C Hamano
2010-05-19 17:21     ` Marcus Comstedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.