All of lore.kernel.org
 help / color / mirror / Atom feed
* approxidate parsing for bad time units
@ 2012-09-06 16:24 Jeffrey Middleton
  2012-09-06 20:36 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeffrey Middleton @ 2012-09-06 16:24 UTC (permalink / raw)
  To: git

In telling someone what date formats git accepts, and how to verify it
understands, I noticed this weirdness:

$ export TEST_DATE_NOW=`date -u +%s --date='September 10'`;
./test-date approxidate now; for i in `seq 1 10`; do ./test-date
approxidate "$i frobbles ago"; done
now -> 2012-09-10 00:00:00 +0000
1 frobbles ago -> 2012-09-02 00:00:00 +0000
2 frobbles ago -> 2012-09-03 00:00:00 +0000
3 frobbles ago -> 2012-09-04 00:00:00 +0000
4 frobbles ago -> 2012-09-05 00:00:00 +0000
5 frobbles ago -> 2012-09-06 00:00:00 +0000
6 frobbles ago -> 2012-09-07 00:00:00 +0000
7 frobbles ago -> 2012-09-08 00:00:00 +0000
8 frobbles ago -> 2012-09-09 00:00:00 +0000
9 frobbles ago -> 2012-09-10 00:00:00 +0000
10 frobbles ago -> 2012-09-11 00:00:00 +0000

Which gets more concerning once you realize the same thing happens no
matter what fake unit of time you use... including things like "yaers"
and "moths". Perhaps approxidate could be a little stricter?

Thanks,
Jeffrey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: approxidate parsing for bad time units
  2012-09-06 16:24 approxidate parsing for bad time units Jeffrey Middleton
@ 2012-09-06 20:36 ` Junio C Hamano
  2012-09-06 21:01   ` Jeffrey Middleton
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2012-09-06 20:36 UTC (permalink / raw)
  To: Jeffrey Middleton; +Cc: git

Jeffrey Middleton <jefromi@gmail.com> writes:

> In telling someone what date formats git accepts, and how to verify it
> understands, I noticed this weirdness:
>
> $ export TEST_DATE_NOW=`date -u +%s --date='September 10'`;
> ./test-date approxidate now; for i in `seq 1 10`; do ./test-date
> approxidate "$i frobbles ago"; done
> now -> 2012-09-10 00:00:00 +0000
> 1 frobbles ago -> 2012-09-02 00:00:00 +0000
> ...
> 10 frobbles ago -> 2012-09-11 00:00:00 +0000
>
> Which gets more concerning once you realize the same thing happens no
> matter what fake unit of time you use... including things like "yaers"
> and "moths". Perhaps approxidate could be a little stricter?

"Could be stricter", perhaps.

Do we care deeply?  I doubt it, and for a good reason.  The fuzzy
parsing is primarily [*1*] for humans getting interactive results
who are expected to be able to notice when the fuzziness went far
off.

As long as we have ways for scripts and humans to feed its input in
a more strict and unambiguous way [*2*], it does not hurt anybody if
the fuzzy parser ignored crufts that it does not understand.


[Footnotes]

*1* ... and of course some coding fun and easter egg values. Think
of it as our own Eliza or Zork parser ;-).

*2* And of course we do.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: approxidate parsing for bad time units
  2012-09-06 20:36 ` Junio C Hamano
@ 2012-09-06 21:01   ` Jeffrey Middleton
  2012-09-07 13:54     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Jeffrey Middleton @ 2012-09-06 21:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

I'm generally very happy with the fuzzy parsing. It's a great feature
that is designed to and in general does save users a lot of time and
thought. In this case I don't think it does. The problems are:
(1) It's not ignoring things it can't understand, it's silently
interpreting them in a useless way. I'm pretty sure that "n units ago"
is equivalent to "the same time of day on the last day of the previous
month, plus n days."
(2) Though in some cases it's really obvious, in others it's quite
possible not to notice, e.g. if `git rev-list --since=5.dyas.ago` is
silently the same as `git rev-list --since=4.days.ago`.

So I do think it's worth improving. (Yes, I know, send patches; I'll
think about it.)


On Thu, Sep 6, 2012 at 1:36 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeffrey Middleton <jefromi@gmail.com> writes:
>
>> In telling someone what date formats git accepts, and how to verify it
>> understands, I noticed this weirdness:
>>
>> $ export TEST_DATE_NOW=`date -u +%s --date='September 10'`;
>> ./test-date approxidate now; for i in `seq 1 10`; do ./test-date
>> approxidate "$i frobbles ago"; done
>> now -> 2012-09-10 00:00:00 +0000
>> 1 frobbles ago -> 2012-09-02 00:00:00 +0000
>> ...
>> 10 frobbles ago -> 2012-09-11 00:00:00 +0000
>>
>> Which gets more concerning once you realize the same thing happens no
>> matter what fake unit of time you use... including things like "yaers"
>> and "moths". Perhaps approxidate could be a little stricter?
>
> "Could be stricter", perhaps.
>
> Do we care deeply?  I doubt it, and for a good reason.  The fuzzy
> parsing is primarily [*1*] for humans getting interactive results
> who are expected to be able to notice when the fuzziness went far
> off.
>
> As long as we have ways for scripts and humans to feed its input in
> a more strict and unambiguous way [*2*], it does not hurt anybody if
> the fuzzy parser ignored crufts that it does not understand.
>
>
> [Footnotes]
>
> *1* ... and of course some coding fun and easter egg values. Think
> of it as our own Eliza or Zork parser ;-).
>
> *2* And of course we do.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: approxidate parsing for bad time units
  2012-09-06 21:01   ` Jeffrey Middleton
@ 2012-09-07 13:54     ` Jeff King
  2012-09-10 21:07       ` Jeffrey Middleton
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2012-09-07 13:54 UTC (permalink / raw)
  To: Jeffrey Middleton; +Cc: Junio C Hamano, git

On Thu, Sep 06, 2012 at 02:01:30PM -0700, Jeffrey Middleton wrote:

> I'm generally very happy with the fuzzy parsing. It's a great feature
> that is designed to and in general does save users a lot of time and
> thought. In this case I don't think it does. The problems are:
> (1) It's not ignoring things it can't understand, it's silently
> interpreting them in a useless way.

Right, but we would then need to come up with a list of things it _does_
understand. So right now I can say "6 June" or "6th of June" or even "6
de June", and it works because we just ignore the cruft in the middle.

So I think you'd need to either whitelist what everybody is typing, or
blacklist some common typos (or convince people to be stricter in what
they type).

> So I do think it's worth improving. (Yes, I know, send patches; I'll
> think about it.)

You read my mind. :)

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: approxidate parsing for bad time units
  2012-09-07 13:54     ` Jeff King
@ 2012-09-10 21:07       ` Jeffrey Middleton
  2012-09-10 21:19         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Jeffrey Middleton @ 2012-09-10 21:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

As you mentioned, parsing "n ... [month]", and even "...n..." (e.g.
"the 3rd") as the nth day of a month is great, but in this case, I
think "n ... ago" is a pretty strong sign that that's not the intended
behavior.

My first thought was just to make it an error if the string ends in
"ago" but the date is parsed as a day of the month. You don't actually
have to come up with any typos to blacklist, just keep the "ago" from
being silently ignored. I suspect "n units ago" is by far the most
common use of the approxidate parsing in the wild, since it's
documented and has been popularized online. So throwing an error just
in that case would save essentially everyone. I hadn't even realized
it worked without "ago" until I looked at the code.

If that doesn't sound like a good plan, then yes, I agree, it'd be
tricky to catch it in the general case without breaking things.
(Levenshtein distance to the target strings instead of exact matching,
I guess, so that it could say "did you mean..." like for misspelled
commands.)

On Fri, Sep 7, 2012 at 6:54 AM, Jeff King <peff@peff.net> wrote:
>
> On Thu, Sep 06, 2012 at 02:01:30PM -0700, Jeffrey Middleton wrote:
>
> > I'm generally very happy with the fuzzy parsing. It's a great feature
> > that is designed to and in general does save users a lot of time and
> > thought. In this case I don't think it does. The problems are:
> > (1) It's not ignoring things it can't understand, it's silently
> > interpreting them in a useless way.
>
> Right, but we would then need to come up with a list of things it _does_
> understand. So right now I can say "6 June" or "6th of June" or even "6
> de June", and it works because we just ignore the cruft in the middle.
>
> So I think you'd need to either whitelist what everybody is typing, or
> blacklist some common typos (or convince people to be stricter in what
> they type).
>
> > So I do think it's worth improving. (Yes, I know, send patches; I'll
> > think about it.)
>
> You read my mind. :)
>
> -Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: approxidate parsing for bad time units
  2012-09-10 21:07       ` Jeffrey Middleton
@ 2012-09-10 21:19         ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2012-09-10 21:19 UTC (permalink / raw)
  To: Jeffrey Middleton; +Cc: Junio C Hamano, git

On Mon, Sep 10, 2012 at 02:07:02PM -0700, Jeffrey Middleton wrote:

> As you mentioned, parsing "n ... [month]", and even "...n..." (e.g.
> "the 3rd") as the nth day of a month is great, but in this case, I
> think "n ... ago" is a pretty strong sign that that's not the intended
> behavior.

Yeah, agreed. We are really talking about two distinct cases:

  1. An absolute date ("the 3rd of June", "last tuesday") whose exact
     location may need to be inferred from the context of the current
     date.

  2. A relative unit difference from the current time ("7 days ago")

However, I'm not sure that the word "ago" is always present when
choosing the latter. For example, you can say "7 days" and approxidate
will treat it like "7 days ago". Nor is it simply using a unit like
"days". You can even say "7 tuesdays" to go backwards that many Tuesdays
(e.g., the 24th of July from today).

So you can use "ago" as a sign that you are definitely in case (2), but
cannot assume that its absence means you are in case (1).

That means we can catch "3 dasy ago" as nonsensical, but not "3 dasy",
as the latter simply looks like "the 3rd" from approxidate's
perspective. Still, something is better than nothing, and it means if
you are careful to always say "ago", you can catch some errors (of
course, you might typo "ago"... :) ).

> My first thought was just to make it an error if the string ends in
> "ago" but the date is parsed as a day of the month. You don't actually
> have to come up with any typos to blacklist, just keep the "ago" from
> being silently ignored. I suspect "n units ago" is by far the most
> common use of the approxidate parsing in the wild, since it's
> documented and has been popularized online. So throwing an error just
> in that case would save essentially everyone. I hadn't even realized
> it worked without "ago" until I looked at the code.

Yeah, I think that would work, and would provide some safety. And it
shouldn't be too hard to implement.

> If that doesn't sound like a good plan, then yes, I agree, it'd be
> tricky to catch it in the general case without breaking things.
> (Levenshtein distance to the target strings instead of exact matching,
> I guess, so that it could say "did you mean..." like for misspelled
> commands.)

Gross. :)

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-09-10 21:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-06 16:24 approxidate parsing for bad time units Jeffrey Middleton
2012-09-06 20:36 ` Junio C Hamano
2012-09-06 21:01   ` Jeffrey Middleton
2012-09-07 13:54     ` Jeff King
2012-09-10 21:07       ` Jeffrey Middleton
2012-09-10 21:19         ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.