All of lore.kernel.org
 help / color / mirror / Atom feed
* Small trivial annoyance with the nice new builtin "git am"
@ 2016-07-28 23:35 Linus Torvalds
  2016-07-28 23:47 ` Junio C Hamano
  2016-07-29  0:21 ` Jeff King
  0 siblings, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2016-07-28 23:35 UTC (permalink / raw)
  To: Junio C Hamano, Paul Tan; +Cc: Git Mailing List

Ok, it's no longer *that* new, but I only now noticed..

So I noticed that when I applied the last patch-bomb series from
Andrew, all the commit date-stamps are idential.

Now, it would be lovely if the new builtin git-am really was *so* fast
that it applies a 100+-patch series in under a second, but no, that's
not it. It's just that it only looks up the current time once.

That seems entirely accidental, I think that what happened is that
"ident_default_date()" just ends up initializing the default date
string once, and then the date is cached there, because it's now run
as a single process for the whole series.

I think I'd rather get the "real" commit dates, even if they show that
git only does a handful of commits a second rather than hundreds of
commits..

Something that just clears git_default_date in between "git am"
iterations, perhaps?

                  Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-28 23:35 Small trivial annoyance with the nice new builtin "git am" Linus Torvalds
@ 2016-07-28 23:47 ` Junio C Hamano
  2016-07-29  0:29   ` Jeff King
  2016-07-29  0:21 ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2016-07-28 23:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Tan, Git Mailing List

On Thu, Jul 28, 2016 at 4:35 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Ok, it's no longer *that* new, but I only now noticed..
>
> So I noticed that when I applied the last patch-bomb series from
> Andrew, all the commit date-stamps are idential.
> ...
> That seems entirely accidental, I think that what happened is that
> "ident_default_date()" just ends up initializing the default date
> string once, and then the date is cached there, because it's now run
> as a single process for the whole series.

Ouch, sorry, that certainly sounds utterly broken. I have a suspicion
that we would see more and more breakages like this in the near
future, as there are folks trying to redo multi-commit commands in
a single process. We need to be very careful to avoid the same
mistake.

Also makes me wonder if "git cherry-pick A..B" shares the same
breakage.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-28 23:35 Small trivial annoyance with the nice new builtin "git am" Linus Torvalds
  2016-07-28 23:47 ` Junio C Hamano
@ 2016-07-29  0:21 ` Jeff King
  2016-07-29  0:32   ` Linus Torvalds
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff King @ 2016-07-29  0:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Paul Tan, Git Mailing List

On Thu, Jul 28, 2016 at 04:35:58PM -0700, Linus Torvalds wrote:

> Now, it would be lovely if the new builtin git-am really was *so* fast
> that it applies a 100+-patch series in under a second, but no, that's
> not it. It's just that it only looks up the current time once.
> 
> That seems entirely accidental, I think that what happened is that
> "ident_default_date()" just ends up initializing the default date
> string once, and then the date is cached there, because it's now run
> as a single process for the whole series.
> 
> I think I'd rather get the "real" commit dates, even if they show that
> git only does a handful of commits a second rather than hundreds of
> commits..
> 
> Something that just clears git_default_date in between "git am"
> iterations, perhaps?

Yeah, your analysis makes sense and I think clearing the date between
patches is a reasonable solution.

I do wonder if you would be happier giving each commit a "fake"
monotonically increasing time, so they are correctly ordered by commit
date. I think that runs into some bizarre corner cases, though, like
adding 100 patches in 5 seconds, and ending up with commit timestamps
just slightly in the future (which is fine if you're then quiet, but
skews if you then follow-up in the next 95 seconds with another commit).
Compared to skew, having chunks of 20 commits with identical time stamps
is probably slightly less bad. At least it reflects reality.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-28 23:47 ` Junio C Hamano
@ 2016-07-29  0:29   ` Jeff King
  2016-07-29  0:37     ` Linus Torvalds
  2016-07-29 17:06     ` Junio C Hamano
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff King @ 2016-07-29  0:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Paul Tan, Git Mailing List

On Thu, Jul 28, 2016 at 04:47:17PM -0700, Junio C Hamano wrote:

> Also makes me wonder if "git cherry-pick A..B" shares the same
> breakage.

Probably.

I guess we want something like:

diff --git a/ident.c b/ident.c
index 139c528..e20a772 100644
--- a/ident.c
+++ b/ident.c
@@ -184,6 +184,11 @@ static const char *ident_default_date(void)
 	return git_default_date.buf;
 }
 
+void reset_ident_date(void)
+{
+	strbuf_reset(&git_default_date);
+}
+
 static int crud(unsigned char c)
 {
 	return  c <= 32  ||

and then to sprinkle calls liberally through builtin-ified programs when
they move from one unit of work to the next.

We could also support resetting the whole ident string, but I don't
think there's any reason to (and unlike the datestamp, it has to do a
bit more looking around to come up with its values).

-Peff

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29  0:21 ` Jeff King
@ 2016-07-29  0:32   ` Linus Torvalds
  2016-07-29  8:19     ` Christian Couder
  2016-07-29 17:18     ` Junio C Hamano
  0 siblings, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2016-07-29  0:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Paul Tan, Git Mailing List

On Thu, Jul 28, 2016 at 5:21 PM, Jeff King <peff@peff.net> wrote:
>
> I do wonder if you would be happier giving each commit a "fake"
> monotonically increasing time, so they are correctly ordered by commit
> date.

No, that would be nasty, partly for the corner case you mention, but
partly because I just think it's wrong to try to lie about reality.

The reason I noticed this in the first place was actually that I just
wanted to take a look whether things had gotten slower or faster over
time, and see how many patches per second I get from the patch-bombs
Andrew sends me.

So getting real time was what I was looking for.

Also, before somebody asks: the reason git has always cached the
"default time" string is because there's a reverse annoying thing,
which is looking up time twice, and getting a difference of a second
between author times and committer times just because of bad luck.
That makes no sense either, and is actually why we have that
"ident_default_date()" cache thing going on.

So we do want to cache things for a single commit, it's just that for
things like "git am" (or, like Junio wondered, "git rebase" - I didn't
check) we probabyl just just flush the cache in between commits.

            Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29  0:29   ` Jeff King
@ 2016-07-29  0:37     ` Linus Torvalds
  2016-07-29 15:50       ` Jeff King
  2016-07-29 18:10       ` Small trivial annoyance with the nice new builtin "git am" Junio C Hamano
  2016-07-29 17:06     ` Junio C Hamano
  1 sibling, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2016-07-29  0:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Paul Tan, Git Mailing List

On Thu, Jul 28, 2016 at 5:29 PM, Jeff King <peff@peff.net> wrote:
>
> I guess we want something like:
>
> +void reset_ident_date(void)
> +{
> +       strbuf_reset(&git_default_date);
> +}

Looks sane.

> and then to sprinkle calls liberally through builtin-ified programs when
> they move from one unit of work to the next.

Maybe we can just add it to the end of commit_tree_extended(), and
just say "the cache is reset between commits".

That way there is no sprinking in random places.

           Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29  0:32   ` Linus Torvalds
@ 2016-07-29  8:19     ` Christian Couder
  2016-07-29 17:18     ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Christian Couder @ 2016-07-29  8:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Junio C Hamano, Paul Tan, Git Mailing List

On Fri, Jul 29, 2016 at 2:32 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 28, 2016 at 5:21 PM, Jeff King <peff@peff.net> wrote:
>>
>> I do wonder if you would be happier giving each commit a "fake"
>> monotonically increasing time, so they are correctly ordered by commit
>> date.
>
> No, that would be nasty, partly for the corner case you mention, but
> partly because I just think it's wrong to try to lie about reality.
>
> The reason I noticed this in the first place was actually that I just
> wanted to take a look whether things had gotten slower or faster over
> time, and see how many patches per second I get from the patch-bombs
> Andrew sends me.
>
> So getting real time was what I was looking for.

Great!

> Also, before somebody asks: the reason git has always cached the
> "default time" string is because there's a reverse annoying thing,
> which is looking up time twice, and getting a difference of a second
> between author times and committer times just because of bad luck.
> That makes no sense either, and is actually why we have that
> "ident_default_date()" cache thing going on.
>
> So we do want to cache things for a single commit, it's just that for
> things like "git am" (or, like Junio wondered, "git rebase" - I didn't
> check) we probabyl just just flush the cache in between commits.

When Junio wrote:

> I have a suspicion that we would see more and more breakages like
> this in the near future, as there are folks trying to redo multi-commit
> commands in a single process.

he was maybe talking about my patch series to libify `git apply` and
use that in `git am` (which is itself used by `git rebase` when not in
interactive mode):

https://public-inbox.org/git/20160627182429.31550-1-chriscool%40tuxfamily.org/

that will likely mean more patches with the same real time anyway
especially with split-index on (see "Performance numbers" in that
email).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29  0:37     ` Linus Torvalds
@ 2016-07-29 15:50       ` Jeff King
  2016-07-29 17:15         ` Junio C Hamano
  2016-07-29 18:10       ` Small trivial annoyance with the nice new builtin "git am" Junio C Hamano
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff King @ 2016-07-29 15:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Paul Tan, Git Mailing List

On Thu, Jul 28, 2016 at 05:37:08PM -0700, Linus Torvalds wrote:

> > and then to sprinkle calls liberally through builtin-ified programs when
> > they move from one unit of work to the next.
> 
> Maybe we can just add it to the end of commit_tree_extended(), and
> just say "the cache is reset between commits".
> 
> That way there is no sprinking in random places.

Hmm, yeah, that might work. As you mentioned, there are cases where we
really do want the timestamps to match (especially between author and
committer). So we would not want this reset to kick in where callers
would not want it.

So I'm trying to play devil's advocate and think of a case where
somebody would not want the time reset after creating a commit.

One obvious impact would be reflog entries, since we would reset the
time between the object creation and the ref write (so your reflog entry
would sometimes be a second or more after the commit time it writes).
I'm not sure how much anybody _cares_ about that; they're much less
intimate than author/committer times.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29  0:29   ` Jeff King
  2016-07-29  0:37     ` Linus Torvalds
@ 2016-07-29 17:06     ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2016-07-29 17:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Paul Tan, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Thu, Jul 28, 2016 at 04:47:17PM -0700, Junio C Hamano wrote:
>
>> Also makes me wonder if "git cherry-pick A..B" shares the same
>> breakage.
>
> Probably.

It seems that "cherry-pick A..B" leads to sequencer.c::run_git_commit()
that uses run_command_v_opt() to drive "git commit", so we are safe.

> I guess we want something like:
>
> +void reset_ident_date(void)
> +{
> +	strbuf_reset(&git_default_date);
> +}
> +
>
> and then to sprinkle calls liberally through builtin-ified programs when
> they move from one unit of work to the next.

ident_default_date() is currently the only one that sets this to be
cached, and that is to be used only when there is no user-specified
date.

When I saw the suggestion first time, I was worried if this had
interaction with things like GIT_COMMITTER_DATE environment (because
I didn't have easy access to the source) but it is not the case, so
the change looks very sensible.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29 15:50       ` Jeff King
@ 2016-07-29 17:15         ` Junio C Hamano
  2016-07-29 18:05           ` [PATCH] reset cached ident date before creating objects Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2016-07-29 17:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Paul Tan, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Thu, Jul 28, 2016 at 05:37:08PM -0700, Linus Torvalds wrote:
>
>> > and then to sprinkle calls liberally through builtin-ified programs when
>> > they move from one unit of work to the next.
>> 
>> Maybe we can just add it to the end of commit_tree_extended(), and
>> just say "the cache is reset between commits".
>> 
>> That way there is no sprinking in random places.
>
> Hmm, yeah, that might work. As you mentioned, there are cases where we
> really do want the timestamps to match (especially between author and
> committer). So we would not want this reset to kick in where callers
> would not want it.
>
> So I'm trying to play devil's advocate and think of a case where
> somebody would not want the time reset after creating a commit.
>
> One obvious impact would be reflog entries, since we would reset the
> time between the object creation and the ref write (so your reflog entry
> would sometimes be a second or more after the commit time it writes).
> I'm not sure how much anybody _cares_ about that; they're much less
> intimate than author/committer times.

As long as it is understood that a commit object is created and then
a ref is updated to point at it in this order, I do not think there
is any confusion on the party who reads the reflog, I would think.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29  0:32   ` Linus Torvalds
  2016-07-29  8:19     ` Christian Couder
@ 2016-07-29 17:18     ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2016-07-29 17:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Paul Tan, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So we do want to cache things for a single commit, it's just that for
> things like "git am" (or, like Junio wondered, "git rebase" - I didn't
> check) we probabyl just just flush the cache in between commits.

What I cautioned was indeed "git rebase", and the codepath that uses
"git am" internally would be fixed when "git am" is fixed, so it is
OK.

What I wondered was actually "git cherry-pick A..B", but I think it
runs "git commit" via the run_command() API as external process, so
it should also be safe.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] reset cached ident date before creating objects
  2016-07-29 17:15         ` Junio C Hamano
@ 2016-07-29 18:05           ` Jeff King
  2016-07-29 18:12             ` Linus Torvalds
                               ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jeff King @ 2016-07-29 18:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Paul Tan, Git Mailing List

On Fri, Jul 29, 2016 at 10:15:26AM -0700, Junio C Hamano wrote:

> > One obvious impact would be reflog entries, since we would reset the
> > time between the object creation and the ref write (so your reflog entry
> > would sometimes be a second or more after the commit time it writes).
> > I'm not sure how much anybody _cares_ about that; they're much less
> > intimate than author/committer times.
> 
> As long as it is understood that a commit object is created and then
> a ref is updated to point at it in this order, I do not think there
> is any confusion on the party who reads the reflog, I would think.

Actually, I think we can trivially keep this the same.

Linus suggested resetting the timestamp after making the commit, to
clear the way for the next commit. But if we reset any cached value
_before_ making the commit, this has a few advantages:

  - the cached timestamp remains the same afterwards for anything which
    wants to look at it (like reflog updates, but also potentially
    anything that wants to report the ident it just used).

  - this gives a more accurate timestamp if the distance between caching
    and the actual commit creation is more than a second (and this does
    happen; we call git_committer_info() in many places besides creating
    an actual commit).

So here's a patch. I gave tags the same treatment, though I don't know
if there are any cases that create a series of tags. I grepped through
all the calls to git_committer_info(), and I didn't see any others that
would want to reset the date.

It does feel a little backwards to cache by default, and then try to
catch all the places that want to reset. Another way of thinking about
it would be to almost _never_ cache, but let a few callsites like (the
commit object creation) explicitly ask for a stable timestamp between
the author and committer. That would be a lot more invasive, though. And
it just gives us the opposite problem: finding all sites which care
about stability and annotating them.

(In fact, even this patch may regress some cases that want stability,
though I could not think of any. The test suite does not complain, but
that's not surprising; it has to avoid looking at this kind of thing
entirely, or it would be racy).

-- >8 --
Subject: reset cached ident date before creating objects

When we compute the date to put in author/committer lines of
commits, or tagger lines of tags, we get the current date
once and then cache it for the rest of the program.  This is
a good thing in some cases, like "git commit", because it
means we do not racily assign different times to the
author/committer fields of a single commit object.

But as more programs start to make many commits in a single
process (e.g., the recently builtin "git am"), it means that
you'll get long strings of commits with identical committer
timestamps (whereas before, we invoked "git commit" many
times and got true timestamps).

This patch addresses it by letting callers reset the cached
time, which means they'll get a fresh time on their next
call to git_committer_info() or git_author_info().  We do so
automatically before filling in the ident fields of commit
and tag objects. That retains the property that committers
and authors in a single object will match, but means that
separate objects we create should always get their own
fresh timestamps.

There's no automated test, because it would be inherently
racy (it depends on whether the program takes multiple
seconds to run). But you can see the effect with something
like:

  # make a fake 100-patch series; use --first-parent
  # so that we pretend merges are just more patches
  top=$(git rev-parse HEAD)
  bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
  git log --format=email --reverse --first-parent \
          --binary -m -p $bottom..$top >patch

  # now apply it; this presumably takes multiple seconds
  git checkout --detach $bottom
  git am <patch

  # now count the number of distinct committer times;
  # prior to this patch, there would only be one, but
  # now we'd typically see several.
  git log --format=%ct $bottom.. | sort -u

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/tag.c | 1 +
 cache.h       | 1 +
 commit.c      | 1 +
 ident.c       | 5 +++++
 4 files changed, 8 insertions(+)

diff --git a/builtin/tag.c b/builtin/tag.c
index 50e4ae5..3025e7f 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -225,6 +225,7 @@ static void create_tag(const unsigned char *object, const char *tag,
 	if (type <= OBJ_NONE)
 	    die(_("bad object type."));
 
+	reset_ident_date();
 	header_len = snprintf(header_buf, sizeof(header_buf),
 			  "object %s\n"
 			  "type %s\n"
diff --git a/cache.h b/cache.h
index b5f76a4..31e65f9 100644
--- a/cache.h
+++ b/cache.h
@@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void);
 extern const char *git_editor(void);
 extern const char *git_pager(int stdout_is_tty);
 extern int git_ident_config(const char *, const char *, void *);
+extern void reset_ident_date(void);
 
 struct ident_split {
 	const char *name_begin;
diff --git a/commit.c b/commit.c
index 71a360d..7ddbffe 100644
--- a/commit.c
+++ b/commit.c
@@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	}
 
 	/* Person/date information */
+	reset_ident_date();
 	if (!author)
 		author = git_author_info(IDENT_STRICT);
 	strbuf_addf(&buffer, "author %s\n", author);
diff --git a/ident.c b/ident.c
index 139c528..e20a772 100644
--- a/ident.c
+++ b/ident.c
@@ -184,6 +184,11 @@ static const char *ident_default_date(void)
 	return git_default_date.buf;
 }
 
+void reset_ident_date(void)
+{
+	strbuf_reset(&git_default_date);
+}
+
 static int crud(unsigned char c)
 {
 	return  c <= 32  ||
-- 
2.9.2.666.g67a7da4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Small trivial annoyance with the nice new builtin "git am"
  2016-07-29  0:37     ` Linus Torvalds
  2016-07-29 15:50       ` Jeff King
@ 2016-07-29 18:10       ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2016-07-29 18:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Paul Tan, Git Mailing List

On Thu, Jul 28, 2016 at 5:37 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 28, 2016 at 5:29 PM, Jeff King <peff@peff.net> wrote:
>>
>> I guess we want something like:
>>
>> +void reset_ident_date(void)
>> +{
>> +       strbuf_reset(&git_default_date);
>> +}
>
> Looks sane.
>
>> and then to sprinkle calls liberally through builtin-ified programs when
>> they move from one unit of work to the next.
>
> Maybe we can just add it to the end of commit_tree_extended(), and
> just say "the cache is reset between commits".
>
> That way there is no sprinking in random places.

Hmph, wouldn't that equally work well if we cleared at the beginning of the
function, instead of at the end?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-07-29 18:05           ` [PATCH] reset cached ident date before creating objects Jeff King
@ 2016-07-29 18:12             ` Linus Torvalds
  2016-07-29 18:15             ` Junio C Hamano
  2016-07-30  2:11             ` Paul Tan
  2 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2016-07-29 18:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Paul Tan, Git Mailing List

On Fri, Jul 29, 2016 at 11:05 AM, Jeff King <peff@peff.net> wrote:
>
> Linus suggested resetting the timestamp after making the commit, to
> clear the way for the next commit. But if we reset any cached value
> _before_ making the commit, this has a few advantages:

Looks fine to me. It should trivially fix the git-am issue, and I
can't see what it could break. Famous last words.

              Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-07-29 18:05           ` [PATCH] reset cached ident date before creating objects Jeff King
  2016-07-29 18:12             ` Linus Torvalds
@ 2016-07-29 18:15             ` Junio C Hamano
  2016-07-29 18:25               ` Jeff King
  2016-07-30  2:11             ` Paul Tan
  2 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2016-07-29 18:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Paul Tan, Git Mailing List

Jeff King <peff@peff.net> writes:

> Linus suggested resetting the timestamp after making the commit, to
> clear the way for the next commit. But if we reset any cached value
> _before_ making the commit, this has a few advantages:

I guess our mails crossed ;-).

> It does feel a little backwards to cache by default, and then try to
> catch all the places that want to reset. Another way of thinking about
> it would be to almost _never_ cache, but let a few callsites like (the
> commit object creation) explicitly ask for a stable timestamp between
> the author and committer.

... and the reflog?

I would say that the approach taken by this version is perfectly
sensible, if we don't look at it as a "cache" and instead look at it
as a "snapshot" of the clock for the duration of the operation.
"reset" is like "now we are starting another operation, so grab a
snapshot please".

The changes to both tagging and committing look sensible.  Thanks.

> -- >8 --
> Subject: reset cached ident date before creating objects
>
> When we compute the date to put in author/committer lines of
> commits, or tagger lines of tags, we get the current date
> once and then cache it for the rest of the program.  This is
> a good thing in some cases, like "git commit", because it
> means we do not racily assign different times to the
> author/committer fields of a single commit object.
>
> But as more programs start to make many commits in a single
> process (e.g., the recently builtin "git am"), it means that
> you'll get long strings of commits with identical committer
> timestamps (whereas before, we invoked "git commit" many
> times and got true timestamps).
>
> This patch addresses it by letting callers reset the cached
> time, which means they'll get a fresh time on their next
> call to git_committer_info() or git_author_info().  We do so
> automatically before filling in the ident fields of commit
> and tag objects. That retains the property that committers
> and authors in a single object will match, but means that
> separate objects we create should always get their own
> fresh timestamps.
>
> There's no automated test, because it would be inherently
> racy (it depends on whether the program takes multiple
> seconds to run). But you can see the effect with something
> like:
>
>   # make a fake 100-patch series; use --first-parent
>   # so that we pretend merges are just more patches
>   top=$(git rev-parse HEAD)
>   bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
>   git log --format=email --reverse --first-parent \
>           --binary -m -p $bottom..$top >patch
>
>   # now apply it; this presumably takes multiple seconds
>   git checkout --detach $bottom
>   git am <patch
>
>   # now count the number of distinct committer times;
>   # prior to this patch, there would only be one, but
>   # now we'd typically see several.
>   git log --format=%ct $bottom.. | sort -u
>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/tag.c | 1 +
>  cache.h       | 1 +
>  commit.c      | 1 +
>  ident.c       | 5 +++++
>  4 files changed, 8 insertions(+)
>
> diff --git a/builtin/tag.c b/builtin/tag.c
> index 50e4ae5..3025e7f 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -225,6 +225,7 @@ static void create_tag(const unsigned char *object, const char *tag,
>  	if (type <= OBJ_NONE)
>  	    die(_("bad object type."));
>  
> +	reset_ident_date();
>  	header_len = snprintf(header_buf, sizeof(header_buf),
>  			  "object %s\n"
>  			  "type %s\n"
> diff --git a/cache.h b/cache.h
> index b5f76a4..31e65f9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void);
>  extern const char *git_editor(void);
>  extern const char *git_pager(int stdout_is_tty);
>  extern int git_ident_config(const char *, const char *, void *);
> +extern void reset_ident_date(void);
>  
>  struct ident_split {
>  	const char *name_begin;
> diff --git a/commit.c b/commit.c
> index 71a360d..7ddbffe 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
>  	}
>  
>  	/* Person/date information */
> +	reset_ident_date();
>  	if (!author)
>  		author = git_author_info(IDENT_STRICT);
>  	strbuf_addf(&buffer, "author %s\n", author);
> diff --git a/ident.c b/ident.c
> index 139c528..e20a772 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -184,6 +184,11 @@ static const char *ident_default_date(void)
>  	return git_default_date.buf;
>  }
>  
> +void reset_ident_date(void)
> +{
> +	strbuf_reset(&git_default_date);
> +}
> +
>  static int crud(unsigned char c)
>  {
>  	return  c <= 32  ||

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-07-29 18:15             ` Junio C Hamano
@ 2016-07-29 18:25               ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2016-07-29 18:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Paul Tan, Git Mailing List

On Fri, Jul 29, 2016 at 11:15:39AM -0700, Junio C Hamano wrote:

> > It does feel a little backwards to cache by default, and then try to
> > catch all the places that want to reset. Another way of thinking about
> > it would be to almost _never_ cache, but let a few callsites like (the
> > commit object creation) explicitly ask for a stable timestamp between
> > the author and committer.
> 
> ... and the reflog?

I left that part out. I can't decide if it is a bug or a feature that
the reflog retains the same timestamp.

I.e., I'm not sure it would be wrong to do:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 12290d2..465cfc5 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2886,6 +2886,8 @@ static int log_ref_write_1(const char *refname, const unsigned char *old_sha1,
 	logfd = open(logfile->buf, oflags);
 	if (logfd < 0)
 		return 0;
+
+	reset_ident_date();
 	result = log_ref_write_fd(logfd, old_sha1, new_sha1,
 				  git_committer_info(0), msg);
 	if (result) {

on top. If somebody writes a lot of refs without updating any commit
objects, should those all have a stable timestamp or not?

On the one hand, moving the timestamp reflects reality.

On the other, I have done dirty things in the past like "undoing" the
results of somebody's mistaken:

  git clone ...
  git push --mirror ;# oops, this deletes everything except master!

by grouping ref updates according to their reflog mtimes.

So I kind of punted by not changing it but also not making any claims
either way. :)

-Peff

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-07-29 18:05           ` [PATCH] reset cached ident date before creating objects Jeff King
  2016-07-29 18:12             ` Linus Torvalds
  2016-07-29 18:15             ` Junio C Hamano
@ 2016-07-30  2:11             ` Paul Tan
  2016-07-30  2:41               ` Jeff King
  2 siblings, 1 reply; 26+ messages in thread
From: Paul Tan @ 2016-07-30  2:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

Hi Jeff,

On Sat, Jul 30, 2016 at 2:05 AM, Jeff King <peff@peff.net> wrote:
> When we compute the date to put in author/committer lines of
> commits, or tagger lines of tags, we get the current date
> once and then cache it for the rest of the program.  This is
> a good thing in some cases, like "git commit", because it
> means we do not racily assign different times to the
> author/committer fields of a single commit object.

So commits created with "git commit" should have the same author and
committer timestamps...

> diff --git a/commit.c b/commit.c
> index 71a360d..7ddbffe 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
>         }
>
>         /* Person/date information */
> +       reset_ident_date();
>         if (!author)
>                 author = git_author_info(IDENT_STRICT);
>         strbuf_addf(&buffer, "author %s\n", author);

But since builtin/commit.c constructs its author ident string before
calling the editor and then commit_tree_extended(), this would cause
the resulting commits to have committer timestamps which differ from
their author timestamps.

So maybe we would have to put reset_ident_date() at the end of the
function instead, at least after git_committer_info() is called.

Regards,
Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-07-30  2:11             ` Paul Tan
@ 2016-07-30  2:41               ` Jeff King
  2016-08-01 17:49                 ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2016-07-30  2:41 UTC (permalink / raw)
  To: Paul Tan; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

On Sat, Jul 30, 2016 at 10:11:56AM +0800, Paul Tan wrote:

> > diff --git a/commit.c b/commit.c
> > index 71a360d..7ddbffe 100644
> > --- a/commit.c
> > +++ b/commit.c
> > @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> >         }
> >
> >         /* Person/date information */
> > +       reset_ident_date();
> >         if (!author)
> >                 author = git_author_info(IDENT_STRICT);
> >         strbuf_addf(&buffer, "author %s\n", author);
> 
> But since builtin/commit.c constructs its author ident string before
> calling the editor and then commit_tree_extended(), this would cause
> the resulting commits to have committer timestamps which differ from
> their author timestamps.

Hrm, yeah. I assumed it would only pass in the author string for things
like "-c" or "--amend". But it looks like it unconditionally passes in
the author.  And it would be slightly difficult to have it pass NULL,
because it may actually have _part_ of an author (e.g., "--author" will
come up with name and email but not the date), so it has to sometimes
combine those bits with things like ident_default_date() itself.

I guess one option would be to commit_tree_extended() to take the
broken-down author bits and call fmt_ident() itself. That's what all of
the callers are doing (that, or just passing NULL). It would make the
interface a bit clunkier, but I think the end result would be more
flexible.

I suppose that would be tricky for git-commit, because in addition to
passing the result of fmt_ident() to commit_tree_extended(), it wants to
take the pieces and put them in the environment for hooks to see. And if
the data is available only inside commit_tree_extended(), we don't have
it for the hooks.

> So maybe we would have to put reset_ident_date() at the end of the
> function instead, at least after git_committer_info() is called.

Yes, although "reset and end" still feels a bit weird to me.

I'd almost prefer to just have long-running programs insert resets at
strategic points.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-07-30  2:41               ` Jeff King
@ 2016-08-01 17:49                 ` Junio C Hamano
  2016-08-01 17:58                   ` Junio C Hamano
  2016-08-01 18:00                   ` Jeff King
  0 siblings, 2 replies; 26+ messages in thread
From: Junio C Hamano @ 2016-08-01 17:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

>> So maybe we would have to put reset_ident_date() at the end of the
>> function instead, at least after git_committer_info() is called.
>
> Yes, although "reset and end" still feels a bit weird to me.
>
> I'd almost prefer to just have long-running programs insert resets at
> strategic points.

Certainly "reset at the end" feels weird but it can be explained as
"for a one-shot thing we use the first time of the default date and
it gives a consistent timestamp; conceptually, things that make
multiple commits are like doing that one-shot thing multiple times
in a row."

When viewed that way, it is not _too_ bad, I would guess.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-08-01 17:49                 ` Junio C Hamano
@ 2016-08-01 17:58                   ` Junio C Hamano
  2016-08-01 18:12                     ` Jeff King
  2016-08-01 18:00                   ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2016-08-01 17:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>>> So maybe we would have to put reset_ident_date() at the end of the
>>> function instead, at least after git_committer_info() is called.
>>
>> Yes, although "reset and end" still feels a bit weird to me.
>>
>> I'd almost prefer to just have long-running programs insert resets at
>> strategic points.
>
> Certainly "reset at the end" feels weird but it can be explained as
> "for a one-shot thing we use the first time of the default date and
> it gives a consistent timestamp; conceptually, things that make
> multiple commits are like doing that one-shot thing multiple times
> in a row."
>
> When viewed that way, it is not _too_ bad, I would guess.

An interdiff to what we queued previously would look like this:

 builtin/tag.c | 11 ++++++++++-
 commit.c      | 15 ++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 5dccae3..e852ded 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -225,7 +225,6 @@ static void create_tag(const unsigned char *object, const char *tag,
 	if (type <= OBJ_NONE)
 	    die(_("bad object type."));
 
-	reset_ident_date();
 	header_len = snprintf(header_buf, sizeof(header_buf),
 			  "object %s\n"
 			  "type %s\n"
@@ -287,6 +286,16 @@ static void create_tag(const unsigned char *object, const char *tag,
 		unlink_or_warn(path);
 		free(path);
 	}
+
+	/*
+	 * Reset the default timestamp for the next call to create tag/commit
+	 * object, so that they get their own fresh timestamp.
+	 *
+	 * NOTE NOTE NOTE! if you are libifying this function later by
+	 * turning exit/die in the above code to return an error, be
+	 * sure to jump here to make this call happen.
+	 */
+	reset_ident_date();
 }
 
 struct msg_arg {
diff --git a/commit.c b/commit.c
index b02f3c4..db24013 100644
--- a/commit.c
+++ b/commit.c
@@ -1543,7 +1543,6 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	}
 
 	/* Person/date information */
-	reset_ident_date();
 	if (!author)
 		author = git_author_info(IDENT_STRICT);
 	strbuf_addf(&buffer, "author %s\n", author);
@@ -1564,11 +1563,21 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	if (encoding_is_utf8 && !verify_utf8(&buffer))
 		fprintf(stderr, commit_utf8_warn);
 
-	if (sign_commit && do_sign_commit(&buffer, sign_commit))
-		return -1;
+	if (sign_commit && do_sign_commit(&buffer, sign_commit)) {
+		result = -1;
+		goto out;
+	}
 
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
+
+out:
 	strbuf_release(&buffer);
+
+	/*
+	 * Reset the default timestamp for the next call to create tag/commit
+	 * object, so that they get their own fresh timestamp.
+	 */
+	reset_ident_date();
 	return result;
 }
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-08-01 17:49                 ` Junio C Hamano
  2016-08-01 17:58                   ` Junio C Hamano
@ 2016-08-01 18:00                   ` Jeff King
  2016-08-01 19:37                     ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff King @ 2016-08-01 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

On Mon, Aug 01, 2016 at 10:49:12AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> So maybe we would have to put reset_ident_date() at the end of the
> >> function instead, at least after git_committer_info() is called.
> >
> > Yes, although "reset and end" still feels a bit weird to me.
> >
> > I'd almost prefer to just have long-running programs insert resets at
> > strategic points.
> 
> Certainly "reset at the end" feels weird but it can be explained as
> "for a one-shot thing we use the first time of the default date and
> it gives a consistent timestamp; conceptually, things that make
> multiple commits are like doing that one-shot thing multiple times
> in a row."
> 
> When viewed that way, it is not _too_ bad, I would guess.

I think what bothers me is that you could use similar logic for "reset
at the beginning". The problem is that we don't know when "the
beginning" is. I thought it was inside commit_tree_extended(), but for
some operations, it's not, as Paul showed.

So when is "the end"? Is it at the end of commit_tree_extended()? I'm
not sure. Already we know it depends on whether you consider the ref
update part of the same operation. Whether that matters is debatable.
Are there other cases (an obvious one would be the human-readable bits
printed by git-commit, but it looks like those are done beforehand)?
Even if we do check every case, though, it seems like a fragile
assumption to be making.

So at this point I think I'd lean towards programs which have multiple
logical commit operations explicitly saying "I am starting a new
operation". It may be that we end up attaching more to that in the long
run than just timestamp resets, too (i.e., what other cached data might
there be that used to happen in separate sub-processes?).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-08-01 17:58                   ` Junio C Hamano
@ 2016-08-01 18:12                     ` Jeff King
  2016-08-01 18:17                       ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2016-08-01 18:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

On Mon, Aug 01, 2016 at 10:58:47AM -0700, Junio C Hamano wrote:

> diff --git a/commit.c b/commit.c
> index b02f3c4..db24013 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1543,7 +1543,6 @@ int commit_tree_extended(const char *msg, size_t msg_len,
>  	}
>  
>  	/* Person/date information */
> -	reset_ident_date();
>  	if (!author)
>  		author = git_author_info(IDENT_STRICT);
>  	strbuf_addf(&buffer, "author %s\n", author);
> @@ -1564,11 +1563,21 @@ int commit_tree_extended(const char *msg, size_t msg_len,
>  	if (encoding_is_utf8 && !verify_utf8(&buffer))
>  		fprintf(stderr, commit_utf8_warn);
>  
> -	if (sign_commit && do_sign_commit(&buffer, sign_commit))
> -		return -1;
> +	if (sign_commit && do_sign_commit(&buffer, sign_commit)) {
> +		result = -1;
> +		goto out;
> +	}
>  
>  	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
> +
> +out:
>  	strbuf_release(&buffer);
> +
> +	/*
> +	 * Reset the default timestamp for the next call to create tag/commit
> +	 * object, so that they get their own fresh timestamp.
> +	 */
> +	reset_ident_date();
>  	return result;
>  }

Before I switched to "reset at the beginning" in my original patch, I
also noticed this issue, but decided the other way: to only reset after
a successful creation.

I think in most cases it wouldn't matter anyway, because the caller will
generally abort as soon as this returns failure anyway. But I wondered
about something like:

  author = prepare_author_info();
  if (commit_tree_extended(..., author, ...) < 0) {
	/* oops, we failed. Do a thing and try again. */
	possible_fix();
	if (commit_tree_extended(..., author, ...) < 0)
		die("giving up");
  }

In the second call (but only the second call!) the committer and author
can diverge.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-08-01 18:12                     ` Jeff King
@ 2016-08-01 18:17                       ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2016-08-01 18:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

On Mon, Aug 01, 2016 at 02:12:34PM -0400, Jeff King wrote:

> Before I switched to "reset at the beginning" in my original patch, I
> also noticed this issue, but decided the other way: to only reset after
> a successful creation.
> 
> I think in most cases it wouldn't matter anyway, because the caller will
> generally abort as soon as this returns failure anyway. But I wondered
> about something like:
> 
>   author = prepare_author_info();
>   if (commit_tree_extended(..., author, ...) < 0) {
> 	/* oops, we failed. Do a thing and try again. */
> 	possible_fix();
> 	if (commit_tree_extended(..., author, ...) < 0)
> 		die("giving up");
>   }
> 
> In the second call (but only the second call!) the committer and author
> can diverge.

To be clear, I checked all of the callers and nobody actually does this.
Every caller proceeds straight to a die() except the one in
notes_cache_write(), which silently ignores error (which is the correct
thing to do).

This is more "it seems like a fragile pattern to me".

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-08-01 18:00                   ` Jeff King
@ 2016-08-01 19:37                     ` Jeff King
  2016-08-01 21:54                       ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2016-08-01 19:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

On Mon, Aug 01, 2016 at 02:00:47PM -0400, Jeff King wrote:

> So at this point I think I'd lean towards programs which have multiple
> logical commit operations explicitly saying "I am starting a new
> operation". It may be that we end up attaching more to that in the long
> run than just timestamp resets, too (i.e., what other cached data might
> there be that used to happen in separate sub-processes?).

Here's that patch.

I still just put in a bare "reset_ident_date()" in git-am. This _could_
be wrapped in: begin_logical_commit() or something, but it would just be
a wrapper around resetting the ident_date. So I'm inclined not to worry
about such semantic obfuscation until we actually have a second thing to
reset. :)

I also didn't go looking for other spots which might want similar
treatment. Junio mentioned that the sequencer code still uses an
external commit process, so cherry-pick/revert are OK. git-fast-import
creates a stream of commits, but already deals with this issue in a
separate way. And I couldn't think of other programs that want to make a
string of commits.

-- >8 --
Subject: [PATCH] am: reset cached ident date for each patch

When we compute the date to go in author/committer lines of
commits, or tagger lines of tags, we get the current date
once and then cache it for the rest of the program.  This is
a good thing in some cases, like "git commit", because it
means we do not racily assign different times to the
author/committer fields of a single commit object.

But as more programs start to make many commits in a single
process (e.g., the recently builtin "git am"), it means that
you'll get long strings of commits with identical committer
timestamps (whereas before, we invoked "git commit" many
times and got true timestamps).

This patch addresses it by letting callers reset the cached
time, which means they'll get a fresh time on their next
call to git_committer_info() or git_author_info(). The first
caller to do so is "git am", which resets the time for each
patch it applies.

It would be nice if we could just do this automatically
before filling in the ident fields of commit and tag
objects. Unfortunately, it's hard to know where a particular
logical operation begins and ends.

For instance, if commit_tree_extended() were to call
reset_ident_date() before getting the committer/author
ident, that doesn't quite work; sometimes the author info is
passed in to us as a parameter, and it may or may not have
come from a previous call to ident_default_date(). So in
those cases, we lose the property that the committer and the
author timestamp always match.

You could similarly put a date-reset at the end of
commit_tree_extended(). That actually works in the current
code base, but it's fragile. It makes the assumption that
after commit_tree_extended() finishes, the caller has no
other operations that would logically want to fall into the
same timestamp.

So instead we provide the tool to easily do the reset, and
let the high-level callers use it to annotate their own
logical operations.

There's no automated test, because it would be inherently
racy (it depends on whether the program takes multiple
seconds to run). But you can see the effect with something
like:

  # make a fake 100-patch series
  top=$(git rev-parse HEAD)
  bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
  git log --format=email --reverse --first-parent \
          --binary -m -p $bottom..$top >patch

  # now apply it; this presumably takes multiple seconds
  git checkout --detach $bottom
  git am <patch

  # now count the number of distinct committer times;
  # prior to this patch, there would only be one, but
  # now we'd typically see several.
  git log --format=%ct $bottom.. | sort -u

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/am.c | 2 ++
 cache.h      | 1 +
 ident.c      | 5 +++++
 3 files changed, 8 insertions(+)

diff --git a/builtin/am.c b/builtin/am.c
index b77bf11..8aa9b5b 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1840,6 +1840,8 @@ static void am_run(struct am_state *state, int resume)
 		const char *mail = am_path(state, msgnum(state));
 		int apply_status;
 
+		reset_ident_date();
+
 		if (!file_exists(mail))
 			goto next;
 
diff --git a/cache.h b/cache.h
index b5f76a4..31e65f9 100644
--- a/cache.h
+++ b/cache.h
@@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void);
 extern const char *git_editor(void);
 extern const char *git_pager(int stdout_is_tty);
 extern int git_ident_config(const char *, const char *, void *);
+extern void reset_ident_date(void);
 
 struct ident_split {
 	const char *name_begin;
diff --git a/ident.c b/ident.c
index 139c528..e20a772 100644
--- a/ident.c
+++ b/ident.c
@@ -184,6 +184,11 @@ static const char *ident_default_date(void)
 	return git_default_date.buf;
 }
 
+void reset_ident_date(void)
+{
+	strbuf_reset(&git_default_date);
+}
+
 static int crud(unsigned char c)
 {
 	return  c <= 32  ||
-- 
2.9.2.670.gf6ce898


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-08-01 19:37                     ` Jeff King
@ 2016-08-01 21:54                       ` Junio C Hamano
  2016-08-01 22:00                         ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2016-08-01 21:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> I also didn't go looking for other spots which might want similar
> treatment. Junio mentioned that the sequencer code still uses an
> external commit process, so cherry-pick/revert are OK. git-fast-import
> creates a stream of commits, but already deals with this issue in a
> separate way. And I couldn't think of other programs that want to make a
> string of commits.

Thanks.  I wonder if do_commit() may be a more appropriate place to
reset this, but the difference only would matter in "am --resolved",
and a call to do_commit() it has will always be the first commit in
the process, so there would not be anything to clear, which would in
turn mean that it would not make any difference.

Will queue.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] reset cached ident date before creating objects
  2016-08-01 21:54                       ` Junio C Hamano
@ 2016-08-01 22:00                         ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2016-08-01 22:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Tan, Linus Torvalds, Git Mailing List

On Mon, Aug 01, 2016 at 02:54:47PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I also didn't go looking for other spots which might want similar
> > treatment. Junio mentioned that the sequencer code still uses an
> > external commit process, so cherry-pick/revert are OK. git-fast-import
> > creates a stream of commits, but already deals with this issue in a
> > separate way. And I couldn't think of other programs that want to make a
> > string of commits.
> 
> Thanks.  I wonder if do_commit() may be a more appropriate place to
> reset this, but the difference only would matter in "am --resolved",
> and a call to do_commit() it has will always be the first commit in
> the process, so there would not be anything to clear, which would in
> turn mean that it would not make any difference.

Yeah, you may be right. I didn't actually find that spot. I was focused
on the looping aspect. And seeing how complex the loop was, my thought
was to just put it at the top, so we know it happens once per iteration.
But I think do_commit() is a reasonble spot, too (as long as it comes
before the call to fmt_ident(), of course).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-08-01 22:01 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-28 23:35 Small trivial annoyance with the nice new builtin "git am" Linus Torvalds
2016-07-28 23:47 ` Junio C Hamano
2016-07-29  0:29   ` Jeff King
2016-07-29  0:37     ` Linus Torvalds
2016-07-29 15:50       ` Jeff King
2016-07-29 17:15         ` Junio C Hamano
2016-07-29 18:05           ` [PATCH] reset cached ident date before creating objects Jeff King
2016-07-29 18:12             ` Linus Torvalds
2016-07-29 18:15             ` Junio C Hamano
2016-07-29 18:25               ` Jeff King
2016-07-30  2:11             ` Paul Tan
2016-07-30  2:41               ` Jeff King
2016-08-01 17:49                 ` Junio C Hamano
2016-08-01 17:58                   ` Junio C Hamano
2016-08-01 18:12                     ` Jeff King
2016-08-01 18:17                       ` Jeff King
2016-08-01 18:00                   ` Jeff King
2016-08-01 19:37                     ` Jeff King
2016-08-01 21:54                       ` Junio C Hamano
2016-08-01 22:00                         ` Jeff King
2016-07-29 18:10       ` Small trivial annoyance with the nice new builtin "git am" Junio C Hamano
2016-07-29 17:06     ` Junio C Hamano
2016-07-29  0:21 ` Jeff King
2016-07-29  0:32   ` Linus Torvalds
2016-07-29  8:19     ` Christian Couder
2016-07-29 17:18     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.