All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] prune: keep files created after process start
@ 2016-06-19  3:13 Eric Wong
  2016-06-19  3:58 ` Junio C Hamano
  2016-06-19  7:11 ` Andreas Schwab
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Wong @ 2016-06-19  3:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Avoid pruning files which were written after the prune process
starts, as it possible to concurrently create new objects while
"git prune" is running.

Tested on git.git by starting "git prune" in one terminal,
creating a random loose object via "git hash-object --stdin -w"
in a different terminal, and ensuring the loose object remains
after "git prune" completes.

Signed-off-by: Eric Wong <e@80x24.org>
---
 I'm somewhat surprised this check didn't already exist;
 but maybe nobody else runs prune manually, anymore.

 builtin/prune.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 8f4f052..d4cd054 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -14,6 +14,7 @@ static const char * const prune_usage[] = {
 static int show_only;
 static int verbose;
 static unsigned long expire;
+static time_t start;
 static int show_progress = -1;
 
 static int prune_tmp_file(const char *fullpath)
@@ -21,7 +22,7 @@ static int prune_tmp_file(const char *fullpath)
 	struct stat st;
 	if (lstat(fullpath, &st))
 		return error("Could not stat '%s'", fullpath);
-	if (st.st_mtime > expire)
+	if (st.st_mtime > expire || st.st_ctime >= start)
 		return 0;
 	if (show_only || verbose)
 		printf("Removing stale temporary file %s\n", fullpath);
@@ -47,7 +48,7 @@ static int prune_object(const unsigned char *sha1, const char *fullpath,
 		error("Could not stat '%s'", fullpath);
 		return 0;
 	}
-	if (st.st_mtime > expire)
+	if (st.st_mtime > expire || st.st_ctime >= start)
 		return 0;
 	if (show_only || verbose) {
 		enum object_type type = sha1_object_info(sha1, NULL);
@@ -111,6 +112,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	};
 	char *s;
 
+	start = time(NULL);
 	expire = ULONG_MAX;
 	save_commit_buffer = 0;
 	check_replace_refs = 0;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] prune: keep files created after process start
  2016-06-19  3:13 [PATCH] prune: keep files created after process start Eric Wong
@ 2016-06-19  3:58 ` Junio C Hamano
  2016-06-19  4:32   ` Eric Wong
  2016-06-19  9:34   ` Jeff King
  2016-06-19  7:11 ` Andreas Schwab
  1 sibling, 2 replies; 6+ messages in thread
From: Junio C Hamano @ 2016-06-19  3:58 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Eric Wong <e@80x24.org> writes:

> Avoid pruning files which were written after the prune process
> starts, as it possible to concurrently create new objects while
> "git prune" is running.
>
> Tested on git.git by starting "git prune" in one terminal,
> creating a random loose object via "git hash-object --stdin -w"
> in a different terminal, and ensuring the loose object remains
> after "git prune" completes.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  I'm somewhat surprised this check didn't already exist;
>  but maybe nobody else runs prune manually, anymore.

The only time an end user would run "git prune" in their
repositories with working trees these days is "git repack" followed
by "git prune", I would guess.

The current behaviour is totally expected and that is why you do not
run "prune" without any grace period in an active repository.

Adding the proposed change however would not make anything worse, I
would think.  Those who run "git prune" without grace period accept
that the repository must be quiescent or otherwise the newly created
objects will be lost.  So they are unlikely to be doing anything to
create such objects, hence "things newer than 'start'" are unlikely
to appear, and their expectation that "repack && prune" will remove
all loose objects will not be broken.

You generally cannot compare a timestamp you read from the
filesystem and the timestamp you obtain from time(2) when network
filesystems are involved, so I am not sure the implementation is
quite right, though.

> diff --git a/builtin/prune.c b/builtin/prune.c
> index 8f4f052..d4cd054 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -14,6 +14,7 @@ static const char * const prune_usage[] = {
>  static int show_only;
>  static int verbose;
>  static unsigned long expire;
> +static time_t start;
>  static int show_progress = -1;
>  
>  static int prune_tmp_file(const char *fullpath)
> @@ -21,7 +22,7 @@ static int prune_tmp_file(const char *fullpath)
>  	struct stat st;
>  	if (lstat(fullpath, &st))
>  		return error("Could not stat '%s'", fullpath);
> -	if (st.st_mtime > expire)
> +	if (st.st_mtime > expire || st.st_ctime >= start)
>  		return 0;
>  	if (show_only || verbose)
>  		printf("Removing stale temporary file %s\n", fullpath);
> @@ -47,7 +48,7 @@ static int prune_object(const unsigned char *sha1, const char *fullpath,
>  		error("Could not stat '%s'", fullpath);
>  		return 0;
>  	}
> -	if (st.st_mtime > expire)
> +	if (st.st_mtime > expire || st.st_ctime >= start)
>  		return 0;
>  	if (show_only || verbose) {
>  		enum object_type type = sha1_object_info(sha1, NULL);
> @@ -111,6 +112,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>  	};
>  	char *s;
>  
> +	start = time(NULL);
>  	expire = ULONG_MAX;
>  	save_commit_buffer = 0;
>  	check_replace_refs = 0;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] prune: keep files created after process start
  2016-06-19  3:58 ` Junio C Hamano
@ 2016-06-19  4:32   ` Eric Wong
  2016-06-19  9:34   ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Wong @ 2016-06-19  4:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> >  I'm somewhat surprised this check didn't already exist;
> >  but maybe nobody else runs prune manually, anymore.
> 
> The only time an end user would run "git prune" in their
> repositories with working trees these days is "git repack" followed
> by "git prune", I would guess.

Right, I wanted to drop some sensitive data with that.

> You generally cannot compare a timestamp you read from the
> filesystem and the timestamp you obtain from time(2) when network
> filesystems are involved, so I am not sure the implementation is
> quite right, though.

Yes, but I'm not aware of a good way to deal with this;
I would expect machines on the same network would have
synchronized times.

Perhaps having a small slack time (one second?) could mitigate
some problems with machines being slightly off:

       start = time(NULL) - 1;

And then warning if it encountered files within the slack
period and asking the user to rerun "prune" if needed.
But I'm not sure it's worth it.

Any other suggestions?

Thanks.

Having prune prevent object creation entirely while running
seems unacceptable.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] prune: keep files created after process start
  2016-06-19  3:13 [PATCH] prune: keep files created after process start Eric Wong
  2016-06-19  3:58 ` Junio C Hamano
@ 2016-06-19  7:11 ` Andreas Schwab
  2016-06-19  8:48   ` Eric Wong
  1 sibling, 1 reply; 6+ messages in thread
From: Andreas Schwab @ 2016-06-19  7:11 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git

Eric Wong <e@80x24.org> writes:

> @@ -21,7 +22,7 @@ static int prune_tmp_file(const char *fullpath)
>  	struct stat st;
>  	if (lstat(fullpath, &st))
>  		return error("Could not stat '%s'", fullpath);
> -	if (st.st_mtime > expire)
> +	if (st.st_mtime > expire || st.st_ctime >= start)

That will also mean objects created (or their inode changed) up to a
second before the start of prune will not be removed.  For example,
objects ejected out of a pack by a previous repack may be affected.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] prune: keep files created after process start
  2016-06-19  7:11 ` Andreas Schwab
@ 2016-06-19  8:48   ` Eric Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2016-06-19  8:48 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Junio C Hamano, git

Andreas Schwab <schwab@linux-m68k.org> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > @@ -21,7 +22,7 @@ static int prune_tmp_file(const char *fullpath)
> >  	struct stat st;
> >  	if (lstat(fullpath, &st))
> >  		return error("Could not stat '%s'", fullpath);
> > -	if (st.st_mtime > expire)
> > +	if (st.st_mtime > expire || st.st_ctime >= start)
> 
> That will also mean objects created (or their inode changed) up to a
> second before the start of prune will not be removed.  For example,
> objects ejected out of a pack by a previous repack may be affected.

True, but I prefer to err on the side of keeping data rather than
removing it.   But keeping it can also be a liability (as it was
in my case :)  So, perhaps warn users instead:

diff --git a/builtin/prune.c b/builtin/prune.c
index d4cd054..c1642d1 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -16,14 +16,22 @@ static int verbose;
 static unsigned long expire;
 static time_t start;
 static int show_progress = -1;
+static unsigned long ctime_matches;
 
 static int prune_tmp_file(const char *fullpath)
 {
 	struct stat st;
 	if (lstat(fullpath, &st))
 		return error("Could not stat '%s'", fullpath);
-	if (st.st_mtime > expire || st.st_ctime >= start)
+	if (st.st_mtime > expire || st.st_ctime > start)
 		return 0;
+
+	if (st.st_ctime == start) {
+		ctime_matches++;
+		warning("Keeping %s since it changed as we started", fullpath);
+		return 0;
+	}
+
 	if (show_only || verbose)
 		printf("Removing stale temporary file %s\n", fullpath);
 	if (!show_only)
@@ -48,8 +56,13 @@ static int prune_object(const unsigned char *sha1, const char *fullpath,
 		error("Could not stat '%s'", fullpath);
 		return 0;
 	}
-	if (st.st_mtime > expire || st.st_ctime >= start)
+	if (st.st_mtime > expire || st.st_ctime > start)
 		return 0;
+	if (st.st_ctime == start) {
+		ctime_matches++;
+		warning("Keeping %s since it changed as we started", fullpath);
+		return 0;
+	}
 	if (show_only || verbose) {
 		enum object_type type = sha1_object_info(sha1, NULL);
 		printf("%s %s\n", sha1_to_hex(sha1),
@@ -155,5 +168,12 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	if (is_repository_shallow())
 		prune_shallow(show_only);
 
+	if (ctime_matches) {
+		warning("%lu files kept since they changed as prune started",
+			ctime_matches);
+		warning("rerun prune after %s",
+			show_date(start, 0, DATE_MODE(NORMAL)));
+	}
+
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] prune: keep files created after process start
  2016-06-19  3:58 ` Junio C Hamano
  2016-06-19  4:32   ` Eric Wong
@ 2016-06-19  9:34   ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2016-06-19  9:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, git

On Sat, Jun 18, 2016 at 08:58:03PM -0700, Junio C Hamano wrote:

> Eric Wong <e@80x24.org> writes:
> 
> > Avoid pruning files which were written after the prune process
> > starts, as it possible to concurrently create new objects while
> > "git prune" is running.
> >
> > Tested on git.git by starting "git prune" in one terminal,
> > creating a random loose object via "git hash-object --stdin -w"
> > in a different terminal, and ensuring the loose object remains
> > after "git prune" completes.
> >
> > Signed-off-by: Eric Wong <e@80x24.org>
> > ---
> >  I'm somewhat surprised this check didn't already exist;
> >  but maybe nobody else runs prune manually, anymore.
> 
> The only time an end user would run "git prune" in their
> repositories with working trees these days is "git repack" followed
> by "git prune", I would guess.
> 
> The current behaviour is totally expected and that is why you do not
> run "prune" without any grace period in an active repository.
> 
> Adding the proposed change however would not make anything worse, I
> would think.  Those who run "git prune" without grace period accept
> that the repository must be quiescent or otherwise the newly created
> objects will be lost.  So they are unlikely to be doing anything to
> create such objects, hence "things newer than 'start'" are unlikely
> to appear, and their expectation that "repack && prune" will remove
> all loose objects will not be broken.

I agree it does not making anything worse from a correctness standpoint,
but I am mildly negative in that it adds complexity without any real
safety. Even after this patch, running git prune without a grace period
isn't safe, as you are racing simultaneous processes that started before
prune, or started after prune but before it marks the "start" timestamp.

I know it isn't much code, but it makes things harder to reason about.
I think it also runs contrary to the logic in 3d27b9b (date.c: add
parse_expiry_date(), 2013-04-17)[1].

-Peff

[1] More context in http://mid.gmane.org/7vbo9ceqb3.fsf@alter.siamese.dyndns.org
    and the surrounding thread.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-19  9:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-19  3:13 [PATCH] prune: keep files created after process start Eric Wong
2016-06-19  3:58 ` Junio C Hamano
2016-06-19  4:32   ` Eric Wong
2016-06-19  9:34   ` Jeff King
2016-06-19  7:11 ` Andreas Schwab
2016-06-19  8:48   ` Eric Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.