All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] repack: Add options to preserve and prune old pack files
@ 2017-03-07 16:40 James Melvin
  2017-03-07 20:33 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: James Melvin @ 2017-03-07 16:40 UTC (permalink / raw)
  To: git; +Cc: nasserg, mfick, peff, sbeller, James Melvin

The new --preserve-oldpacks option moves old pack files into the
preserved subdirectory instead of deleting them after repacking.

The new --prune-preserved option prunes old pack files from the
preserved subdirectory after repacking, but before potentially
moving the latest old packfiles to this subdirectory.

These options are designed to prevent stale file handle exceptions
during git operations which can happen on users of NFS repos when
repacking is done on them. The strategy is to preserve old pack files
around until the next repack with the hopes that they will become
unreferenced by then and not cause any exceptions to running processes
when they are finally deleted (pruned).

Signed-off-by: James Melvin <jmelvin@codeaurora.org>
---
 Documentation/git-repack.txt |  9 +++++++++
 builtin/repack.c             | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 26afe6ed5..0b19b761f 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -143,6 +143,15 @@ other objects in that pack they already have locally.
 	being removed. In addition, any unreachable loose objects will
 	be packed (and their loose counterparts removed).
 
+--preserve-oldpacks::
+	 Move old pack files into the preserved subdirectory instead
+	 of deleting them after repacking.
+
+--prune-preserved::
+	 Prune old pack files from the preserved subdirectory after
+	 repacking, but before potentially moving the latest old
+	 packfiles to this subdirectory
+
 Configuration
 -------------
 
diff --git a/builtin/repack.c b/builtin/repack.c
index 677bc7c81..f1a0c97f3 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -10,8 +10,10 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
+static int preserve_oldpacks = 0;
+static int prune_preserved = 0;
 static int write_bitmaps;
-static char *packdir, *packtmp;
+static char *packdir, *packtmp, *preservedir;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [<options>]"),
@@ -108,6 +110,27 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list)
 	closedir(dir);
 }
 
+static void preserve_pack(const char *file_path, const char *file_name,  const char *file_ext)
+{
+	char *fname_old;
+
+	if (mkdir(preservedir, 0700) && errno != EEXIST)
+		error(_("failed to create preserve directory"));
+
+	fname_old = mkpathdup("%s/%s.old-%s", preservedir, file_name, ++file_ext);
+	rename(file_path, fname_old);
+
+	free(fname_old);
+}
+
+static void remove_preserved_dir(void) {
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addstr(&buf, preservedir);
+	remove_dir_recursively(&buf, 0);
+	strbuf_release(&buf);
+}
+
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	const char *exts[] = {".pack", ".idx", ".keep", ".bitmap"};
@@ -121,7 +144,10 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
 	for (i = 0; i < ARRAY_SIZE(exts); i++) {
 		strbuf_setlen(&buf, plen);
 		strbuf_addstr(&buf, exts[i]);
-		unlink(buf.buf);
+		if (preserve_oldpacks)
+			preserve_pack(buf.buf, base_name, exts[i]);
+		else
+			unlink(buf.buf);
 	}
 	strbuf_release(&buf);
 }
@@ -194,6 +220,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("maximum size of each packfile")),
 		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
 				N_("repack objects in packs marked with .keep")),
+		OPT_BOOL(0, "preserve-oldpacks", &preserve_oldpacks,
+				N_("move old pack files into the preserved subdirectory")),
+		OPT_BOOL(0, "prune-preserved", &prune_preserved,
+				N_("prune old pack files from the preserved subdirectory after repacking")),
 		OPT_END()
 	};
 
@@ -217,6 +247,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
+	preservedir = mkpathdup("%s/preserved", packdir);
 
 	sigchain_push_common(remove_pack_on_signal);
 
@@ -404,6 +435,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	/* End of pack replacement. */
 
+	if (prune_preserved)
+		remove_preserved_dir();
+
 	if (delete_redundant) {
 		int opts = 0;
 		string_list_sort(&names);
-- 
2.12.0.190.gab6997d48.dirty


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] repack: Add options to preserve and prune old pack files
  2017-03-07 16:40 [PATCH] repack: Add options to preserve and prune old pack files James Melvin
@ 2017-03-07 20:33 ` Junio C Hamano
  2017-03-09 17:50   ` jmelvin
  2017-03-10 21:52   ` jmelvin
  0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2017-03-07 20:33 UTC (permalink / raw)
  To: James Melvin; +Cc: git, nasserg, mfick, peff, sbeller

James Melvin <jmelvin@codeaurora.org> writes:

> These options are designed to prevent stale file handle exceptions
> during git operations which can happen on users of NFS repos when
> repacking is done on them. The strategy is to preserve old pack files
> around until the next repack with the hopes that they will become
> unreferenced by then and not cause any exceptions to running processes
> when they are finally deleted (pruned).

I find it a very sensible strategy to work around NFS, but it does
not explain why the directory the old ones are moved to need to be
configurable.  It feels to me that a boolean that causes the old
ones renamed s/^pack-/^old-&/ in the same directory (instead of
pruning them right away) would risk less chances of mistakes (e.g.
making "preserved" subdirectory on a separate device mounted there
in a hope to reduce disk usage of the primary repository, which
may defeat the whole point of moving the still-active file around
instead of removing them).

And if we make "preserve-old" a boolean, perhaps the presence of
"prune-preserved" would serve as a substitute for it, iow, perhaps
we may only need --prune-preserved option (and repack.prunePreserved
configuration variable)?

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 677bc7c81..f1a0c97f3 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -10,8 +10,10 @@
>  
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> +static int preserve_oldpacks = 0;
> +static int prune_preserved = 0;

We avoid initializing statics to 0 or NULL and instead let BSS take
care of them...

>  static int write_bitmaps;
> -static char *packdir, *packtmp;
> +static char *packdir, *packtmp, *preservedir;

... just like what you did here.

> @@ -108,6 +110,27 @@ static void get_non_kept_pack_filenames(struct s
> ...
> +static void preserve_pack(const char *file_path, const char *file_name,  const char *file_ext)
> +{
> +	char *fname_old;
> +
> +	if (mkdir(preservedir, 0700) && errno != EEXIST)
> +		error(_("failed to create preserve directory"));

You do not want to do the rest of this function after issuing this
error, no?  Because ...

> +
> +	fname_old = mkpathdup("%s/%s.old-%s", preservedir, file_name, ++file_ext);
> +	rename(file_path, fname_old);

... this rename(2) would fail, whose error return you would catch
and act on.

> +	free(fname_old);
> +}
> +
> +static void remove_preserved_dir(void) {
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_addstr(&buf, preservedir);
> +	remove_dir_recursively(&buf, 0);

This is a wrong helper function to use on files and directories
inside .git/; the function is about removing paths in the working
tree.

> @@ -121,7 +144,10 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
>  	for (i = 0; i < ARRAY_SIZE(exts); i++) {
>  		strbuf_setlen(&buf, plen);
>  		strbuf_addstr(&buf, exts[i]);
> -		unlink(buf.buf);
> +		if (preserve_oldpacks)
> +			preserve_pack(buf.buf, base_name, exts[i]);
> +		else
> +			unlink(buf.buf);

OK.

> @@ -194,6 +220,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  				N_("maximum size of each packfile")),
>  		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
>  				N_("repack objects in packs marked with .keep")),
> +		OPT_BOOL(0, "preserve-oldpacks", &preserve_oldpacks,
> +				N_("move old pack files into the preserved subdirectory")),
> +		OPT_BOOL(0, "prune-preserved", &prune_preserved,
> +				N_("prune old pack files from the preserved subdirectory after repacking")),
>  		OPT_END()
>  	};
>  
> @@ -217,6 +247,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  
>  	packdir = mkpathdup("%s/pack", get_object_directory());
>  	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> +	preservedir = mkpathdup("%s/preserved", packdir);
>  
>  	sigchain_push_common(remove_pack_on_signal);
>  
> @@ -404,6 +435,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  
>  	/* End of pack replacement. */
>  
> +	if (prune_preserved)
> +		remove_preserved_dir();

I am not sure if I understand your design.  Your model looks to me
like there are two modes of operation.  #1 uses "--preserve-old" and
sends old ones to purgatory instead of removing them and #2 uses
"--prune-preserved" to remove all the ones in the purgatory
immediately.  A few things that come to my mind immediately:

 * When "--prune-preseved" is used, it removes both ancient ones and
   more recent ones indiscriminately.  Would it make more sense to
   "expire" only the older ones while keeping the more recent ones?

 * It appears that the main reason you would want --prune-preserved
   in this design is after running with "--preserve-old" number of
   times, you want to remove really old ones that have accumulated,
   and I would imagine that at that point of time, you are only
   interested in repack, but the code structure tells me that this
   will force the users to first run a repack before pruning.

I suspect that a design that is simpler to explain to the users may
be to add a command line option "--preserve-pruned=<expiration>" and
a matching configuration variable repack.preservePruned, which
defaults to "immediate" (i.e. no preserving), and

 - When the value of preserve_pruned is not "immediate", use
   preserve_pack() instead of unlink();

 - At the end, find preserved packs that are older than the value in
   preserve_pruned and unlink() them.

It also may make sense to add another command line option
"--prune-preserved-packs-only" (without matching configuration
variable) that _ONLY_ does the "find older preserved packs and
unlink them" part, without doing any repack.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] repack: Add options to preserve and prune old pack files
  2017-03-07 20:33 ` Junio C Hamano
@ 2017-03-09 17:50   ` jmelvin
  2017-03-09 18:45     ` Martin Fick
  2017-03-10 21:52   ` jmelvin
  1 sibling, 1 reply; 5+ messages in thread
From: jmelvin @ 2017-03-09 17:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, nasserg, mfick, peff, sbeller

On 2017-03-07 13:33, Junio C Hamano wrote:
> James Melvin <jmelvin@codeaurora.org> writes:
> 
>> These options are designed to prevent stale file handle exceptions
>> during git operations which can happen on users of NFS repos when
>> repacking is done on them. The strategy is to preserve old pack files
>> around until the next repack with the hopes that they will become
>> unreferenced by then and not cause any exceptions to running processes
>> when they are finally deleted (pruned).
> 
> I find it a very sensible strategy to work around NFS, but it does
> not explain why the directory the old ones are moved to need to be
> configurable.  It feels to me that a boolean that causes the old
> ones renamed s/^pack-/^old-&/ in the same directory (instead of
> pruning them right away) would risk less chances of mistakes (e.g.
> making "preserved" subdirectory on a separate device mounted there
> in a hope to reduce disk usage of the primary repository, which
> may defeat the whole point of moving the still-active file around
> instead of removing them).

Moving the preserved pack files to a separate directory only helped make 
the pack directory cleaner, but I agree that having the old* pack files 
in the same directory is a better approach as it would ensure that it's 
still on the same mounted device. I'll update the logic to reflect that.

As for the naming convention of the preserved pack files, there is 
already some logic to remove "old-" files in repack. Currently this is 
the naming convention I have for them:

pack-<sha1>.old-<ext>
pack-7412ee739b8a20941aa1c2fd03abcc7336b330ba.old-pack

One advantage of that is the extension is no longer an expected one, 
differentiating it from current pack files.

That said, if that is not a concern, I could prefix them with 
"preserved" instead of "old" to differentiate them from the other logic 
that cleans up "old-*". What are your thoughts on that?

preserved-<sha1>.<ext>
preserved-7412ee739b8a20941aa1c2fd03abcc7336b330ba.pack


> And if we make "preserve-old" a boolean, perhaps the presence of
> "prune-preserved" would serve as a substitute for it, iow, perhaps
> we may only need --prune-preserved option (and repack.prunePreserved
> configuration variable)?
> 
>> diff --git a/builtin/repack.c b/builtin/repack.c
>> index 677bc7c81..f1a0c97f3 100644
>> --- a/builtin/repack.c
>> +++ b/builtin/repack.c
>> @@ -10,8 +10,10 @@
>> 
>>  static int delta_base_offset = 1;
>>  static int pack_kept_objects = -1;
>> +static int preserve_oldpacks = 0;
>> +static int prune_preserved = 0;
> 
> We avoid initializing statics to 0 or NULL and instead let BSS take
> care of them...

Will do

>>  static int write_bitmaps;
>> -static char *packdir, *packtmp;
>> +static char *packdir, *packtmp, *preservedir;
> 
> ... just like what you did here.
> 
>> @@ -108,6 +110,27 @@ static void get_non_kept_pack_filenames(struct s
>> ...
>> +static void preserve_pack(const char *file_path, const char 
>> *file_name,  const char *file_ext)
>> +{
>> +	char *fname_old;
>> +
>> +	if (mkdir(preservedir, 0700) && errno != EEXIST)
>> +		error(_("failed to create preserve directory"));
> 
> You do not want to do the rest of this function after issuing this
> error, no?  Because ...

Agreed, I'll update.

>> +
>> +	fname_old = mkpathdup("%s/%s.old-%s", preservedir, file_name, 
>> ++file_ext);
>> +	rename(file_path, fname_old);
> 
> ... this rename(2) would fail, whose error return you would catch
> and act on.
> 
>> +	free(fname_old);
>> +}
>> +
>> +static void remove_preserved_dir(void) {
>> +	struct strbuf buf = STRBUF_INIT;
>> +
>> +	strbuf_addstr(&buf, preservedir);
>> +	remove_dir_recursively(&buf, 0);
> 
> This is a wrong helper function to use on files and directories
> inside .git/; the function is about removing paths in the working
> tree.

Will update.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] repack: Add options to preserve and prune old pack files
  2017-03-09 17:50   ` jmelvin
@ 2017-03-09 18:45     ` Martin Fick
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Fick @ 2017-03-09 18:45 UTC (permalink / raw)
  To: jmelvin; +Cc: Junio C Hamano, git, nasserg, peff, sbeller

On Thursday, March 09, 2017 10:50:21 AM 
jmelvin@codeaurora.org wrote:
> On 2017-03-07 13:33, Junio C Hamano wrote:
> > James Melvin <jmelvin@codeaurora.org> writes:
> >> These options are designed to prevent stale file handle
> >> exceptions during git operations which can happen on
> >> users of NFS repos when repacking is done on them. The
> >> strategy is to preserve old pack files around until
> >> the next repack with the hopes that they will become
> >> unreferenced by then and not cause any exceptions to
> >> running processes when they are finally deleted
> >> (pruned).
> > 
> > I find it a very sensible strategy to work around NFS,
> > but it does not explain why the directory the old ones
> > are moved to need to be configurable.  It feels to me
> > that a boolean that causes the old ones renamed
> > s/^pack-/^old-&/ in the same directory (instead of
> > pruning them right away) would risk less chances of
> > mistakes (e.g. making "preserved" subdirectory on a
> > separate device mounted there in a hope to reduce disk
> > usage of the primary repository, which may defeat the
> > whole point of moving the still-active file around
> > instead of removing them).
> 
> Moving the preserved pack files to a separate directory
> only helped make the pack directory cleaner, but I agree
> that having the old* pack files in the same directory is
> a better approach as it would ensure that it's still on
> the same mounted device. I'll update the logic to reflect
> that.
> 
> As for the naming convention of the preserved pack files,
> there is already some logic to remove "old-" files in
> repack. Currently this is the naming convention I have
> for them:
> 
> pack-<sha1>.old-<ext>
> pack-7412ee739b8a20941aa1c2fd03abcc7336b330ba.old-pack
> 
> One advantage of that is the extension is no longer an
> expected one, differentiating it from current pack files.
> 
> That said, if that is not a concern, I could prefix them
> with "preserved" instead of "old" to differentiate them
> from the other logic that cleans up "old-*". What are
> your thoughts on that?
> 
> preserved-<sha1>.<ext>
> preserved-7412ee739b8a20941aa1c2fd03abcc7336b330ba.pack

Some other proposals so that the preserved files do not get 
returned by naive finds based on their extensions,

 preserved-<sha1>.<ext>-preserved
 preserved-7412ee739b8a20941aa1c2fd03abcc7336b330ba.pack-
preserved

or:

 preserved-<sha1>.preserved-<ext>
 preserved-7412ee739b8a20941aa1c2fd03abcc7336b330ba.preserved-
pack

or maybe even just:

 preserved-<ext>-<sha1>
 preserved-pack-7412ee739b8a20941aa1c2fd03abcc7336b330ba


-Martin
-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] repack: Add options to preserve and prune old pack files
  2017-03-07 20:33 ` Junio C Hamano
  2017-03-09 17:50   ` jmelvin
@ 2017-03-10 21:52   ` jmelvin
  1 sibling, 0 replies; 5+ messages in thread
From: jmelvin @ 2017-03-10 21:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, nasserg, mfick, peff, sbeller

On 2017-03-07 13:33, Junio C Hamano wrote:
> James Melvin <jmelvin@codeaurora.org> writes:
> 
...
> 
> I am not sure if I understand your design.  Your model looks to me
> like there are two modes of operation.  #1 uses "--preserve-old" and
> sends old ones to purgatory instead of removing them and #2 uses
> "--prune-preserved" to remove all the ones in the purgatory
> immediately.  A few things that come to my mind immediately:
> 
>  * When "--prune-preseved" is used, it removes both ancient ones and
>    more recent ones indiscriminately.  Would it make more sense to
>    "expire" only the older ones while keeping the more recent ones?
> 
>  * It appears that the main reason you would want --prune-preserved
>    in this design is after running with "--preserve-old" number of
>    times, you want to remove really old ones that have accumulated,
>    and I would imagine that at that point of time, you are only
>    interested in repack, but the code structure tells me that this
>    will force the users to first run a repack before pruning.
> 
> I suspect that a design that is simpler to explain to the users may
> be to add a command line option "--preserve-pruned=<expiration>" and
> a matching configuration variable repack.preservePruned, which
> defaults to "immediate" (i.e. no preserving), and
> 
>  - When the value of preserve_pruned is not "immediate", use
>    preserve_pack() instead of unlink();
> 
>  - At the end, find preserved packs that are older than the value in
>    preserve_pruned and unlink() them.
> 
> It also may make sense to add another command line option
> "--prune-preserved-packs-only" (without matching configuration
> variable) that _ONLY_ does the "find older preserved packs and
> unlink them" part, without doing any repack.

I like the idea of only having a single option to do both the preserving 
and the pruning. It makes the operation more cycle based, which is more 
closely tied to the use case this is intended for. Ie. Run repack while 
preserving old pack files so that any open file handles to those pack 
files will continue to work, while the next time repack is run it is 
highly likely those will no longer be needed, so they can be cleaned up. 
Obviously this depends on how frequent repack is run, but something an 
Administrator can determine.

I'll push a patch that combines that functionality into a single option 
to review.


-James

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-10 21:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-07 16:40 [PATCH] repack: Add options to preserve and prune old pack files James Melvin
2017-03-07 20:33 ` Junio C Hamano
2017-03-09 17:50   ` jmelvin
2017-03-09 18:45     ` Martin Fick
2017-03-10 21:52   ` jmelvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.