All of lore.kernel.org
 help / color / mirror / Atom feed
From: Quentin Casasnovas <quentin.casasnovas@oracle.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Refreshing index timestamps without reading content
Date: Mon, 9 Jan 2017 13:17:24 +0100	[thread overview]
Message-ID: <20170109121724.GC7000@chrystal.oracle.com> (raw)
In-Reply-To: <CACsJy8BRfJG6L49VyC+qsrQ9Arz0gCGpMATpK9uLq61Lx6_Jtg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1736 bytes --]

On Mon, Jan 09, 2017 at 07:02:45PM +0700, Duy Nguyen wrote:
> On Thu, Jan 5, 2017 at 6:23 PM, Quentin Casasnovas
> <quentin.casasnovas@oracle.com> wrote:
> 
> > If not, I am willing to implement a --assume-content-unchanged to the git
> > update-index if you guys don't see something fundamentally wrong with this
> > approach.
> 
> If you do that, I think you should go with either of the following options
> 
> - Extend git-update-index --index-info to take stat info as well (or
> maybe make a new option instead). Then you can feed stat info directly
> to git without a use-case-specific "assume-content-unchanged".
> 
> - Add "git update-index --touch" that does what "touch" does. In this
> case, it blindly updates stat info to latest. But like touch, we can
> also specify  mtime from command line if we need to. It's a bit less
> generic than the above option, but easier to use.
> 
> Caveat: The options I'm proposing can be rejected. So maybe wait a bit
> to see how people feel and perhaps send an RFC patch, again to gauge
> the reception.


Hey Duy,

Thanks for your answer.

I've played with this a bit last week and added an extra command, which I
called --refresh-stat, which works like your suggested --index.  It works
very well with my use case and improves the performances very significantly
on some of our use cases.

It is attached to this e-mail to gather comments, as you suggest, and is
really not meant to be reviewed for inclusion yet as it lacks test cases,
documentation changes, etc.  It is just a convenient way to show what I
need and receive comments.

The logic is simple enugh and will just skip calling ie_modified() when
refreshing the index.

Cheers,
Q

[-- Attachment #1.2: 0001-git-update-index-add-refresh-stat-option.patch --]
[-- Type: text/x-diff, Size: 5346 bytes --]

From 461b4f75056fc476db6bbdf8bc3ff6e91a8ad08d Mon Sep 17 00:00:00 2001
From: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Date: Sat, 7 Jan 2017 09:26:29 +0100
Subject: [PATCH] git-update-index: add --refresh-stat option.

git-update-index --refresh, contrary to what the documentation says, not
only will refresh the stat information but will also update the sha1 of the
objects in the working tree if the stat information is found to be
out-of-date.

We add a --refresh-stat option, which like --refresh will update the stat
information, but will assume that any file in the working tree that is
present in the index matches the index - hence prevents unecessarily
reading the files in the working tree.  It is different from
--assume-unchanged or --skip-worktree in that the new stat information is
recorded in the index, hence subsequent git update-index --refresh will not
read the files if their mtime hasn't changed.

This sounds like a very dangerous option to give to the user, since it
could potentially cause changed files to be missed, but can be needed when
we are sure the working tree files have not changed and are matching the
index.

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
---
 builtin/update-index.c | 10 ++++++++++
 cache.h                |  3 +++
 read-cache.c           |  7 +++++--
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index f3f07e7f1cb2..1215b6a09687 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -777,6 +777,12 @@ static int really_refresh_callback(const struct option *opt,
 	return refresh(opt->value, REFRESH_REALLY);
 }
 
+static int refresh_stat_callback(const struct option *opt,
+				const char *arg, int unset)
+{
+	return refresh(opt->value, REFRESH_STAT_ONLY);
+}
+
 static int chmod_callback(const struct option *opt,
 				const char *arg, int unset)
 {
@@ -934,6 +940,10 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			N_("refresh stat information"),
 			PARSE_OPT_NOARG | PARSE_OPT_NONEG,
 			refresh_callback},
+		{OPTION_CALLBACK, 0, "refresh-stat", &refresh_args, NULL,
+			N_("refresh only stat information (assume content has not changed)"),
+			PARSE_OPT_NOARG | PARSE_OPT_NONEG,
+			refresh_stat_callback},
 		{OPTION_CALLBACK, 0, "really-refresh", &refresh_args, NULL,
 			N_("like --refresh, but ignore assume-unchanged setting"),
 			PARSE_OPT_NOARG | PARSE_OPT_NONEG,
diff --git a/cache.h b/cache.h
index a50a61a19787..57d0f9fffbe5 100644
--- a/cache.h
+++ b/cache.h
@@ -611,6 +611,8 @@ extern void *read_blob_data_from_index(struct index_state *, const char *, unsig
 #define CE_MATCH_IGNORE_MISSING		0x08
 /* enable stat refresh */
 #define CE_MATCH_REFRESH		0x10
+/* only stat refresh, assume content hasn't changed */
+#define CE_MATCH_REFRESH_STAT_ONLY	0x20
 extern int ie_match_stat(const struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
 extern int ie_modified(const struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
 
@@ -643,6 +645,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 #define REFRESH_IGNORE_MISSING	0x0008	/* ignore non-existent */
 #define REFRESH_IGNORE_SUBMODULES	0x0010	/* ignore submodules */
 #define REFRESH_IN_PORCELAIN	0x0020	/* user friendly output, not "needs update" */
+#define REFRESH_STAT_ONLY	0x0040	/* do not check content but update stat */
 extern int refresh_index(struct index_state *, unsigned int flags, const struct pathspec *pathspec, char *seen, const char *header_msg);
 extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned int);
 
diff --git a/read-cache.c b/read-cache.c
index db5d91064266..e9334094c6f0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1101,6 +1101,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	int ignore_valid = options & CE_MATCH_IGNORE_VALID;
 	int ignore_skip_worktree = options & CE_MATCH_IGNORE_SKIP_WORKTREE;
 	int ignore_missing = options & CE_MATCH_IGNORE_MISSING;
+	int assume_content_unchanged = options & CE_MATCH_REFRESH_STAT_ONLY;
 
 	if (!refresh || ce_uptodate(ce))
 		return ce;
@@ -1161,7 +1162,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 		}
 	}
 
-	if (ie_modified(istate, ce, &st, options)) {
+	if (!assume_content_unchanged && ie_modified(istate, ce, &st, options)) {
 		if (err)
 			*err = EINVAL;
 		return NULL;
@@ -1206,11 +1207,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	int quiet = (flags & REFRESH_QUIET) != 0;
 	int not_new = (flags & REFRESH_IGNORE_MISSING) != 0;
 	int ignore_submodules = (flags & REFRESH_IGNORE_SUBMODULES) != 0;
+	int refresh_stat_only = (flags & REFRESH_STAT_ONLY) != 0;
 	int first = 1;
 	int in_porcelain = (flags & REFRESH_IN_PORCELAIN);
 	unsigned int options = (CE_MATCH_REFRESH |
 				(really ? CE_MATCH_IGNORE_VALID : 0) |
-				(not_new ? CE_MATCH_IGNORE_MISSING : 0));
+				(not_new ? CE_MATCH_IGNORE_MISSING : 0) |
+				(refresh_stat_only ? CE_MATCH_REFRESH_STAT_ONLY : 0));
 	const char *modified_fmt;
 	const char *deleted_fmt;
 	const char *typechange_fmt;
-- 
2.11.0


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

  reply	other threads:[~2017-01-09 12:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-05 11:23 Refreshing index timestamps without reading content Quentin Casasnovas
2017-01-09 12:02 ` Duy Nguyen
2017-01-09 12:17   ` Quentin Casasnovas [this message]
2017-01-09 12:22     ` Quentin Casasnovas
2017-01-09 15:01   ` Junio C Hamano
2017-01-09 15:55     ` Quentin Casasnovas
2017-01-10 14:17       ` Quentin Casasnovas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170109121724.GC7000@chrystal.oracle.com \
    --to=quentin.casasnovas@oracle.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.